Measuring PaperQA2 with LFRQA
Overview
Download the Annotations
# Create a new directory for the dataset
!mkdir -p data/rag-qa-benchmarking
# Get the annotated questions
!curl https://raw.githubusercontent.com/awslabs/rag-qa-arena/refs/heads/main/data/\
annotations_science_with_citation.jsonl \
-o data/rag-qa-benchmarking/annotations_science_with_citation.jsonlDownload the Robust-QA Documents
# Download the Lotte dataset, which includes the required documents
!curl https://downloads.cs.stanford.edu/nlp/data/colbert/colbertv2/lotte.tar.gz --output lotte.tar.gz
# Extract the dataset
!tar -xvzf lotte.tar.gz
# Move the science test collection to our dataset folder
!cp lotte/science/test/collection.tsv ./data/rag-qa-benchmarking/science_test_collection.tsv
# Clean up unnecessary files
!rm lotte.tar.gz
!rm -rf lotteLoad the Data
Select the Documents to Use
Prepare the Document Files
Create the Manifest File
Filter and Save Questions
Install paperqa
Index the Documents
Benchmark!
Last updated

