settings_tutorial
Setup
This tutorial is available as a Jupyter notebook here.
This tutorial aims to show how to use the Settings class to configure PaperQA. Firstly, we will be using OpenAI and Anthropic models, so we need to set the OPENAI_API_KEY and ANTHROPIC_API_KEY environment variables. We will use both models to make it clear when paperqa agent is using either one or the other. We use python-dotenv to load the environment variables from a .env file. Hence, our first step is to create a .env file and install the required packages.
# fmt: off
# Create .env file with OpenAI API and Anthropic API keys
# Replace <your-openai-api-key> and <your-anthropic-api-key> with your actual API keys
!echo "OPENAI_API_KEY=<your-openai-api-key>" > .env # fmt: skip
!echo "ANTHROPIC_API_KEY=<your-anthropic-api-key>" >> .env # fmt: skip
!uv pip install -q nest-asyncio python-dotenv aiohttp fhlmi "paper-qa[local]"
# fmt: onimport os
import aiohttp
import nest_asyncio
from dotenv import load_dotenv
nest_asyncio.apply()
load_dotenv(".env")We will use the lmi package to get the model names and the .papers directory to save documents we will use.
The Settings class is used to configure the PaperQA settings. Official documentation can be found here and the open source code can be found here.
Here is a basic example of how to use the Settings class. We will be unnecessarily verbose for the sake of clarity. Please notice that most of the settings are optional and the defaults are good for most cases. Refer to the descriptions of each setting for more information.
Within this Settings object, I'd like to discuss specifically how the llms are configured and how paperqa looks for papers.
A common source of confusion is that multiple llms are used in paperqa. We have llm, summary_llm, agent_llm, and embedding. Hence, if llm is set to an Anthropic model, summary_llm and agent_llm will still require a OPENAI_API_KEY, since OpenAI models are the default.
Among the objects that use llms in paperqa, we have llm, summary_llm, agent_llm, and embedding:
llm: Main LLM used by the agent to reason about the question, extract metadata from documents, etc.summary_llm: LLM used to summarize the papers.agent_llm: LLM used to answer questions and select tools.embedding: Embedding model used to embed the papers.
Let's see some examples around this concept. First, we define the settings with llm set to an OpenAI model. Please notice this is not an complete list of settings. But take your time to read through this Settings class and all customization that can be done.
As it is evident, Paperqa is absolutely customizable. And here we reinterate that despite this possible fine customization, the defaults are good for most cases. Although, the user is welcome to explore the settings and customize the paperqa to their needs.
We also set settings.verbosity to 1, which will print the agent configuration. Feel free to set it to 0 to silence the logging after your first run.
Which probably worked fine. Let's now try to remove OPENAI_API_KEY and run again the same question with the same settings.
It would obviously fail. We don't have a valid OPENAI_API_KEY, so the agent will not be able to use OpenAI models. Let's change it to an Anthropic model and see if it works.
Now the agent is able to use Anthropic models only and although we don't have a valid OPENAI_API_KEY, the question is answered because the agent will not use OpenAI models. See that we also changed the embedding because it was using text-embedding-3-small by default, which is a OpenAI model. Paperqa implements a few embedding models. Please refer to the documentation for more information.
In addition, notice that this is a very verbose example for the sake of clarity. We could have just set only the llms names and used default settings for the rest:
The output
Paperqa returns a PQASession object, which contains not only the answer but also all the information gatheres to answer the questions. We recommend printing the PQASession object (print(response.session)) to understand the information it contains. Let's check the PQASession object:
In addition to the answer, the PQASession object contains all the references and contexts used to generate the answer.
Because paperqa splits the documents into chunks, each chunk is a valid reference. You can see that it also references the page where the context was found.
Lastly, PQASession.session.contexts contains the contexts used to generate the answer. Each context has a score, which is the similarity between the question and the context. Paperqa uses this score to choose what contexts is more relevant to answer the question.
Last updated

