Skip to content

Question and answer

LangGraph PDF Retrieval-Augmented Generation (RAG) Tool

This tool answers user questions using the traditional RAG pipeline: 1. Retrieve relevant chunks from ALL papers in the vector store 2. Rerank chunks using NVIDIA NIM reranker to find the most relevant ones 3. Generate answer using the top reranked chunks

Traditional RAG Pipeline Flow

Query → Retrieve chunks from ALL papers → Rerank chunks → Generate answer

This ensures the best possible chunks are selected across all available papers, not just from pre-selected papers.

QuestionAndAnswerInput

Bases: BaseModel

Pydantic schema for the PDF Q&A tool inputs.

Fields

question: User's free-text query to answer based on PDF content. tool_call_id: LangGraph-injected call identifier for tracking. state: Shared agent state dict containing: - article_data: metadata mapping of paper IDs to info (e.g., 'pdf_url', title). - text_embedding_model: embedding model instance for chunk indexing. - llm_model: chat/LLM instance for answer generation.

Source code in aiagents4pharma/talk2scholars/tools/pdf/question_and_answer.py
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
class QuestionAndAnswerInput(BaseModel):
    """
    Pydantic schema for the PDF Q&A tool inputs.

    Fields:
      question: User's free-text query to answer based on PDF content.
      tool_call_id: LangGraph-injected call identifier for tracking.
      state: Shared agent state dict containing:
        - article_data: metadata mapping of paper IDs to info (e.g., 'pdf_url', title).
        - text_embedding_model: embedding model instance for chunk indexing.
        - llm_model: chat/LLM instance for answer generation.
    """

    question: str = Field(
        description="User question for generating a PDF-based answer."
    )
    tool_call_id: Annotated[str, InjectedToolCallId]
    state: Annotated[dict, InjectedState]

question_and_answer(question, state, tool_call_id)

LangGraph tool for Retrieval-Augmented Generation over PDFs using traditional RAG pipeline.

Traditional RAG Pipeline Implementation
  1. Load ALL available PDFs into Milvus vector store (if not already loaded)
  2. Retrieve relevant chunks from ALL papers using vector similarity search
  3. Rerank retrieved chunks using NVIDIA NIM semantic reranker
  4. Generate answer using top reranked chunks with source attribution

This approach ensures the best chunks are selected across all available papers, rather than pre-selecting papers and potentially missing relevant information.

Parameters:

Name Type Description Default
question str

The free-text question to answer.

required
state dict

Injected agent state; must include: - article_data: mapping paper IDs → metadata (pdf_url, title, etc.) - text_embedding_model: embedding model instance. - llm_model: chat/LLM instance.

required
tool_call_id str

Internal identifier for this tool invocation.

required

Returns:

Type Description
Command[Any]

Command[Any]: updates conversation state with a ToolMessage(answer).

Raises:

Type Description
ValueError

when required models or metadata are missing in state.

RuntimeError

when no relevant chunks can be retrieved for the query.

Source code in aiagents4pharma/talk2scholars/tools/pdf/question_and_answer.py
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
@tool(args_schema=QuestionAndAnswerInput, parse_docstring=True)
def question_and_answer(
    question: str,
    state: Annotated[dict, InjectedState],
    tool_call_id: Annotated[str, InjectedToolCallId],
) -> Command[Any]:
    """
    LangGraph tool for Retrieval-Augmented Generation over PDFs using traditional RAG pipeline.

    Traditional RAG Pipeline Implementation:
      1. Load ALL available PDFs into Milvus vector store (if not already loaded)
      2. Retrieve relevant chunks from ALL papers using vector similarity search
      3. Rerank retrieved chunks using NVIDIA NIM semantic reranker
      4. Generate answer using top reranked chunks with source attribution

    This approach ensures the best chunks are selected across all available papers,
    rather than pre-selecting papers and potentially missing relevant information.

    Args:
      question (str): The free-text question to answer.
      state (dict): Injected agent state; must include:
        - article_data: mapping paper IDs → metadata (pdf_url, title, etc.)
        - text_embedding_model: embedding model instance.
        - llm_model: chat/LLM instance.
      tool_call_id (str): Internal identifier for this tool invocation.

    Returns:
      Command[Any]: updates conversation state with a ToolMessage(answer).

    Raises:
      ValueError: when required models or metadata are missing in state.
      RuntimeError: when no relevant chunks can be retrieved for the query.
    """
    call_id = f"qa_call_{time.time()}"
    logger.info(
        "Starting PDF Question and Answer tool (Traditional RAG Pipeline) - Call %s",
        call_id,
    )
    logger.info("%s: Question: '%s'", call_id, question)

    helper.start_call(config, call_id)

    # Extract models and article metadata
    text_emb, llm_model, article_data = helper.get_state_models_and_data(state)

    # Initialize or reuse Milvus vector store
    logger.info("%s: Initializing vector store", call_id)
    vs = helper.init_vector_store(text_emb)

    # Load ALL papers (traditional RAG approach)
    logger.info(
        "%s: Loading all %d papers into vector store (traditional RAG approach)",
        call_id,
        len(article_data),
    )
    load_all_papers(
        vector_store=vs,
        articles=article_data,
        call_id=call_id,
        config=config,
        has_gpu=helper.has_gpu,
    )

    # Traditional RAG Pipeline: Retrieve from ALL papers, then rerank
    logger.info(
        "%s: Starting traditional RAG pipeline: retrieve → rerank → generate",
        call_id,
    )

    # Retrieve and rerank chunks in one step
    reranked_chunks = retrieve_and_rerank_chunks(
        vs, question, config, call_id, helper.has_gpu
    )

    if not reranked_chunks:
        msg = f"No relevant chunks found for question: '{question}'"
        logger.warning("%s: %s", call_id, msg)

    # Generate answer using reranked chunks
    logger.info(
        "%s: Generating answer using %d reranked chunks",
        call_id,
        len(reranked_chunks),
    )
    response_text = format_answer(
        question,
        reranked_chunks,
        llm_model,
        article_data,
        config,
        call_id=call_id,
        has_gpu=helper.has_gpu,
    )

    logger.info(
        "%s: Successfully traditional completed RAG pipeline",
        call_id,
    )

    return Command(
        update={
            "messages": [
                ToolMessage(
                    content=response_text,
                    tool_call_id=tool_call_id,
                )
            ],
        }
    )