Question and answer
LangGraph PDF Retrieval-Augmented Generation (RAG) Tool
This tool answers user questions using the traditional RAG pipeline: 1. Retrieve relevant chunks from ALL papers in the vector store 2. Rerank chunks using NVIDIA NIM reranker to find the most relevant ones 3. Generate answer using the top reranked chunks
Traditional RAG Pipeline Flow
Query → Retrieve chunks from ALL papers → Rerank chunks → Generate answer
This ensures the best possible chunks are selected across all available papers, not just from pre-selected papers.
QuestionAndAnswerInput
Bases: BaseModel
Pydantic schema for the PDF Q&A tool inputs.
Fields
question: User's free-text query to answer based on PDF content. tool_call_id: LangGraph-injected call identifier for tracking. state: Shared agent state dict containing: - article_data: metadata mapping of paper IDs to info (e.g., 'pdf_url', title). - text_embedding_model: embedding model instance for chunk indexing. - llm_model: chat/LLM instance for answer generation.
Source code in aiagents4pharma/talk2scholars/tools/pdf/question_and_answer.py
46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 |
|
question_and_answer(question, state, tool_call_id)
LangGraph tool for Retrieval-Augmented Generation over PDFs using traditional RAG pipeline.
Traditional RAG Pipeline Implementation
- Load ALL available PDFs into Milvus vector store (if not already loaded)
- Retrieve relevant chunks from ALL papers using vector similarity search
- Rerank retrieved chunks using NVIDIA NIM semantic reranker
- Generate answer using top reranked chunks with source attribution
This approach ensures the best chunks are selected across all available papers, rather than pre-selecting papers and potentially missing relevant information.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
question
|
str
|
The free-text question to answer. |
required |
state
|
dict
|
Injected agent state; must include: - article_data: mapping paper IDs → metadata (pdf_url, title, etc.) - text_embedding_model: embedding model instance. - llm_model: chat/LLM instance. |
required |
tool_call_id
|
str
|
Internal identifier for this tool invocation. |
required |
Returns:
Type | Description |
---|---|
Command[Any]
|
Command[Any]: updates conversation state with a ToolMessage(answer). |
Raises:
Type | Description |
---|---|
ValueError
|
when required models or metadata are missing in state. |
RuntimeError
|
when no relevant chunks can be retrieved for the query. |
Source code in aiagents4pharma/talk2scholars/tools/pdf/question_and_answer.py
66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 |
|