Question and answer
question_and_answer: Tool for performing Q&A on PDF documents using retrieval augmented generation.
This module provides functionality to extract text from PDF binary data, split it into chunks, retrieve relevant segments via a vector store, and generate an answer to a user-provided question using a language model chain.
QuestionAndAnswerInput
Bases: BaseModel
Input schema for the PDF Question and Answer tool.
Attributes:
Name | Type | Description |
---|---|---|
question |
str
|
The question to ask regarding the PDF content. |
tool_call_id |
str
|
Unique identifier for the tool call, injected automatically. |
Source code in aiagents4pharma/talk2scholars/tools/pdf/question_and_answer.py
47 48 49 50 51 52 53 54 55 56 57 58 |
|
extract_text_from_pdf_data(pdf_bytes)
Extract text content from PDF binary data.
This function uses PyPDF2 to read the provided PDF bytes and concatenates the text extracted from each page.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pdf_bytes
|
bytes
|
The binary data of the PDF document. |
required |
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
The complete text extracted from the PDF. |
Source code in aiagents4pharma/talk2scholars/tools/pdf/question_and_answer.py
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 |
|
generate_answer(question, pdf_bytes, llm_model)
Generate an answer for a question using retrieval augmented generation on PDF content.
This function extracts text from the PDF data, splits the text into manageable chunks, performs a similarity search to retrieve the most relevant segments, and then uses a question-answering chain (built using the provided llm_model) to generate an answer.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
question
|
str
|
The question to be answered. |
required |
pdf_bytes
|
bytes
|
The binary content of the PDF document. |
required |
llm_model
|
BaseChatModel
|
The language model instance to use for answering. |
required |
Returns:
Type | Description |
---|---|
Dict[str, Any]
|
Dict[str, Any]: A dictionary containing the answer generated by the language model. |
Source code in aiagents4pharma/talk2scholars/tools/pdf/question_and_answer.py
82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 |
|
generate_answer2(question, pdf_url, text_embedding_model)
Generate an answer for a question using retrieval augmented generation on PDF content.
This function extracts text from the PDF data, splits the text into manageable chunks, performs a similarity search to retrieve the most relevant segments, and then uses a question-answering chain (built using the provided llm_model) to generate an answer.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
question
|
str
|
The question to be answered. |
required |
pdf_bytes
|
bytes
|
The binary content of the PDF document. |
required |
llm_model
|
BaseChatModel
|
The language model instance to use for answering. |
required |
Returns:
Type | Description |
---|---|
Dict[str, Any]
|
Dict[str, Any]: A dictionary containing the answer generated by the language model. |
Source code in aiagents4pharma/talk2scholars/tools/pdf/question_and_answer.py
121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 |
|
question_and_answer_tool(question, tool_call_id, state)
Answer a question using PDF content stored in the state via retrieval augmented generation.
This tool retrieves the PDF binary data from the state (under the key "pdf_data"), extracts its textual content, and generates an answer to the specified question. It also extracts the llm_model (of type BaseChatModel) from the state to use for answering.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
question
|
str
|
The question regarding the PDF content. |
required |
tool_call_id
|
str
|
Unique identifier for the current tool call. |
required |
state
|
dict
|
A dictionary representing the current state, expected to contain PDF data under the key "pdf_data" with a sub-key "pdf_object" for the binary content, and a key "llm_model" holding the language model instance. |
required |
Returns:
Type | Description |
---|---|
Dict[str, Any]
|
Dict[str, Any]: A dictionary containing the generated answer or an error message. |
Source code in aiagents4pharma/talk2scholars/tools/pdf/question_and_answer.py
163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 |
|