Query Dataframe
Query the metadata table of the most recently displayed papers.
This tool loads state['last_displayed_papers']
into a pandas DataFrame and uses an
LLM-driven DataFrame agent to execute metadata-level queries. It supports both
natural-language prompts (e.g., “list titles by author X”) and direct Python expressions
over the DataFrame.
Capabilities - Filter, sort, and aggregate rows using metadata columns (e.g., Title, Authors, Venue, Year). - Extract paper identifiers from a designated column (default: 'paper_ids'), optionally for a single row. - Return the DataFrame agent’s textual result as a ToolMessage.
Requirements
- state['llm_model']
: model used to instantiate the DataFrame agent.
- state['last_displayed_papers']
: dictionary mapping row keys → metadata records.
Notes
- Operates strictly on the metadata table; it does not parse or read PDF content.
- When extract_ids=True
, the tool constructs a Python expression for the agent to evaluate
and return identifiers from id_column
. If row_number
is provided (1-based), only that row’s
first identifier is returned; otherwise a list is returned from all rows that have values.
NoPapersFoundError
Bases: Exception
Exception raised when no papers are found in the state.
Source code in aiagents4pharma/talk2scholars/tools/s2/query_dataframe.py
47 48 |
|
QueryDataFrameInput
Bases: BaseModel
Input schema for querying the last displayed papers metadata DataFrame.
Fields
question (str): The query to execute. Accepts natural language (e.g., "List titles from 2024") or a Python expression over the DataFrame (e.g., "df['Title'].tolist()").
extract_ids (bool, default=False):
When True, the tool prepares a Python expression for the DataFrame agent to extract
identifiers from id_column
. Use to obtain IDs from the metadata table.
id_column (str, default="paper_ids"):
Name of the column that contains per-row lists of identifiers (e.g., ["arxiv:2301.12345"]).
Used only when extract_ids=True
.
row_number (int | None, default=None):
1-based row index. When provided with extract_ids=True
, returns only that row’s first
identifier. When omitted, returns a list of first identifiers from each applicable row.
tool_call_id (InjectedToolCallId): Internal identifier for tracing the tool invocation.
state (dict): Agent state containing: - 'last_displayed_papers': dict with the current results table (rows → metadata) - 'llm_model': model object or reference for the DataFrame agent
Source code in aiagents4pharma/talk2scholars/tools/s2/query_dataframe.py
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 |
|
query_dataframe(question, state, tool_call_id, **kwargs)
Execute a metadata query against the DataFrame built from last_displayed_papers
.
Behavior
- Builds a pandas DataFrame from state['last_displayed_papers']
.
- Instantiates a pandas DataFrame agent with state['llm_model']
.
- Runs either:
• the provided natural-language prompt, or
• a constructed Python expression when extract_ids=True
(optionally scoped to row_number
, 1-based).
- Returns the DataFrame agent’s output text in a ToolMessage.
Parameters
question (str):
Natural-language query or Python expression to run on the DataFrame.
state (dict):
Must provide 'llm_model' and 'last_displayed_papers'.
tool_call_id (str):
Internal identifier for the tool call.
**kwargs:
extract_ids (bool): Enable ID extraction from id_column
.
id_column (str): Column containing lists of identifiers (default: "paper_ids").
row_number (int | None): 1-based index for a single-row extraction.
Returns
Command:
update = {
"messages": [
ToolMessage(
content=
Errors
- Raises ValueError
if 'llm_model' is missing in state
.
- Raises NoPapersFoundError
if state['last_displayed_papers']
is missing or empty.
- Raises ValueError
if a required argument for the chosen mode is invalid
(e.g., no id_column
when extract_ids=True
).
Examples - Natural language: question="List titles where Year >= 2023" - Python list of titles: question="df.query('Year >= 2023')['Title'].tolist()" - Extract first ID from row 1: extract_ids=True, row_number=1 - Extract first IDs from all rows: extract_ids=True
Source code in aiagents4pharma/talk2scholars/tools/s2/query_dataframe.py
113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 |
|