Skip to content

Retrieve Semantic Scholar ID

Resolve a paper title to a Semantic Scholar paperId.

This module provides a tool that queries the Semantic Scholar API for the best match to a given paper title (full or partial) and returns the corresponding paperId string. Configuration is loaded via Hydra and the top ranked result is returned.

RetrieveSemanticScholarPaperIdInput

Bases: BaseModel

Input schema for title→paperId resolution.

Fields

paper_title : str Paper title to search. Accepts full titles or informative partial titles. tool_call_id : InjectedToolCallId Runtime-injected identifier for tracing the tool invocation.

Source code in aiagents4pharma/talk2scholars/tools/s2/retrieve_semantic_scholar_paper_id.py
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
class RetrieveSemanticScholarPaperIdInput(BaseModel):
    """
    Input schema for title→paperId resolution.

    Fields
    -------
    paper_title : str
        Paper title to search. Accepts full titles or informative partial titles.
    tool_call_id : InjectedToolCallId
        Runtime-injected identifier for tracing the tool invocation.
    """

    paper_title: str = Field(
        ..., description="The paper title to search for on Semantic Scholar."
    )
    tool_call_id: Annotated[str, InjectedToolCallId]

retrieve_semantic_scholar_paper_id(paper_title, tool_call_id)

Look up a Semantic Scholar paperId from a paper title.

Behavior

  • Loads Hydra config from tools.retrieve_semantic_scholar_paper_id.
  • Sends a search request with query=<paper_title>, limit=1, and requested fields.
  • Parses the top hit and returns its paperId as the ToolMessage content (plain string).

Parameters

paper_title : str Title or informative partial title to resolve. tool_call_id : str Runtime-injected identifier for the tool call.

Returns

Command update = { "messages": [ ToolMessage( content="", # Semantic Scholar paperId string tool_call_id= ) ] }

Exceptions

ValueError Raised when no match is found for the provided title. requests.RequestException Raised on network/HTTP errors (timeout, connection issues, etc.).

Examples

retrieve_semantic_scholar_paper_id("Attention Is All You Need", "tc_123")

Source code in aiagents4pharma/talk2scholars/tools/s2/retrieve_semantic_scholar_paper_id.py
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
@tool(
    "retrieve_semantic_scholar_paper_id",
    args_schema=RetrieveSemanticScholarPaperIdInput,
    parse_docstring=True,
)
def retrieve_semantic_scholar_paper_id(
    paper_title: str,
    tool_call_id: str,
) -> Command[Any]:
    """
    Look up a Semantic Scholar paperId from a paper title.

    Behavior
    --------
    - Loads Hydra config from `tools.retrieve_semantic_scholar_paper_id`.
    - Sends a search request with `query=<paper_title>`, `limit=1`, and requested fields.
    - Parses the top hit and returns its `paperId` as the ToolMessage content (plain string).

    Parameters
    ----------
    paper_title : str
        Title or informative partial title to resolve.
    tool_call_id : str
        Runtime-injected identifier for the tool call.

    Returns
    -------
    Command
        update = {
          "messages": [
            ToolMessage(
              content="<paperId>",  # Semantic Scholar paperId string
              tool_call_id=<tool_call_id>
            )
          ]
        }

    Exceptions
    ----------
    ValueError
        Raised when no match is found for the provided title.
    requests.RequestException
        Raised on network/HTTP errors (timeout, connection issues, etc.).

    Examples
    --------
    >>> retrieve_semantic_scholar_paper_id("Attention Is All You Need", "tc_123")
    """
    # Load hydra configuration
    with hydra.initialize(version_base=None, config_path="../../configs"):
        cfg = hydra.compose(
            config_name="config",
            overrides=["tools/retrieve_semantic_scholar_paper_id=default"],
        )
        cfg = cfg.tools.retrieve_semantic_scholar_paper_id
        logger.info("Loaded configuration for Semantic Scholar paper ID retrieval tool")
    logger.info("Retrieving ID of paper with title: %s", paper_title)
    endpoint = cfg.api_endpoint
    params = {
        "query": paper_title,
        "limit": 1,
        "fields": ",".join(cfg.api_fields),
    }

    response = requests.get(endpoint, params=params, timeout=10)
    data = response.json()
    papers = data.get("data", [])
    logger.info("Received %d papers", len(papers))
    if not papers:
        logger.error("No papers found for query: %s", paper_title)
        raise ValueError(f"No papers found for query: {paper_title}. Try again.")
    # Extract the paper ID from the top result
    paper_id = papers[0]["paperId"]
    logger.info("Found paper ID: %s", paper_id)
    # Prepare the response content (just the ID)
    response_text = paper_id
    return Command(
        update={
            "messages": [
                ToolMessage(
                    content=response_text,
                    tool_call_id=tool_call_id,
                )
            ],
        }
    )