Skip to content

Paper Download Agent

This module defines the paper download agent that connects to the arXiv API to fetch paper details and PDFs. It is part of the Talk2Scholars project.

get_app(uniq_id, llm_model)

Initializes and returns the LangGraph application for the Talk2Scholars paper download agent.

This agent supports downloading scientific papers from multiple preprint servers, including arXiv, BioRxiv, and MedRxiv. It can intelligently handle user queries by extracting or resolving necessary identifiers (e.g., arXiv ID or DOI) from the paper title and routing the request to the appropriate download tool.

Parameters:

Name Type Description Default
uniq_id str

A unique identifier for tracking the current session.

required
llm_model BaseChatModel

The language model to be used by the agent.

required

Returns:

Name Type Description
StateGraph

A compiled LangGraph application that enables the paper download agent to

process user queries and retrieve research papers from arXiv (using arXiv ID),

BioRxiv and MedRxiv (using DOI resolved from the paper title or provided directly).

Source code in aiagents4pharma/talk2scholars/agents/paper_download_agent.py
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
def get_app(uniq_id, llm_model: BaseChatModel):
    """
    Initializes and returns the LangGraph application for the Talk2Scholars paper download agent.

    This agent supports downloading scientific papers from multiple preprint servers, including
    arXiv, BioRxiv, and MedRxiv. It can intelligently handle user queries by extracting or resolving
    necessary identifiers (e.g., arXiv ID or DOI) from the paper title and routing the request to
    the appropriate download tool.

    Args:
        uniq_id (str): A unique identifier for tracking the current session.
        llm_model (BaseChatModel, optional): The language model to be used by the agent.
        Defaults to ChatOpenAI(model="gpt-4o-mini", temperature=0.5).

    Returns:
        StateGraph: A compiled LangGraph application that enables the paper download agent to
        process user queries and retrieve research papers from arXiv (using arXiv ID),
        BioRxiv and MedRxiv (using DOI resolved from the paper title or provided directly).
    """

    # Load Hydra configuration
    logger.info("Loading Hydra configuration for Talk2Scholars paper download agent")
    with hydra.initialize(version_base=None, config_path="../configs"):
        cfg = hydra.compose(
            config_name="config",
            overrides=["agents/talk2scholars/paper_download_agent=default"],
        )
        cfg = cfg.agents.talk2scholars.paper_download_agent

    # Define tools properly
    tools = ToolNode([download_arxiv_paper, download_medrxiv_paper, download_biorxiv_paper])

    # Define the model
    logger.info("Using OpenAI model %s", llm_model)
    model = create_react_agent(
        llm_model,
        tools=tools,
        state_schema=Talk2Scholars,
        prompt=cfg.paper_download_agent,
        checkpointer=MemorySaver(),
    )

    def paper_download_agent_node(state: Talk2Scholars) -> Dict[str, Any]:
        """
        Processes the current state to fetch the research paper from arXiv, BioRxiv, or MedRxiv.
        """
        logger.info("Creating paper download agent node with thread_id: %s", uniq_id)
        result = model.invoke(state, {"configurable": {"thread_id": uniq_id}})
        return result

    # Define new graph
    workflow = StateGraph(Talk2Scholars)

    # Adding node for paper download agent
    workflow.add_node("paper_download_agent", paper_download_agent_node)

    # Entering into the agent
    workflow.add_edge(START, "paper_download_agent")

    # Memory management for states between graph runs
    checkpointer = MemorySaver()

    # Compile the graph
    app = workflow.compile(checkpointer=checkpointer, name="paper_download_agent")

    # Logging the information and returning the app
    logger.info("Compiled the graph")
    return app