Task categories and types for AI agents for life sciences

ℹ️
Date 2024-10-14 to -15
Location BioLabs Heidelberg and Online

Category 1: AI agents for computational modelling and simulation

Special software requirements: https://github.com/copasi/basico Documentation
Task type 1 and Task type 3 Data for computational modelling: ODE mathematical models in SBML format from BioModels e.g., insulin dynamics, IBD disease dynamics
Task type 2 Computational IBD model for parameter fitting: paper and xml file
Task type 3 Computational IBD model in non-standard format: paper and MATLAB code
Task type 3 Ground truth for MATLAB to SBML conversion: paper, MATLAB equations and SBML file.
Data for model annotation: cell ontology, ChEBI, and UniProtKB

🧑‍🏫
Lilija Wehling

Description: Forward simulation of a mathematical model and reporting of the biomarker trajectories and predicted clinical efficacy
Input: simulation parameters such as initial concentrations
Output: time-course of simulation species

Description: Reverse fitting of a mathematical model and reporting of the parameter ranges
Input: time-course of species
Output: fitted model parameters

Description: Creating a mathematical model from scratch
Input: Original article describing the mathematical model and list of equations
Output: SBML model with annotated species

🧑‍🏫
Gurdeep Singh

Description: Integration of multiple scRNA seq datasets, correction for batch effects, and annotation of cells
Input: multiple cell x gene datasets for a particular disease (e.g., Rheumatoid Arthritis, Atopic Dermatitis, Inflammatory Bowel Disease, etx.)
Output: UMAP visualization with cell annotation

Description: Simulation of gene perturbation and reporting of the predicted differentially expressed genes using pathway enrichment analysis
Input: cell x gene dataset for a particular disease; knockout gene list
Output: list of differentially expressed genes and pathway enrichment analysis visualization

Special software requirements: PyTorch Geometric and available models through PyG, LLMGraphTransformer, and schema-agnostic graph foundation model (e.g., ULTRA)
Biomedical knowledge graph dataset: PrimeKG specifically the subset used in STARK for textual Q&A
Other data: PubMed for original articles
Graph database: NetworkX

🧑‍🏫
Ahmad Wisnu Mulyadi

Description: Knowledge graph Q&A and retrieval of the K-hop subgraph explanations
Input: Natural language question (see subset used in https://arxiv.org/abs/2404.13207 for PrimeKG)
Output: Ranked nodes answers and visualization of k-hop subgraphs

Description: Disease knowledge graph construction from text using a text-to-graph model to construct the initial knowledge graph and a link prediction model to fill in gaps in the reconstructed knowledge graph
Input: List of disease MeSH terms and associated articles from PubMed and list of nodes and edges (same as in PrimeKG)
Output: NetworkX representation of the knowledge graph and visualization

Description: Same as type 1 but including protein embeddings from https://www.uniprot.org/help/embeddings and additional vector similarity search of drug targets embeddings
Input: Natural language question (see subset used in https://arxiv.org/abs/2404.13207 for PrimeKG)
Output: Ranked nodes answers and visualization of k-hop subgraphs