Task categories and types for AI agents for life sciences
âšī¸
Date 2024-10-14 to -15
Location BioLabs Heidelberg and Online
Category 1: AI agents for computational modelling and simulation
- Special software requirements: https://github.com/copasi/basico Documentation
 - Task type 1 and Task type 3 Data for computational modelling: ODE mathematical models in SBML format from BioModels e.g., insulin dynamics, IBD disease dynamics
 - Task type 2 Computational IBD model for parameter fitting: paper and xml file
 - Task type 3 Computational IBD model in non-standard format: paper and MATLAB code
 - Task type 3 Ground truth for MATLAB to SBML conversion: paper, MATLAB equations and SBML file.
 - Data for model annotation: cell ontology, ChEBI, and UniProtKB
 
đ§âđĢ
Lilija Wehling
Task type 1
- Description: Forward simulation of a mathematical model and reporting of the biomarker trajectories and predicted clinical efficacy
 - Input: simulation parameters such as initial concentrations
 - Output: time-course of simulation species
 
Task type 2
- Description: Reverse fitting of a mathematical model and reporting of the parameter ranges
 - Input: time-course of species
 - Output: fitted model parameters
 
Task type 3
- Description: Creating a mathematical model from scratch
 - Input: Original article describing the mathematical model and list of equations
 - Output: SBML model with annotated species
 
Category 2: AI agents for omics and foundation models
- Special software requirements: Cell2Sentence
 - Data for analysis: cell by gene
 - Other tools/analyses: differential gene set enrichment analysis using GO, UMAP
 
đ§âđĢ
Gurdeep Singh
Task type 1
- Description: Integration of multiple scRNA seq datasets, correction for batch effects, and annotation of cells
 - Input: multiple cell x gene datasets for a particular disease (e.g., Rheumatoid Arthritis, Atopic Dermatitis, Inflammatory Bowel Disease, etx.)
 - Output: UMAP visualization with cell annotation
 
Task type 2
- Description: Simulation of gene perturbation and reporting of the predicted differentially expressed genes using pathway enrichment analysis
 - Input: cell x gene dataset for a particular disease; knockout gene list
 - Output: list of differentially expressed genes and pathway enrichment analysis visualization
 
Category 3: AI agent for Biomedical knowledge graph reasoning and construction
- Special software requirements: PyTorch Geometric and available models through PyG, LLMGraphTransformer, and schema-agnostic graph foundation model (e.g., ULTRA)
 - Biomedical knowledge graph dataset: PrimeKG specifically the subset used in STARK for textual Q&A
 - Other data: PubMed for original articles
 - Graph database: NetworkX
 
đ§âđĢ
Ahmad Wisnu Mulyadi
Task type 1
- Description: Knowledge graph Q&A and retrieval of the K-hop subgraph explanations
 - Input: Natural language question (see subset used in https://arxiv.org/abs/2404.13207 for PrimeKG)
 - Output: Ranked nodes answers and visualization of k-hop subgraphs
 
Task type 2
- Description: Disease knowledge graph construction from text using a text-to-graph model to construct the initial knowledge graph and a link prediction model to fill in gaps in the reconstructed knowledge graph
 - Input: List of disease MeSH terms and associated articles from PubMed and list of nodes and edges (same as in PrimeKG)
 - Output: NetworkX representation of the knowledge graph and visualization
 
Task type 3
- Description: Same as type 1 but including protein embeddings from https://www.uniprot.org/help/embeddings and additional vector similarity search of drug targets embeddings
 - Input: Natural language question (see subset used in https://arxiv.org/abs/2404.13207 for PrimeKG)
 - Output: Ranked nodes answers and visualization of k-hop subgraphs