Task categories and types for AI agents for life sciences
âšī¸
Date 2024-10-14 to -15
Location BioLabs Heidelberg and Online
Category 1: AI agents for computational modelling and simulation
- Special software requirements: https://github.com/copasi/basico Documentation
- Task type 1 and Task type 3 Data for computational modelling: ODE mathematical models in SBML format from BioModels e.g., insulin dynamics, IBD disease dynamics
- Task type 2 Computational IBD model for parameter fitting: paper and xml file
- Task type 3 Computational IBD model in non-standard format: paper and MATLAB code
- Task type 3 Ground truth for MATLAB to SBML conversion: paper, MATLAB equations and SBML file.
- Data for model annotation: cell ontology, ChEBI, and UniProtKB
đ§âđĢ
Lilija Wehling
Task type 1
- Description: Forward simulation of a mathematical model and reporting of the biomarker trajectories and predicted clinical efficacy
- Input: simulation parameters such as initial concentrations
- Output: time-course of simulation species
Task type 2
- Description: Reverse fitting of a mathematical model and reporting of the parameter ranges
- Input: time-course of species
- Output: fitted model parameters
Task type 3
- Description: Creating a mathematical model from scratch
- Input: Original article describing the mathematical model and list of equations
- Output: SBML model with annotated species
Category 2: AI agents for omics and foundation models
- Special software requirements: Cell2Sentence
- Data for analysis: cell by gene
- Other tools/analyses: differential gene set enrichment analysis using GO, UMAP
đ§âđĢ
Gurdeep Singh
Task type 1
- Description: Integration of multiple scRNA seq datasets, correction for batch effects, and annotation of cells
- Input: multiple cell x gene datasets for a particular disease (e.g., Rheumatoid Arthritis, Atopic Dermatitis, Inflammatory Bowel Disease, etx.)
- Output: UMAP visualization with cell annotation
Task type 2
- Description: Simulation of gene perturbation and reporting of the predicted differentially expressed genes using pathway enrichment analysis
- Input: cell x gene dataset for a particular disease; knockout gene list
- Output: list of differentially expressed genes and pathway enrichment analysis visualization
Category 3: AI agent for Biomedical knowledge graph reasoning and construction
- Special software requirements: PyTorch Geometric and available models through PyG, LLMGraphTransformer, and schema-agnostic graph foundation model (e.g., ULTRA)
- Biomedical knowledge graph dataset: PrimeKG specifically the subset used in STARK for textual Q&A
- Other data: PubMed for original articles
- Graph database: NetworkX
đ§âđĢ
Ahmad Wisnu Mulyadi
Task type 1
- Description: Knowledge graph Q&A and retrieval of the K-hop subgraph explanations
- Input: Natural language question (see subset used in https://arxiv.org/abs/2404.13207 for PrimeKG)
- Output: Ranked nodes answers and visualization of k-hop subgraphs
Task type 2
- Description: Disease knowledge graph construction from text using a text-to-graph model to construct the initial knowledge graph and a link prediction model to fill in gaps in the reconstructed knowledge graph
- Input: List of disease MeSH terms and associated articles from PubMed and list of nodes and edges (same as in PrimeKG)
- Output: NetworkX representation of the knowledge graph and visualization
Task type 3
- Description: Same as type 1 but including protein embeddings from https://www.uniprot.org/help/embeddings and additional vector similarity search of drug targets embeddings
- Input: Natural language question (see subset used in https://arxiv.org/abs/2404.13207 for PrimeKG)
- Output: Ranked nodes answers and visualization of k-hop subgraphs