Setup for AI agents for life sciences

Infrastructure and core software

Cloud infrastructure: Amazon Web Services
Computational research platform: Code Ocean
Version control: GitHub
AI Agent application requirements: Streamlit, Ollama, LangChain, and FAISS

Setup

Step 1: Introduction to our computational research platform with Code Ocean

View a short overview video of the Code Ocean platform
Review further information in the Code Ocean user guide if needed

Step 2: Enable simultaneous capsule collaboration with version control using git and GitHub

Navigate to our GitHub repository
Each team will have their own branch [Team Name] created by the coaches

VPEHackathonAIAgentsCOTemplate/main -> VPEHackathonAIAgentsCOTemplate/[Team Name]
Each team member will need to add their own pesonal access token in Code Ocean
1. Follow GitHub instructions to generate a personal access token: https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens
2. Save your personal access token! You will also need it when working with Git
3. Then on Code Ocean, click on the account icon on the bottom left side, and go to credentials.
4. Click on ⊕ Add credential and choose GitHub, then add your username (in GitHub) and the token you have created.
Each team member will create their own capsule in Code Ocean by cloning the template repository
1. Click on the New Capsule button on the top right corner.
2. Select Copy from public Git.
3. Paste the git repository address: (i.e., https://github.com/VirtualPatientEngine/VPEHackathonAIAgentsCOTemplate)
4. Click clone
5. The capsule will be cloned within a few seconds.
Each team member will need to attach shared data assets to their own capsule in Code Ocean
1. In the capsule view, in the data folder in the files tree click ⚙️manage
2. Attach the data-assets by clicking the plus sign (⊕)
3. The data assets are collections: cellxgene census metadata 2024-04-24 and ollama_models_09_2024
Individual team members will contribute by syncing with their teams branch (see section step 4 below)

💡Tip
1. Use the command line terminal in the VS Code editor for running git commands
2. Quick reference of git commands if you forget and the full documentation if typing git --help is not sufficient

Step 3: Familiarization with the template Code Ocean AI Agents capsule

README and overview of the repository
The Streamlit Application with three starter examples

# test that we can run the streamlit app
python /code/streamlit_app.py
# Run the streamlit app
streamlit run /code/streamlit_app.py
# Stop the streamlit app
[Ctrl] + [C]
Extended examples needed for completing the Hackathon challenges
Downloading datasets and models (Ollama example)

# Create the source directory if it doesn't exist
mkdir -p /scratch/.ollama
# delete the existing symbolic link link
rm /root/.ollama
# Create the new symbolic link (each write to /root/.ollama will be directed to /scratch/.ollama)
ln -s /scratch/.ollama /root/.ollama
# copy the key:
cp /data/.ollama/id_ed25519 /scratch/.ollama/id_ed25519
# start the ollama server in the scratch directory
cd scratch
cd ollama serve
# list and pull models
ollama list
ollama pull llama3.1
Downloading datasets and models (Stark example)

# Create and activate a virtual environment (Optional since we are working in a docker container)
python -m venv .venv
source .venv/bin/activate
# Install stark via pip
pip install stark-qa
# Download to scratch
python
from stark_qa import load_skb
skb = load_skb("prime", download_processed=True, root="/scratch")
# Deactivate virtual environment when done
exit()
deactivate

💡Tip
1. Use the Scratch folder for downloading large data files
2. Use the VS Code editor to launch the Ollama server, interact with Streamlit, coding, etc.
3. If you use a virtual envrionment, be sure to add the virtual environment directory to .gitignore!
4. Ollama cheat sheat

Step 4: Launching, working in, and stopping the capsule

Click the VS Code icon on the top right under Reproducible run to launch a cloud workstation on AWS; Please note that the first time you launch a capsule it will take a few minutes to allocate the resources on AWS.
In a new terminal, add the remote team branches

# add a the VPE remote branches with your teams branch
git remote -v
git remote add upstream https://[user name]:[token]@github.com/VirtualPatientEngine/VPEHackathonAIAgentsCOTemplate.git
git fetch --all --prune
Check to see that your teams branch is there e.g., upstream [Team Name]. This is the branch that your team will sync with
Create your branch derived from your teams branch

# check to see that your teams branch is there
git branch -v
# switch to your teams branch
git checkout [Team Name] # create a branch starting from your teams branch for your features (feat) and fixes (fix)
git checkout -b [feat or fix]/[name]
Hack away 😀
Commit your changes

# stage your changes for the next commit
git add .
# add your changes to the commit
git commit -m "feat: my cool feature"
# push your changes to your local branch
git push origin [feat or fix]/[name]
Update your branch with your teams changes

# fetch all changes from upstream branches
git fetch --all --prune
# update the local team branch
git checkout [Team Name]
git pull [Team name]
# merge changes from your local team branch into your branch
git checkout [feat or fix]/[name]
git merge [Team Name]
Share your changes with your teams branch

# ensure your local team branch is up to date
git checkout [Team Name]
git fetch --all --prune
git pull [Team name]
# merge your changes (and resolve any conflicts)
git merge [feat or fix]/[name]
git push upstream [Team Name]
# delete your old branch and begin a new one
git branch -D [feat or fix]/[name]
git checkout -b [feat or fix]/[name]
When you are done, please shut down the capsule to save resources! by clicking the red power button on the top left.