BioBridge-PrimeKG Loader¶
In this tutorial, we will explain how to load BioBridge-PrimeKG dataset, which is a dataset for cross-modality prediction using PrimeKG as the knowledge graph.
Prior information about the BioBridge-PrimeKG dataaset can be found in the following repositories:
First of all, we need to import the necessary libraries as follows.
# Import necessary libraries
import sys
sys.path.append('../../..')
from aiagents4pharma.talk2knowledgegraphs.datasets.biobridge_primekg import BioBridgePrimeKG
/home/awmulyadi/Repositories/office/AIAgents4Pharma/venv/lib/python3.12/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html from .autonotebook import tqdm as notebook_tqdm /home/awmulyadi/Repositories/office/AIAgents4Pharma/venv/lib/python3.12/site-packages/pydantic/_internal/_fields.py:132: UserWarning: Field "model_id" in SysBioModel has conflict with protected namespace "model_". You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`. warnings.warn( /home/awmulyadi/Repositories/office/AIAgents4Pharma/venv/lib/python3.12/site-packages/pydantic/_internal/_fields.py:132: UserWarning: Field "model_id" in BasicoModel has conflict with protected namespace "model_". You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`. warnings.warn( /home/awmulyadi/Repositories/office/AIAgents4Pharma/venv/lib/python3.12/site-packages/pydantic/_internal/_fields.py:132: UserWarning: Field "model_data" in SimulateModelInput has conflict with protected namespace "model_". You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`. warnings.warn(
Load BioBridge-PrimeKG¶
The BioBridgePrimeKG
allows to load the data from related Github repository if the data is not available locally.
Otherwise, the data is loaded from the local directory as defined in the local_dir
and primekg_dir
.
# Define biobridge primekg data by providing a local directory where the data is stored
biobridge_data = BioBridgePrimeKG(primekg_dir="../../../../data/primekg/",
local_dir="../../../../data/biobridge_primekg/")
To load the dataframes of BioBridge and its split, we just need to call a method as follows.
# Invoke a method to load the data
biobridge_data.load_data()
Loading PrimeKG dataset... Loading nodes of PrimeKG dataset ... ../../../../data/primekg/primekg_nodes.tsv.gz already exists. Loading the data from the local directory. Loading edges of PrimeKG dataset ... ../../../../data/primekg/primekg_edges.tsv.gz already exists. Loading the data from the local directory. Loading data config file of BioBridgePrimeKG... Downloading data_config.json from https://raw.githubusercontent.com/RyanWangZf/BioBridge/refs/heads/main/data/BindData/data_config.json to ../../../../data/biobridge_primekg/...
1.03kiB [00:00, 3.78MiB/s]
Building node embeddings... Downloading protein.pkl from https://media.githubusercontent.com/media/RyanWangZf/BioBridge/refs/heads/main/data/embeddings/esm2b_unimo_pubmedbert/protein.pkl to ../../../../data/biobridge_primekg/embeddings...
100%|██████████| 197M/197M [00:13<00:00, 14.7MiB/s]
Downloading mf.pkl from https://media.githubusercontent.com/media/RyanWangZf/BioBridge/refs/heads/main/data/embeddings/esm2b_unimo_pubmedbert/mf.pkl to ../../../../data/biobridge_primekg/embeddings...
100%|██████████| 34.3M/34.3M [00:02<00:00, 12.4MiB/s]
Downloading cc.pkl from https://media.githubusercontent.com/media/RyanWangZf/BioBridge/refs/heads/main/data/embeddings/esm2b_unimo_pubmedbert/cc.pkl to ../../../../data/biobridge_primekg/embeddings...
100%|██████████| 12.5M/12.5M [00:01<00:00, 7.26MiB/s]
Downloading bp.pkl from https://media.githubusercontent.com/media/RyanWangZf/BioBridge/refs/heads/main/data/embeddings/esm2b_unimo_pubmedbert/bp.pkl to ../../../../data/biobridge_primekg/embeddings...
100%|██████████| 85.9M/85.9M [00:10<00:00, 8.09MiB/s]
Downloading drug.pkl from https://media.githubusercontent.com/media/RyanWangZf/BioBridge/refs/heads/main/data/embeddings/esm2b_unimo_pubmedbert/drug.pkl to ../../../../data/biobridge_primekg/embeddings...
100%|██████████| 28.7M/28.7M [00:04<00:00, 6.86MiB/s]
Downloading disease.pkl from https://media.githubusercontent.com/media/RyanWangZf/BioBridge/refs/heads/main/data/embeddings/esm2b_unimo_pubmedbert/disease.pkl to ../../../../data/biobridge_primekg/embeddings...
100%|██████████| 53.1M/53.1M [00:02<00:00, 20.2MiB/s]
Building full triplets... Downloading protein.csv from https://media.githubusercontent.com/media/RyanWangZf/BioBridge/refs/heads/main/data/Processed/protein.csv to ../../../../data/biobridge_primekg/processed...
100%|██████████| 11.7M/11.7M [00:00<00:00, 17.2MiB/s]
Downloading mf.csv from https://media.githubusercontent.com/media/RyanWangZf/BioBridge/refs/heads/main/data/Processed/molecular.csv to ../../../../data/biobridge_primekg/processed...
100%|██████████| 2.24M/2.24M [00:00<00:00, 13.9MiB/s]
Downloading cc.csv from https://media.githubusercontent.com/media/RyanWangZf/BioBridge/refs/heads/main/data/Processed/cellular.csv to ../../../../data/biobridge_primekg/processed...
100%|██████████| 1.05M/1.05M [00:00<00:00, 4.17MiB/s]
Downloading bp.csv from https://media.githubusercontent.com/media/RyanWangZf/BioBridge/refs/heads/main/data/Processed/biological.csv to ../../../../data/biobridge_primekg/processed...
100%|██████████| 6.79M/6.79M [00:00<00:00, 38.3MiB/s]
Downloading drug.csv from https://media.githubusercontent.com/media/RyanWangZf/BioBridge/refs/heads/main/data/Processed/drug.csv to ../../../../data/biobridge_primekg/processed...
100%|██████████| 9.47M/9.47M [00:00<00:00, 15.6MiB/s]
Downloading disease.csv from https://media.githubusercontent.com/media/RyanWangZf/BioBridge/refs/heads/main/data/Processed/disease.csv to ../../../../data/biobridge_primekg/processed...
100%|██████████| 11.4M/11.4M [00:01<00:00, 7.92MiB/s]
Building train-test split... Number of 1 nodes in train: 16918 Number of 1 nodes in test: 1879 Number of 6 nodes in train: 6084 Number of 6 nodes in test: 675 Number of 2 nodes in train: 15349 Number of 2 nodes in test: 1705 Number of 0 nodes in train: 24669 Number of 0 nodes in test: 2740 Number of 5 nodes in train: 9856 Number of 5 nodes in test: 1095 Number of 7 nodes in train: 3610 Number of 7 nodes in test: 401
As a result, we obtained several processed files in the local directory as explained in the subsequent sections.
Check initial PrimeKG Dataframes¶
Firstly, we can get the initial PrimeKG data by invoking the method get_primekg()
and further check the dataframes of nodes and edges.
# Get the initial data of PrimeKG
primekg_nodes = biobridge_data.get_primekg().get_nodes()
primekg_edges = biobridge_data.get_primekg().get_edges()
# Check PrimeKG nodes
primekg_nodes.head()
node_index | node_name | node_source | node_id | node_type | |
---|---|---|---|---|---|
0 | 0 | PHYHIP | NCBI | 9796 | gene/protein |
1 | 1 | GPANK1 | NCBI | 7918 | gene/protein |
2 | 2 | ZRSR2 | NCBI | 8233 | gene/protein |
3 | 3 | NRF1 | NCBI | 4899 | gene/protein |
4 | 4 | PI4KA | NCBI | 5297 | gene/protein |
# Check the dimensions of the PrimeKG nodes
primekg_nodes.shape
(129375, 5)
# Check PrimeKG edges
primekg_edges.head()
head_index | head_name | head_source | head_id | head_type | tail_index | tail_name | tail_source | tail_id | tail_type | display_relation | relation | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | PHYHIP | NCBI | 9796 | gene/protein | 8889 | KIF15 | NCBI | 56992 | gene/protein | ppi | protein_protein |
1 | 1 | GPANK1 | NCBI | 7918 | gene/protein | 2798 | PNMA1 | NCBI | 9240 | gene/protein | ppi | protein_protein |
2 | 2 | ZRSR2 | NCBI | 8233 | gene/protein | 5646 | TTC33 | NCBI | 23548 | gene/protein | ppi | protein_protein |
3 | 3 | NRF1 | NCBI | 4899 | gene/protein | 11592 | MAN1B1 | NCBI | 11253 | gene/protein | ppi | protein_protein |
4 | 4 | PI4KA | NCBI | 5297 | gene/protein | 2122 | RGS20 | NCBI | 8601 | gene/protein | ppi | protein_protein |
# Check the dimensions of the PrimeKG edges
primekg_edges.shape
(8100498, 12)
Check BioBridge-PrimeKG Data Config¶
BioBridgePrimeKG class provides a method to load the dataconfig file which contains the following information:
node_type
: The node type mappingrelation_type
: The relation type mappingemb_dim
: The embedding dimension (pre-loaded embeddings from the repository)
biobridge_data.get_data_config().keys()
dict_keys(['node_type', 'relation_type', 'emb_dim'])
# Check the node type within data config of the BioBridge PrimeKG
biobridge_data.get_data_config()['node_type']
{'biological_process': 0, 'gene/protein': 1, 'disease': 2, 'effect/phenotype': 3, 'anatomy': 4, 'molecular_function': 5, 'drug': 6, 'cellular_component': 7, 'pathway': 8, 'exposure': 9}
# Check the relation type within data config of the BioBridge PrimeKG
biobridge_data.get_data_config()['relation_type']
{'expression present': 0, 'synergistic interaction': 1, 'interacts with': 2, 'ppi': 3, 'phenotype present': 4, 'parent-child': 5, 'associated with': 6, 'side effect': 7, 'contraindication': 8, 'expression absent': 9, 'target': 10, 'indication': 11, 'enzyme': 12, 'transporter': 13, 'off-label use': 14, 'linked to': 15, 'phenotype absent': 16, 'carrier': 17}
# Check the embedding dimension within data config of the BioBridge PrimeKG
# Note that, not all of node types have embeddings
biobridge_data.get_data_config()['emb_dim']
{'biological_process': 768, 'cellular_component': 768, 'disease': 768, 'drug': 512, 'molecular_function': 768, 'gene/protein': 2560}
Check BioBridge-PrimeKG Node Information¶
BioBridge has provided us with node information for each modalities of PrimeKG. In particular, they included the information for the following modalities:
- protein (sequence)
- molecular function (texts)
- cellular component (texts)
- biological process (texts)
- drug (SMILES strings)
- disease (texts)
# Get the node information of the BioBridge PrimeKG
biobridge_node_info = biobridge_data.get_node_info_dict()
biobridge_node_info.keys()
dict_keys(['gene/protein', 'molecular_function', 'cellular_component', 'biological_process', 'drug', 'disease'])
# Check a sample node information of gene/protein
biobridge_node_info['gene/protein']
node_index | node_id | node_type | node_name | node_source | sequence | |
---|---|---|---|---|---|---|
0 | 0 | 9796 | gene/protein | PHYHIP | NCBI | MELLSTPHSIEINNITCDSFRISWAMEDSDLERVTHYFIDLNKKEN... |
1 | 1 | 7918 | gene/protein | GPANK1 | NCBI | MSRPLLITFTPATDPSDLWKDGQQQPQPEKPESTLDGAAARAFYEA... |
2 | 2 | 8233 | gene/protein | ZRSR2 | NCBI | MAAPEKMTFPEKPSHKKYRAALKKEKRKKRRQELARLRDSGLSQKE... |
3 | 3 | 4899 | gene/protein | NRF1 | NCBI | MEEHGVTQTEHMATIEAHAVAQQVQQVHVATYTEHSMLSADEDSPS... |
4 | 4 | 5297 | gene/protein | PI4KA | NCBI | MAAAPARGGGGGGGGGGGCSGSGSSASRGFYFNTVLSLARSLAVQR... |
... | ... | ... | ... | ... | ... | ... |
19157 | 83735 | 100133251 | gene/protein | PRR23D2 | NCBI | MYGYRRLRSPRDSQTEPQNDNEGETSLATTQMNPPKRRQVEQGPST... |
19158 | 83735 | 100133251 | gene/protein | PRR23D2 | NCBI | MYGYRRLRSPRDSQTEPQNDNEGETSLATTQMNPPKRRQVEQGPST... |
19159 | 83740 | 389649 | gene/protein | C8orf86 | NCBI | MRPLGKGLLPAEELIRSNLGVGRSLRDCLSQSGKLAEELGSKRLKP... |
19160 | 83746 | 343990 | gene/protein | CRACDL | NCBI | MISTRVMDIKLREAAEGLGEDSTGKKKSKFKTFKKFFGKKKRKESP... |
19161 | 83747 | 340441 | gene/protein | POTEA | NCBI | MVAEVSPKLAASPMKKPFGFRGKMGKWCCCCFPCCRGSGKNNMGAW... |
19162 rows × 6 columns
# Check a sample node information of molecular_function
biobridge_node_info['molecular_function']
node_index | node_id | node_type | node_name | node_source | description | |
---|---|---|---|---|---|---|
0 | 53517 | 8168 | molecular_function | methyltransferase activity | GO | Catalysis of the transfer of a methyl group to... |
1 | 53518 | 140101 | molecular_function | catalytic activity, acting on a tRNA | GO | Catalytic activity that acts to modify a tRNA. |
2 | 53519 | 140097 | molecular_function | catalytic activity, acting on DNA | GO | Catalytic activity that acts to modify DNA. |
3 | 53520 | 140096 | molecular_function | catalytic activity, acting on a protein | GO | Catalytic activity that acts to modify a protein. |
4 | 53521 | 140098 | molecular_function | catalytic activity, acting on RNA | GO | Catalytic activity that acts to modify RNA, dr... |
... | ... | ... | ... | ... | ... | ... |
10961 | 124216 | 42880 | molecular_function | D-glucuronate transmembrane transporter activity | GO | Enables the transfer of D-glucuronate, the D-e... |
10962 | 124217 | 61922 | molecular_function | histone propionyltransferase activity | GO | Catalysis of the reaction: propionyl-CoA + his... |
10963 | 124218 | 61995 | molecular_function | ATP-dependent protein-DNA complex displacement... | GO | An activity that displaces proteins or protein... |
10964 | 124219 | 51266 | molecular_function | sirohydrochlorin ferrochelatase activity | GO | Catalysis of the reaction: siroheme + 2 H+ = F... |
10965 | 124220 | 51740 | molecular_function | ethylene binding | GO | Binding to ethylene (C2-H4, ethene), a simple ... |
10966 rows × 6 columns
# Check a sample node information of cellular_component
biobridge_node_info['cellular_component']
node_index | node_id | node_type | node_name | node_source | description | |
---|---|---|---|---|---|---|
0 | 55515 | 110165 | cellular_component | cellular anatomical entity | GO | A part of a cellular organism that is either a... |
1 | 55516 | 30137 | cellular_component | COPI-coated vesicle | GO | A vesicle with a coat formed of the COPI coat ... |
2 | 55517 | 30133 | cellular_component | transport vesicle | GO | Any of the vesicles of the constitutive secret... |
3 | 55520 | 5777 | cellular_component | peroxisome | GO | A small organelle enclosed by a single membran... |
4 | 55524 | 99512 | cellular_component | supramolecular fiber | GO | A polymer consisting of an indefinite number o... |
... | ... | ... | ... | ... | ... | ... |
4008 | 127430 | 44169 | cellular_component | host cell rough endoplasmic reticulum membrane | GO | The lipid bilayer surrounding the host cell ro... |
4009 | 127431 | 98652 | cellular_component | collagen type VII anchoring fibril | GO | An antiparallel dimer of two collagen VII trim... |
4010 | 127432 | 90732 | cellular_component | cofilin-actin rod | GO | A cellular structure consisting of parallel, h... |
4011 | 127433 | 779 | cellular_component | condensed chromosome, centromeric region | GO | The region of a condensed chromosome that incl... |
4012 | 127434 | 32282 | cellular_component | plastid acetyl-CoA carboxylase complex | GO | An acetyl-CoA carboxylase complex located in t... |
4013 rows × 6 columns
# Check a sample node information of biological_process
biobridge_node_info['biological_process']
node_index | node_id | node_type | node_name | node_source | description | |
---|---|---|---|---|---|---|
0 | 39898 | 51581 | biological_process | negative regulation of neurotransmitter uptake | GO | Any process that stops, prevents, or reduces t... |
1 | 39899 | 43271 | biological_process | negative regulation of ion transport | GO | Any process that stops, prevents, or reduces t... |
2 | 39900 | 51611 | biological_process | regulation of serotonin uptake | GO | Any process that modulates the frequency, rate... |
3 | 39901 | 51616 | biological_process | regulation of histamine uptake | GO | Any process that modulates the frequency, rate... |
4 | 39902 | 51956 | biological_process | negative regulation of amino acid transport | GO | Any process that stops, prevents, or reduces t... |
... | ... | ... | ... | ... | ... | ... |
27473 | 115046 | 60654 | biological_process | mammary gland cord elongation | GO | The process in which the mammary gland sprout ... |
27474 | 115047 | 1903696 | biological_process | protein localization to horsetail-astral micro... | GO | A process in which a protein is transported to... |
27475 | 115048 | 3372 | biological_process | establishment or maintenance of cytoskeleton p... | GO | Any cellular process that results in the speci... |
27476 | 115049 | 97624 | biological_process | UDP-galactose transmembrane import into Golgi ... | GO | The directed movement of UDP-galactose into th... |
27477 | 115050 | 48072 | biological_process | compound eye pigmentation | GO | Establishment of a pattern of pigment in the c... |
27478 rows × 6 columns
# Check a sample node information of drug
biobridge_node_info['drug']
node_index | description | half_life | indication | mechanism_of_action | protein_binding | pharmacodynamics | state | atc_1 | atc_2 | ... | node_type | node_name | node_source | name | smiles | logP ALOGPS | logP ChemAxon | solubility ALOGPS | pKa (strongest acidic) | pKa (strongest basic) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 14014 | Flunisolide (marketed as AeroBid, Nasalide, Na... | The half-life is 1.8 hours | For the maintenance treatment of asthma as a p... | Flunisolide is a glucocorticoid receptor agoni... | Approximately 40% after oral inhalation | Flunisolide is a synthetic corticosteroid. It ... | Flunisolide is a solid. | Flunisolide is anatomically related to respira... | Flunisolide is in the therapeutic group of nas... | ... | drug | Flunisolide | DrugBank | Flunisolide | [H][C@@]12C[C@@]3([H])[C@]4([H])C[C@H](F)C5=CC... | 2.20 | 1.56 | 3.74e-02 g/l | 13.73 | -2.90 |
1 | 14015 | Alclometasone is synthetic glucocorticoid ster... | NaN | For the relief of the inflammatory and pruriti... | The mechanism of the anti-inflammatory activit... | NaN | Alclometasone is a synthetic corticosteroid fo... | Alclometasone is a solid. | Alclometasone is anatomically related to derma... | Alclometasone is in the therapeutic group of c... | ... | drug | Alclometasone | DrugBank | Alclometasone | [H][C@@]12C[C@@H](C)[C@](O)(C(=O)CO)[C@@]1(C)C... | 2.11 | 1.68 | 1.37e-01 g/l | 12.45 | -2.90 |
2 | 14016 | Medrysone is a corticosteroid used in ophthalm... | NaN | For the treatment of allergic conjunctivitis, ... | There is no generally accepted explanation for... | NaN | Medrysone is a topical anti-inflammatory corti... | Medrysone is a solid. | Medrysone is anatomically related to sensory o... | Medrysone is in the therapeutic group of ophth... | ... | drug | Medrysone | DrugBank | Medrysone | [H][C@@]12CC[C@H](C(C)=O)[C@@]1(C)C[C@H](O)[C@... | 3.06 | 3.13 | 3.37e-02 g/l | 19.14 | -0.26 |
3 | 14017 | A glucocorticoid employed, usually as eye drop... | NaN | For the ophthalmic treatment of corticosteroid... | There is no generally accepted explanation for... | NaN | Corticosteroids such as fluorometholone inhibi... | Fluorometholone is a solid. | Fluorometholone is anatomically related to der... | Fluorometholone is in the therapeutic group of... | ... | drug | Fluorometholone | DrugBank | Fluorometholone | [H][C@@]12CC[C@](O)(C(C)=O)[C@@]1(C)C[C@H](O)[... | 2.34 | 2.42 | 1.66e-02 g/l | 12.65 | -3.40 |
4 | 14018 | Beclomethasone dipropionate is a second-genera... | Following intravenous administration, the half... | Indicated for oral inhalation use in the maint... | Beclomethasone dipropionate is a corticosteroi... | Based on the findings of _in vitro_ studies, t... | Inflammatory conditions, including asthma, der... | Beclomethasone dipropionate is a solid. | Beclomethasone dipropionate is anatomically re... | Beclomethasone dipropionate is in the therapeu... | ... | drug | Beclomethasone dipropionate | DrugBank | Beclomethasone dipropionate | [H][C@@]12C[C@H](C)[C@](OC(=O)CC)(C(=O)COC(=O)... | 3.69 | 4.43 | 2.08e-03 g/l | 13.85 | -3.30 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
6943 | 21955 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Metabutethamine is anatomically related to ner... | Metabutethamine is in the therapeutic group of... | ... | drug | Metabutethamine | DrugBank | NaN | CC(C)CNCCOC(=O)C1=CC=CC(N)=C1 | NaN | NaN | NaN | NaN | NaN |
6944 | 21956 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Quinisocaine is anatomically related to dermat... | Quinisocaine is in the therapeutic group of an... | ... | drug | Quinisocaine | DrugBank | NaN | CCCCC1=CC2=CC=CC=C2C(OCCN(C)C)=N1 | NaN | NaN | NaN | NaN | NaN |
6945 | 39888 | Benzyl benzoate is one of the older preparatio... | NaN | Used to kill lice and the mites responsible fo... | Benzyl benzoate exerts toxic effects on the ne... | NaN | Benzyl benzoate is one of the older preparatio... | Benzyl benzoate is a liquid. | Benzyl benzoate is anatomically related to ant... | Benzyl benzoate is in the therapeutic group of... | ... | drug | Benzyl benzoate | DrugBank | NaN | O=C(OCC1=CC=CC=C1)C1=CC=CC=C1 | NaN | NaN | NaN | NaN | NaN |
6946 | 39894 | Sulfur hexafluoride is an ultrasound contrast ... | The terminal half-life of SF6 in blood was app... | Echocardiography: Sulfur hexafluoride is indic... | Within the blood, the acoustic impedance of Lu... | NaN | Sulfur hexafluoride provides useful echocardio... | NaN | Sulfur hexafluoride is anatomically related to... | Sulfur hexafluoride is in the therapeutic grou... | ... | drug | Sulfur hexafluoride | DrugBank | NaN | FS(F)(F)(F)(F)F | NaN | NaN | NaN | NaN | NaN |
6947 | 39895 | Butoconazole is an imidazole antifungal used i... | NaN | For the local treatment of vulvovaginal candid... | The exact mechanism of the antifungal action o... | NaN | Butoconazole is an imidazole derivative that h... | Butoconazole is a solid. | Butoconazole is anatomically related to genito... | Butoconazole is in the therapeutic group of gy... | ... | drug | Butoconazole | DrugBank | NaN | ClC1=CC=C(CCC(CN2C=CN=C2)SC2=C(Cl)C=CC=C2Cl)C=C1 | NaN | NaN | NaN | NaN | NaN |
6948 rows × 29 columns
# Check a sample node information of disease
biobridge_node_info['disease']
node_index | mondo_id | mondo_name | definition | |
---|---|---|---|---|
0 | 27165 | 8019 | mullerian aplasia and hyperandrogenism | Deficiency of the glycoprotein WNT4, associate... |
1 | 27165 | 8019 | mullerian aplasia and hyperandrogenism | Deficiency of the glycoprotein WNT4, associate... |
2 | 27166 | 11043 | myelodysplasia, immunodeficiency, facial dysmo... | NaN |
3 | 27168 | 8878 | bone dysplasia, lethal Holmgren type | Bone dysplasia lethal Holmgren type (BDLH) is ... |
4 | 27169 | 8905 | predisposition to invasive fungal disease due ... | A rare, genetic primary immunodeficiency chara... |
... | ... | ... | ... | ... |
44128 | 99866 | 44144 | panic disorder with agoraphobia | A disorder in which an individual experiences ... |
44129 | 99916 | 44797 | desmoplastic nevus | A benign melanocytic nevus characterized by th... |
44130 | 99916 | 44800 | desmoplastic spitz nevus | A Spitz nevus associated with fibrous stroma f... |
44131 | 99969 | 100023 | self-limited familial and non-familial neonata... | A neonatal/infantile epilepsy sndrome that is ... |
44132 | 99969 | 100024 | self-limited familial and non-familial infanti... | This syndrome is characterized by the onset of... |
44133 rows × 4 columns
Check BioBridge-PrimeKG Node Embeddings (Pre-Loaded)¶
BioBridge provides a dictionary of pre-loaded embeddings, which can be obtained as follows.
# Check node embeddings
emb_dict = biobridge_data.get_node_embeddings()
emb_dict[0]
[0.04029838368296623, -0.018344514071941376, 0.02762659639120102, -0.026468712836503983, 0.021834833547472954, -0.04956040903925896, 0.013426685705780983, 0.04726368933916092, -0.025193220004439354, -0.004347709938883781, -0.09398091584444046, -0.02682836912572384, 0.06272736936807632, 0.03773018345236778, -0.0003949799865949899, -0.10644476860761642, -0.04382409527897835, -0.03279171884059906, 0.03302460163831711, 0.0036869393661618233, -0.0472925640642643, 0.015392928384244442, 0.01283049676567316, -0.04233483597636223, 0.009237916208803654, -0.05455828458070755, 0.024593649432063103, -0.09538378566503525, -0.0695975124835968, -0.010735561139881611, 0.005214910954236984, 0.11971891671419144, -0.0430755540728569, -0.00513798464089632, 0.04484416916966438, 0.08294414728879929, 0.07404263317584991, 0.022557679563760757, -0.046012863516807556, 0.016955774277448654, -0.023771632462739944, 0.00910295732319355, 0.008506315760314465, 0.0026993537321686745, -0.07880035042762756, 0.02047019824385643, -0.024598246440291405, -0.006052911747246981, 0.023546278476715088, -0.01073275413364172, 0.007697388529777527, 0.03541181609034538, -0.015209227800369263, -0.04220151901245117, 0.005686447024345398, -0.022106457501649857, -0.0501108355820179, -0.01996927708387375, -0.07490301877260208, -0.034318242222070694, 0.06406942754983902, 0.026413891464471817, -0.01750134490430355, -0.0020737317390739918, 0.011436696164309978, -0.0027924978639930487, 0.01444043405354023, 0.03244240954518318, 0.055468522012233734, 0.000421048462158069, 0.07648606598377228, -0.034907933324575424, 0.03364640101790428, 0.012630841694772243, 0.0356338694691658, 0.029361367225646973, 0.05810529738664627, 0.051039308309555054, -0.05525490269064903, -0.006498970091342926, 0.014916720800101757, -0.02827579900622368, -0.02439919486641884, -0.0032098768278956413, 0.02892136573791504, -0.04411356523633003, -0.05595407262444496, 0.008885009214282036, 0.002434935187920928, -0.0008174182148650289, -0.0003146776289213449, 0.00360421696677804, -0.0549507774412632, -0.03480708599090576, -0.005562361795455217, -0.026715315878391266, 0.010748428292572498, 0.05795736983418465, -0.008814256638288498, -0.01192760281264782, -0.03355904296040535, 0.0390462726354599, -0.02706330083310604, -0.005029067862778902, -0.015869874507188797, -0.0827607735991478, 0.00616435706615448, 0.0004398504097480327, 0.052674341946840286, 0.016801806166768074, 0.021987032145261765, -0.03487316891551018, 0.08734723180532455, 0.05325174331665039, 0.027245769277215004, 0.01973266899585724, -0.02645018883049488, -0.007567515131086111, 0.01668708026409149, -0.05888340249657631, 0.011039909906685352, -0.018678313121199608, -0.053122736513614655, -0.07906474173069, -0.008969047106802464, -0.10221055895090103, 0.02861125022172928, -0.03449633717536926, 0.042843014001846313, 0.02469048462808132, 0.0266242828220129, 0.0005339858471415937, 0.006560288369655609, 0.054612383246421814, 0.014275429770350456, -0.040155429393053055, 0.009025336243212223, 0.05959588661789894, -0.10562986880540848, -0.0026736047584563494, 0.054434821009635925, 0.02268270216882229, -0.014410329982638359, 0.03142722323536873, 0.036639828234910965, -0.0051039415411651134, -0.04204719513654709, 0.021408583968877792, 0.014998323284089565, -0.021329469978809357, -0.04066392406821251, -0.033883653581142426, 0.02957429364323616, 0.016370132565498352, -0.05582666024565697, -0.05259693041443825, 0.04019647836685181, -0.014453473500907421, 0.036645177751779556, -0.04221060499548912, 0.019559290260076523, -0.09889122098684311, -0.042848050594329834, -0.07701241970062256, 0.025375893339514732, 0.06667166203260422, -0.026460988447070122, -0.06721235811710358, -0.05854760482907295, -0.03200861066579819, 0.01613408327102661, -0.01020011305809021, -0.009928660467267036, 0.030260812491178513, -0.041137296706438065, -0.0013622258557006717, -0.0090464036911726, 0.07113339751958847, -0.017014382407069206, 7.682084105908871e-05, -0.05589859560132027, 0.018907127901911736, 0.005991958547383547, 0.045321524143218994, -0.021731318905949593, 0.00869719684123993, -0.033495739102363586, -0.038452114909887314, -0.014180691912770271, 0.039115455001592636, 0.03866371884942055, 0.00921943224966526, 0.056556735187768936, 0.044964201748371124, 0.05020255595445633, 0.12839728593826294, -0.10168173164129257, -0.05194401741027832, -0.037121597677469254, -0.057500407099723816, 0.03707585483789444, -0.0053098490461707115, 0.010336501523852348, -0.002029292518272996, 0.015148420818150043, 0.029777564108371735, -0.013076030649244785, 0.026804285123944283, 0.028570037335157394, 0.05173419415950775, 0.005377662368118763, -0.007118920795619488, 0.058820776641368866, 0.015195094980299473, 0.005127663258463144, 0.02712017484009266, -0.04155531898140907, 0.08129262179136276, 0.0034462169278413057, 0.021197492256760597, -0.04931671917438507, -0.02739494852721691, -0.017645297572016716, 0.020231327041983604, -0.022530455142259598, 0.03107580542564392, -0.027658868581056595, -0.004237980581820011, 0.01176244206726551, 0.018966855481266975, -0.028157614171504974, -0.02793066017329693, -0.016387593001127243, -0.06764520704746246, -0.0585671104490757, -0.03066752664744854, -0.006047348957508802, 0.02732912451028824, 0.028488492593169212, -0.020036261528730392, -0.05717623978853226, 0.08039935678243637, 0.015198281034827232, 0.013982664793729782, -0.026102596893906593, 0.05188038945198059, -0.02306080237030983, 0.0523577518761158, -0.029244564473628998, 0.04167806729674339, 0.016317546367645264, -0.024670863524079323, 0.024628395214676857, -0.007798952050507069, 0.0019167391583323479, -0.012449810281395912, 0.023807384073734283, 0.04891068860888481, -0.02878706529736519, 0.04812026023864746, -0.00854082778096199, 0.02675958350300789, 0.03990050405263901, -0.04700407758355141, -0.000715451140422374, 0.04258646443486214, -0.050874169915914536, -0.035909537225961685, -0.06904200464487076, 0.08753672987222672, -0.04223060607910156, 0.03271983563899994, -0.02652662992477417, -0.020858824253082275, -0.007859435863792896, -0.049515414983034134, 0.0014385459944605827, 0.009278316050767899, 0.02445099875330925, 0.05049492418766022, 0.028869640082120895, -0.11794491857290268, -0.0052613443695008755, -0.021137278527021408, -0.0008545375894755125, 0.016163095831871033, -0.041098516434431076, -0.063350610435009, 0.0005626750644296408, -0.010312208905816078, 0.06307674199342728, -0.018192289397120476, -0.04241204634308815, -0.06316392123699188, -0.01767602562904358, 0.02913868986070156, 0.0025597047060728073, -0.03809165209531784, 0.04566282406449318, -0.053724080324172974, -0.030606618151068687, -0.07493606954813004, -0.028194785118103027, 0.01545566413551569, -0.022702626883983612, -0.035386305302381516, 0.00242732185870409, -0.11948322504758835, 0.026384694501757622, 0.0420764796435833, -0.025798527523875237, -0.04058421775698662, 0.04793267324566841, 0.09118802100419998, -0.0456298403441906, -0.016518861055374146, 0.021775487810373306, 0.05567555129528046, -0.016222810372710228, -0.038117244839668274, 0.00914144329726696, -0.03193476051092148, -0.044928137212991714, -0.017014067620038986, -0.013558733277022839, 0.009222832508385181, 0.029769128188490868, -0.012996071018278599, -0.001385899493470788, 0.08605766296386719, -0.09426447004079819, -0.009061155840754509, -0.0788058266043663, -0.021059589460492134, -0.09020312875509262, -0.007842637598514557, -0.03338266536593437, 0.006379834841936827, -0.08588118106126785, 0.028855012729763985, -0.027429860085248947, 0.007522427476942539, 0.0577651672065258, -0.01195206306874752, -0.05130549892783165, 0.06743904203176498, -0.07836314290761948, -0.007012663874775171, 0.0008620519074611366, 0.021442534402012825, -0.045336510986089706, 0.019339581951498985, -0.04125940054655075, 0.026645295321941376, -0.01852567493915558, -0.03230265900492668, -0.011601553298532963, -0.019917314872145653, 0.002146124141290784, -0.006789618171751499, -0.07294871658086777, -0.03047441877424717, -0.04620600864291191, -0.03889620676636696, -0.0018508592620491982, -0.05474083498120308, 0.0575009360909462, -0.09167023748159409, -0.017845213413238525, 0.014621570706367493, -0.021408580243587494, -0.04479019343852997, 0.040940091013908386, -0.03586449846625328, -0.022995227947831154, 0.05194389820098877, -0.0400242917239666, -0.005897575989365578, -0.05356553941965103, 0.021755682304501534, -0.0556231364607811, -0.009665188379585743, -0.07814400643110275, -0.0248993132263422, 0.051646724343299866, -0.014475435018539429, -0.026011835783720016, -0.028019314631819725, -0.013895790092647076, -0.026618381962180138, -0.026522569358348846, 0.006866053212434053, -0.05588879436254501, 0.008098273538053036, -0.035045333206653595, -0.04856543987989426, -0.12497542798519135, -0.06941547244787216, -0.07000140845775604, -0.002956899581477046, 0.05796799436211586, 0.03406064584851265, 0.02198636159300804, -0.03506630286574364, -0.09122215956449509, 0.07022522389888763, -0.010751993395388126, -0.021536145359277725, -0.0658908486366272, 0.030613498762249947, 0.04103873670101166, -0.056617170572280884, 0.0009565976215526462, -0.03549102693796158, 0.02737591043114662, 0.09322978556156158, -0.02481578104197979, 0.019094109535217285, -0.0009404918528161943, -0.04606598615646362, -0.08887539058923721, 0.14027400314807892, -0.028917312622070312, 0.08674173057079315, -0.058721017092466354, -0.02155769057571888, -0.014222664758563042, -0.008267270401120186, -0.008061092346906662, 0.024227401241660118, 0.02411738596856594, -0.0006882682209834456, 0.010349034331738949, 0.074966661632061, -0.029603248462080956, -0.0699971467256546, -0.04555271938443184, 0.04706747457385063, 0.0005282397614791989, -0.051725104451179504, -0.043576933443546295, -0.0508698970079422, 0.02345978654921055, -0.05290379002690315, 0.0160776786506176, -0.020897693932056427, 0.08659598231315613, -0.022158373147249222, 0.032175127416849136, 0.03747594729065895, 0.026388248428702354, 0.05571660399436951, -0.016463806852698326, -0.11929627507925034, -0.006662018597126007, -0.02942674234509468, -0.07644764333963394, -0.02004392445087433, 0.035897329449653625, -0.021836351603269577, 0.03283437341451645, -0.8788740038871765, 0.007262303959578276, -0.025003785267472267, -0.03642859682440758, -0.023705899715423584, 0.09447610378265381, 0.03486751392483711, 0.03762674331665039, -0.05564364790916443, -0.05592398717999458, 0.030010532587766647, -0.008316848427057266, -0.06401187926530838, -0.030883846804499626, -0.05355474352836609, 0.005283691920340061, 0.08208303898572922, -0.02150379680097103, 0.02914205752313137, -0.004671927075833082, -0.018069660291075706, 0.03764035180211067, -0.016507839784026146, -0.015029777772724628, -0.07049934566020966, 0.0871996209025383, -0.05977731570601463, -0.012134957127273083, 0.03551500290632248, -0.039921727031469345, 0.022012514993548393, -0.030912531539797783, -0.03631019592285156, 0.04041189327836037, -0.07042663544416428, -0.0902336984872818, -0.01601950265467167, 0.06027404963970184, -0.012120296247303486, -0.04864587262272835, -0.07397894561290741, 0.014651961624622345, -0.022165078669786453, 0.022759854793548584, 0.07348082959651947, -0.03677331656217575, -0.012386320158839226, 0.05858563631772995, -0.02218996174633503, 0.008143575862050056, -0.03325444459915161, -0.020750392228364944, 0.01226009801030159, -0.06507302820682526, -0.02838653326034546, 0.03512018918991089, -0.07374884188175201, -0.04458392411470413, 0.014208108186721802, 0.006506034638732672, 0.04502125456929207, -0.05022962763905525, -0.03807517886161804, -0.0015837108949199319, -0.023940693587064743, -0.001527251093648374, -0.0025195118505507708, 0.03518985956907272, -0.022629553452134132, -0.04013960063457489, -0.020576782524585724, -0.016225513070821762, 0.017195936292409897, 0.0590323880314827, 0.012796652503311634, 0.020174741744995117, -0.041081614792346954, 0.010530127212405205, 0.11716574430465698, -0.016666928306221962, 0.0007364454795606434, -0.014586145058274269, 0.024738501757383347, 0.0203242264688015, -0.027696950361132622, -0.10626038908958435, -0.047153081744909286, 0.03113548457622528, -0.041734106838703156, 0.02135242149233818, -0.054463066160678864, -0.02550467662513256, -0.04976227134466171, -0.0036031783092767, -0.023282168433070183, 0.003934537060558796, 0.016612937673926353, -0.05549006909132004, 0.006502739153802395, -0.010194506496191025, -0.026895839720964432, -0.05504419282078743, 0.04134650155901909, -0.021918490529060364, 0.08288206160068512, -0.002869946416467428, 0.006473280489444733, -0.018180307000875473, -0.04501872509717941, -0.00502518005669117, -0.008978396654129028, 0.06203753501176834, -0.004928195383399725, -0.048844099044799805, -0.001990334829315543, -0.019436050206422806, 0.0453951470553875, -0.03885947912931442, -0.0037116529420018196, 0.0016183465486392379, 0.09054219722747803, -0.01895354874432087, -0.007716404274106026, -0.019651880487799644, 0.05022988095879555, 0.021585622802376747, 0.02868681214749813, 0.054208312183618546, 0.037250783294439316, -0.024597831070423126, 0.04432501271367073, 0.04692010581493378, 0.0113906878978014, -0.07716446369886398, -0.002090864349156618, 0.04667116329073906, 0.02923983335494995, -0.028486795723438263, 0.008978152647614479, 0.01671082340180874, -0.03686099871993065, 0.013529421761631966, -0.0039811343885958195, -0.03534573316574097, 0.04083346202969551, -0.049462124705314636, -0.026194045320153236, -0.02555175870656967, -0.025919219478964806, 0.08391035348176956, -0.014914309605956078, -0.05555816739797592, 0.056586429476737976, 0.010482246056199074, 0.020059935748577118, -0.010832001455128193, 0.012922094203531742, 0.0124970106408, 0.035321783274412155, 0.0035621607676148415, 0.026547446846961975, -0.0317254476249218, 0.03233250230550766, -0.017827149480581284, -0.009764323942363262, -0.05894114077091217, -0.031127793714404106, 0.012316172011196613, -0.02887091413140297, -0.04860110953450203, -0.02247508242726326, 0.008307523094117641, -0.07274939864873886, 0.01793665997684002, -0.0004808412923011929, 0.06278204917907715, -0.00035824760561808944, 0.0535513199865818, 0.049769457429647446, -0.0905897468328476, -0.07754891365766525, -0.006993803661316633, -0.005104719661176205, -0.029252367094159126, 0.024552688002586365, -0.046478405594825745, 0.04393654689192772, 0.018892405554652214, -0.05117754638195038, 0.052933961153030396, -0.016686277464032173, -0.009523441083729267, 0.014690752141177654, -0.030630838125944138, 0.00929953157901764, -0.006144703831523657, 0.008730385452508926, 0.005642009899020195, 0.008334859274327755, -0.017376074567437172, -0.04243287816643715, 0.06394188851118088, 0.0005083686555735767, 0.012362301349639893, -0.0076433587819337845, 0.03501686453819275, -0.05928255617618561, 0.0293326023966074, 0.027897877618670464, 0.05359531566500664, -0.09746351838111877, -0.0428154356777668, -0.0030792669858783484, -0.030750954523682594, -0.017249159514904022, -0.09774590283632278, 0.003080026712268591, -0.032788779586553574, -0.048432860523462296, -0.006401875987648964, 0.05935097485780716, -0.05838523060083389, 0.03460827097296715, 0.012104760855436325, 0.033294230699539185, 0.0034700653050094843, -0.06990322470664978, -0.007960968650877476, -0.0005259870667941868, 0.05518404394388199, -0.043537311255931854, -0.08248536288738251, -0.08272276818752289, -0.08141064643859863, -0.05888890102505684, 2.45047926902771, 0.021380750462412834, -0.03541437163949013, -0.011724271811544895, -0.0075055803172290325, -0.054347362369298935, -0.05494547635316849, 0.05644000321626663, 0.0020361223723739386, -0.05477894842624664, 0.021001845598220825, 0.03689528629183769, 0.011772021651268005, -0.052495479583740234, -0.035718295723199844, 0.012685157358646393, -0.013880845159292221, 0.043528664857149124, 0.0044511230662465096, 0.0053535387851297855, -0.021547934040427208, 0.029770473018288612, -0.04571501538157463, -0.05976163595914841, -0.00859000999480486, 0.020589597523212433, -0.028627701103687286, 0.05886236950755119, 0.014256313443183899, -0.032165542244911194, 0.01051384024322033, -0.0075911870226264, -0.019449882209300995, -0.005164731293916702, 0.029473166912794113, 0.014231467619538307, -0.002796831773594022, 0.03397424519062042, 0.016431564465165138, -0.0058226203545928, 0.012438229285180569, -0.007458977866917849, -0.0036106721963733435, -0.023416901007294655, 0.007608114276081324, 0.0331648550927639, -0.003738521132618189, 0.05441484600305557, 0.013743729330599308, -0.027098162099719048, -0.03515302762389183, 0.016407350078225136, 0.018741033971309662, 0.004434184171259403, -0.010119606740772724, -0.027706919237971306, 0.045854631811380386, -0.02652665041387081, -0.015669364482164383, 0.04051104187965393, 0.014299261383712292, -0.051518477499485016, 0.0247501190751791, -0.0769663080573082, -0.03227701783180237, -0.0682896077632904, -0.006666714325547218, 0.04629950597882271, -0.04621082916855812, 0.0123417554423213, 0.008207732811570168, 0.003075622720643878, -0.0018046770710498095, -0.01756148971617222, -0.09142625331878662, -0.008598546497523785, -5.038573362980969e-06, 0.02628733590245247, -0.06391257792711258, 0.022308045998215675, 0.024979427456855774, -0.0027987626381218433, 0.019262349233031273, 0.015299794264137745, -0.05245068296790123, 0.014805704355239868, -0.013463210314512253, 0.022786477580666542, 0.01999025233089924, -0.016625698655843735, -0.02650211565196514, -0.020000098273158073, -0.13169878721237183, -0.009055743925273418, -0.03594351187348366, -0.0012482318561524153, -0.011181912384927273, -0.05377069488167763, -0.024881837889552116, 0.006542439106851816, -0.030912261456251144, -0.001976432278752327, -0.02869308926165104, 0.07431095838546753, -0.08398990333080292, -0.05648504197597504, -0.03853749856352806, -0.039883531630039215, 0.03985270857810974, 0.010380552150309086, 0.0511120930314064, -0.06727979332208633, -0.045582376420497894, 0.036217015236616135, 2.9765227736788802e-05, -0.01794726960361004, 0.0512266606092453, -0.05094507336616516, 0.03755544126033783, 0.04006528481841087, -0.00048634508857503533, -0.008994167670607567, 0.03841407224535942, 0.04628074914216995, 0.00605130847543478, -0.015114928595721722, 0.001103266142308712, -0.04150160774588585, -0.020136136561632156, -0.044954150915145874, 0.053435299545526505, 0.036948833614587784, -0.019517654553055763, 0.027120642364025116, 0.038925446569919586, -0.003229887457564473, -0.004076499026268721, 0.07427597045898438, 0.023801051080226898, -0.016691233962774277, 0.07389700412750244, 0.0049940189346671104, 0.014642499387264252, -0.014922719448804855, -0.02269822359085083, 0.028411969542503357, 0.015860455110669136, -0.0031095293816179037, -0.026218971237540245, -0.020904041826725006, 0.026014165952801704, 0.028709586709737778, -0.0011511467164382339, 0.005598618648946285, 0.0762813463807106, -0.031927745789289474, -0.04980437457561493, -0.023837530985474586, -0.039071787148714066, 0.00042603458859957755, 0.0011782472720369697, -0.01679646037518978, -0.031536996364593506, 0.011054012924432755, -0.018133563920855522, 0.036679286509752274, 0.014095216989517212, -0.05711539834737778, 0.04295315593481064, 0.04036344960331917, 0.028301406651735306, 0.04286765307188034, -0.014873746782541275, -0.049614034593105316, -0.005986377131193876, 0.010808815248310566, 0.00981096737086773, 0.008440791629254818, -0.059120651334524155, 0.029024597257375717, -0.016045887023210526, 0.05206911265850067, 0.07946864515542984, 0.03045222908258438, 0.016964105889201164, -0.017693987116217613, -0.018403902649879456, -0.060654621571302414, -0.05516315996646881, 0.029268890619277954, -0.029849201440811157, 0.03935197740793228, -0.03141409903764725, -0.012519449926912785, -0.056619949638843536, -0.030109399929642677, -0.0008069179602898657, 0.06630870699882507, -0.02657996118068695, -0.019648756831884384, -0.044789258390665054, 0.035150978714227676, -0.06873136013746262, -0.024617673829197884, -0.011287517845630646, -0.06314080953598022, 0.08604546636343002, -0.023259015753865242, 0.05435391142964363, 0.023080874234437943, -0.02119659073650837, -0.02811380848288536, -0.006150928325951099, -0.05723372846841812, 0.019368156790733337, 0.03942255675792694, 0.0029540536925196648, -0.03450785577297211, -0.0603058822453022, 0.07363598793745041, -0.028946802020072937, -0.1019098237156868, -0.06409598141908646, -0.0001573656190885231, -0.01745186559855938, -0.016582688316702843, 0.0019188413862138987, 0.06132078543305397, 0.018504027277231216, 0.053272560238838196, -0.01765173301100731, 0.000961085082963109, 0.02714916318655014, -0.04547201097011566, 0.03736450523138046, -0.050277117639780045, 0.07904423773288727, -0.12237613648176193, -0.04073382914066315, -0.06591887772083282, -0.005742086097598076, 0.0017023910768330097, -0.037875909358263016, 0.018116453662514687, 0.05403118208050728, -0.10028786957263947, -0.06842974573373795, -0.10418018698692322, 0.018444741144776344, 0.05776314437389374, -0.0315215028822422, -0.04068070277571678, 0.06567439436912537, -0.05020696669816971, -0.03296293690800667, -0.10668456554412842, -0.04050064831972122, -0.05788307636976242, 0.011562791652977467, 0.06537213176488876, 0.0017488849116489291, 0.03379515931010246, -0.0542314313352108, 0.03217236325144768, -0.03804267197847366, -0.03753761947154999, -0.002307848772034049, -0.04246191307902336, -0.03515712544322014, -0.021012013778090477, -0.04304037243127823, 0.06375716626644135, 0.026737023144960403, -0.09157785028219223, 0.061818331480026245, -0.02742522954940796, -0.002763529308140278, 0.041928499937057495, -0.05356898531317711, 0.0539068877696991, -0.018068518489599228, 0.025555135682225227, -0.006111381109803915, -0.021271517500281334, -0.07312703132629395, 0.01476153265684843, 0.012981124222278595, 0.003919216804206371, 0.0145395677536726, 0.00764697277918458, -0.04976822808384895, 0.05506889894604683, -0.10115853697061539, 0.08098918944597244, 0.06652805209159851, -0.05758703872561455, -0.004396950826048851, 0.028381312265992165, 0.0681462362408638, 0.0003248592547606677, -0.013774129562079906, -0.02230236865580082, -0.02134030871093273, 0.044915515929460526, ...]
len(emb_dict)
85466
# Check embedding dimension
len(emb_dict[0])
2560
# Cross-check the node embeddings with the node types
primekg_nodes[primekg_nodes.node_index == 0]
node_index | node_name | node_source | node_id | node_type | |
---|---|---|---|---|---|
0 | 0 | PHYHIP | NCBI | 9796 | gene/protein |
Check Triplet Splits of BioBridge-PrimeKG¶
Lastly, BioBridge-PrimeKG splits contain train and test dataframes of triplets and nodes.
# Get all triplets of BioBridge PrimeKG
triplets = biobridge_data.get_primekg_triplets()
triplets.head()
head_index | head_name | head_source | head_id | head_type | tail_index | tail_name | tail_source | tail_id | tail_type | display_relation | relation | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | PHYHIP | NCBI | 9796 | 1 | 8889 | KIF15 | NCBI | 56992 | 1 | 3 | protein_protein |
1 | 1 | GPANK1 | NCBI | 7918 | 1 | 2798 | PNMA1 | NCBI | 9240 | 1 | 3 | protein_protein |
2 | 2 | ZRSR2 | NCBI | 8233 | 1 | 5646 | TTC33 | NCBI | 23548 | 1 | 3 | protein_protein |
3 | 3 | NRF1 | NCBI | 4899 | 1 | 11592 | MAN1B1 | NCBI | 11253 | 1 | 3 | protein_protein |
4 | 4 | PI4KA | NCBI | 5297 | 1 | 2122 | RGS20 | NCBI | 8601 | 1 | 3 | protein_protein |
triplets
head_index | head_name | head_source | head_id | head_type | tail_index | tail_name | tail_source | tail_id | tail_type | display_relation | relation | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | PHYHIP | NCBI | 9796 | 1 | 8889 | KIF15 | NCBI | 56992 | 1 | 3 | protein_protein |
1 | 1 | GPANK1 | NCBI | 7918 | 1 | 2798 | PNMA1 | NCBI | 9240 | 1 | 3 | protein_protein |
2 | 2 | ZRSR2 | NCBI | 8233 | 1 | 5646 | TTC33 | NCBI | 23548 | 1 | 3 | protein_protein |
3 | 3 | NRF1 | NCBI | 4899 | 1 | 11592 | MAN1B1 | NCBI | 11253 | 1 | 3 | protein_protein |
4 | 4 | PI4KA | NCBI | 5297 | 1 | 2122 | RGS20 | NCBI | 8601 | 1 | 3 | protein_protein |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
3904605 | 52855 | B cell receptor transport into membrane raft | GO | 32597 | 0 | 34572 | CD24 | NCBI | 100133941 | 1 | 2 | bioprocess_protein |
3904606 | 113352 | chemokine receptor transport out of membrane raft | GO | 32600 | 0 | 34572 | CD24 | NCBI | 100133941 | 1 | 2 | bioprocess_protein |
3904607 | 42264 | negative regulation of cytoskeleton organization | GO | 51494 | 0 | 57675 | IQCJ-SCHIP1 | NCBI | 100505385 | 1 | 2 | bioprocess_protein |
3904608 | 109904 | mesendoderm migration | GO | 90133 | 0 | 58770 | APELA | NCBI | 100506013 | 1 | 2 | bioprocess_protein |
3904609 | 44810 | regulation of endoplasmic reticulum unfolded p... | GO | 1900101 | 0 | 57692 | PIGBOS1 | NCBI | 101928527 | 1 | 2 | bioprocess_protein |
3904610 rows × 12 columns
Finally, we can check the number of each split as follows.
# Check the number of samples in each split of the biobridge primekg dataframes
biobridge_splits = biobridge_data.get_train_test_split()
list(biobridge_splits.keys())
['train', 'node_train', 'test', 'node_test']
# Check dataframe of training triples
biobridge_splits["train"]
head_index | head_name | head_source | head_id | head_type | tail_index | tail_name | tail_source | tail_id | tail_type | display_relation | relation | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | PHYHIP | NCBI | 9796 | 1 | 8889 | KIF15 | NCBI | 56992 | 1 | 3 | protein_protein |
1 | 1 | GPANK1 | NCBI | 7918 | 1 | 2798 | PNMA1 | NCBI | 9240 | 1 | 3 | protein_protein |
2 | 2 | ZRSR2 | NCBI | 8233 | 1 | 5646 | TTC33 | NCBI | 23548 | 1 | 3 | protein_protein |
3 | 3 | NRF1 | NCBI | 4899 | 1 | 11592 | MAN1B1 | NCBI | 11253 | 1 | 3 | protein_protein |
4 | 4 | PI4KA | NCBI | 5297 | 1 | 2122 | RGS20 | NCBI | 8601 | 1 | 3 | protein_protein |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
3768281 | 124473 | longitudinal sarcoplasmic reticulum | GO | 14801 | 7 | 58744 | DHRS7C | NCBI | 201140 | 1 | 2 | cellcomp_protein |
3768282 | 55747 | myofilament | GO | 36379 | 7 | 57367 | MYBPHL | NCBI | 343263 | 1 | 2 | cellcomp_protein |
3768285 | 126945 | lateral wall of outer hair cell | GO | 120249 | 7 | 22033 | SLC26A5 | NCBI | 375611 | 1 | 2 | cellcomp_protein |
3768286 | 125456 | Swi5-Swi2 complex | GO | 34974 | 7 | 57415 | SWI5 | NCBI | 375757 | 1 | 2 | cellcomp_protein |
3768287 | 55667 | SUMO ligase complex | GO | 106068 | 7 | 35398 | SUMO4 | NCBI | 387082 | 1 | 2 | cellcomp_protein |
3510930 rows × 12 columns
# Check dataframe of training nodes
biobridge_splits["node_train"]
node_index | node_type | |
---|---|---|
0 | 0 | 1 |
1 | 1 | 1 |
2 | 2 | 1 |
3 | 3 | 1 |
4 | 4 | 1 |
... | ... | ... |
76481 | 127431 | 7 |
76482 | 127432 | 7 |
76483 | 127433 | 7 |
76484 | 127239 | 7 |
76485 | 127316 | 7 |
76486 rows × 2 columns
# Check dataframe of testing triples
biobridge_splits["test"]
head_index | head_name | head_source | head_id | head_type | tail_index | tail_name | tail_source | tail_id | tail_type | display_relation | relation | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
8 | 8 | MT1A | NCBI | 4489 | 1 | 1785 | TP53 | NCBI | 7157 | 1 | 3 | protein_protein |
12 | 12 | CD7 | NCBI | 924 | 1 | 7681 | SFXN5 | NCBI | 94097 | 1 | 3 | protein_protein |
16 | 16 | SNRPD2 | NCBI | 6633 | 1 | 3235 | PRPF4 | NCBI | 9128 | 1 | 3 | protein_protein |
18 | 19 | VAV3 | NCBI | 10451 | 1 | 3005 | ZRANB1 | NCBI | 54764 | 1 | 3 | protein_protein |
29 | 16 | SNRPD2 | NCBI | 6633 | 1 | 216 | NCSTN | NCBI | 23385 | 1 | 3 | protein_protein |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
3768273 | 125342 | myosin V complex | GO | 31475 | 7 | 9639 | DYNLL2 | NCBI | 140735 | 1 | 2 | cellcomp_protein |
3768277 | 55608 | extracellular membrane-bounded organelle | GO | 65010 | 7 | 57129 | PHOSPHO1 | NCBI | 162466 | 1 | 2 | cellcomp_protein |
3768283 | 124243 | axonemal outer doublet | GO | 97545 | 7 | 59351 | CFAP100 | NCBI | 348807 | 1 | 2 | cellcomp_protein |
3768284 | 124243 | axonemal outer doublet | GO | 97545 | 7 | 59352 | CFAP73 | NCBI | 387885 | 1 | 2 | cellcomp_protein |
3768288 | 126258 | uropod membrane | GO | 31259 | 7 | 57434 | SCIMP | NCBI | 388325 | 1 | 2 | cellcomp_protein |
393680 rows × 12 columns
# Check dataframe of testing nodes
biobridge_splits["node_test"]
node_index | node_type | |
---|---|---|
0 | 8 | 1 |
1 | 12 | 1 |
2 | 16 | 1 |
3 | 19 | 1 |
4 | 34 | 1 |
... | ... | ... |
8490 | 127404 | 7 |
8491 | 127415 | 7 |
8492 | 127421 | 7 |
8493 | 127425 | 7 |
8494 | 127434 | 7 |
8495 rows × 2 columns