PyG2DataFrame
In [1]:
Copied!
import pickle
import pandas as pd
import pickle
import pandas as pd
In [2]:
Copied!
# Load the knowledge graph
pyg_file = "../../../aiagents4pharma/talk2knowledgegraphs/tests/files/primekg_ibd_pyg_graph.pkl"
with open(pyg_file, "rb") as f:
pyg_data = pickle.load(f)
# Load the knowledge graph
pyg_file = "../../../aiagents4pharma/talk2knowledgegraphs/tests/files/primekg_ibd_pyg_graph.pkl"
with open(pyg_file, "rb") as f:
pyg_data = pickle.load(f)
c:\Users\mulyadi\TempRepo\hackathon\AIAgents4Pharma\venv\Lib\site-packages\tqdm\auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html from .autonotebook import tqdm as notebook_tqdm
In [3]:
Copied!
pyg_data
pyg_data
Out[3]:
Data(x=[3426, 768], edge_index=[2, 12752], edge_attr=[12752, 768], node_id=[3426], node_name=[3426], node_type=[3426], enriched_node=[3426], key=[12752], head_id=[12752], head_name=[12752], tail_id=[12752], tail_name=[12752], edge_type=[12752], enriched_edge=[12752])
In [4]:
Copied!
# Convert the PyG data to a pandas DataFrame for node
df_nodes = pd.DataFrame({
"node_id": pyg_data.node_id,
"node_name": pyg_data.node_name,
"node_type": pyg_data.node_type,
"enriched_node": pyg_data.enriched_node,
"embedded_node": pyg_data.x.tolist(),
})
df_nodes
# Convert the PyG data to a pandas DataFrame for node
df_nodes = pd.DataFrame({
"node_id": pyg_data.node_id,
"node_name": pyg_data.node_name,
"node_type": pyg_data.node_type,
"enriched_node": pyg_data.enriched_node,
"embedded_node": pyg_data.x.tolist(),
})
df_nodes
Out[4]:
node_id | node_name | node_type | enriched_node | embedded_node | |
---|---|---|---|---|---|
0 | SMAD3_(144) | SMAD3 | gene/protein | SMAD3 belongs to gene/protein category. The SM... | [0.02653600461781025, 0.05420931056141853, -0.... |
1 | IL10RB_(179) | IL10RB | gene/protein | IL10RB belongs to gene/protein category. The p... | [0.02476494573056698, 0.02278200164437294, -0.... |
2 | GNA12_(192) | GNA12 | gene/protein | GNA12 belongs to gene/protein category. Predic... | [0.00479594711214304, 0.04921527951955795, -0.... |
3 | HNF4A_(279) | HNF4A | gene/protein | HNF4A belongs to gene/protein category. The pr... | [0.013905026949942112, 0.032602787017822266, -... |
4 | VCAM1_(417) | VCAM1 | gene/protein | VCAM1 belongs to gene/protein category. This g... | [0.04729974642395973, 0.03262118622660637, -0.... |
... | ... | ... | ... | ... | ... |
3421 | IRAK2 mediated activation of TAK1 complex upon... | IRAK2 mediated activation of TAK1 complex upon... | pathway | IRAK2 mediated activation of TAK1 complex upon... | [-0.014931154437363148, 0.03044624999165535, -... |
3422 | TRAF6 mediated IRF7 activation in TLR7/8 or 9 ... | TRAF6 mediated IRF7 activation in TLR7/8 or 9 ... | pathway | TRAF6 mediated IRF7 activation in TLR7/8 or 9 ... | [0.03156436234712601, 0.05540117993950844, -0.... |
3423 | Antigen processing: Ubiquitination & Proteasom... | Antigen processing: Ubiquitination & Proteasom... | pathway | Antigen processing: Ubiquitination & Proteasom... | [0.04519890621304512, 0.029452601447701454, -0... |
3424 | Antigen Presentation: Folding, assembly and pe... | Antigen Presentation: Folding, assembly and pe... | pathway | Antigen Presentation: Folding, assembly and pe... | [0.014839296229183674, 0.04876236990094185, -0... |
3425 | Kinesins_(129367) | Kinesins | pathway | Kinesins belongs to pathway category. This pat... | [0.038248274475336075, 0.07633280754089355, -0... |
3426 rows × 5 columns
In [5]:
Copied!
# Convert the PyG data to a pandas DataFrame for node
df_edges = pd.DataFrame({
"head_id": pyg_data.head_id,
"head_name": pyg_data.head_name,
"edge_type": pyg_data.edge_type,
"tail_id": pyg_data.tail_id,
"tail_name": pyg_data.tail_name,
"enriched_edge": pyg_data.enriched_edge,
"embedded_edge": pyg_data.edge_attr.tolist(),
})
df_edges
# Convert the PyG data to a pandas DataFrame for node
df_edges = pd.DataFrame({
"head_id": pyg_data.head_id,
"head_name": pyg_data.head_name,
"edge_type": pyg_data.edge_type,
"tail_id": pyg_data.tail_id,
"tail_name": pyg_data.tail_name,
"enriched_edge": pyg_data.enriched_edge,
"embedded_edge": pyg_data.edge_attr.tolist(),
})
df_edges
Out[5]:
head_id | head_name | edge_type | tail_id | tail_name | enriched_edge | embedded_edge | |
---|---|---|---|---|---|---|---|
0 | SMAD3_(144) | SMAD3 | (gene/protein, associated with, disease) | Crohn disease_(37784) | Crohn disease | SMAD3 (gene/protein) has a direct relationship... | [0.052218832075595856, 0.011464782059192657, -... |
1 | SMAD3_(144) | SMAD3 | (gene/protein, associated with, disease) | inflammatory bowel disease_(28158) | inflammatory bowel disease | SMAD3 (gene/protein) has a direct relationship... | [0.04878539964556694, 0.027767326682806015, -0... |
2 | SMAD3_(144) | SMAD3 | (gene/protein, associated with, disease) | Crohn's colitis_(83770) | Crohn's colitis | SMAD3 (gene/protein) has a direct relationship... | [0.04968055710196495, 0.013924038037657738, -0... |
3 | SMAD3_(144) | SMAD3 | (gene/protein, associated with, disease) | Crohn ileitis and jejunitis_(35814) | Crohn ileitis and jejunitis | SMAD3 (gene/protein) has a direct relationship... | [0.03398257866501808, 0.014872003346681595, -0... |
4 | SMAD3_(144) | SMAD3 | (gene/protein, interacts with, pathway) | Signaling by NODAL_(62373) | Signaling by NODAL | SMAD3 (gene/protein) has a direct relationship... | [0.01159461960196495, 0.01849970780313015, -0.... |
... | ... | ... | ... | ... | ... | ... | ... |
12747 | IRAK2 mediated activation of TAK1 complex upon... | IRAK2 mediated activation of TAK1 complex upon... | (pathway, interacts with, gene/protein) | TLR4_(3259) | TLR4 | IRAK2 mediated activation of TAK1 complex upon... | [-0.00019741167488973588, 0.006676936056464910... |
12748 | TRAF6 mediated IRF7 activation in TLR7/8 or 9 ... | TRAF6 mediated IRF7 activation in TLR7/8 or 9 ... | (pathway, interacts with, gene/protein) | TLR9_(10113) | TLR9 | TRAF6 mediated IRF7 activation in TLR7/8 or 9 ... | [0.03718600049614906, 0.01651887036859989, -0.... |
12749 | Antigen processing: Ubiquitination & Proteasom... | Antigen processing: Ubiquitination & Proteasom... | (pathway, interacts with, gene/protein) | HERC2_(1777) | HERC2 | Antigen processing: Ubiquitination & Proteasom... | [0.057375308126211166, 0.009233011864125729, -... |
12750 | Antigen Presentation: Folding, assembly and pe... | Antigen Presentation: Folding, assembly and pe... | (pathway, interacts with, gene/protein) | ERAP2_(12763) | ERAP2 | Antigen Presentation: Folding, assembly and pe... | [0.008740102872252464, 0.007800932973623276, -... |
12751 | Kinesins_(129367) | Kinesins | (pathway, interacts with, gene/protein) | KIF21B_(8564) | KIF21B | Kinesins (pathway) has a direct relationship o... | [0.01051196176558733, 0.04535209387540817, -0.... |
12752 rows × 7 columns