BioBridge-PrimeKG (IBD) Multimodal Data Dump
In this tutorial, we will prepare Milvus database for storing and searching nodes and edges of a graph.
In particular, we are using PrimeKG multimodal data from the BioBridge project.
# Load necessary libraries
import os
import glob
import hydra
import cudf
import cupy as cp
import numpy as np
from pymilvus import (
db,
connections,
FieldSchema,
CollectionSchema,
DataType,
Collection,
utility,
MilvusClient
)
from tqdm import tqdm
import time
import pickle
from langchain_openai import OpenAIEmbeddings
Setup OpenAI API Key for Re-Embedding¶
os.environ["OPENAI_API_KEY"] = "your_openai_api_key_here"
emb_model = OpenAIEmbeddings(model="text-embedding-ada-002",
openai_api_key=os.environ["OPENAI_API_KEY"])
Loading IBD BioBridge-PrimeKG Multimodal Data¶
First, we need to get the path to the directory containing the parquet files of nodes and edges.
For nodes and edges, we have a separate folder that contains its enrichment and embeddings.
# Load pickle of the graph data
with open('../../../aiagents4pharma/talk2knowledgegraphs/tests/files/biobridge_multimodal_pyg_graph.pkl', 'rb') as f:
graph = pickle.load(f)
/home/awmulyadi/Repositories/AIAgents4Pharma/venv/lib/python3.12/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html from .autonotebook import tqdm as notebook_tqdm
def normalize_matrix(m, axis=1):
"""
Normalize each row of a 2D matrix using CuPy.
Parameters:
m (cupy.ndarray): 2D matrix to normalize.
Returns:
cupy.ndarray: Normalized matrix.
"""
norms = cp.linalg.norm(m, axis=axis, keepdims=True)
return m / norms
def normalize_vector(v):
"""
Normalize a vector using CuPy.
Parameters:
v (cupy.ndarray): Vector to normalize.
Returns:
cupy.ndarray: Normalized vector.
"""
v = cp.asarray(v)
norm = cp.linalg.norm(v)
return v / norm
Nodes Preprocessing (including re-embedding)¶
# Convert the list of embeddings to a 2D CuPy array (N x D)
graph_desc_x_cp = cp.asarray(graph['desc_x'].tolist())
# Normalize all rows (vectors) using broadcasting
graph_desc_x_normalized = normalize_matrix(graph_desc_x_cp, axis=1)
graph_x_normalized = [normalize_vector(v).tolist() for v in graph['x']]
# Convert the graph nodes to a cudf DataFrame
nodes_df = cudf.DataFrame({
'node_id': graph['node_id'],
'node_name': graph['node_name'],
'node_type': graph['node_type'],
'desc': graph['desc'],
'desc_emb': graph_desc_x_normalized.tolist(),
'feat': graph['enriched_node'],
'feat_emb': graph_x_normalized,
})
nodes_df.reset_index(inplace=True)
nodes_df.rename(columns={'index': 'node_index'}, inplace=True)
nodes_df.head(3)
node_index | node_id | node_name | node_type | desc | desc_emb | feat | feat_emb | |
---|---|---|---|---|---|---|---|---|
0 | 0 | SMAD3_(144) | SMAD3 | gene/protein | SMAD3 belongs to gene/protein node. SMAD3 is S... | [0.02974936784975063, 0.05350021171537046, -0.... | MSSILPFTPPIVKRLLGWKKGEQNGQEEKWCEKAVKSLVKKLKKTG... | [-0.0010794274069904548, -0.0028632148270051, ... |
1 | 1 | IL10RB_(179) | IL10RB | gene/protein | IL10RB belongs to gene/protein node. IL10RB is... | [0.02842173040130417, 0.01986006372730412, -0.... | MAWSLGSWLGGCLLVSALGMVPPPENVRMNSVNFKNILQWESPAFA... | [-0.007157766077247574, 0.006195289622587354, ... |
2 | 2 | GNA12_(192) | GNA12 | gene/protein | GNA12 belongs to gene/protein node. GNA12 is G... | [0.003668847841835145, 0.051380571197126614, -... | MSGVVRTLSRCLLPAEAGGARERRAGSGARDAEREARRRSRDIDAL... | [-0.001562959383761, -0.01338132129666802, -0.... |
Optional: Nodes Re-embedding usin OpenAI¶
You may skip this section if you want to stick with 'nomic-embed-text' embedding model as default one.
# Checking embedding dimensions before proceeding
emb_dim = len(nodes_df['desc_emb'].iloc[0])
print(f"Embedding dimension: {emb_dim}")
Embedding dimension: 768
# For textual data, we will re-embed the descriptions using OpenAI embeddings
mini_batch_size = 100
desc_embeddings = []
for i in tqdm(range(0, nodes_df.shape[0], mini_batch_size), desc="Re-embedding descriptions"):
batch = nodes_df['desc'].to_pandas().tolist()[i:i+mini_batch_size]
embeddings = emb_model.embed_documents(batch)
desc_embeddings.extend(embeddings)
nodes_df['desc_emb'] = desc_embeddings
Re-embedding descriptions: 100%|██████████| 30/30 [00:27<00:00, 1.07it/s]
# Checking embeddings dimensions after re-embedding
emb_dim = len(nodes_df['desc_emb'].iloc[0])
print(f"Embedding dimension after re-embedding: {emb_dim}")
Embedding dimension after re-embedding: 1536
# Get the text-based nodes for re-embedding
text_based_df = nodes_df[nodes_df.node_type.isin(['disease', 'biological_process', 'cellular_component', 'molecular_function'])]
for nt, text_based_df_ in text_based_df.groupby("node_type"):
print(f"Re-embedding {nt} nodes")
# Checking embedding dimensions before proceeding
emb_dim = len(text_based_df_['feat_emb'].iloc[0])
print(f"Embedding dimension: {emb_dim}")
print('---')
Re-embedding biological_process nodes Embedding dimension: 768 --- Re-embedding cellular_component nodes Embedding dimension: 768 --- Re-embedding disease nodes Embedding dimension: 768 --- Re-embedding molecular_function nodes Embedding dimension: 768 ---
# Update textual pre-loaded embeddings with OpenAI embeddings
# Since the records of nodes has large amount of data, we will split them into mini-batches
mini_batch_size = 100
text_node_indexes = []
text_node_embeddings = []
for i in tqdm(range(0, text_based_df.shape[0], mini_batch_size), desc="Re-embedding text nodes"):
outputs = emb_model.embed_documents(text_based_df.to_pandas().feat.values.tolist()[i:i+mini_batch_size])
text_node_indexes.extend(text_based_df.to_pandas().node_index.values.tolist()[i:i+mini_batch_size])
text_node_embeddings.extend(outputs)
dic_text_embeddings = dict(zip(text_node_indexes, text_node_embeddings))
Re-embedding text nodes: 100%|██████████| 22/22 [00:58<00:00, 2.65s/it]
# Replace the embeddings of the nodes with the updated embeddings for text-based nodes
nodes_df["feat_emb"] = nodes_df.to_pandas().apply(lambda x: dic_text_embeddings[x["node_index"]] if x["node_index"] in dic_text_embeddings else x["feat_emb"], axis=1)
# Get the text-based nodes for re-embedding
text_based_df = nodes_df[nodes_df.node_type.isin(['disease', 'biological_process', 'cellular_component', 'molecular_function'])]
for nt, text_based_df_ in text_based_df.groupby("node_type"):
print(f"Re-embedding {nt} nodes")
# Checking embedding dimensions before proceeding
emb_dim = len(text_based_df_['feat_emb'].iloc[0])
print(f"Embedding dimension: {emb_dim}")
print('---')
Re-embedding biological_process nodes Embedding dimension: 1536 --- Re-embedding cellular_component nodes Embedding dimension: 1536 --- Re-embedding disease nodes Embedding dimension: 1536 --- Re-embedding molecular_function nodes Embedding dimension: 1536 ---
Edges Preprocessing (including re-embedding)¶
# Convert the list of edge embeddings to a 2D CuPy array (M x D)
graph_edge_attr_cp = cp.asarray(graph['edge_attr'].tolist())
# Normalize all rows (vectors) using broadcasting
graph_edge_attr_normalized = normalize_matrix(graph_edge_attr_cp, axis=1)
# Convert the graph edges to a cudf DataFrame
edges_df = cudf.DataFrame({
'triplet_index': graph['triplet_index'],
'head_id': graph['head_id'],
'head_name': graph['head_name'],
'tail_id': graph['tail_id'],
'tail_name': graph['tail_name'],
'display_relation': graph['display_relation'],
'edge_type': graph['edge_type'],
'edge_type_str': ['|'.join(e) for e in graph['edge_type']],
'feat': graph['enriched_edge'],
'edge_emb': graph_edge_attr_normalized.tolist(),
})
edges_df = edges_df.merge(
nodes_df[['node_index', 'node_id']],
left_on='head_id',
right_on='node_id',
how='left'
)
edges_df.rename(columns={'node_index': 'head_index'}, inplace=True)
edges_df.drop(columns=['node_id'], inplace=True)
edges_df = edges_df.merge(
nodes_df[['node_index', 'node_id']],
left_on='tail_id',
right_on='node_id',
how='left'
)
edges_df.rename(columns={'node_index': 'tail_index'}, inplace=True)
edges_df.drop(columns=['node_id'], inplace=True)
edges_df.head(3)
triplet_index | head_id | head_name | tail_id | tail_name | display_relation | edge_type | edge_type_str | feat | edge_emb | head_index | tail_index | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 8602 | cytokine-mediated signaling pathway_(47242) | cytokine-mediated signaling pathway | IL10RB_(179) | IL10RB | interacts with | [biological_process, interacts with, gene/prot... | biological_process|interacts with|gene/protein | cytokine-mediated signaling pathway (biologica... | [0.016838406414606846, 0.019238545922865967, -... | 1455 | 1 |
1 | 8603 | cytokine-mediated signaling pathway_(47242) | cytokine-mediated signaling pathway | IL12B_(6168) | IL12B | interacts with | [biological_process, interacts with, gene/prot... | biological_process|interacts with|gene/protein | cytokine-mediated signaling pathway (biologica... | [0.018197947379867397, 0.03141968316046658, -0... | 1455 | 59 |
2 | 8604 | cytokine-mediated signaling pathway_(47242) | cytokine-mediated signaling pathway | IRF5_(3646) | IRF5 | interacts with | [biological_process, interacts with, gene/prot... | biological_process|interacts with|gene/protein | cytokine-mediated signaling pathway (biologica... | [0.018029207941198132, 0.019414354880667273, -... | 1455 | 46 |
Optional: Edges Re-embedding using OpenAI¶
# Checking embeddings dimensions after re-embedding
emb_dim = len(edges_df['edge_emb'].iloc[0])
print(f"Embedding dimension after re-embedding: {emb_dim}")
Embedding dimension after re-embedding: 768
# For textual data, we will re-embed the descriptions using OpenAI embeddings
mini_batch_size = 100
edge_embeddings = []
for i in tqdm(range(0, edges_df.shape[0], mini_batch_size), desc="Re-embedding edges"):
batch = edges_df['feat'].to_pandas().tolist()[i:i+mini_batch_size]
embeddings = emb_model.embed_documents(batch)
edge_embeddings.extend(embeddings)
edges_df['edge_emb'] = edge_embeddings
Re-embedding edges: 100%|██████████| 113/113 [01:37<00:00, 1.16it/s]
# Checking embeddings dimensions after re-embedding
emb_dim = len(edges_df['edge_emb'].iloc[0])
print(f"Embedding dimension after re-embedding: {emb_dim}")
Embedding dimension after re-embedding: 1536
Storing dataframes¶
# Store the DataFrame into compressed parquet files
storage_path = "../../../aiagents4pharma/talk2knowledgegraphs/tests/files/biobridge_multimodal/"
os.makedirs(storage_path, exist_ok=True)
# Nodes enrichment & embeddings
nodes_enrichment = nodes_df[['node_index', 'node_id', 'node_name', 'node_type', 'desc', 'feat', ]].to_pandas().copy()
os.makedirs(os.path.join(storage_path, 'nodes', 'enrichment'), exist_ok=True)
for nt, nodes_df_ in nodes_enrichment.groupby('node_type'):
print(nt, nodes_df_.shape)
nodes_df_.to_parquet(os.path.join(storage_path, 'nodes', 'enrichment', f'{nt.replace('/', '_')}.parquet.gzip'),
compression='gzip',
index=False)
print("Nodes enrichment saved.")
print('---')
nodes_embeddings = nodes_df[['node_index', 'node_id', 'node_type', 'desc_emb', 'feat_emb']].to_pandas().copy()
os.makedirs(os.path.join(storage_path, 'nodes', 'embedding'), exist_ok=True)
for nt, nodes_df_ in nodes_embeddings.groupby('node_type'):
print(nt, nodes_df_.shape)
nodes_df_[['node_index', 'node_id', 'desc_emb', 'feat_emb']].to_parquet(os.path.join(storage_path, 'nodes', 'embedding', f'{nt.replace('/', '_')}.parquet.gzip'),
compression='gzip',
index=False)
print("Nodes embeddings saved.")
biological_process (1615, 6) cellular_component (202, 6) disease (7, 6) drug (748, 6) gene/protein (102, 6) molecular_function (317, 6) Nodes enrichment saved. --- biological_process (1615, 5) cellular_component (202, 5) disease (7, 5) drug (748, 5) gene/protein (102, 5) molecular_function (317, 5) Nodes embeddings saved.
# Edges enrichment & embeddings
edges_enrichment = edges_df[['triplet_index', 'head_id', 'head_index', 'tail_id', 'tail_index',
'edge_type', 'edge_type_str', 'display_relation', 'feat']].to_pandas().copy()
os.makedirs(os.path.join(storage_path, 'edges', 'enrichment'), exist_ok=True)
edges_enrichment.to_parquet(os.path.join(storage_path, 'edges', 'enrichment', f'edges.parquet.gzip'),
compression='gzip',
index=False)
print("Edges enrichment saved.")
print('---')
edges_embeddings = edges_df[['triplet_index', 'head_index', 'tail_index', 'edge_emb']].to_pandas().copy()
os.makedirs(os.path.join(storage_path, 'edges', 'embedding'), exist_ok=True)
chunk_size = 1000
for i in range(0, edges_embeddings.shape[0], chunk_size):
et = f'edges_{i // chunk_size}'
edges_embeddings_chunk = edges_embeddings.iloc[i:i + chunk_size]
# Save each chunk to a separate parquet file
edges_embeddings_chunk.to_parquet(os.path.join(storage_path, 'edges', 'embedding', f'{et}.parquet.gzip'),
compression='gzip',
index=False)
print("Edges embeddings saved.")
Edges enrichment saved. --- Edges embeddings saved.
Loading dataframes¶
# Set storage path for the dataframes
storage_path = "../../../aiagents4pharma/talk2knowledgegraphs/tests/files/biobridge_multimodal/"
# Loop over nodes and edges
graph_dict = {}
for element in ["nodes", "edges"]:
# Make an empty dictionary for each folder
graph_dict[element] = {}
for stage in ["enrichment", "embedding"]:
print(element, stage)
# Create the file pattern for the current subfolder
file_list = glob.glob(os.path.join(storage_path,
element,
stage, '*.parquet.gzip'))
print(file_list)
# Read and concatenate all dataframes in the folder
# Except the edges embedding, which is too large to read in one go
# We are using a chunk size to read the edges embedding in smaller parts instead
if element == "edges" and stage == "embedding":
# For edges embedding, only read two columns: triplet_index and edge_emb
# graph_dict[element][stage] = cudf.concat([cudf.read_parquet(f, columns=["triplet_index", "edge_emb"]) for f in file_list[:2]], ignore_index=True)
# Loop by chunks
# file_list = file_list[:2]
chunk_size = 5
graph_dict[element][stage] = []
for i in range(0, len(file_list), chunk_size):
chunk_files = file_list[i:i+chunk_size]
chunk_df = cudf.concat([cudf.read_parquet(f, columns=["triplet_index", "edge_emb"]) for f in chunk_files], ignore_index=True)
graph_dict[element][stage].append(chunk_df)
else:
# For nodes and edges enrichment, read and concatenate all dataframes in the folder
# This includes the nodes embedding, which is small enough to read in one go
graph_dict[element][stage] = cudf.concat([cudf.read_parquet(f) for f in file_list], ignore_index=True)
nodes enrichment ['../../../aiagents4pharma/talk2knowledgegraphs/tests/files/biobridge_multimodal/nodes/enrichment/gene_protein.parquet.gzip', '../../../aiagents4pharma/talk2knowledgegraphs/tests/files/biobridge_multimodal/nodes/enrichment/disease.parquet.gzip', '../../../aiagents4pharma/talk2knowledgegraphs/tests/files/biobridge_multimodal/nodes/enrichment/biological_process.parquet.gzip', '../../../aiagents4pharma/talk2knowledgegraphs/tests/files/biobridge_multimodal/nodes/enrichment/cellular_component.parquet.gzip', '../../../aiagents4pharma/talk2knowledgegraphs/tests/files/biobridge_multimodal/nodes/enrichment/molecular_function.parquet.gzip', '../../../aiagents4pharma/talk2knowledgegraphs/tests/files/biobridge_multimodal/nodes/enrichment/drug.parquet.gzip'] nodes embedding ['../../../aiagents4pharma/talk2knowledgegraphs/tests/files/biobridge_multimodal/nodes/embedding/gene_protein.parquet.gzip', '../../../aiagents4pharma/talk2knowledgegraphs/tests/files/biobridge_multimodal/nodes/embedding/disease.parquet.gzip', '../../../aiagents4pharma/talk2knowledgegraphs/tests/files/biobridge_multimodal/nodes/embedding/biological_process.parquet.gzip', '../../../aiagents4pharma/talk2knowledgegraphs/tests/files/biobridge_multimodal/nodes/embedding/cellular_component.parquet.gzip', '../../../aiagents4pharma/talk2knowledgegraphs/tests/files/biobridge_multimodal/nodes/embedding/molecular_function.parquet.gzip', '../../../aiagents4pharma/talk2knowledgegraphs/tests/files/biobridge_multimodal/nodes/embedding/drug.parquet.gzip'] edges enrichment ['../../../aiagents4pharma/talk2knowledgegraphs/tests/files/biobridge_multimodal/edges/enrichment/edges.parquet.gzip'] edges embedding ['../../../aiagents4pharma/talk2knowledgegraphs/tests/files/biobridge_multimodal/edges/embedding/edges_3.parquet.gzip', '../../../aiagents4pharma/talk2knowledgegraphs/tests/files/biobridge_multimodal/edges/embedding/edges_5.parquet.gzip', '../../../aiagents4pharma/talk2knowledgegraphs/tests/files/biobridge_multimodal/edges/embedding/edges_0.parquet.gzip', '../../../aiagents4pharma/talk2knowledgegraphs/tests/files/biobridge_multimodal/edges/embedding/edges_9.parquet.gzip', '../../../aiagents4pharma/talk2knowledgegraphs/tests/files/biobridge_multimodal/edges/embedding/edges_8.parquet.gzip', '../../../aiagents4pharma/talk2knowledgegraphs/tests/files/biobridge_multimodal/edges/embedding/edges_7.parquet.gzip', '../../../aiagents4pharma/talk2knowledgegraphs/tests/files/biobridge_multimodal/edges/embedding/edges_11.parquet.gzip', '../../../aiagents4pharma/talk2knowledgegraphs/tests/files/biobridge_multimodal/edges/embedding/edges_1.parquet.gzip', '../../../aiagents4pharma/talk2knowledgegraphs/tests/files/biobridge_multimodal/edges/embedding/edges_2.parquet.gzip', '../../../aiagents4pharma/talk2knowledgegraphs/tests/files/biobridge_multimodal/edges/embedding/edges_4.parquet.gzip', '../../../aiagents4pharma/talk2knowledgegraphs/tests/files/biobridge_multimodal/edges/embedding/edges_6.parquet.gzip', '../../../aiagents4pharma/talk2knowledgegraphs/tests/files/biobridge_multimodal/edges/embedding/edges_10.parquet.gzip']
# Get nodes enrichment and embedding dataframes
nodes_enrichment_df = graph_dict['nodes']['enrichment']
nodes_embedding_df = graph_dict['nodes']['embedding']
# Get edges enrichment and embedding dataframes
edges_enrichment_df = graph_dict['edges']['enrichment']
edges_embedding_df = graph_dict['edges']['embedding'] # !!consisted of a list of dataframes!!
# Merge nodes enrichment and embedding dataframes
merged_nodes_df = nodes_enrichment_df.merge(
nodes_embedding_df[["node_id", "desc_emb", "feat_emb"]],
on="node_id",
how="left"
)
# del nodes_enrichment_df, nodes_embedding_df # Free memory
# Check dataframe
nodes_enrichment_df.head(3)
node_index | node_id | node_name | node_type | desc | feat | |
---|---|---|---|---|---|---|
0 | 0 | SMAD3_(144) | SMAD3 | gene/protein | SMAD3 belongs to gene/protein node. SMAD3 is S... | MSSILPFTPPIVKRLLGWKKGEQNGQEEKWCEKAVKSLVKKLKKTG... |
1 | 1 | IL10RB_(179) | IL10RB | gene/protein | IL10RB belongs to gene/protein node. IL10RB is... | MAWSLGSWLGGCLLVSALGMVPPPENVRMNSVNFKNILQWESPAFA... |
2 | 2 | GNA12_(192) | GNA12 | gene/protein | GNA12 belongs to gene/protein node. GNA12 is G... | MSGVVRTLSRCLLPAEAGGARERRAGSGARDAEREARRRSRDIDAL... |
# Check dataframe
nodes_embedding_df.head(3)
node_index | node_id | desc_emb | feat_emb | |
---|---|---|---|---|
0 | 0 | SMAD3_(144) | [-0.03699171170592308, -0.005479035433381796, ... | [-0.0010794274069904548, -0.0028632148270051, ... |
1 | 1 | IL10RB_(179) | [-0.02927332930266857, -0.0068625640124082565,... | [-0.007157766077247574, 0.006195289622587354, ... |
2 | 2 | GNA12_(192) | [-0.02188265137374401, -0.01718498021364212, -... | [-0.001562959383761, -0.01338132129666802, -0.... |
# Check dataframe
edges_enrichment_df.head(3)
triplet_index | head_id | head_index | tail_id | tail_index | edge_type | edge_type_str | display_relation | feat | |
---|---|---|---|---|---|---|---|---|---|
0 | 8602 | cytokine-mediated signaling pathway_(47242) | 1455 | IL10RB_(179) | 1 | [biological_process, interacts with, gene/prot... | biological_process|interacts with|gene/protein | interacts with | cytokine-mediated signaling pathway (biologica... |
1 | 8603 | cytokine-mediated signaling pathway_(47242) | 1455 | IL12B_(6168) | 59 | [biological_process, interacts with, gene/prot... | biological_process|interacts with|gene/protein | interacts with | cytokine-mediated signaling pathway (biologica... |
2 | 8604 | cytokine-mediated signaling pathway_(47242) | 1455 | IRF5_(3646) | 46 | [biological_process, interacts with, gene/prot... | biological_process|interacts with|gene/protein | interacts with | cytokine-mediated signaling pathway (biologica... |
# Check dataframes
len(edges_embedding_df)
3
# Check the first chunk of edges embedding
edges_embedding_df[0].head(3)
triplet_index | edge_emb | |
---|---|---|
0 | 7408 | [-0.030603667721152306, -0.020534928888082504,... |
1 | 7071 | [-0.028667431324720383, -0.004508184734731913,... |
2 | 7072 | [-0.04058527573943138, -0.010810465551912785, ... |
Setup Milvus Database¶
# Configuration for Milvus
milvus_host = "localhost"
milvus_port = "19530"
milvus_uri = "http://localhost:19530"
milvus_token = "root:Milvus"
milvus_user = "root"
milvus_password = "Milvus"
milvus_database = "t2kg_primekg"
# Connect to Milvus
connections.connect(
alias="default",
host=milvus_host,
port=milvus_port,
user=milvus_user,
password=milvus_password
)
# Check if the database exists, create if it doesn't
if milvus_database not in db.list_database():
db.create_database(milvus_database)
# Switch to the desired database
db.using_database(milvus_database)
# List all collections
for coll in utility.list_collections():
print(f"Collection: {coll}")
# Load the collection to get stats
collection = Collection(name=coll)
print(collection.num_entities)
# Drop the collection if it exists
if utility.has_collection(coll):
print(f"Dropping collection: {coll}")
utility.drop_collection(coll)
# A helper function to chunk the data into smaller parts
# Utility: chunk generator
def chunked(data_list, chunk_size):
for i in range(0, len(data_list), chunk_size):
yield data_list[i:i + chunk_size]
Building Node Collection (Description Embedding)¶
%%time
# Configuration for Milvus collection
node_coll_name = f"{milvus_database}_nodes"
# Define schema for the collection
# Leave out the feat and feat_emb fields for now
desc_emb_dim = len(merged_nodes_df.iloc[0]['desc_emb'].to_arrow().to_pylist()[0])
node_fields = [
FieldSchema(name="node_index", dtype=DataType.INT64, is_primary=True),
FieldSchema(name="node_id", dtype=DataType.VARCHAR, max_length=1024),
FieldSchema(name="node_name", dtype=DataType.VARCHAR, max_length=1024,
enable_analyzer=True, enable_match=True),
FieldSchema(name="node_type", dtype=DataType.VARCHAR, max_length=1024,
enable_analyzer=True, enable_match=True),
FieldSchema(name="desc", dtype=DataType.VARCHAR, max_length=40960,
enable_analyzer=True, enable_match=True),
FieldSchema(name="desc_emb", dtype=DataType.FLOAT_VECTOR, dim=desc_emb_dim),
]
schema = CollectionSchema(fields=node_fields, description=f"Schema for collection {node_coll_name}")
# Create collection if it doesn't exist
if not utility.has_collection(node_coll_name):
collection = Collection(name=node_coll_name, schema=schema)
else:
collection = Collection(name=node_coll_name)
# Create indexes
collection.create_index(
field_name="node_index",
index_params={"index_type": "STL_SORT"}, # STL_SORT
index_name="node_index_index"
)
# Create index for node_name, node_type, desc fields (inverted)
collection.create_index(
field_name="node_name",
index_params={"index_type": "INVERTED"},
index_name="node_name_index"
)
collection.create_index(
field_name="node_type",
index_params={"index_type": "INVERTED"},
index_name="node_type_index"
)
collection.create_index(
field_name="desc",
index_params={"index_type": "INVERTED"},
index_name="desc_index"
)
collection.create_index(
field_name="desc_emb",
index_params={"index_type": "GPU_CAGRA", "metric_type": "IP"}, # AUTOINDEX
index_name="desc_emb_index"
)
# Prepare data for insertion
# Normalize the embeddings
graph_desc_emb_cp = cp.asarray(merged_nodes_df["desc_emb"].list.leaves).astype(cp.float32).reshape(merged_nodes_df.shape[0], -1)
graph_desc_emb_norm = normalize_matrix(graph_desc_emb_cp, axis=1)
data = [
merged_nodes_df["node_index"].to_arrow().to_pylist(),
merged_nodes_df["node_id"].to_arrow().to_pylist(),
merged_nodes_df["node_name"].to_arrow().to_pylist(),
merged_nodes_df["node_type"].to_arrow().to_pylist(),
merged_nodes_df["desc"].to_arrow().to_pylist(),
graph_desc_emb_norm.tolist(), # Use normalized embeddings
]
# Insert data in batches
batch_size = 500
total = len(data[0])
for i in tqdm(range(0, total, batch_size)):
batch = [
col[i:i+batch_size] for col in data
]
collection.insert(batch)
# Flush to persist data
collection.flush()
# Get collection stats
print(collection.num_entities)
100%|██████████| 6/6 [00:04<00:00, 1.46it/s]
2991 CPU times: user 1.3 s, sys: 137 ms, total: 1.44 s Wall time: 8.61 s
# List all collections
for coll in utility.list_collections():
print(f"Collection: {coll}")
# Load the collection to get stats
collection = Collection(name=coll)
print(collection.num_entities)
Collection: t2kg_primekg_nodes 2991
%%time
# Assume node_coll_name is defined and collection exists
collection = Collection(node_coll_name)
# Load the collection into memory before query
collection.load()
# Query by expr on node_index
expr = "node_index == 0"
output_fields = ["node_index", "node_id", "node_name", "node_type", "desc", "desc_emb"]
results = collection.query(expr, output_fields=output_fields)
print(results)
data: ['{\'node_index\': 0, \'node_id\': \'SMAD3_(144)\', \'node_name\': \'SMAD3\', \'node_type\': \'gene/protein\', \'desc\': "SMAD3 belongs to gene/protein node. SMAD3 is SMAD family member 3. The SMAD family of proteins are a group of intracellular signal transducer proteins similar to the gene products of the Drosophila gene \'mothers against decapentaplegic\' (Mad) and the C. elegans gene Sma. The SMAD3 protein functions in the transforming growth factor-beta signaling pathway, and transmits signals from the cell surface to the nucleus, regulating gene activity and cell proliferation. This protein forms a complex with other SMAD proteins and binds DNA, functioning both as a transcription factor and tumor suppressor. Mutations in this gene are associated with aneurysms-osteoarthritis syndrome and Loeys-Dietz Syndrome 3. [provided by RefSeq, May 2022].", \'desc_emb\': [-0.036991715, -0.005479036, -0.03023007, -0.012918158, -0.02741491, 0.025599528, -0.024547132, 0.011050156, -0.028256828, -0.0019304885, 0.034308106, 0.0019403548, 0.0078074615, 0.0057158247, -0.0024862853, 0.012885272, 0.021034762, -0.024205104, 0.01377323, 0.020732198, -0.008090293, 0.019416703, -0.02427088, -0.034781683, 0.008794083, 0.034229174, 0.02398147, -0.039701633, -0.015391288, -0.007932434, 0.020653268, 0.0048706196, -0.0089059, 0.021876678, -0.00043699093, 0.011405339, 0.004081323, -0.008070561, 0.012543242, -0.013056286, 0.019035209, 0.0015465285, 0.00945183, -0.014259963, -0.010320056, -0.0123327635, 0.0027411622, -0.032440104, -0.026822938, 0.010024071, -0.0035649908, 0.016035883, -0.031045677, -0.01640422, -0.0077022216, -0.0028102258, 0.016890954, 0.01015562, 0.00947814, -0.010214817, 0.00562374, -0.012477468, -0.029388152, -0.002177144, 0.007261531, 0.011102776, 2.7954264e-05, -0.004939683, -0.011997312, 0.022323947, 0.048673306, 0.03480799, -0.011359298, -0.02307378, -0.0044200625, -0.020074451, -0.03572884, 0.015246584, -0.036465514, -0.0035814345, 0.0044101966, -0.023560511, 0.0060479874, 0.005923015, 0.014154724, 0.0093137035, -0.0008501385, 0.0013985354, -0.0011683238, 0.0073536155, 0.016391065, 0.025652148, -0.0024994402, 0.0052093593, -0.028309448, 0.0027345847, 0.0020538163, 0.032098074, -0.0074062357, -0.015891178, -6.4387306e-05, 0.020942677, -0.015720163, -0.0017874285, -0.041595947, -0.020127071, -0.0057059587, -0.0016838333, 0.026651924, -0.025967866, -0.015141345, 0.04509516, -0.0013080952, -0.018561631, 0.01205651, -0.023968315, 0.029888041, 0.011517157, -0.007787729, 0.0044726823, 0.0065084104, 0.055513877, 0.013865314, -0.03586039, 0.015233429, -0.00037635484, -0.016154276, -0.01012931, -0.025191724, -0.0009619555, 0.014733542, 0.0066794246, 0.015259739, 0.0021738552, -0.012674792, 0.0047357813, -0.00069145695, -0.008063983, -0.01265506, -0.047252573, 0.013944245, 0.016022727, -0.03293999, -0.0070313197, -0.009649155, -0.004873908, 0.03733374, -0.00235638, -0.01645684, -0.0022790947, -0.004498992, -0.009951718, 0.009820169, -0.015036105, 0.014904556, 0.018009124, -0.009616267, 0.019798197, -0.014167879, 0.0046239644, -0.00830735, 0.028967194, -0.004364154, 0.0011321477, 0.010484493, 0.009701774, -0.011477692, -0.0063045085, -0.028914575, -0.0273886, -0.025507445, 0.025691614, -0.0077679968, 0.010392409, -0.023021158, 0.00010698672, -0.0021919433, 0.010905452, -0.018995745, -0.014615146, -0.017232982, -0.009300549, 0.028440997, -0.000105342355, -0.019469323, -0.0055086343, 0.011648706, -0.0064459243, -0.0093729, -0.0046371194, 0.0046239644, 0.011464537, -0.00958338, -0.036044557, -0.61417824, 0.0042095836, 0.009050604, -0.012780032, 0.004814711, 0.04322716, -0.004413485, -0.005663205, 0.0016205251, 0.001856492, -0.02307378, 0.03838614, 0.015088725, -0.022981694, -0.02741491, -0.03338726, -0.012043355, -0.026138881, 0.0475946, -0.016391065, -0.025704768, 0.009307126, -0.00065528083, 0.009497873, -0.004752225, 0.024783922, 0.009576802, -0.030914126, 0.01911414, 0.017193517, -0.017930195, 0.00689977, 0.021205775, 0.015207119, 0.039859492, -0.00042095833, -0.0050942535, 0.04038569, 0.015443908, 0.010931762, -0.004643697, -0.010438452, 0.017824955, 0.0050350563, -0.021679355, 0.031729735, 0.010767325, -0.006084163, -0.003374244, -0.026993953, 0.008623068, 0.007537785, -0.028098969, -0.028519927, 0.0030009726, 0.0072549535, 0.04062248, -0.01653577, 0.036255036, 0.0028316025, -0.008537562, 0.022021383, -0.0190089, 0.007978477, 0.018535322, -0.0147861615, -0.013299652, 0.00752463, -0.0016295691, 0.00684715, 0.012760299, 0.028914575, 0.006649826, 0.011004114, -0.00012281377, 0.05314599, 0.0023810456, 0.044358484, -0.00095537805, 0.011438227, 0.0075180526, -0.0022379856, -0.02254758, -0.016206896, 0.030940436, 0.025218034, -0.023363188, 0.038991265, 0.01879842, -0.0027016974, 0.018061744, -0.0078206165, -0.0034794838, -0.044963613, -0.0035485472, -0.015509684, -0.010274014, 0.030835196, -0.014404668, -0.02382361, -0.006462368, 0.010083267, 0.004278647, -0.017390842, 0.016325291, -0.003979372, -0.039517462, 0.015272894, 0.016825179, -0.017982814, -0.0040221256, -0.00045549005, 0.0037360052, -0.022113467, -0.028283138, -0.011254058, 0.0013960688, -0.010116155, 0.018680027, -0.029782802, 0.010668662, -0.02767801, 0.009978028, 0.010168775, 0.014154724, -0.0008600047, 0.0112014385, -0.010004338, 0.0032032297, -0.010379254, 0.005475747, -0.018403772, -0.005202782, -0.04230631, 0.017127743, -0.003119367, -0.001003887, 0.0045581893, 0.028177898, -0.02358682, -0.0059822127, -0.0039563505, 0.0029828844, -0.021653045, -0.015799092, -0.0028299582, -0.0093137035, 0.013076018, 0.008669111, -0.008649378, -0.018627407, 0.0070444746, 0.005452726, 0.0076890667, 0.020955833, -0.022718595, -0.01187234, -0.004804845, -0.014299428, 0.0025158839, -0.01932462, 0.03020376, -0.02799373, -0.012609017, -0.013444357, -0.03025638, -0.016246362, 0.019429859, -0.00700501, -0.02640198, -0.009787281, -0.021008452, -0.026849248, 0.006991855, 0.0056697824, 0.016548924, 0.0190089, -0.007708799, -0.003699829, 0.0023974893, -0.009182154, -0.029598633, -0.0142204985, -0.010398987, 0.05259348, 0.004301668, 0.021442564, -0.0025356163, 0.0117736785, 0.014878246, -0.031913906, 0.020784818, 0.0010425296, 0.012385383, 0.0017117875, -0.020245465, 0.03520264, -0.0028414687, -0.01929831, 0.016246362, -0.005028479, -0.029282913, 0.0015095302, -0.0019518654, 0.0008912477, -0.03312416, 0.0006778909, -0.013615371, -0.012477468, 0.0006413037, -0.0036208993, 0.005061366, 0.0014363559, 0.014575682, 0.004002393, 0.019942902, -0.00015487896, -0.021771438, -0.0027724053, -0.013496976, -0.008346815, -0.0010556846, 0.018522168, -0.01898259, 0.006965545, 0.018351153, 0.019798197, 0.02799373, 0.0039760834, -0.040096283, -0.010188507, 0.0075312075, 0.014549372, 0.016391065, -0.012891849, -0.0050679436, -0.005192916, -0.01664101, 0.031677116, -0.02252127, -0.013628526, 0.017206673, 0.01887735, -0.0075312075, -0.006064431, -0.017982814, 0.03830721, -0.0028677785, 0.009629422, 0.012885272, -0.015522839, 0.010530536, -0.014509907, -0.006123628, 0.01181972, -0.02637567, 0.00040739228, 0.026691388, 0.055408638, 0.014430977, 0.028704096, -0.01312206, 0.015430753, 0.0075706723, 0.01986397, 0.034492273, 0.0032903813, -0.014062639, 0.034071315, -0.011293523, -0.029177673, 0.01898259, -0.011661861, -0.043937527, 0.007952167, -0.00044932368, 0.013589061, 0.02041648, -0.016733095, 0.02523119, -0.012977356, -0.044595275, 0.02366575, 0.024744457, -0.027704319, 0.0049857255, -0.02398147, -0.006413037, -0.00078518596, 0.029361842, -0.00656103, 0.03304523, 0.018482702, 0.022310792, 0.0030404374, 0.0053573526, 0.012780032, 0.0049331053, 0.021192621, 0.0046502743, 0.0024566865, 0.0075904047, -0.0014766429, -0.016233206, 0.038123038, 0.002200165, 0.0026605881, -0.0011617463, -0.022218706, -0.008695421, -0.0024484647, -0.036623377, -0.009682042, -0.014641456, -0.0019173336, -0.0010153976, 0.0044693938, 0.0076167146, 0.015522839, 0.0044496614, -0.018048588, -0.04283251, -0.00042465815, 0.014615146, 0.0043444214, 0.046068627, -0.0044332175, -0.0004195195, 0.00017471415, 0.0042589144, -0.02744122, 0.0075838272, 0.0190089, -0.01018193, -0.023455271, -0.0011642129, 0.015430753, -0.008899323, -0.0045351684, 0.022744905, 0.0032755819, -0.010201662, 0.014799316, -0.005189627, 0.0014240231, -0.0068931924, 0.019206224, 0.0356236, 0.03491323, 0.0005286645, -0.019232534, 0.028783025, 0.024073554, -0.022560736, -0.003923463, 0.021311015, 0.000589095, 0.018114364, -0.013247033, -0.008583603, 0.015733318, 0.02650722, -0.021718819, 0.023652596, 0.023008004, 0.016325291, 0.024770766, 0.0029713737, -0.02512595, -0.00689977, -0.017127743, -0.0139574, 0.006172959, -0.009846479, 0.00016299803, -0.041806426, 0.0020965699, -0.022836989, 0.04541088, -0.041332845, -0.0034202863, -0.017969659, 0.002338292, -0.026691388, -0.010602888, -0.011832875, 0.0030782577, -0.005380374, -0.00035189485, -0.014259963, 0.012872117, 0.005867107, -0.019969212, -0.0106423525, 0.009682042, -0.00749832, -0.03372929, -0.0073536155, 0.009109802, 0.026651924, 0.020192845, -0.008379702, -0.0069852774, 0.039280675, 0.00040903664, -0.033755597, 0.0043049566, -0.019956056, -0.0013730477, 0.015391288, 0.008419167, -0.0027230743, -0.013944245, -0.0007716199, 0.011602664, 0.007379926, -0.008031096, -0.020771664, -0.008583603, -0.021666199, 0.031861287, 0.0021672777, 0.0042589144, -0.0072944183, 0.001271919, -0.030466858, -0.028204208, -0.013003666, -0.0064064595, -0.004265492, 0.013450934, 2.1107011e-05, -0.011267213, -0.012405116, -0.010149042, 0.0113921845, 0.011096198, -0.017561857, 0.017114587, 0.017943349, 0.013220723, 0.0056599164, -0.0004850887, -0.006038121, -0.012786609, -0.009945141, 0.0029664407, 0.010662085, -0.019469323, 0.017824955, -0.0024287323, -0.01265506, -0.016009573, 0.008175801, 0.020455943, 0.039201744, 0.008701998, -0.012161749, -0.009484718, 0.00028365356, -0.016890954, -0.0054297047, -0.0078206165, -0.013891624, -0.0019387105, 0.01023455, 0.011642129, -0.017588167, -0.0038083573, -0.021876678, -0.028414687, 0.0039596395, 0.027151812, 0.0119578475, -0.010241127, -0.011688171, -0.00269512, -0.010918607, -0.010859409, -0.046963163, -0.0057816, 0.0005315421, 0.028625166, 0.034413345, 0.02158727, 0.0006108829, 0.02270544, 0.023350032, 0.010958072, 0.0052685565, -0.0015021306, 0.0053869514, -0.01653577, 0.014549372, 0.0045450344, 0.0067419107, 0.023047468, -0.005636895, 0.023336878, 0.017009348, -0.0026211233, -0.008484942, 0.044148006, -0.03825459, -0.010379254, 0.007379926, 0.0076298695, 0.0089059, -0.025941556, -0.009958296, 0.008169223, 0.028493617, -0.009122957, 0.0043477104, 0.014115259, -0.022258172, 0.041359156, -0.0069852774, 0.015641233, 0.000544286, -0.0042194496, 0.0036307655, -0.016759405, 0.004489126, -0.009215041, 0.02028493, -0.0015341957, 0.015391288, -0.0045088585, 0.016746249, -0.024047244, -0.032308552, -0.011655284, -0.011497424, -0.012720834, -0.041517016, -0.02377099, -0.03864924, 0.005314599, 0.024599751, -0.009760971, 0.021389944, 0.008550717, -0.006659692, 0.029309222, 0.00593617, 0.034097627, -0.01656208, 0.026638769, 0.019706111, 0.014628301, -0.017127743, -0.0067550656, -0.0023744681, -0.00026433225, 0.0214031, -0.0028957329, -0.012582707, 0.006100607, -0.004189851, -0.007149714, -0.015614923, -0.010576578, 0.018758956, -0.010372677, -0.003673519, 0.0030108388, 0.013812695, 0.00828104, 0.00089453644, -0.0038379559, -0.00044726822, -0.012431426, -0.012398538, -0.016127966, 0.024626061, 0.009550492, 0.021074226, -0.0029368422, -0.007833771, 0.0102674365, -0.015838558, 0.011089621, -0.002168922, 0.012569552, 0.041359156, 0.028572546, -0.013358849, 0.0066958684, 0.017614476, -0.00035847232, -0.010372677, -0.005278423, 0.034439653, -0.009280816, -0.0151676545, 0.010523958, -0.0100635355, -0.0042062947, 0.0120302, -0.0025504155, -0.016825179, -0.03559729, 0.006024966, 0.021863524, -0.0025471267, -0.009931985, 0.019416703, -0.029546013, -0.02392885, -0.03262427, -0.013016821, 0.012115707, -0.061038956, -0.0049100844, -0.0039201747, 0.017614476, 0.019535098, 0.029361842, -0.025375893, 0.020100761, 0.03304523, -0.020666422, 0.0043345555, -0.0025833028, -0.0055316556, -0.0146019915, 0.020061295, -0.01871949, -0.04270096, 0.008609913, 0.011984157, -0.015404443, -0.0040352806, -0.0042918017, -0.003020705, -0.005548099, -0.03333464, 0.03567622, -0.021600425, 0.009168999, -0.032492723, -0.010188507, 0.017509237, -0.019403549, 0.0093729, -0.005840797, 0.017298756, -0.0031390993, 0.012516933, -0.026717698, 0.0087546185, -0.0004637119, 0.002780627, -0.01401002, 0.018259067, -0.000803274, 0.022823835, -0.0031621205, 0.009905675, -0.009215041, 0.010662085, 0.013141793, -0.014536217, -0.015930643, 0.019285154, -0.029598633, 0.0016698561, -0.024257723, -0.004949549, 0.0009890876, -0.031729735, -0.03023007, 0.0012488979, -0.039385915, 0.006442636, 0.016443685, 0.002861201, -0.012556397, 0.002039017, -0.03299261, -0.02153465, -0.017903885, -0.0006992677, -0.0028628455, 0.035228953, -0.014588837, 0.007176024, -0.0066399598, -0.0001528235, -0.019337773, -0.008116603, 0.0059427475, -0.021389944, 0.0030108388, 0.006630094, -0.021981917, -0.015904333, 0.024599751, 0.018219603, -0.008287617, 0.01932462, -0.052567173, 0.019614028, -0.024231413, 0.006149938, -0.025494289, 0.008715153, 0.021284705, -0.027099192, -0.0050712326, 0.002155767, 0.009024294, -0.014246808, 0.010912029, -0.00042794688, 0.00661365, -0.004015548, 0.024152484, -0.008320505, 0.0020110628, 0.0190089, 0.025757387, 0.028388377, 0.006712312, 0.025757387, 0.021140002, -0.0013286497, -0.004587788, -0.016496304, -0.0045187245, -0.010662085, -0.0091361115, 0.0043477104, -0.025375893, -0.027257051, -0.017061967, -0.032334864, -0.005137007, -0.012661637, -0.00079011905, -0.025270654, 0.014694076, -0.00825473, -0.02033755, -0.0017348087, 0.000118291755, -0.022508116, -0.017088277, -0.005969058, 0.009991183, -0.012010467, -0.023165863, 0.004324689, -0.008123181, 0.002530683, 0.0006540476, -0.007945589, -0.015062415, 0.0025191726, 0.20111284, -0.0038642657, -0.0146019915, 0.014299428, 0.02509964, 0.036176108, 0.038780786, -0.0034400187, -0.004949549, 0.0016608122, 0.011602664, 0.011885495, 0.006965545, 0.0011617463, 0.0008213621, -0.016785715, -0.026415136, 0.0013154948, -0.030756267, -0.01914045, 0.002538905, -0.0053113103, -0.024336653, -0.0042490484, 0.013589061, 0.009432098, -0.04072772, -0.0032525607, 0.025507445, 0.015338669, -0.02523119, -0.00052003155, 0.018732646, -0.003926752, 0.004101055, 0.001003887, -0.009412365, -0.020758508, 0.019732421, 0.02358682, -0.008583603, 0.004939683, -0.014312583, -0.010037226, 0.011227748, 0.008951942, -0.0029351977, 0.008465209, -0.020140225, 0.002690187, -0.017325066, 0.010214817, -0.0044101966, 0.014444132, -0.008215265, 0.00016022315, 0.014115259, 0.016838335, 0.013286497, 0.010379254, -0.017995968, -0.011332988, -0.03309785, 0.014930866, -0.0072681084, 0.013233878, -0.006314375, 0.026033642, 0.022981694, -0.0452004, -0.0035584134, -0.01265506, -0.010938339, -0.011004114, -0.024481358, 0.0065511637, 0.05033083, 0.002900666, 0.019127294, 0.018232757, -0.008550717, -0.022100313, -0.028809335, 0.016575234, 0.004873908, -0.011076466, -0.0024731301, -0.007721954, 0.00066021393, -0.025033865, 0.0095110275, -0.0132799195, -0.0009759327, 0.0061466494, 0.006186114, 0.007123404, 0.011175129, 0.0015958596, 0.008169223, -0.006084163, -0.00596248, 0.06972122, 0.004189851, 0.015825402, 0.005248824, 0.017101433, -0.004804845, 0.012701102, 0.008070561, -0.03357143, 0.004239182, -0.026309896, -0.011431649, -0.00013802419, 0.013233878, 0.011543467, 0.029493393, -0.032071766, 0.015246584, -0.009011139, 0.004301668, -0.0102608595, -0.0055579655, 0.023573667, 0.0034169976, 0.015457063, -0.0065544527, -0.017798645, 0.00687346, -0.025770543, -0.0008830259, 0.00117819, 0.0004480904, 0.01382585, 0.016627854, -0.020705888, 0.008432322, 0.018548477, -0.003400554, 0.026257277, -0.022455497, 0.0023794011, 0.021705665, 0.0068076854, -0.0044101966, -0.007024742, 0.020995297, 0.02377099, -0.013733765, -0.009497873, -0.024441892, -0.00833366, 0.010024071, -0.011471114, 0.0112014385, -0.019311463, -0.004890352, -0.012911581, 0.010688395, 0.004337844, -0.024639217, 0.011530312, 0.008221842, -0.017982814, -0.012365651, -0.0012414982, -0.16417375, 0.006186114, 0.0166147, -0.034939542, 0.005564543, 0.024941782, 0.03343988, 0.0060151, 0.0089059, 0.0005113986, 0.03754422, 0.022218706, -0.023915695, -0.0020044853, -0.0061795367, 0.024284033, 0.0151676545, 0.045963388, -0.0050975424, 0.023021158, 0.002637567, -0.015825402, 0.011425072, -0.011497424, 0.008728308, 0.016259516, 0.013286497, -0.036333967, -0.031124607, 0.01135272, -0.0019469323, 0.0051304298, 0.023284258, 0.0006281488, 0.007176024, 0.01020824, 0.0073930807, 9.645249e-05, -0.016022727, 0.024560288, 0.0023826899, 0.030440548, -0.0050087464, 0.009767549, -0.016417375, 0.03017745, 0.005426416, -0.0010819945, 0.027204432, -0.0073601934, 0.02523119, -0.013760075, 0.0027526729, 0.037281122, 0.039596394, 0.019508788, -0.0023662462, 0.021126846, -0.0069852774, -0.024323499, -0.020377014, -0.017456615, 0.031966526, -0.01012931, 0.0056566275, -0.020034986, -0.017206673, 0.032334864, -0.02799373, 0.016785715, -0.029703872, 0.0049692816, 0.0030305712, -0.010958072, 0.004110921, 0.0113001, -0.023389498, 0.014352048, -0.00049906585, 0.0010244416, -0.013418047, 0.00440033, -0.031598184, -0.012747144, 0.002713208, 0.0071957563, 0.0117736785, 0.007051052, 0.004485837, -0.0144178225, 0.016706785, -0.007202334, 0.012477468, -0.016075347, 0.026559839, 0.011155396, -0.018680027, -0.018180138, -0.023257948, -0.006238734, 0.019127294, -0.0123261865, -0.009076914, 0.016601544, 0.011596086, 0.016825179, -0.022350257, 0.014207344, 0.040701408, -0.0136548355, 0.0042194496, -0.017561857, 0.013424624, 0.005663205, -0.0013228945, 0.026086261, -0.006001945, -0.010043803, 0.014812471, -0.012497201, 0.06745858, 0.025586374, 0.01986397, 0.0151545, -0.017219827, -0.02310009, -0.10839677, -0.01892997, 0.008438899, 0.03309785, -0.049409986, -0.004137231, -0.0085638715, 0.033860836, -0.014865091, 0.029730182, -0.028993504, -0.029203983, 0.0018532033, -0.008991407, 0.0048936405, -0.010135887, 0.01760132, -0.007248376, -0.017482925, 0.03830721, -0.011734214, -0.016035883, 0.016114812, -0.007136559, 0.020587493, 0.016851489, -0.03333464, 0.0062354454, -3.029235e-05, 0.0073141507, -0.0031703424, -0.0300459, 0.04848914, 0.01650946, -0.007721954, -0.022416031, 0.009655732, -0.02145572, 0.04788401, 0.0008805593, 0.029177673, 0.0002542605, -0.011865763, -0.036886476, -0.0078206165, -0.014391513, -0.039649013, 0.031492945, 0.022639666, -0.016009573, -0.033992387, -0.0013171391, -0.027730629, -0.005038345, 0.028467307, -0.009741239, 0.031545565, 0.0047226264, -0.01124748, 0.007149714, -0.009997761, 0.012457736, 0.013247033, -0.0071562915, 0.005334331, -0.029256603, -0.014352048, -0.056829374, 0.054303624, -0.01869318, 0.008767773, -0.006965545, -0.014575682, -0.01320099, -0.027072882, 0.027730629, -0.011411917, -0.025560064, 0.014457287, -0.038780786, -0.004716049, -0.01502295, -0.0041701184, -0.017338222, 0.018864196, 0.019403549, 0.008846703, -0.013878469, 0.023876231, -0.037412673, 0.012457736, 0.03025638, -0.001290007, 0.00040410354, -0.01393109, 0.012017045, -0.007873236, -0.0064064595, 0.0166147, 0.033308327, -0.030808887, -0.033360947, -0.06577474, 0.013339117, 0.0007350327, 0.0065676076, 0.027309671, -0.026888713, 0.021705665, -0.008524407, 0.0048245774, -0.010885719, 0.0020094183, -0.004341133, -0.0155359935, -0.0038576883, -0.009649155, -0.014049484, 0.03849138, -0.009333435, 0.030703647, 0.03280844, 0.0035945894, -0.012115707, 0.01382585, 0.039649013, -0.0029565745, -0.0069063473, -0.00010914496, 0.007892969, 0.0015687275, 0.023521047, -0.014904556, -0.037096955, -0.011457959, 0.005627029, -0.018153828, -0.039649013, -0.026783474, -0.003400554, 0.019824507, 0.028230518, -0.0031341664, -0.04764722, 0.024836542, -0.012490623, 0.018364308, 0.003288737, -0.024205104, -0.009708351, -0.0057026697, -0.009793859, 0.031545565, 0.008846703, 0.009392633, -0.025560064, -0.022376567, -0.012964201, 0.015878022, -6.8189904e-05, -0.022810679, -0.003824801, 0.02733598, 0.025967866, 0.010570001, 0.012102552, 0.0011995669, -0.008484942, -0.024928626, 0.02025862, 0.0050515, -0.015496528, -0.017193517, -0.005913149, 0.0079982085, 0.011359298, 0.0038379559, 0.0078206165, -0.028967194, -0.024481358, -0.024573442, 0.028256828, 0.029230293, 0.0017035657, -0.008471787, 0.028914575, 0.014838781, -0.0065347203, -0.032229625, 0.014299428, 0.016088502, 0.006429481, -0.014233653, 0.041464396, 0.007912701, -0.0076035596, 0.009017717, 0.023402652, 0.017509237, 0.016917264, 0.01312206, 0.036255036, 0.001963376, -0.003660364, -0.015601768, 0.001428134, -0.022245016, 0.009201886, -0.022008227, -0.04814711, -0.0056072967, -0.004390464, 0.012352496, -0.014667766, 0.0041569634, 0.0071299816, -0.003877421, 0.013214145, 0.024797076, -0.018811576, -0.011063311, 0.025691614, 0.012668215, 0.0027362292, -0.020245465, -0.00051386515, 0.031834975, -0.000110378234, 0.033913456, -0.04327978, 0.020995297, -0.032150693, -0.025849473, 0.00039074305, -0.034281794, 0.0001238415, 0.009563647, -0.008616491, -0.0067188893, 0.011069888, -0.001682189, 0.083086655, 0.0070181647, -0.008412589, -0.022692285, -0.011115931, 0.039070196, 0.007860081, 0.007886391, 0.00033257352, -0.02653353, -0.007425968, -0.03491323, -0.0069852774, -0.016049037, -0.019272, 0.041517016, -0.015956953, 0.023942005, 0.0043214005, 0.0005685404, 0.009215041, -0.016733095, -0.020021832, 0.0004320578, -0.036228728, 0.0004008148, 0.023376342, 0.005814487, 0.011319833, -0.037070643, -0.012253834, 0.0035814345, -0.046831615, -0.009800436, 0.0009290682, -0.018640561, 0.01929831, -0.024468202, 0.0030009726, 0.026691388, 0.010221395, -0.0022412743, -0.0020982143, -0.02759908, 0.01671994, -0.0028496906, -0.0021804327, -0.004265492, -0.006761643]}'] CPU times: user 11.4 ms, sys: 2.55 ms, total: 13.9 ms Wall time: 1.49 s
%%time
# Assume node_coll_name is defined and collection exists
collection = Collection(node_coll_name)
# Load the collection into memory before query
collection.load()
# Query by expr on node_index
expr = "node_index in [0, 1]"
output_fields = ["node_index", "node_id", "node_name", "node_type", "desc", "desc_emb"]
results = collection.query(expr, output_fields=output_fields)
print(results)
data: ['{\'node_name\': \'SMAD3\', \'node_type\': \'gene/protein\', \'desc\': "SMAD3 belongs to gene/protein node. SMAD3 is SMAD family member 3. The SMAD family of proteins are a group of intracellular signal transducer proteins similar to the gene products of the Drosophila gene \'mothers against decapentaplegic\' (Mad) and the C. elegans gene Sma. The SMAD3 protein functions in the transforming growth factor-beta signaling pathway, and transmits signals from the cell surface to the nucleus, regulating gene activity and cell proliferation. This protein forms a complex with other SMAD proteins and binds DNA, functioning both as a transcription factor and tumor suppressor. Mutations in this gene are associated with aneurysms-osteoarthritis syndrome and Loeys-Dietz Syndrome 3. [provided by RefSeq, May 2022].", \'desc_emb\': [-0.036991715, -0.005479036, -0.03023007, -0.012918158, -0.02741491, 0.025599528, -0.024547132, 0.011050156, -0.028256828, -0.0019304885, 0.034308106, 0.0019403548, 0.0078074615, 0.0057158247, -0.0024862853, 0.012885272, 0.021034762, -0.024205104, 0.01377323, 0.020732198, -0.008090293, 0.019416703, -0.02427088, -0.034781683, 0.008794083, 0.034229174, 0.02398147, -0.039701633, -0.015391288, -0.007932434, 0.020653268, 0.0048706196, -0.0089059, 0.021876678, -0.00043699093, 0.011405339, 0.004081323, -0.008070561, 0.012543242, -0.013056286, 0.019035209, 0.0015465285, 0.00945183, -0.014259963, -0.010320056, -0.0123327635, 0.0027411622, -0.032440104, -0.026822938, 0.010024071, -0.0035649908, 0.016035883, -0.031045677, -0.01640422, -0.0077022216, -0.0028102258, 0.016890954, 0.01015562, 0.00947814, -0.010214817, 0.00562374, -0.012477468, -0.029388152, -0.002177144, 0.007261531, 0.011102776, 2.7954264e-05, -0.004939683, -0.011997312, 0.022323947, 0.048673306, 0.03480799, -0.011359298, -0.02307378, -0.0044200625, -0.020074451, -0.03572884, 0.015246584, -0.036465514, -0.0035814345, 0.0044101966, -0.023560511, 0.0060479874, 0.005923015, 0.014154724, 0.0093137035, -0.0008501385, 0.0013985354, -0.0011683238, 0.0073536155, 0.016391065, 0.025652148, -0.0024994402, 0.0052093593, -0.028309448, 0.0027345847, 0.0020538163, 0.032098074, -0.0074062357, -0.015891178, -6.4387306e-05, 0.020942677, -0.015720163, -0.0017874285, -0.041595947, -0.020127071, -0.0057059587, -0.0016838333, 0.026651924, -0.025967866, -0.015141345, 0.04509516, -0.0013080952, -0.018561631, 0.01205651, -0.023968315, 0.029888041, 0.011517157, -0.007787729, 0.0044726823, 0.0065084104, 0.055513877, 0.013865314, -0.03586039, 0.015233429, -0.00037635484, -0.016154276, -0.01012931, -0.025191724, -0.0009619555, 0.014733542, 0.0066794246, 0.015259739, 0.0021738552, -0.012674792, 0.0047357813, -0.00069145695, -0.008063983, -0.01265506, -0.047252573, 0.013944245, 0.016022727, -0.03293999, -0.0070313197, -0.009649155, -0.004873908, 0.03733374, -0.00235638, -0.01645684, -0.0022790947, -0.004498992, -0.009951718, 0.009820169, -0.015036105, 0.014904556, 0.018009124, -0.009616267, 0.019798197, -0.014167879, 0.0046239644, -0.00830735, 0.028967194, -0.004364154, 0.0011321477, 0.010484493, 0.009701774, -0.011477692, -0.0063045085, -0.028914575, -0.0273886, -0.025507445, 0.025691614, -0.0077679968, 0.010392409, -0.023021158, 0.00010698672, -0.0021919433, 0.010905452, -0.018995745, -0.014615146, -0.017232982, -0.009300549, 0.028440997, -0.000105342355, -0.019469323, -0.0055086343, 0.011648706, -0.0064459243, -0.0093729, -0.0046371194, 0.0046239644, 0.011464537, -0.00958338, -0.036044557, -0.61417824, 0.0042095836, 0.009050604, -0.012780032, 0.004814711, 0.04322716, -0.004413485, -0.005663205, 0.0016205251, 0.001856492, -0.02307378, 0.03838614, 0.015088725, -0.022981694, -0.02741491, -0.03338726, -0.012043355, -0.026138881, 0.0475946, -0.016391065, -0.025704768, 0.009307126, -0.00065528083, 0.009497873, -0.004752225, 0.024783922, 0.009576802, -0.030914126, 0.01911414, 0.017193517, -0.017930195, 0.00689977, 0.021205775, 0.015207119, 0.039859492, -0.00042095833, -0.0050942535, 0.04038569, 0.015443908, 0.010931762, -0.004643697, -0.010438452, 0.017824955, 0.0050350563, -0.021679355, 0.031729735, 0.010767325, -0.006084163, -0.003374244, -0.026993953, 0.008623068, 0.007537785, -0.028098969, -0.028519927, 0.0030009726, 0.0072549535, 0.04062248, -0.01653577, 0.036255036, 0.0028316025, -0.008537562, 0.022021383, -0.0190089, 0.007978477, 0.018535322, -0.0147861615, -0.013299652, 0.00752463, -0.0016295691, 0.00684715, 0.012760299, 0.028914575, 0.006649826, 0.011004114, -0.00012281377, 0.05314599, 0.0023810456, 0.044358484, -0.00095537805, 0.011438227, 0.0075180526, -0.0022379856, -0.02254758, -0.016206896, 0.030940436, 0.025218034, -0.023363188, 0.038991265, 0.01879842, -0.0027016974, 0.018061744, -0.0078206165, -0.0034794838, -0.044963613, -0.0035485472, -0.015509684, -0.010274014, 0.030835196, -0.014404668, -0.02382361, -0.006462368, 0.010083267, 0.004278647, -0.017390842, 0.016325291, -0.003979372, -0.039517462, 0.015272894, 0.016825179, -0.017982814, -0.0040221256, -0.00045549005, 0.0037360052, -0.022113467, -0.028283138, -0.011254058, 0.0013960688, -0.010116155, 0.018680027, -0.029782802, 0.010668662, -0.02767801, 0.009978028, 0.010168775, 0.014154724, -0.0008600047, 0.0112014385, -0.010004338, 0.0032032297, -0.010379254, 0.005475747, -0.018403772, -0.005202782, -0.04230631, 0.017127743, -0.003119367, -0.001003887, 0.0045581893, 0.028177898, -0.02358682, -0.0059822127, -0.0039563505, 0.0029828844, -0.021653045, -0.015799092, -0.0028299582, -0.0093137035, 0.013076018, 0.008669111, -0.008649378, -0.018627407, 0.0070444746, 0.005452726, 0.0076890667, 0.020955833, -0.022718595, -0.01187234, -0.004804845, -0.014299428, 0.0025158839, -0.01932462, 0.03020376, -0.02799373, -0.012609017, -0.013444357, -0.03025638, -0.016246362, 0.019429859, -0.00700501, -0.02640198, -0.009787281, -0.021008452, -0.026849248, 0.006991855, 0.0056697824, 0.016548924, 0.0190089, -0.007708799, -0.003699829, 0.0023974893, -0.009182154, -0.029598633, -0.0142204985, -0.010398987, 0.05259348, 0.004301668, 0.021442564, -0.0025356163, 0.0117736785, 0.014878246, -0.031913906, 0.020784818, 0.0010425296, 0.012385383, 0.0017117875, -0.020245465, 0.03520264, -0.0028414687, -0.01929831, 0.016246362, -0.005028479, -0.029282913, 0.0015095302, -0.0019518654, 0.0008912477, -0.03312416, 0.0006778909, -0.013615371, -0.012477468, 0.0006413037, -0.0036208993, 0.005061366, 0.0014363559, 0.014575682, 0.004002393, 0.019942902, -0.00015487896, -0.021771438, -0.0027724053, -0.013496976, -0.008346815, -0.0010556846, 0.018522168, -0.01898259, 0.006965545, 0.018351153, 0.019798197, 0.02799373, 0.0039760834, -0.040096283, -0.010188507, 0.0075312075, 0.014549372, 0.016391065, -0.012891849, -0.0050679436, -0.005192916, -0.01664101, 0.031677116, -0.02252127, -0.013628526, 0.017206673, 0.01887735, -0.0075312075, -0.006064431, -0.017982814, 0.03830721, -0.0028677785, 0.009629422, 0.012885272, -0.015522839, 0.010530536, -0.014509907, -0.006123628, 0.01181972, -0.02637567, 0.00040739228, 0.026691388, 0.055408638, 0.014430977, 0.028704096, -0.01312206, 0.015430753, 0.0075706723, 0.01986397, 0.034492273, 0.0032903813, -0.014062639, 0.034071315, -0.011293523, -0.029177673, 0.01898259, -0.011661861, -0.043937527, 0.007952167, -0.00044932368, 0.013589061, 0.02041648, -0.016733095, 0.02523119, -0.012977356, -0.044595275, 0.02366575, 0.024744457, -0.027704319, 0.0049857255, -0.02398147, -0.006413037, -0.00078518596, 0.029361842, -0.00656103, 0.03304523, 0.018482702, 0.022310792, 0.0030404374, 0.0053573526, 0.012780032, 0.0049331053, 0.021192621, 0.0046502743, 0.0024566865, 0.0075904047, -0.0014766429, -0.016233206, 0.038123038, 0.002200165, 0.0026605881, -0.0011617463, -0.022218706, -0.008695421, -0.0024484647, -0.036623377, -0.009682042, -0.014641456, -0.0019173336, -0.0010153976, 0.0044693938, 0.0076167146, 0.015522839, 0.0044496614, -0.018048588, -0.04283251, -0.00042465815, 0.014615146, 0.0043444214, 0.046068627, -0.0044332175, -0.0004195195, 0.00017471415, 0.0042589144, -0.02744122, 0.0075838272, 0.0190089, -0.01018193, -0.023455271, -0.0011642129, 0.015430753, -0.008899323, -0.0045351684, 0.022744905, 0.0032755819, -0.010201662, 0.014799316, -0.005189627, 0.0014240231, -0.0068931924, 0.019206224, 0.0356236, 0.03491323, 0.0005286645, -0.019232534, 0.028783025, 0.024073554, -0.022560736, -0.003923463, 0.021311015, 0.000589095, 0.018114364, -0.013247033, -0.008583603, 0.015733318, 0.02650722, -0.021718819, 0.023652596, 0.023008004, 0.016325291, 0.024770766, 0.0029713737, -0.02512595, -0.00689977, -0.017127743, -0.0139574, 0.006172959, -0.009846479, 0.00016299803, -0.041806426, 0.0020965699, -0.022836989, 0.04541088, -0.041332845, -0.0034202863, -0.017969659, 0.002338292, -0.026691388, -0.010602888, -0.011832875, 0.0030782577, -0.005380374, -0.00035189485, -0.014259963, 0.012872117, 0.005867107, -0.019969212, -0.0106423525, 0.009682042, -0.00749832, -0.03372929, -0.0073536155, 0.009109802, 0.026651924, 0.020192845, -0.008379702, -0.0069852774, 0.039280675, 0.00040903664, -0.033755597, 0.0043049566, -0.019956056, -0.0013730477, 0.015391288, 0.008419167, -0.0027230743, -0.013944245, -0.0007716199, 0.011602664, 0.007379926, -0.008031096, -0.020771664, -0.008583603, -0.021666199, 0.031861287, 0.0021672777, 0.0042589144, -0.0072944183, 0.001271919, -0.030466858, -0.028204208, -0.013003666, -0.0064064595, -0.004265492, 0.013450934, 2.1107011e-05, -0.011267213, -0.012405116, -0.010149042, 0.0113921845, 0.011096198, -0.017561857, 0.017114587, 0.017943349, 0.013220723, 0.0056599164, -0.0004850887, -0.006038121, -0.012786609, -0.009945141, 0.0029664407, 0.010662085, -0.019469323, 0.017824955, -0.0024287323, -0.01265506, -0.016009573, 0.008175801, 0.020455943, 0.039201744, 0.008701998, -0.012161749, -0.009484718, 0.00028365356, -0.016890954, -0.0054297047, -0.0078206165, -0.013891624, -0.0019387105, 0.01023455, 0.011642129, -0.017588167, -0.0038083573, -0.021876678, -0.028414687, 0.0039596395, 0.027151812, 0.0119578475, -0.010241127, -0.011688171, -0.00269512, -0.010918607, -0.010859409, -0.046963163, -0.0057816, 0.0005315421, 0.028625166, 0.034413345, 0.02158727, 0.0006108829, 0.02270544, 0.023350032, 0.010958072, 0.0052685565, -0.0015021306, 0.0053869514, -0.01653577, 0.014549372, 0.0045450344, 0.0067419107, 0.023047468, -0.005636895, 0.023336878, 0.017009348, -0.0026211233, -0.008484942, 0.044148006, -0.03825459, -0.010379254, 0.007379926, 0.0076298695, 0.0089059, -0.025941556, -0.009958296, 0.008169223, 0.028493617, -0.009122957, 0.0043477104, 0.014115259, -0.022258172, 0.041359156, -0.0069852774, 0.015641233, 0.000544286, -0.0042194496, 0.0036307655, -0.016759405, 0.004489126, -0.009215041, 0.02028493, -0.0015341957, 0.015391288, -0.0045088585, 0.016746249, -0.024047244, -0.032308552, -0.011655284, -0.011497424, -0.012720834, -0.041517016, -0.02377099, -0.03864924, 0.005314599, 0.024599751, -0.009760971, 0.021389944, 0.008550717, -0.006659692, 0.029309222, 0.00593617, 0.034097627, -0.01656208, 0.026638769, 0.019706111, 0.014628301, -0.017127743, -0.0067550656, -0.0023744681, -0.00026433225, 0.0214031, -0.0028957329, -0.012582707, 0.006100607, -0.004189851, -0.007149714, -0.015614923, -0.010576578, 0.018758956, -0.010372677, -0.003673519, 0.0030108388, 0.013812695, 0.00828104, 0.00089453644, -0.0038379559, -0.00044726822, -0.012431426, -0.012398538, -0.016127966, 0.024626061, 0.009550492, 0.021074226, -0.0029368422, -0.007833771, 0.0102674365, -0.015838558, 0.011089621, -0.002168922, 0.012569552, 0.041359156, 0.028572546, -0.013358849, 0.0066958684, 0.017614476, -0.00035847232, -0.010372677, -0.005278423, 0.034439653, -0.009280816, -0.0151676545, 0.010523958, -0.0100635355, -0.0042062947, 0.0120302, -0.0025504155, -0.016825179, -0.03559729, 0.006024966, 0.021863524, -0.0025471267, -0.009931985, 0.019416703, -0.029546013, -0.02392885, -0.03262427, -0.013016821, 0.012115707, -0.061038956, -0.0049100844, -0.0039201747, 0.017614476, 0.019535098, 0.029361842, -0.025375893, 0.020100761, 0.03304523, -0.020666422, 0.0043345555, -0.0025833028, -0.0055316556, -0.0146019915, 0.020061295, -0.01871949, -0.04270096, 0.008609913, 0.011984157, -0.015404443, -0.0040352806, -0.0042918017, -0.003020705, -0.005548099, -0.03333464, 0.03567622, -0.021600425, 0.009168999, -0.032492723, -0.010188507, 0.017509237, -0.019403549, 0.0093729, -0.005840797, 0.017298756, -0.0031390993, 0.012516933, -0.026717698, 0.0087546185, -0.0004637119, 0.002780627, -0.01401002, 0.018259067, -0.000803274, 0.022823835, -0.0031621205, 0.009905675, -0.009215041, 0.010662085, 0.013141793, -0.014536217, -0.015930643, 0.019285154, -0.029598633, 0.0016698561, -0.024257723, -0.004949549, 0.0009890876, -0.031729735, -0.03023007, 0.0012488979, -0.039385915, 0.006442636, 0.016443685, 0.002861201, -0.012556397, 0.002039017, -0.03299261, -0.02153465, -0.017903885, -0.0006992677, -0.0028628455, 0.035228953, -0.014588837, 0.007176024, -0.0066399598, -0.0001528235, -0.019337773, -0.008116603, 0.0059427475, -0.021389944, 0.0030108388, 0.006630094, -0.021981917, -0.015904333, 0.024599751, 0.018219603, -0.008287617, 0.01932462, -0.052567173, 0.019614028, -0.024231413, 0.006149938, -0.025494289, 0.008715153, 0.021284705, -0.027099192, -0.0050712326, 0.002155767, 0.009024294, -0.014246808, 0.010912029, -0.00042794688, 0.00661365, -0.004015548, 0.024152484, -0.008320505, 0.0020110628, 0.0190089, 0.025757387, 0.028388377, 0.006712312, 0.025757387, 0.021140002, -0.0013286497, -0.004587788, -0.016496304, -0.0045187245, -0.010662085, -0.0091361115, 0.0043477104, -0.025375893, -0.027257051, -0.017061967, -0.032334864, -0.005137007, -0.012661637, -0.00079011905, -0.025270654, 0.014694076, -0.00825473, -0.02033755, -0.0017348087, 0.000118291755, -0.022508116, -0.017088277, -0.005969058, 0.009991183, -0.012010467, -0.023165863, 0.004324689, -0.008123181, 0.002530683, 0.0006540476, -0.007945589, -0.015062415, 0.0025191726, 0.20111284, -0.0038642657, -0.0146019915, 0.014299428, 0.02509964, 0.036176108, 0.038780786, -0.0034400187, -0.004949549, 0.0016608122, 0.011602664, 0.011885495, 0.006965545, 0.0011617463, 0.0008213621, -0.016785715, -0.026415136, 0.0013154948, -0.030756267, -0.01914045, 0.002538905, -0.0053113103, -0.024336653, -0.0042490484, 0.013589061, 0.009432098, -0.04072772, -0.0032525607, 0.025507445, 0.015338669, -0.02523119, -0.00052003155, 0.018732646, -0.003926752, 0.004101055, 0.001003887, -0.009412365, -0.020758508, 0.019732421, 0.02358682, -0.008583603, 0.004939683, -0.014312583, -0.010037226, 0.011227748, 0.008951942, -0.0029351977, 0.008465209, -0.020140225, 0.002690187, -0.017325066, 0.010214817, -0.0044101966, 0.014444132, -0.008215265, 0.00016022315, 0.014115259, 0.016838335, 0.013286497, 0.010379254, -0.017995968, -0.011332988, -0.03309785, 0.014930866, -0.0072681084, 0.013233878, -0.006314375, 0.026033642, 0.022981694, -0.0452004, -0.0035584134, -0.01265506, -0.010938339, -0.011004114, -0.024481358, 0.0065511637, 0.05033083, 0.002900666, 0.019127294, 0.018232757, -0.008550717, -0.022100313, -0.028809335, 0.016575234, 0.004873908, -0.011076466, -0.0024731301, -0.007721954, 0.00066021393, -0.025033865, 0.0095110275, -0.0132799195, -0.0009759327, 0.0061466494, 0.006186114, 0.007123404, 0.011175129, 0.0015958596, 0.008169223, -0.006084163, -0.00596248, 0.06972122, 0.004189851, 0.015825402, 0.005248824, 0.017101433, -0.004804845, 0.012701102, 0.008070561, -0.03357143, 0.004239182, -0.026309896, -0.011431649, -0.00013802419, 0.013233878, 0.011543467, 0.029493393, -0.032071766, 0.015246584, -0.009011139, 0.004301668, -0.0102608595, -0.0055579655, 0.023573667, 0.0034169976, 0.015457063, -0.0065544527, -0.017798645, 0.00687346, -0.025770543, -0.0008830259, 0.00117819, 0.0004480904, 0.01382585, 0.016627854, -0.020705888, 0.008432322, 0.018548477, -0.003400554, 0.026257277, -0.022455497, 0.0023794011, 0.021705665, 0.0068076854, -0.0044101966, -0.007024742, 0.020995297, 0.02377099, -0.013733765, -0.009497873, -0.024441892, -0.00833366, 0.010024071, -0.011471114, 0.0112014385, -0.019311463, -0.004890352, -0.012911581, 0.010688395, 0.004337844, -0.024639217, 0.011530312, 0.008221842, -0.017982814, -0.012365651, -0.0012414982, -0.16417375, 0.006186114, 0.0166147, -0.034939542, 0.005564543, 0.024941782, 0.03343988, 0.0060151, 0.0089059, 0.0005113986, 0.03754422, 0.022218706, -0.023915695, -0.0020044853, -0.0061795367, 0.024284033, 0.0151676545, 0.045963388, -0.0050975424, 0.023021158, 0.002637567, -0.015825402, 0.011425072, -0.011497424, 0.008728308, 0.016259516, 0.013286497, -0.036333967, -0.031124607, 0.01135272, -0.0019469323, 0.0051304298, 0.023284258, 0.0006281488, 0.007176024, 0.01020824, 0.0073930807, 9.645249e-05, -0.016022727, 0.024560288, 0.0023826899, 0.030440548, -0.0050087464, 0.009767549, -0.016417375, 0.03017745, 0.005426416, -0.0010819945, 0.027204432, -0.0073601934, 0.02523119, -0.013760075, 0.0027526729, 0.037281122, 0.039596394, 0.019508788, -0.0023662462, 0.021126846, -0.0069852774, -0.024323499, -0.020377014, -0.017456615, 0.031966526, -0.01012931, 0.0056566275, -0.020034986, -0.017206673, 0.032334864, -0.02799373, 0.016785715, -0.029703872, 0.0049692816, 0.0030305712, -0.010958072, 0.004110921, 0.0113001, -0.023389498, 0.014352048, -0.00049906585, 0.0010244416, -0.013418047, 0.00440033, -0.031598184, -0.012747144, 0.002713208, 0.0071957563, 0.0117736785, 0.007051052, 0.004485837, -0.0144178225, 0.016706785, -0.007202334, 0.012477468, -0.016075347, 0.026559839, 0.011155396, -0.018680027, -0.018180138, -0.023257948, -0.006238734, 0.019127294, -0.0123261865, -0.009076914, 0.016601544, 0.011596086, 0.016825179, -0.022350257, 0.014207344, 0.040701408, -0.0136548355, 0.0042194496, -0.017561857, 0.013424624, 0.005663205, -0.0013228945, 0.026086261, -0.006001945, -0.010043803, 0.014812471, -0.012497201, 0.06745858, 0.025586374, 0.01986397, 0.0151545, -0.017219827, -0.02310009, -0.10839677, -0.01892997, 0.008438899, 0.03309785, -0.049409986, -0.004137231, -0.0085638715, 0.033860836, -0.014865091, 0.029730182, -0.028993504, -0.029203983, 0.0018532033, -0.008991407, 0.0048936405, -0.010135887, 0.01760132, -0.007248376, -0.017482925, 0.03830721, -0.011734214, -0.016035883, 0.016114812, -0.007136559, 0.020587493, 0.016851489, -0.03333464, 0.0062354454, -3.029235e-05, 0.0073141507, -0.0031703424, -0.0300459, 0.04848914, 0.01650946, -0.007721954, -0.022416031, 0.009655732, -0.02145572, 0.04788401, 0.0008805593, 0.029177673, 0.0002542605, -0.011865763, -0.036886476, -0.0078206165, -0.014391513, -0.039649013, 0.031492945, 0.022639666, -0.016009573, -0.033992387, -0.0013171391, -0.027730629, -0.005038345, 0.028467307, -0.009741239, 0.031545565, 0.0047226264, -0.01124748, 0.007149714, -0.009997761, 0.012457736, 0.013247033, -0.0071562915, 0.005334331, -0.029256603, -0.014352048, -0.056829374, 0.054303624, -0.01869318, 0.008767773, -0.006965545, -0.014575682, -0.01320099, -0.027072882, 0.027730629, -0.011411917, -0.025560064, 0.014457287, -0.038780786, -0.004716049, -0.01502295, -0.0041701184, -0.017338222, 0.018864196, 0.019403549, 0.008846703, -0.013878469, 0.023876231, -0.037412673, 0.012457736, 0.03025638, -0.001290007, 0.00040410354, -0.01393109, 0.012017045, -0.007873236, -0.0064064595, 0.0166147, 0.033308327, -0.030808887, -0.033360947, -0.06577474, 0.013339117, 0.0007350327, 0.0065676076, 0.027309671, -0.026888713, 0.021705665, -0.008524407, 0.0048245774, -0.010885719, 0.0020094183, -0.004341133, -0.0155359935, -0.0038576883, -0.009649155, -0.014049484, 0.03849138, -0.009333435, 0.030703647, 0.03280844, 0.0035945894, -0.012115707, 0.01382585, 0.039649013, -0.0029565745, -0.0069063473, -0.00010914496, 0.007892969, 0.0015687275, 0.023521047, -0.014904556, -0.037096955, -0.011457959, 0.005627029, -0.018153828, -0.039649013, -0.026783474, -0.003400554, 0.019824507, 0.028230518, -0.0031341664, -0.04764722, 0.024836542, -0.012490623, 0.018364308, 0.003288737, -0.024205104, -0.009708351, -0.0057026697, -0.009793859, 0.031545565, 0.008846703, 0.009392633, -0.025560064, -0.022376567, -0.012964201, 0.015878022, -6.8189904e-05, -0.022810679, -0.003824801, 0.02733598, 0.025967866, 0.010570001, 0.012102552, 0.0011995669, -0.008484942, -0.024928626, 0.02025862, 0.0050515, -0.015496528, -0.017193517, -0.005913149, 0.0079982085, 0.011359298, 0.0038379559, 0.0078206165, -0.028967194, -0.024481358, -0.024573442, 0.028256828, 0.029230293, 0.0017035657, -0.008471787, 0.028914575, 0.014838781, -0.0065347203, -0.032229625, 0.014299428, 0.016088502, 0.006429481, -0.014233653, 0.041464396, 0.007912701, -0.0076035596, 0.009017717, 0.023402652, 0.017509237, 0.016917264, 0.01312206, 0.036255036, 0.001963376, -0.003660364, -0.015601768, 0.001428134, -0.022245016, 0.009201886, -0.022008227, -0.04814711, -0.0056072967, -0.004390464, 0.012352496, -0.014667766, 0.0041569634, 0.0071299816, -0.003877421, 0.013214145, 0.024797076, -0.018811576, -0.011063311, 0.025691614, 0.012668215, 0.0027362292, -0.020245465, -0.00051386515, 0.031834975, -0.000110378234, 0.033913456, -0.04327978, 0.020995297, -0.032150693, -0.025849473, 0.00039074305, -0.034281794, 0.0001238415, 0.009563647, -0.008616491, -0.0067188893, 0.011069888, -0.001682189, 0.083086655, 0.0070181647, -0.008412589, -0.022692285, -0.011115931, 0.039070196, 0.007860081, 0.007886391, 0.00033257352, -0.02653353, -0.007425968, -0.03491323, -0.0069852774, -0.016049037, -0.019272, 0.041517016, -0.015956953, 0.023942005, 0.0043214005, 0.0005685404, 0.009215041, -0.016733095, -0.020021832, 0.0004320578, -0.036228728, 0.0004008148, 0.023376342, 0.005814487, 0.011319833, -0.037070643, -0.012253834, 0.0035814345, -0.046831615, -0.009800436, 0.0009290682, -0.018640561, 0.01929831, -0.024468202, 0.0030009726, 0.026691388, 0.010221395, -0.0022412743, -0.0020982143, -0.02759908, 0.01671994, -0.0028496906, -0.0021804327, -0.004265492, -0.006761643], \'node_index\': 0, \'node_id\': \'SMAD3_(144)\'}', "{'node_name': 'IL10RB', 'node_type': 'gene/protein', 'desc': 'IL10RB belongs to gene/protein node. IL10RB is interleukin 10 receptor subunit beta. The protein encoded by this gene belongs to the cytokine receptor family. It is an accessory chain essential for the active interleukin 10 receptor complex. Coexpression of this and IL10RA proteins has been shown to be required for IL10-induced signal transduction. This gene and three other interferon receptor genes, IFAR2, IFNAR1, and IFNGR2, form a class II cytokine receptor gene cluster located in a small region on chromosome 21. [provided by RefSeq, Jul 2008].', 'desc_emb': [-0.02927333, -0.006862564, -0.042112965, -0.020822087, -0.01688946, 0.017983304, -0.039117917, 0.016850395, -0.031461015, -0.012631285, 0.008887476, -0.024220815, 0.0005607574, 0.013620953, 0.0011622085, -0.008457752, 0.02084813, -0.0051436676, 0.0059184735, 0.019142257, -0.017175943, 0.017670777, 0.0033694278, -0.017800996, -0.012748483, 0.010352445, 0.028856626, -0.055108864, -0.0051632007, -0.026382457, 0.0072011347, 0.011895546, -0.02786696, 0.00757226, 0.010528241, -0.019350609, 0.00891352, -0.0072141564, 0.028023222, -0.0036884656, 0.038753305, 0.019142257, 0.003805663, 0.0034963917, 0.0126052415, -0.008294978, 0.0084772855, -0.021199724, -0.04562889, 0.0015398448, -0.009831567, 0.014311116, -0.023973398, -0.01761869, -0.021290878, -0.0016212319, 0.014141831, -0.0064751613, 0.006608636, -0.036617704, -0.01655089, -0.013816282, -0.017970283, -0.0014299721, -0.002070489, 0.01795726, 0.0033059458, 0.021290878, -0.017306162, 0.01735825, 0.034351885, 0.026486633, -0.014792928, 0.01360793, 0.017006658, -0.025483944, -0.0119281, 0.015782595, -0.00046878995, 0.00022442505, 0.021759667, -0.0018621379, 0.00041059815, 0.006458884, 0.024832847, -0.012136451, -0.003799152, 0.018946927, -0.019376652, -0.019793354, 0.033544526, 0.01119236, 0.023166038, 0.018660445, -0.017631711, 0.01005294, 0.011543953, 0.03148706, 0.0010433833, -0.024207793, -0.0072922884, 0.01152442, -0.010228736, 0.0033238512, -0.033831008, -0.0010181532, -0.019676156, -0.006830009, 0.026512677, -0.008021517, -0.03169541, 0.044587135, 0.020118903, -0.014011611, 0.023100927, -0.01614721, 0.015300783, -0.008731213, -0.013555843, -0.007337865, 0.01942874, 0.05159294, -0.0008260795, -0.028960802, 0.018972972, 0.004577213, -0.009701348, -0.024507297, -0.023283234, 0.0095711285, 0.010424065, 0.0028208785, 0.020809066, 0.0049353167, 0.006117058, 0.007298799, -0.0036168448, -0.01742336, 0.0014779905, -0.028908715, 0.0063026207, 0.027997179, -0.004892995, -0.009154426, 0.010658461, 0.0077936333, 0.018999016, 0.01629045, -0.009102338, 0.0022967453, -0.029195197, -0.01782704, 0.006006371, -0.02198104, -0.00033124568, 0.019337585, 0.005635246, 0.031174533, -0.0025929944, -0.0037372978, 0.003292924, -0.0016285568, -0.0005208777, 0.01461062, 0.026773116, 0.02017099, 2.9401106e-05, 0.0013835814, -0.010677993, 0.012826614, -0.004681389, 0.013217272, -0.009805524, 0.0060779923, -0.0022934899, 0.019220388, -0.0046065124, 0.007800144, -0.02606993, -0.008314511, -0.003906583, 0.0050232145, 0.029195197, -0.0024839358, -0.017983304, -0.010293846, 0.029637944, -0.010808213, 0.0012948695, 0.005889174, 0.028648276, 0.011641617, -0.0011076791, -0.03385705, -0.6279702, -0.011511398, 0.032372553, 0.012377357, -0.0045414027, 0.029611899, -0.00978599, -0.005352019, 0.0022316356, 0.019103192, 0.0004903576, 0.008503329, 0.008262423, -0.006745367, -0.044378784, -0.036279134, -0.008034539, -0.018907862, 0.012351314, -0.012553154, -0.02125181, 0.008607505, -0.014636665, 0.0044046724, -0.010645439, -0.0038447287, 0.02914311, -0.033205956, -2.7824231e-05, 0.01816561, -0.00038964098, 0.021994062, 0.014532489, 0.010065963, 0.056931935, 0.013295405, -0.011973677, 0.0059738164, 0.027111687, 0.02125181, -0.003095967, -0.021004396, 0.0004704177, -0.02198104, -0.028179485, 0.030888049, 0.009186981, 0.03799803, -0.0018263275, -0.023478564, 0.0037828747, 0.0035908008, -0.023205103, 0.0043590954, -0.0030048133, -0.003444304, 0.020184012, -0.001076752, 0.014519467, -0.018478138, -0.0072727553, 0.013790238, 0.00036705603, -0.015378915, 0.007162069, -0.022944665, -0.0010523358, -0.005869641, 0.01206483, 0.008685636, 0.02573136, 0.04997822, 0.0040172697, 0.018712532, 0.01293079, 0.021538295, 0.0007593421, 0.027189817, -0.0072857775, 0.01989753, 0.012364335, -0.021238789, -0.025015153, -0.0014202057, 0.020314232, -0.009213025, -0.0014275305, 0.013594909, 0.009831567, 0.00013316971, 0.0238562, -0.014871059, -0.03370079, -0.019845441, -0.014949191, -0.0070318496, 0.015053366, 0.028492013, 0.01734523, -0.009434398, -0.0039228606, -0.0005640129, -0.0020818831, -0.018686488, 0.012859169, -0.0031952593, -0.033726834, 0.0077610784, 0.015105454, -0.011804392, -0.008216847, -0.007936874, -0.013178207, 0.0063351756, 0.0032554858, -0.019077146, -0.01816561, 0.008112671, 0.013751172, -0.028492013, 0.020314232, -0.02458543, 0.0050362367, 0.003450815, -2.4670479e-05, 0.0055180485, -0.00048018416, -0.00653376, -0.0062309997, -0.014584577, 0.008809345, -0.0045739575, 0.008594482, -0.019324563, 0.019637091, 0.014467379, 0.048337452, 0.0016415787, -0.0026613597, -0.044743396, -0.018269787, -0.012748483, 0.0072076456, -0.02646059, 0.0003741774, -0.013256338, -0.018269787, 0.004798586, 0.015014301, 0.0038219404, -0.0023032562, 0.012735461, -0.0058305752, 0.0024774247, 0.013829304, -0.032138158, -0.01748847, -0.015118476, -0.0020037515, -0.006091014, -0.009141404, 0.0037372978, -0.021733623, 0.003139916, -0.010085495, -0.0006372614, -0.0043786285, 0.04187857, -0.009551596, -0.022059172, -1.7358352e-06, 0.0070969593, -0.041644175, -0.0040726126, 0.010808213, -0.0012395262, 0.011732771, -0.013503755, 0.00060999667, -0.0005868013, -0.019415718, 0.010782169, 0.010664972, -0.022788402, 0.036747925, 0.018686488, 0.017579624, 0.023491586, 0.0051762224, 0.028335748, -0.03932627, 0.024598451, -0.012273181, 0.0509158, 0.01052173, 0.0011361645, 0.02532768, 0.00021425166, -0.012514087, 0.0076829465, -0.0037763636, -0.012169006, 0.012566175, -0.027085641, -0.0021111825, -0.033232, -0.009213025, -0.03242464, 0.008125693, -0.014571555, -0.00405308, 0.014311116, 0.014180897, 0.022957686, 0.008047561, 0.009434398, -0.010430576, 0.0061007803, 0.010046429, -0.025379768, -0.0079303635, 0.015300783, 0.011876012, -0.023270212, 0.0002494516, -0.00445676, 0.0070904484, 0.0026939146, 0.014623643, -0.01976731, -0.007298799, -0.007826188, 0.02330928, 0.015652375, -0.011804392, -0.004111679, -0.0026385712, -0.024702627, 0.051384587, -0.008334043, 0.0005591297, -0.011185849, 0.01005294, 0.009753436, -0.005179478, -0.011075163, 0.0102678025, 0.017397316, -0.012741972, 0.02566625, -0.015352871, 0.040550333, -0.02084813, 0.01621232, 0.0027215863, -0.0009058389, 0.017983304, 0.011446288, 0.061932363, 0.013555843, 0.03377892, -0.0017384294, 0.0063091316, -0.008425198, 0.011387689, 0.018465117, -0.014011611, -0.010059452, 0.0052901646, 0.006110547, -0.01949385, 0.01782704, 0.008086626, -0.012006232, 0.015496112, -0.00097094866, 0.011426755, 0.014441336, 0.02358274, 0.0014023005, -0.008301489, -0.015300783, 0.025028175, 0.024937022, -0.0100203855, -0.0065174825, -0.039352313, -0.0009383938, -0.003779619, 0.016095122, 0.0021307154, 0.029038934, 0.024702627, 0.020561649, 0.017123856, 0.01642067, 0.03752924, -0.019585002, 0.00482463, 0.007962919, 0.011622084, -0.00904374, 0.0030243464, 0.007975941, 0.032815296, 0.011042608, 0.0069602286, -0.0079824515, -0.014558532, 0.008737724, 0.0022820956, -0.014415291, 0.005801276, -0.016850395, 0.016381605, 0.009186981, 0.013712106, 0.011889034, 0.0033922163, -0.015704464, -0.0025050964, -0.02935146, 0.012631285, 0.019611048, 0.00037031152, 0.028648276, -0.0030276019, 0.017097812, -0.013210761, -0.0031415438, -0.037893854, -0.008698658, 0.002773674, -0.017866107, -0.015170564, -0.0027525134, 0.0063156425, -0.005622224, -0.0007495756, 0.004551169, 0.030835962, -0.03966484, 0.007787122, 0.0013152163, 0.008679125, 0.022124281, 0.0024041764, 0.03357057, 0.016433693, -0.008633548, 0.00090828055, 0.012110407, 0.016329518, -0.045941416, 0.027893003, 0.016602978, -0.01199321, 0.0049353167, 0.0021176937, -0.012494555, 0.0031285218, 0.0047139437, -0.0027639074, 0.00633192, 0.020275166, 0.025900645, 0.013373536, -0.0015618193, -0.0067063007, 0.003095967, -0.02927333, 0.0036005673, 0.018347919, -0.009759947, 0.010293846, -0.023843179, -0.007279266, -0.025301635, 0.001671692, -0.009688326, -0.0092260465, -0.014219962, 0.0076699248, -0.0153398495, -0.012442467, -0.03268508, 0.033804964, 0.0023797601, -0.022749335, -0.030184865, 0.0030601567, -0.01842605, -0.024754714, -0.014180897, 0.010215715, -0.011485354, 0.015001278, -0.009675304, 0.019663135, -0.0013843954, 0.01983242, -0.0071230032, 0.021499228, 0.026981467, -0.01580864, -0.016824352, 0.027189817, -0.025015153, 7.380797e-05, 0.0134516675, -0.0020932774, 0.0038577507, 0.0036721881, -0.015730508, 0.003831707, -0.011218404, 0.013660018, 0.0053650406, 0.005794765, 0.027710695, 0.020392364, 0.02077, -0.011075163, -0.016095122, -0.026161084, 0.0009611822, -0.01734523, 0.007357398, -0.0030210908, -0.01185648, 0.0071230032, -0.00076259754, -0.012943812, -0.008640059, -0.0011915078, -0.007279266, 0.0032912963, -0.0029348205, 0.005166456, 0.017462427, 0.011237937, -0.0063579637, 0.014701774, -0.024194771, -0.020079836, -0.019728243, 0.011980188, 0.008607505, -0.008054072, 0.010879833, -0.0059803277, -0.0013705596, -0.0045739575, -0.004261431, -0.011361646, 0.03443002, 0.007585282, -0.025614163, -0.03349244, -0.0063221534, -0.007897808, 0.0046944106, -0.010274313, -0.010612884, -0.0009855983, 0.006973251, 0.009037228, 2.5331748e-05, -0.028283661, -0.020418407, -0.029455636, 0.028205529, 0.0022820956, 0.012142962, -0.012169006, 0.008666104, -0.016160231, -0.008301489, -0.018647423, -0.02922124, 0.0055766474, -0.014949191, 0.05797369, 0.03987319, 0.029794207, -0.009440909, 0.024910977, 0.0026385712, 0.022840489, 0.008308, 0.010209204, -0.002070489, -0.01989753, -0.0045381472, 0.0015740274, -0.008640059, 0.02351763, -0.025770426, 0.004730221, 0.007591793, -0.01634254, -0.0018784154, -0.0044404827, -0.024155704, -0.023166038, 0.0202361, -0.01052173, 0.00938231, -0.016694132, -0.0039619263, -0.003704743, -0.011283514, 0.0077741, 0.0011776721, 0.03008069, -0.022137303, 0.042243183, -0.0063612196, 0.008353577, -0.0067844326, 0.004534892, -0.00050093787, 0.012103897, 0.019064125, 0.014884082, 0.014024633, -0.004424205, -0.014818972, 0.0039814594, -0.009427887, -0.021681536, -0.010306868, -0.01574353, 0.01058684, 0.0045674467, -0.03432584, -0.00669979, -0.033362217, 0.0019093425, 0.016694132, -0.014818972, 0.015105454, -0.0012924279, -0.024338013, 0.044248562, 0.010437088, 0.020483516, -0.021928953, 0.028153442, -0.0014812461, 0.016173253, -0.014805949, -0.015587267, -0.0060942695, 0.0011263981, 0.023478564, 0.033232, 0.003102478, -0.01099052, 0.021186702, -0.004121445, -0.0072532226, -0.03411749, 0.017462427, -0.0076699248, 0.021915931, -0.0039619263, -0.0072076456, -0.018504182, -0.006745367, -0.0020753723, 0.008119182, -0.016446715, -0.0060193934, -0.015483091, 0.004909273, -0.029507725, 0.010691016, 0.00042280622, -0.016954571, -0.006185423, -0.013894414, 0.00055058405, -0.015014301, 0.01769682, 0.01774891, 0.0060812477, -0.008464263, 0.0003363324, -0.017735887, -0.017136877, -0.0067648995, -0.0024318479, 0.03565408, -0.012383868, -0.012370846, -3.687855e-05, 0.004899506, 0.013386558, -0.0019093425, -0.0030927116, -0.0072141564, -0.015053366, 0.0053585297, 0.021290878, 0.008933053, -0.017397316, 0.026721029, -0.022931643, -0.004977638, -0.04127956, -0.0077024796, 0.0024334756, -0.03987319, 0.008392642, -0.0032750187, 0.009753436, 0.0062668105, 0.027528388, 0.020926263, -0.002241402, 0.018999016, -0.050238658, -0.0005111113, -0.0094734635, 0.010430576, 0.013412601, 0.00938231, -0.005641757, -0.03596661, 0.010918899, 0.0062570437, -0.010886345, -0.0042842194, -0.0182177, -0.00669979, 0.009460442, -0.031200577, 0.020262145, -0.0020297954, 0.017944237, 0.007389953, -0.014324138, 0.025770426, -0.01849116, 0.00040693572, 0.032346506, 0.008939564, -0.0007939316, 0.003704743, 0.007331354, 0.0006913838, 0.010957966, -0.009506019, 0.004596746, 0.020275166, -0.015183586, 0.016629022, -0.011784859, -0.0088288775, -0.02217637, 0.01179137, -0.0006694093, -0.0056450125, -0.0038349624, 0.019936595, -0.0058989404, 0.01942874, -0.01769682, -0.005387829, 0.0005273887, -0.024233837, -0.026239216, 0.01052173, -0.016720176, 0.031122444, -0.008620527, 0.0017075023, -0.024923999, -0.006592359, -0.037216716, -0.011915078, 0.007077426, 0.0058826627, -0.01574353, 0.008822367, -0.012631285, 0.017058745, -0.017006658, 0.017071769, -0.015652375, -0.030445304, 0.017462427, -0.002107927, 0.027033554, 0.019676156, -0.02298373, -0.018204676, 0.039352313, 0.015157542, -0.018803686, 0.053702496, -0.044535045, 0.014936169, -0.018582312, 0.009929232, -0.029533768, 0.01393348, 0.011244448, -0.013178207, -0.014792928, 0.0098706335, 0.027684651, -0.0047627757, 0.0061268243, -0.009160937, 0.022410765, 0.004043313, 0.012051809, -0.029299373, -0.015040345, -0.00035566182, 0.023908287, 0.036227047, -0.0046878997, 0.006553293, 0.002260935, 0.0047269654, -0.0064263293, -0.018282808, 0.010150605, -0.007988962, 0.019259455, 0.007552727, -0.01125747, 0.0060421815, -0.021199724, -0.014766884, -0.005749188, -0.0005660476, -0.0020086349, -0.011277003, 0.0070904484, 0.027476301, -0.00978599, 0.0004553611, -0.020379342, -0.009069784, -0.028257618, 0.00884841, 0.0036005673, -0.015457047, -0.0254579, 0.0009383938, -0.028778495, -0.007389953, -0.00046187206, -0.018530225, -0.01467573, 0.013412601, 0.19553751, -0.021551317, -0.014415291, 0.005146923, -0.0021600148, 0.009538573, 0.0070253382, -0.0116611505, -0.02157736, -0.00034528496, 0.0036038228, 0.016563913, -0.00033083875, 0.0002494516, 0.007533194, 0.0019467805, -0.026434544, -5.6360597e-05, -0.047347788, 0.005436661, -0.002120949, 0.0028908714, 0.003597312, -0.008015006, 0.043519333, 0.010300357, -0.035029028, 0.010078984, 0.0059510283, 0.023843179, -0.028335748, 0.0052120327, -0.013425624, 0.009291156, 0.01574353, 0.0114528, -0.013972545, -0.015496112, 0.005286909, 0.010658461, -0.021290878, 0.01172626, -0.02453334, -0.022358676, 0.00435584, 0.021017417, 0.0035354577, 0.00019003899, -0.013959523, 0.0013412602, -0.009831567, -0.003538713, 0.026108997, 0.009903188, 0.00071173056, -0.004046569, 0.016720176, 0.009284645, 0.0023341833, 0.0009587406, 0.0039716926, 0.012175517, -0.0426078, 0.025848558, -0.012898235, 0.04252967, -0.011498376, -0.0011329091, 0.01842605, -0.02679916, 0.022632137, 0.0007255664, -0.0034898808, -0.007057893, -0.020066814, -0.015756551, 0.030445304, 0.02353065, 0.0134516675, 0.028309705, 0.013542821, 0.018139567, -0.04294637, -4.5831137e-05, 0.0047595203, -0.009128382, 0.008210335, 0.00053593435, -0.014897103, -0.010105028, 0.012299226, -0.012650818, -0.01588677, -0.008294978, 0.012142962, -0.014727818, -0.02653872, 0.0023472053, -0.023400431, -0.0182177, 0.0026629874, 0.07078728, 0.015795616, 0.0046944106, -0.027424213, 0.016954571, 0.0037893855, -0.0056938445, 0.004704177, 0.005407362, 0.028960802, -0.03805012, -0.003900072, -0.0054854937, 0.020288188, 0.0030113244, -0.0016765752, -0.030783875, 0.0068430314, -0.007787122, -0.014402269, -0.029169153, -0.009258602, 0.015248695, -0.013568865, 0.03515925, 0.0020916497, -0.009662282, 0.001336377, -0.037893854, 0.005146923, 0.008672615, 0.018243743, 0.00011750269, 0.025054218, -0.0025929944, 0.015170564, -0.011980188, -0.032007936, 0.027788827, -0.012598731, -0.010671482, 0.027241906, -0.015196608, -0.0030210908, 0.0043623513, 0.022072194, 0.029299373, -0.014910125, -0.009890166, -0.01842605, -0.05844248, -0.003939138, -0.00837311, 0.021017417, -0.020822087, -0.018959949, -0.021759667, 0.006058459, 0.0046325563, -0.02586158, 0.0050622807, 0.02995047, -0.027346082, -0.02211126, 0.0054854937, -0.1627222, 0.013660018, 0.025236527, -0.04435274, 0.025444878, 0.020548627, -0.009115361, -0.017254075, -0.0058240644, 0.00030255673, 0.034950897, 0.0027720463, -0.033388264, 0.0008281142, -0.01942874, 0.0051111127, -0.005039492, 0.04052429, 0.016811328, 0.022775378, 0.039274182, -0.023348344, 0.00326688, -0.022840489, 0.0092260465, 0.015756551, -0.008425198, -0.022228457, -0.0071490468, -0.0052706315, 0.008418687, 0.012032276, 0.04328494, -0.006566315, 0.0074876174, -0.016655065, 0.0325809, 0.0047204546, -0.0046195346, 0.024950044, 0.019871486, 0.024650538, -0.0030992224, 0.023283234, -0.020600714, 0.030132776, 0.025366746, 0.014897103, 0.025757404, -0.008854922, 0.006048693, -0.013555843, 0.018439071, 0.0074745957, 0.036227047, 0.015704464, 0.005505027, 0.013907435, -0.007891297, -0.00061894924, -0.0062961094, -0.020809066, 0.023543674, -0.029429592, 0.021824777, -0.0052543543, -0.0090762945, 0.029768163, -0.006149613, -0.002583228, -0.043988124, -0.015600288, 0.018321874, -0.008262423, 0.009616706, -0.001950036, -0.007077426, 0.026486633, 0.025054218, -0.014831994, -0.005462705, 0.019233411, -0.02646059, 0.00529342, 7.6605655e-05, 2.3602272e-05, 0.00864657, 0.023100927, -0.0070904484, -0.011049119, 0.021499228, -0.016824352, 0.011218404, -0.002824134, 0.014519467, 0.015248695, -0.0013274243, -0.0063026207, -0.025184438, 0.005319464, 0.029976513, -0.036227047, -0.015756551, -0.0076829465, 0.04042011, -0.013894414, -0.024246858, -0.004241898, 0.032867387, 0.0023651104, -0.002736236, -0.0048018415, -0.00163344, 0.010599862, -0.015652375, 0.013738151, -0.024598451, -0.009414865, -0.0023830156, -0.031148488, 0.06057808, 0.012338291, 0.008060583, 0.0033922163, -0.028283661, -0.035029028, -0.10761334, -0.043571424, 0.02097835, -0.0112118935, -0.024207793, 0.012715927, -0.020835109, 0.032711122, -0.01942874, 0.012950323, -0.027085641, -0.00015371997, -0.0026776372, -0.015300783, -0.017123856, 0.020197034, 0.0133019155, -0.0039814594, -0.0076764356, 0.033909142, 0.0017823785, -0.03268508, 0.004544658, -0.00034141907, 0.009056762, -0.010775658, -0.0347165, -0.015769573, 0.01748847, 0.014805949, 0.007136025, -0.024116639, 0.037294846, 0.005527815, -0.020210056, -0.0136990845, -0.019910552, -0.024494275, 0.03633122, -0.0048636957, 0.0030780619, -0.0027443746, -0.020965328, -0.021160658, 0.012071341, -0.0018458605, -0.02620015, -0.006745367, 0.035732213, -0.010918899, -0.02351763, -0.036669794, -0.029794207, -0.02807531, 0.01688946, -0.0009302551, 0.005902196, 0.027710695, -0.020509562, 0.0015984436, 0.013907435, -0.0044795484, -0.016160231, 0.007129514, 0.018178632, -0.03424771, 0.009063273, -0.024715649, 0.046931084, -0.017592646, -0.017188964, 0.010997031, 0.0013257966, -0.012461999, -0.0041084234, 0.0033059458, -0.02679916, -0.025926689, 0.025184438, -0.03385705, -0.02110857, -0.022879554, 0.009519041, -0.015105454, 0.02010588, 0.030914094, -0.008770279, -0.007643881, -0.0044925706, -0.018139567, -0.014858037, 0.03752924, 0.013946502, -0.009831567, 0.0020070071, 0.01983242, -0.021785712, -0.01742336, 0.020262145, 0.025223505, -0.02244983, -0.027971134, -0.07094355, 0.021316921, -0.025926689, -0.014884082, 0.0038577507, -0.015925838, 0.022475874, -0.0039554155, 1.4420785e-05, 0.015365893, -0.0076569025, 0.019754289, -0.011960655, 0.012598731, -0.00092781347, -0.013008921, 0.03344035, -0.017787974, 0.026877292, 0.014558532, -0.017280119, 0.005635246, 0.0127549935, 0.0066639795, -0.025288614, -0.004635812, -0.009532062, 0.024377078, -0.017384294, 0.0099227205, -0.019663135, -0.044066258, -0.017397316, 0.02453334, -0.012247138, -0.029038934, -0.0015618193, 0.028830582, 0.022098238, 0.026747072, -0.009877144, -0.055317216, 0.0191683, 0.010756126, 0.018126545, -0.00086351763, -0.024168726, 0.006419818, -0.0034247711, 0.0041377223, 0.0364354, 0.004450249, -0.0015097315, -0.03851891, -0.028882671, -0.0031057335, -0.011667661, -0.018178632, -0.016694132, -0.006686768, 0.019519893, 0.026512677, -0.0018133057, 0.030575523, -0.001741685, -0.0017433127, -0.0069927834, 0.010684504, 0.02351763, 0.0070709153, -0.022671204, 0.02558812, -0.0036070782, -0.0030373684, 0.022150327, 0.0030357405, -0.025002131, -0.012748483, -0.013047988, 0.010163627, 0.026213173, -0.009499508, -0.009252091, -0.0011280258, 0.0015846078, 0.011849969, -0.027788827, 0.017644733, 0.004590235, 0.009525551, -0.001741685, 0.01795726, -0.0008081744, 0.026174106, 0.0065630595, 0.015769573, -0.00037987452, 0.021785712, -0.0036168448, 0.05125437, 0.0020607226, -0.011543953, -0.010736592, -0.023322301, -0.0066477023, -0.011511398, -0.01186299, -0.039638795, -0.029299373, -0.019506872, 0.007188113, -0.008666104, 0.009317201, 0.0023390665, -0.015027323, -0.021147637, -0.018946927, -0.009186981, -0.020314232, 0.01580864, 0.015587267, 0.016694132, -0.0108668115, -0.0012167378, -0.001639951, -0.0053650406, 0.03544573, -0.013712106, 0.009265113, -0.0063742413, -0.023296257, -0.0074941283, -0.03229442, -0.008151736, -0.00047733562, 0.037190672, 0.021395054, 0.0050818133, -0.013633975, 0.08792416, 0.022150327, -0.009440909, 0.0020867665, -0.01836094, 0.021863842, 0.0053552743, 0.015600288, -0.023322301, -0.042842194, -0.0029022656, -0.027762784, 0.031929806, -0.025705317, -0.031461015, 0.010293846, 0.0012216211, 0.0063840076, -0.0076829465, -0.010502198, 0.0008142784, -0.021434119, -0.0079303635, 0.00055058405, -0.037216716, -0.00259625, 0.02238472, 0.0002734608, 0.0050232145, -0.050394922, 0.0095711285, -0.009720881, -0.050577227, -0.00036949763, 0.00817127, -0.03961275, -0.006680257, -0.010834257, 0.016915504, 0.03424771, 0.022163348, 0.011296536, -0.014871059, -0.018725555, 0.028570144, -0.008835388, -0.038310558, -0.0051925, -0.01776193], 'node_index': 1, 'node_id': 'IL10RB_(179)'}"] CPU times: user 3.35 ms, sys: 154 μs, total: 3.5 ms Wall time: 6.23 ms
%%time
# Assume node_coll_name is defined and collection exists
collection = Collection(node_coll_name)
# Load the collection into memory before query
collection.load()
# Vector similarity search in Milvus
vector_to_search = nodes_df["desc_emb"].iloc[0]
search_params = {"metric_type": "IP"}
results = collection.search(
data=[vector_to_search],
anns_field="desc_emb",
param=search_params,
limit=10,
output_fields=["node_id", "node_name"]
)
results
CPU times: user 5.9 ms, sys: 2.02 ms, total: 7.91 ms Wall time: 11.6 ms
data: [[{'node_index': 0, 'distance': 0.9999999403953552, 'entity': {'node_id': 'SMAD3_(144)', 'node_name': 'SMAD3'}}, {'node_index': 2182, 'distance': 0.8984372019767761, 'entity': {'node_id': 'SMAD protein signal transduction_(101792)', 'node_name': 'SMAD protein signal transduction'}}, {'node_index': 2023, 'distance': 0.8962712287902832, 'entity': {'node_id': 'SMAD protein complex_(55601)', 'node_name': 'SMAD protein complex'}}, {'node_index': 2913, 'distance': 0.8886438012123108, 'entity': {'node_id': 'heteromeric SMAD protein complex_(124869)', 'node_name': 'heteromeric SMAD protein complex'}}, {'node_index': 1009, 'distance': 0.8831610083580017, 'entity': {'node_id': 'regulation of SMAD protein signal transduction_(41433)', 'node_name': 'regulation of SMAD protein signal transduction'}}, {'node_index': 2152, 'distance': 0.8783092498779297, 'entity': {'node_id': 'positive regulation of SMAD protein signal transduction_(101088)', 'node_name': 'positive regulation of SMAD protein signal transduction'}}, {'node_index': 7, 'distance': 0.8766131401062012, 'entity': {'node_id': 'STAT3_(729)', 'node_name': 'STAT3'}}, {'node_index': 2223, 'distance': 0.8753368854522705, 'entity': {'node_id': 'SMAD protein complex assembly_(103465)', 'node_name': 'SMAD protein complex assembly'}}, {'node_index': 2858, 'distance': 0.8635619878768921, 'entity': {'node_id': 'co-SMAD binding_(123250)', 'node_name': 'co-SMAD binding'}}, {'node_index': 2149, 'distance': 0.860752284526825, 'entity': {'node_id': 'common-partner SMAD protein phosphorylation_(101002)', 'node_name': 'common-partner SMAD protein phosphorylation'}}]]
nodes_df.loc[0]
node_index | node_id | node_name | node_type | desc | desc_emb | feat | feat_emb | |
---|---|---|---|---|---|---|---|---|
0 | 0 | SMAD3_(144) | SMAD3 | gene/protein | SMAD3 belongs to gene/protein node. SMAD3 is S... | [-0.03699171170592308, -0.005479035433381796, ... | MSSILPFTPPIVKRLLGWKKGEQNGQEEKWCEKAVKSLVKKLKKTG... | [-0.0010794274069904548, -0.0028632148270051, ... |
Building Node Collection (Node Type-specific Embedding)¶
Note that nodes information of the PrimeKG data is different for each node type, we are going to build a separate collection for each node type.
We will use the node type as the collection name.
%%time
# Loop over group enrichment nodes by node_type
for node_type, nodes_df in tqdm(merged_nodes_df.groupby('node_type')):
print(f"Processing node type: {node_type}")
# Milvus collection name for this node_type
node_coll_name = f"{milvus_database}_nodes_{node_type.replace('/', '_')}"
# Define collection schema
desc_emb_dim = len(nodes_df.iloc[0]['desc_emb'].to_arrow().to_pylist()[0])
feat_emb_dim = len(nodes_df.iloc[0]['feat_emb'].to_arrow().to_pylist()[0])
node_fields = [
FieldSchema(name="node_index", dtype=DataType.INT64, is_primary=True, auto_id=False),
FieldSchema(name="node_id", dtype=DataType.VARCHAR, max_length=1024),
FieldSchema(name="node_name", dtype=DataType.VARCHAR, max_length=1024,
enable_analyzer=True, enable_match=True),
FieldSchema(name="node_type", dtype=DataType.VARCHAR, max_length=1024,
enable_analyzer=True, enable_match=True),
FieldSchema(name="desc", dtype=DataType.VARCHAR, max_length=40960,
enable_analyzer=True, enable_match=True),
FieldSchema(name="desc_emb", dtype=DataType.FLOAT_VECTOR, dim=desc_emb_dim),
FieldSchema(name="feat", dtype=DataType.VARCHAR, max_length=40960,
enable_analyzer=True, enable_match=True),
FieldSchema(name="feat_emb", dtype=DataType.FLOAT_VECTOR, dim=feat_emb_dim),
]
schema = CollectionSchema(fields=node_fields, description=f"schema for collection {node_coll_name}")
# Create collection if not exists
if not utility.has_collection(node_coll_name):
collection = Collection(name=node_coll_name, schema=schema)
else:
collection = Collection(name=node_coll_name)
# Create index for node_index field (scalar)
collection.create_index(
field_name="node_index",
index_params={"index_type": "STL_SORT"},
index_name="node_index_index"
)
# Create index for node_name, node_type, desc fields (inverted)
collection.create_index(
field_name="node_name",
index_params={"index_type": "INVERTED"},
index_name="node_name_index"
)
collection.create_index(
field_name="node_type",
index_params={"index_type": "INVERTED"},
index_name="node_type_index"
)
collection.create_index(
field_name="desc",
index_params={"index_type": "INVERTED"},
index_name="desc_index"
)
collection.create_index(
field_name="desc_emb",
index_params={"index_type": "GPU_CAGRA", "metric_type": "IP"}, # AUTOINDEX
index_name="desc_emb_index"
)
# Create index for feat_emb (vector)
collection.create_index(
field_name="feat_emb",
index_params={"index_type": "GPU_CAGRA", "metric_type": "IP"}, # AUTOINDEX
index_name="feat_emb_index"
)
# Prepare data for insertion
# Normalize the embeddings
graph_desc_emb_cp = cp.asarray(nodes_df["desc_emb"].list.leaves).astype(cp.float32).reshape(nodes_df.shape[0], -1)
graph_desc_emb_norm = normalize_matrix(graph_desc_emb_cp, axis=1)
graph_feat_emb_cp = cp.asarray(nodes_df["feat_emb"].list.leaves).astype(cp.float32).reshape(nodes_df.shape[0], -1)
graph_feat_emb_norm = normalize_matrix(graph_feat_emb_cp, axis=1)
# Columns must be lists of values in order matching schema fields
data = [
nodes_df["node_index"].to_arrow().to_pylist(),
nodes_df["node_id"].to_arrow().to_pylist(),
nodes_df["node_name"].to_arrow().to_pylist(),
nodes_df["node_type"].to_arrow().to_pylist(),
nodes_df["desc"].to_arrow().to_pylist(),
graph_desc_emb_norm.tolist(), # Use normalized embeddings
nodes_df["feat"].to_arrow().to_pylist(),
graph_feat_emb_norm.tolist(), # Use normalized embeddings
]
# Batch insert data in chunks
batch_size = 500
total_rows = len(data[0])
for i in tqdm(range(0, total_rows, batch_size)):
batch = [col[i:i + batch_size] for col in data]
collection.insert(batch)
# Flush the collection to ensure data is persisted
collection.flush()
# Print collection stats (number of entities and segment info)
stats = collection.num_entities
print(f"Collection {node_coll_name} stats:")
print(stats)
0%| | 0/6 [00:00<?, ?it/s]
Processing node type: biological_process
100%|██████████| 4/4 [00:02<00:00, 1.67it/s] 17%|█▋ | 1/6 [00:06<00:32, 6.52s/it]
Collection t2kg_primekg_nodes_biological_process stats: 1615 Processing node type: cellular_component
100%|██████████| 1/1 [00:00<00:00, 1.32it/s] 33%|███▎ | 2/6 [00:11<00:22, 5.52s/it]
Collection t2kg_primekg_nodes_cellular_component stats: 202 Processing node type: disease
100%|██████████| 1/1 [00:00<00:00, 2.79it/s] 50%|█████ | 3/6 [00:15<00:14, 5.00s/it]
Collection t2kg_primekg_nodes_disease stats: 7 Processing node type: drug
100%|██████████| 2/2 [00:01<00:00, 1.29it/s] 67%|██████▋ | 4/6 [00:21<00:10, 5.49s/it]
Collection t2kg_primekg_nodes_drug stats: 748 Processing node type: gene/protein
100%|██████████| 1/1 [00:00<00:00, 2.78it/s] 83%|████████▎ | 5/6 [00:26<00:05, 5.10s/it]
Collection t2kg_primekg_nodes_gene_protein stats: 102 Processing node type: molecular_function
100%|██████████| 1/1 [00:00<00:00, 1.27it/s] 100%|██████████| 6/6 [00:31<00:00, 5.20s/it]
Collection t2kg_primekg_nodes_molecular_function stats: 317 CPU times: user 1.77 s, sys: 238 ms, total: 2 s Wall time: 31.2 s
# List all collections
for coll in utility.list_collections():
print(f"Collection: {coll}")
# Load the collection to get stats
collection = Collection(name=coll)
print(collection.num_entities)
Collection: t2kg_primekg_nodes 2991 Collection: t2kg_primekg_nodes_biological_process 1615 Collection: t2kg_primekg_nodes_cellular_component 202 Collection: t2kg_primekg_nodes_molecular_function 317 Collection: t2kg_primekg_nodes_disease 7 Collection: t2kg_primekg_nodes_gene_protein 102 Collection: t2kg_primekg_nodes_drug 748
merged_nodes_df[merged_nodes_df.node_type == 'gene/protein']
node_index | node_id | node_name | node_type | desc | feat | desc_emb | feat_emb | |
---|---|---|---|---|---|---|---|---|
48 | 16 | IL1R2_(1654) | IL1R2 | gene/protein | IL1R2 belongs to gene/protein node. IL1R2 is i... | MLRLYVLVMGVSAFTLQPAAHTGAARSCRFRGRHYKREFRLEGEPV... | [-0.02899833954870701, 0.0021472955122590065, ... | [-0.0014144123640997008, -0.001413213325978187... |
49 | 17 | HERC2_(1777) | HERC2 | gene/protein | HERC2 belongs to gene/protein node. HERC2 is H... | MPSESFCLAAQARLDSKWLKTDIQLAFTRDGLCGLWNEMVKDGEIV... | [-0.014023775234818459, 0.009176542051136494, ... | [-0.007428297050351188, -0.010088199667370488,... |
50 | 18 | FCGR2A_(1990) | FCGR2A | gene/protein | FCGR2A belongs to gene/protein node. FCGR2A is... | MTMETQMSQNVCPRNLWLLQPLTVLLLLASADSQAAAPPKAVLKLE... | [-0.030694980174303055, 0.008019879460334778, ... | [0.005696470124890766, -0.006343481862099802, ... |
51 | 19 | CXCR1_(2012) | CXCR1 | gene/protein | CXCR1 belongs to gene/protein node. CXCR1 is C... | MSNITDPQMWDFDDLNFTGMPPADEDYSPCMLETETLNKYVVIIAY... | [-0.027637897059321404, -0.0004926534020341933... | [0.010659271964210543, -0.0016738061039453167,... |
52 | 20 | FN1_(2057) | FN1 | gene/protein | FN1 belongs to gene/protein node. FN1 is fibro... | MLRGPGPGLLLLAVQCLGTAVPSTGASKSKRQAQQMVQPQSPVAVS... | [-0.03269859775900841, 0.0019264572765678167, ... | [-0.0009853045367468072, -0.004182791243864574... |
... | ... | ... | ... | ... | ... | ... | ... | ... |
1168 | 848 | IL17REL_(34781) | IL17REL | gene/protein | IL17REL belongs to gene/protein node. IL17REL ... | MSRSVLEALTSSTAMQCVPSDGCAMLLRVRASITLHERLRGLEACA... | [-0.02713669277727604, -0.011936459690332413, ... | [0.0003768516783435937, -0.005117051265290013,... |
1169 | 849 | TAGAP_(34814) | TAGAP | gene/protein | TAGAP belongs to gene/protein node. TAGAP is T... | MKLRSSHNASKTLNANNMETLIECQSEGDIKEHPLLASCESEDSIC... | [-0.03348768502473831, -0.009191920049488544, ... | [-0.0010800918479102228, -0.003681655277429923... |
1170 | 850 | DENND1B_(34887) | DENND1B | gene/protein | DENND1B belongs to gene/protein node. DENND1B ... | MDCRTKANPDRTFDLVLKVKCHASENEDPVVLWKFPEDFGDQEILQ... | [-0.02666034922003746, 0.005825655534863472, -... | [-0.000950883844682143, -0.0007957585650949377... |
1171 | 851 | IL21_(34967) | IL21 | gene/protein | IL21 belongs to gene/protein node. IL21 is int... | MRSSPGNMERIVICLMVIFLGTLVHKSSSQGQDRHMIRMRQLIDIV... | [-0.02861925959587097, 0.005471138749271631, -... | [0.002036418016520262, -0.001478201039299714, ... |
1172 | 852 | FAM92B_(35156) | FAM92B | gene/protein | FAM92B belongs to gene/protein node. | MNIVFSRDSQVRVMENTVANTEKYFGQFCSLLAAYTRKTARLRDKA... | [-0.005422372370958328, -0.011813736520707607,... | [0.0008678497828269967, -0.0047449115179252635... |
102 rows × 8 columns
%%time
# Assume node_coll_name is defined and collection exists
collection = Collection('t2kg_primekg_nodes_gene_protein')
# Load the collection into memory before query
collection.load()
# Vector similarity search in Milvus
vector_to_search = normalize_vector(merged_nodes_df[merged_nodes_df.node_type == 'gene/protein']['feat_emb'].iloc[0]).tolist()
search_params = {"metric_type": "IP"}
results = collection.search(
data=[vector_to_search],
anns_field="feat_emb",
param=search_params,
limit=10,
output_fields=["node_id", "node_name"]
)
results
CPU times: user 8.64 ms, sys: 53 μs, total: 8.69 ms Wall time: 20.8 ms
data: [[{'node_index': 16, 'distance': 1.000000238418579, 'entity': {'node_id': 'IL1R2_(1654)', 'node_name': 'IL1R2'}}, {'node_index': 82, 'distance': 0.9777935147285461, 'entity': {'node_id': 'IL18RAP_(11588)', 'node_name': 'IL18RAP'}}, {'node_index': 59, 'distance': 0.9770528078079224, 'entity': {'node_id': 'IL12B_(6168)', 'node_name': 'IL12B'}}, {'node_index': 4, 'distance': 0.9742822051048279, 'entity': {'node_id': 'VCAM1_(417)', 'node_name': 'VCAM1'}}, {'node_index': 64, 'distance': 0.9739426970481873, 'entity': {'node_id': 'IL2RA_(7059)', 'node_name': 'IL2RA'}}, {'node_index': 51, 'distance': 0.9738389253616333, 'entity': {'node_id': 'ICAM1_(4968)', 'node_name': 'ICAM1'}}, {'node_index': 75, 'distance': 0.9732135534286499, 'entity': {'node_id': 'TLR9_(10113)', 'node_name': 'TLR9'}}, {'node_index': 18, 'distance': 0.9724373817443848, 'entity': {'node_id': 'FCGR2A_(1990)', 'node_name': 'FCGR2A'}}, {'node_index': 73, 'distance': 0.9714288711547852, 'entity': {'node_id': 'ICOSLG_(9454)', 'node_name': 'ICOSLG'}}, {'node_index': 31, 'distance': 0.9704809188842773, 'entity': {'node_id': 'TGFB1_(2889)', 'node_name': 'TGFB1'}}]]
# Check the ground truth for the search
merged_nodes_df[merged_nodes_df.node_type == 'gene/protein'].iloc[0]
node_index | node_id | node_name | node_type | desc | feat | desc_emb | feat_emb | |
---|---|---|---|---|---|---|---|---|
48 | 16 | IL1R2_(1654) | IL1R2 | gene/protein | IL1R2 belongs to gene/protein node. IL1R2 is i... | MLRLYVLVMGVSAFTLQPAAHTGAARSCRFRGRHYKREFRLEGEPV... | [-0.02899833954870701, 0.0021472955122590065, ... | [-0.0014144123640997008, -0.001413213325978187... |
# Get node indices from the results
[n['node_index'] for n in results[0]]
[16, 82, 59, 4, 64, 51, 75, 18, 73, 31]
# Get the cosine similarity scores
[n['distance'] for n in results[0]]
[1.000000238418579, 0.9777935147285461, 0.9770528078079224, 0.9742822051048279, 0.9739426970481873, 0.9738389253616333, 0.9732135534286499, 0.9724373817443848, 0.9714288711547852, 0.9704809188842773]
Building Edge Collection¶
Subsquently, we are also building the edges collection in Milvus.
Note that the edges information of PrimeKG has massive records, so once again we are chunking the data to avoid memory issues.
%%time
# Define collection name
edge_coll_name = f"{milvus_database}_edges"
# Define schema
edge_fields = [
FieldSchema(name="triplet_index", dtype=DataType.INT64, is_primary=True, auto_id=False),
FieldSchema(name="head_id", dtype=DataType.VARCHAR, max_length=1024),
FieldSchema(name="head_index", dtype=DataType.INT64),
FieldSchema(name="tail_id", dtype=DataType.VARCHAR, max_length=1024),
FieldSchema(name="tail_index", dtype=DataType.INT64),
FieldSchema(name="edge_type", dtype=DataType.VARCHAR, max_length=1024),
FieldSchema(name="display_relation", dtype=DataType.VARCHAR, max_length=1024),
FieldSchema(name="feat", dtype=DataType.VARCHAR, max_length=40960),
FieldSchema(name="feat_emb", dtype=DataType.FLOAT_VECTOR, dim=1536),
]
edge_schema = CollectionSchema(fields=edge_fields, description="Schema for edges collection")
# Create collection if not exists
if not utility.has_collection(edge_coll_name):
collection = Collection(name=edge_coll_name, schema=edge_schema)
else:
collection = Collection(name=edge_coll_name)
# Create indexes
collection.create_index(field_name="triplet_index", index_params={"index_type": "STL_SORT"}, index_name="triplet_index_index")
collection.create_index(field_name="head_index", index_params={"index_type": "STL_SORT"}, index_name="head_index_index")
collection.create_index(field_name="tail_index", index_params={"index_type": "STL_SORT"}, index_name="tail_index_index")
collection.create_index(field_name="feat_emb", index_params={"index_type": "GPU_CAGRA", "metric_type": "IP"}, index_name="feat_emb_index") # AUTOINDEX
# Iterate over chunked edges embedding df
for edges_df in tqdm(edges_embedding_df):
# Merge enrichment with embedding
merged_edges_df = edges_enrichment_df.merge(
edges_df[["triplet_index", "edge_emb"]],
on="triplet_index",
how="inner"
)
# Prepare data fields in column-wise format
# Normalize the embeddings
edges_edge_emb_cp = cp.asarray(merged_edges_df["edge_emb"].list.leaves).astype(cp.float32).reshape(merged_edges_df.shape[0], -1)
edges_edge_emb_norm = normalize_matrix(edges_edge_emb_cp, axis=1)
data = [
merged_edges_df["triplet_index"].to_arrow().to_pylist(),
merged_edges_df["head_id"].to_arrow().to_pylist(),
merged_edges_df["head_index"].to_arrow().to_pylist(),
merged_edges_df["tail_id"].to_arrow().to_pylist(),
merged_edges_df["tail_index"].to_arrow().to_pylist(),
merged_edges_df["edge_type_str"].to_arrow().to_pylist(),
merged_edges_df["display_relation"].to_arrow().to_pylist(),
merged_edges_df["feat"].to_arrow().to_pylist(),
edges_edge_emb_norm.tolist(), # Use normalized embeddings
]
# Insert in chunks
batch_size = 500
for i in tqdm(range(0, len(data[0]), batch_size)):
batch_data = [d[i:i+batch_size] for d in data]
collection.insert(batch_data)
# Flush to ensure persistence
collection.flush()
# Print collection stats
print(collection.num_entities)
time.sleep(5) # Sleep to avoid overwhelming the server
100%|██████████| 10/10 [00:06<00:00, 1.52it/s]
5000
100%|██████████| 9/9 [00:05<00:00, 1.62it/s]
9272
100%|██████████| 4/4 [00:02<00:00, 1.50it/s]
11272
100%|██████████| 3/3 [00:34<00:00, 11.59s/it]
CPU times: user 1.99 s, sys: 402 ms, total: 2.39 s Wall time: 37 s
# List all collections
for coll in utility.list_collections():
print(f"Collection: {coll}")
# Load the collection to get stats
collection = Collection(name=coll)
print(collection.num_entities)
Collection: t2kg_primekg_edges 11272 Collection: t2kg_primekg_nodes_drug 748 Collection: t2kg_primekg_nodes 2991 Collection: t2kg_primekg_nodes_biological_process 1615 Collection: t2kg_primekg_nodes_cellular_component 202 Collection: t2kg_primekg_nodes_molecular_function 317 Collection: t2kg_primekg_nodes_disease 7 Collection: t2kg_primekg_nodes_gene_protein 102
%%time
# Assume node_coll_name is defined and collection exists
collection = Collection('t2kg_primekg_edges')
# Load the collection into memory before query
collection.load()
# Query by expr on triplet_index
expr = "triplet_index == 0"
output_fields = ["triplet_index", "head_id", "tail_id", "edge_type", "feat", "feat_emb"]
results = collection.query(expr, output_fields=output_fields)
results
CPU times: user 10.6 ms, sys: 5.68 ms, total: 16.3 ms Wall time: 1.52 s
data: ["{'tail_id': 'LTF_(3233)', 'edge_type': 'drug|carrier|gene/protein', 'feat': 'Rose bengal (drug) has a direct relationship of drug_protein:carrier with LTF (gene/protein).', 'feat_emb': [-0.011582981, 0.0018133956, 0.0050554257, -0.022483429, -0.024731772, 0.0004805331, -0.0038308818, 0.02338009, -0.018321319, -0.0016402531, -0.0034862699, 0.010552491, -0.0005691955, 0.015858848, -0.0011275172, -0.0024306863, 0.028318414, -0.00040399912, -0.0013274257, -0.0055606337, -0.005219368, 0.027247775, 0.004904867, -0.012713844, -0.011161417, 0.01891017, 0.03642851, -0.03214595, -0.015577804, -0.012165141, 0.009816426, -0.0039312546, -0.02168045, -0.008250616, 0.008712329, -0.033591315, 0.010519033, 0.008431287, 0.028773436, -0.0077286786, 0.024785304, 0.010826842, 0.0058216024, 0.0050018937, -0.0031985354, -0.01072647, 0.021881195, -0.017317595, -0.040898427, 0.0023252952, -0.015136166, 0.011636513, -0.016086359, -0.005972161, -0.007989648, 0.005707847, 0.003285525, 0.0035431476, 0.0048078406, 0.009709362, -0.0065643582, -0.006213055, -0.016139891, 0.011429077, -0.0068721673, -0.0017364434, 0.0026247397, 0.02722101, 0.00970267, 0.015430591, 0.020609811, 0.0072134333, 0.014105675, 0.0036669401, 0.019927278, 0.01457408, -0.033189826, 0.016608296, -0.010525725, -0.007032763, 0.011415694, 0.0053900005, -0.025855944, 0.014547314, 0.018374851, 0.0015850483, -0.016420934, 0.02261726, -0.0011275172, -0.011415694, 0.01072647, 0.022804622, -0.0041085794, 0.003404299, -0.0031868254, -0.00044038412, 0.008103403, 0.037044127, -0.00044414809, -0.03308276, -0.006447257, 0.017893063, -0.025601666, -0.019954044, -0.028960798, -0.01082015, 0.004804495, -0.015497507, 0.015925763, -0.02467824, -0.018870022, 0.04349473, 0.013302696, -0.027836626, 0.012412727, 0.0037305094, 0.017277446, -0.015189698, 0.0038308818, -0.00875917, 0.030272331, 0.012051386, 0.008578499, 0.0053665806, 0.02257711, 0.0203823, -0.004710814, -0.0057111927, -0.024611326, 0.0053799637, 0.001222871, 0.0058416766, -0.008397829, 0.016139891, -0.0074141785, 0.017237296, -0.040737834, -0.023125812, -0.017826147, -0.019325044, 0.0087792445, 0.022563728, -0.029656714, 0.0097762775, -0.017973362, 0.022189004, 0.018642511, 0.012031311, -0.0150157185, -0.0055807084, 0.01274061, -0.015457357, -0.00011375544, -0.020971151, 0.00086236664, 0.002342024, 0.00010329998, 0.016996402, 0.0032955622, 0.013630579, -0.005410075, 0.01591238, 0.01457408, -0.00031721877, 0.039640427, 0.031771228, 0.014922038, -0.0073807207, 0.0036267913, -0.015069251, 0.021104982, -0.0051591443, -0.026658922, 0.027033648, 0.030325864, 0.0071331356, 0.010585948, -0.012017928, -0.007815668, -0.020087874, -0.0078022853, 0.0255749, 0.038409192, 0.011957705, -0.03067382, -0.02588271, 0.018200872, -0.02874667, -0.0055439053, -0.011723503, -0.0056074746, 0.024878984, 0.011723503, -0.014252888, -0.6616552, -0.037659746, 0.0048479894, -0.0020057762, 0.02722101, 0.023567451, -0.0028388675, 0.014226122, 0.0008042343, 0.019057384, -0.017852914, 0.019017234, 0.008518276, 0.009635756, 0.007494476, -0.016875956, -0.0022165584, -0.018896786, 0.033832207, 0.0077621364, -0.012760685, 0.0070662205, -0.017959978, 0.016661828, 0.02230945, 0.029656714, 0.020087874, -0.015136166, -0.004403005, 0.0040115523, -0.020168172, 0.03856979, 0.032386847, 0.012078152, 0.038944513, -0.012687078, -0.005794836, 0.023888644, 0.011248406, 0.008518276, -0.009682596, -0.008685563, 0.0016569819, -0.010646172, -0.017973362, 0.0164477, 0.016072975, 0.0029208383, -0.014346569, 0.0023805, -0.0036267913, 0.0053297775, 0.002765261, -0.022791239, -0.004884793, 0.014667761, 0.015109399, -0.030995013, 0.0020007577, -0.0059621236, -0.018374851, -0.009087053, -0.0026531785, -0.011208258, 0.0055807084, -0.0035264187, -0.012038003, 0.0006595307, 0.0153636765, 0.016661828, 0.021171896, 0.017491573, 0.0038108074, 0.017893063, -0.00849151, 0.035652295, 0.017812764, 0.011676662, -0.010646172, 0.01783953, -0.0032068999, 0.00239221, 0.011248406, 0.0076082316, 0.023273027, -0.0010823497, -0.025936242, 0.00097110344, 0.001997412, -0.023714665, 0.002872325, 0.026980115, -0.026725838, -0.023768196, -0.023848495, 0.0116030555, -0.013891547, -0.0016511268, 0.034073103, -0.030700587, 0.008116786, -0.014627612, 0.014293037, 0.00251433, 0.014975569, -0.0014077236, -0.021800896, 0.029121393, 0.013289313, -0.0034528123, -0.0090067545, -0.014694527, -0.00065158453, -0.006530901, -0.010753236, -0.020917619, -0.0030262293, 0.0067483746, 0.0028422133, -0.011469225, 0.010003787, 0.00042219163, -0.0009903415, -0.012452875, -0.00938817, 0.031583868, -0.028853733, -0.00587848, 0.0041989144, -0.0025310586, -0.008056562, -0.020850705, 0.017732468, -0.0025193486, 0.014667761, -0.01913768, 0.019699767, -0.00015045413, -0.002024178, -0.027622499, -0.018655894, 0.0010848589, 0.0097160535, -0.0024674896, -0.02714071, -0.016367402, -0.010920523, 0.008263999, -0.015229846, -0.010913831, -0.0027318036, -0.020248469, -0.012747302, 0.009695979, 0.03881068, -0.0035832964, -0.017183764, 0.009997097, -0.0047710375, -0.015229846, 0.021894578, -0.01063948, 0.004576984, -0.012867749, -0.0071732844, -0.010612714, 0.017197147, 0.039801024, -0.012586705, -0.013918313, -0.0028689792, -0.03843596, 0.002940913, 0.02914816, -0.008043179, 0.0026899818, -0.014640995, 0.00032934712, 0.020435832, -0.017090084, 0.007655072, 0.015537655, -0.0045702923, -0.0069993054, 0.03329689, 0.012861057, -0.013530207, 0.0068788584, 0.0033139638, 0.031717695, -0.050079163, 0.015591187, -0.01577855, -0.004004861, 0.0024290134, -0.008712329, 0.02655186, 0.0015666467, -0.0053130486, -0.0032754876, 0.00034858516, -0.025802411, 0.011301938, -0.029924374, -0.002062654, -0.010023862, -0.01747819, -0.009488543, 0.015711635, 0.006701534, 0.008785935, 0.015256613, -0.0090067545, -0.0021764094, -0.0002743514, 0.011368853, 0.014105675, -0.010345054, 0.020181555, -0.013985228, -0.0028087557, 0.004764346, 0.020636577, -0.0022182313, -0.010793384, -0.0039145257, -0.00391118, 0.007461019, 0.008464743, -0.010452118, -0.0073472634, 0.003324001, 0.026498327, 0.011034278, -0.015457357, -0.03583966, 0.014239505, -0.01694287, 0.030887948, -0.008919765, -0.00083350955, -0.0011843949, 0.024785304, 0.0027602424, 0.015497507, 0.018053658, 0.013971846, 0.03562553, -0.006848747, 0.0013851399, -0.009803043, 0.020542895, -0.026832903, -0.0137175685, -0.003867685, -0.004138691, 0.011288555, 0.032493908, 0.03522404, 6.2157735e-05, 0.007561391, 0.037044127, -0.0016737106, 0.0068520927, -0.019619469, 0.036589105, 0.007179976, -0.008819393, -0.016715359, -0.010017171, -0.026966732, 0.0011082792, 0.009287798, -0.033644848, 0.019391958, 0.0067115715, -0.0012755666, 0.01725068, 0.014493782, -0.002276782, -0.012446184, -0.017973362, 0.022416515, 0.042129666, -0.004379585, -0.0036100624, -0.013443217, 0.01810719, -0.007561391, 0.023099046, 0.04657282, 0.013122025, 0.034688722, 0.016795658, -0.010933906, -0.012539865, 0.0020041033, -0.018655894, 0.011322013, -0.017090084, -0.015617954, 0.010840225, -0.03723149, -0.0007013525, 0.043387666, -0.0131889405, -0.014480399, -0.0007682675, 0.013476674, -0.022068556, 0.03294893, 0.0032002083, -0.004991857, -0.01717038, 0.0076751467, -0.0009811408, -0.00061394484, 0.032788336, 0.010974055, -0.026966732, -0.00550041, -0.029549649, 0.0014821666, 0.013490058, 0.018562213, 0.032654505, -0.010659555, -0.010552491, 0.0007887602, -0.014199356, -0.003083107, -0.026431412, 0.0056442777, -0.00045083958, 0.0026899818, -0.024169687, 0.0021646994, -0.024624709, -0.007046146, 0.024517644, -0.006186289, -0.04868733, 0.022215769, 0.0038710309, 0.0007665946, 0.009020138, 0.015269996, 0.011168108, -0.0074141785, -0.020128023, 0.017906446, 0.028960798, 0.019338425, -0.011723503, 0.007775519, 0.019646235, -0.0049115587, -0.0025745535, 0.00077412254, -0.0033474213, -0.009615681, 0.007353955, 0.013316078, -0.0037238179, 0.0132491635, 0.018495297, 0.036134083, -0.0025260402, 0.010492267, -0.0039212173, -0.011214949, -0.006273278, -0.012580014, -0.0066814595, 0.009722745, -0.012820908, -0.01537706, -0.016661828, 0.01824102, -0.006079225, 0.010338362, -0.009220883, -0.010177767, -0.01430642, 0.0062431665, -0.014855123, 0.011429077, -0.0029810618, 0.015122782, 0.019873746, -0.026846286, -0.024102772, 0.0014821666, -0.011763652, -0.008317531, -0.009227574, -0.0021948111, 0.015082634, 0.029844075, 0.0112417145, 0.0054435325, -0.005624203, 0.010177767, 0.021961493, -0.027435137, -0.0078022853, 0.007494476, -0.023326557, -0.0037171263, -0.0016837479, 0.013784483, -0.017290829, -0.012493025, -0.00014229887, -0.0012680386, 0.0009384824, 0.020368917, -0.010378512, -0.01564472, 0.007989648, 0.008230542, 0.011522758, -0.019325044, -0.007099678, 0.020636577, -0.024397198, -0.023540685, 0.009950256, 0.010030554, -0.001768228, 0.016487848, -0.010612714, -0.024638092, 0.008926457, -0.004536835, -0.01877634, -0.0078022853, 0.0024139576, 0.011596364, 0.01162313, -0.020181555, 0.010753236, 0.02481207, -0.00938817, -0.00036008618, -0.01564472, -0.000392289, -0.0037137808, -0.019418724, 0.005450224, -0.0008544205, -0.03195859, -0.019673001, 0.015269996, 0.005968815, 0.011576289, 0.013851399, -0.01783953, -0.033109527, -0.007440944, -0.026324349, 0.013329461, -0.0050621172, -0.005657661, -0.028452244, 0.0010438735, 0.024116155, -0.017304212, 0.007815668, -0.025789028, -0.0056844265, 0.005637586, -0.0069256993, 0.0044933404, -0.010485576, 0.006497443, -0.036910295, 0.005550597, -0.018829873, -0.03482255, -0.0071866675, -0.009495234, 0.027863393, 0.021131746, 0.03755268, -0.023741432, 0.033055995, 0.0073472634, 0.021533238, 0.0342337, 0.0018886749, 0.01050565, -0.030914715, 0.014908655, 0.032199483, -0.009756203, 0.021626918, 0.004854681, -0.006253204, 0.031342972, -0.013784483, -0.007655072, -0.0035832964, -0.019793447, -0.033136293, -0.0010480557, -0.011288555, 0.0065476294, -0.04162111, -0.01622019, 0.019365191, -0.031985357, 0.00047551448, 0.012559939, 0.02284477, -0.040496938, 0.023366706, -6.743774e-06, -0.013690802, -0.021841045, -0.027970456, 0.02681952, -0.020823939, 0.01998081, -0.028559308, 0.014814974, -0.0042156433, -0.01940534, 0.006554321, 0.0020977843, -0.0035130358, 0.0009284452, -0.007340572, -0.012104917, -0.010164384, -0.03803447, -0.028050754, -0.030299097, -0.015002335, 0.02137264, -0.026297582, 0.027783094, -0.008933148, -0.01824102, 0.018522063, 0.009281106, 0.025066348, -0.03035263, 0.04437801, 0.034581657, 0.0021764094, -0.011094502, 0.004500032, 0.0011258443, -0.020101257, 0.0001032477, 0.022496812, -0.025146645, -0.0057747616, 0.013650653, 0.025414305, -0.028960798, -0.0056810807, -0.0020041033, -0.01591238, 0.0153636765, -0.01184395, 0.0030797615, -0.015818698, -0.002820466, -0.03329689, 0.0054468783, -0.008933148, 0.004275867, -0.0053531975, -1.09128905e-05, 0.01757187, 0.016862573, 0.0030764157, 0.0051056123, -0.01631387, 0.005169181, -0.009943564, -0.027287925, -0.01787968, 0.035705827, 0.01751834, 0.020261852, -0.004453191, -4.2788986e-05, -0.022336217, -0.0011935958, -0.0037271637, 0.03190506, -0.0018702734, -0.007835743, 0.011288555, 0.02874667, 9.498789e-05, 0.0049282876, -0.0052728998, -0.028291648, -0.019338425, -0.010438736, 0.018883405, 0.0131421, -0.020502746, 0.008832776, -0.020676725, -0.004540181, -0.010579256, -0.017290829, 0.01783953, -0.043414433, -0.010318289, 0.009287798, 0.0075346255, 0.019124297, 0.006898933, -0.006741683, 0.0007737043, 0.030807652, -0.04815201, 0.015069251, -0.009160659, 0.030272331, -0.01828117, 0.025762262, -0.020676725, -0.035866424, 0.020235086, -0.026806137, -0.030593524, 0.0028371946, -0.009602298, -0.005794836, 0.01877634, -0.03008497, 0.012774067, -0.00086738524, 0.021078216, -0.01457408, -0.0035331103, -0.010264756, -0.02181428, 0.006514172, 0.028398711, -0.027568966, -0.024745155, -0.011656588, 0.00827069, -0.0037003977, 0.0031416577, -0.0074810935, -0.0066814595, 0.03195859, -0.009153968, 0.017719084, 0.016099742, -0.01166997, -0.026136987, 0.026658922, 0.017090084, -0.01791983, 0.013389685, 0.011656588, -0.00083350955, 0.014868505, -0.022215769, -0.0047208513, 0.0058316393, -0.014426867, -0.013945079, 0.017812764, -0.0060357302, 0.038650088, -0.015443974, -0.051123034, -0.019525789, -0.010271448, -0.025387539, 0.0087792445, 0.0039346004, 0.008043179, -0.035652295, 0.0009225901, 0.0024976013, 0.019231362, 0.0021345876, 0.025896093, 0.013423143, 0.014587463, 0.008859542, -0.037900638, 0.02901433, 0.012038003, -0.021466322, -0.013684111, 0.014279654, -0.0010781675, -0.0006235639, 0.037713278, -0.029121393, 0.018428383, -0.0127205355, 0.016206807, -0.001970646, 0.010699703, 0.008170318, -0.010151001, -0.0029041097, 0.024611326, 0.027836626, -0.004958399, 0.004292595, 0.018120574, 0.026498327, 0.007634998, -0.008230542, -0.001781611, 0.011395619, 0.002340351, 0.023781579, 0.0009359731, -0.01725068, -0.0062164003, 0.017009785, -0.009060287, -0.009562149, -0.020542895, 0.0060491133, -0.015631337, -0.0019673002, 0.004707468, 0.008638723, 0.008424595, 0.005520485, -0.009588915, -0.0034795783, -0.017665552, -0.026257433, -0.004172148, 0.038409192, 0.0021513163, -0.014226122, -0.0024574522, -0.017344361, -0.007929424, -0.011134651, 0.002872325, 0.015965912, -0.017304212, -0.018040275, -0.0015858847, -0.0029743703, -0.019070767, -0.0131421, -0.0033557857, -0.017504957, 0.020877471, 0.19517758, -0.0109539805, 0.00978966, 0.016688593, -0.019726533, -0.009695979, -0.011181491, 0.001375939, 0.0057346127, 0.013202324, 0.0008816047, 0.026391264, -0.007407487, -0.0021111674, -0.0012672022, 0.0016854207, -0.03244038, -0.010887065, -0.015296762, 0.0007916877, -0.0009133893, -0.0010229626, -0.00040169893, -0.017893063, 0.033832207, -0.00489483, -0.015443974, -0.012319046, 0.0031617323, -0.011315322, -0.00018621181, 0.009401553, -0.014266271, -0.007829051, 0.0058048735, 0.008826084, -0.010719778, 0.0013458274, 0.016862573, -0.014640995, 0.0032621047, 0.017317595, 0.0055807084, -0.0236076, -0.01688934, 0.012278897, -0.0087190205, -0.007300423, -0.007815668, 0.00992349, -0.021854429, 0.016969636, 0.012031311, 0.009281106, -0.021841045, -0.011716811, 0.011910864, 0.008337606, -0.0008941513, 0.01877634, -0.008765861, 0.016916104, -0.017143615, 0.009301181, -0.020020958, 0.024129538, -0.026083454, 0.011944322, 0.04515422, -0.032654505, -0.009133893, -0.0068019065, 0.007902658, 0.021131746, -0.016916104, -0.016113125, 0.001668692, 0.012392652, 0.024290133, 0.028184583, 0.0010438735, -0.0073338803, -0.038248595, 0.0018836564, -0.021707216, -0.0022784548, -0.000107900385, 0.0069457735, -0.023072282, 0.0001611187, 0.021104982, -0.019325044, -0.008524967, -0.01680904, 0.018388234, -0.0038643393, 0.019432107, -0.003854302, -0.0025778993, -0.016340636, -0.052381035, 0.06306067, -0.0039011426, -0.0065743956, -0.017986745, 0.017732468, -0.012948046, 0.012934663, 0.007179976, -0.0063100816, 0.0036200997, -0.023928793, -0.002286819, -0.017090084, 0.0042892494, -0.0013541917, 0.0015833754, -0.025789028, 0.011409002, 0.0047442713, -0.00853835, -0.033216592, -0.00073313713, 0.02458456, -0.008424595, 0.01568487, -0.010880374, -0.004831261, -0.009314564, -0.045609243, 0.019391958, 0.011716811, -0.0052126762, 0.017223913, 0.008632031, 0.0103651285, 0.006932391, -0.021586768, -0.032788336, 0.008036488, -0.006008964, 0.0006913153, -0.00087324035, -0.02351392, -0.0031684237, -0.020971151, -0.010077395, 0.01680904, -0.018294552, -0.01654138, -0.016380785, -0.014493782, 0.0015725017, -0.016795658, 0.020971151, -0.030593524, -0.02261726, -0.03495638, 0.033216592, -0.007514551, -0.027261158, -0.008096712, -0.006795215, -0.02821135, 0.022523578, -0.016139891, -0.17162351, 0.0016427625, 0.0025344044, -0.022269301, -0.011536141, 0.0038911053, 0.002260053, -0.0044230795, -0.010599331, 0.026257433, 0.03915864, 0.002447415, -0.027354838, -0.013209014, -0.007494476, 0.005972161, -0.03875715, 0.028023988, 0.028853733, 0.0041319993, 0.013918313, -0.03396604, 0.00034879427, -0.023192728, 0.012934663, 0.0069725397, 0.0039847866, -0.0015172969, 0.006912316, -0.004229026, -0.050453886, 0.011168108, 0.0061059906, 0.00012598834, 0.02928199, 0.01417259, 0.0061762515, -0.008384446, -0.0030178651, 0.0026113566, 0.0072134333, 0.03683, -0.0102446815, 0.008056562, -0.020221703, 0.035786126, 0.023393473, -0.003456158, 0.022068556, -0.0009819772, 0.02400909, 0.0033474213, 0.01940534, 0.03589319, 0.031637397, 0.008183701, -0.0108603, 0.00862534, -0.010311597, 0.0026548514, -0.013222397, -0.028398711, 0.005132378, 0.0012989868, 0.0042624837, -0.019164447, -0.024290133, -0.026029922, -0.037285022, 0.004229026, -0.028157819, -0.01050565, -0.00021046848, -0.0015415535, 0.011248406, 0.0054903734, -0.013603813, 0.012325737, 0.01980683, -0.0019371883, -0.0045468723, 0.026297582, -0.014627612, 0.0045301435, -0.000635274, -0.0026314312, 0.027435137, 0.008979989, -0.015136166, -0.017598636, 0.03915864, -0.04009545, 0.017197147, -0.019472256, 0.026699072, 0.010840225, 0.0018418345, -0.0035331103, -0.0027384951, -0.006263241, 0.03294893, -0.010612714, -0.012171833, -0.0009092071, 0.03316306, 0.019472256, -0.025802411, 0.009341329, 0.031182375, -0.0097762775, -0.0061561773, 0.00039751673, 0.011522758, 0.038088, -0.017906446, 0.0131287165, 0.018950319, -0.020837322, 0.016233573, -0.023299793, 0.07034101, 0.01881649, 0.03236008, 0.04009545, 0.006992614, -0.0126000885, -0.09716053, -0.04140698, -0.005985544, 0.013449908, -0.013784483, 0.018174106, -0.0236879, 0.018843256, 0.021747366, 0.02821135, -0.00056459504, -0.034126636, -0.030486459, -0.0060491133, 0.022470046, 0.015818698, 0.02517341, 0.00037116898, -0.0033156367, 0.027729563, 0.0083108395, 2.1629741e-05, 0.015658103, 0.005654315, 0.015765168, -8.945956e-06, -0.041674644, -0.007929424, -0.006008964, 0.018950319, 0.01684919, -0.015885614, 0.0136573445, 0.00043118332, -0.01828117, -0.00728704, 0.012252131, 0.00478442, 0.025374155, -0.0057714162, -0.0025628433, -0.0076751467, 0.02521356, 0.008705637, -0.006597816, -0.01068632, -0.004777729, -0.009970331, 0.0111748, -0.027997222, -0.01654138, -0.03094148, -0.027729563, -0.0083576795, 0.019780064, -0.025775645, 0.009107128, 0.017692318, -0.019284895, -0.012486333, -0.02981731, 0.009803043, -0.014226122, 0.023928793, -0.010539108, -0.007855818, -0.0002816702, -0.035170507, 0.028586075, -0.015791934, -0.005065463, 0.014279654, -0.018388234, -0.010077395, -0.0537461, 0.004081813, -0.019726533, -0.00862534, 0.032386847, -0.053050186, -0.0036937061, -0.008043179, 0.030058203, -0.037713278, 0.016032826, 0.02177413, -0.012907897, 0.011616439, -0.01100082, -0.0020844014, 0.001927151, 0.014333186, -0.011830566, -0.01591238, 0.0042122975, 0.013971846, -0.023888644, 0.014828357, 0.014065526, 5.2996133e-06, -0.017264063, -0.017598636, -0.054468784, 0.025360772, 0.005824948, 0.011107884, 0.021934727, -0.006567704, 0.007969573, 0.0036903603, 0.008739095, 0.016246954, 0.0030396124, 0.011777034, 0.00013728024, -0.027435137, -0.0013173885, -0.028398711, 0.030727353, 0.021399407, 0.039265703, 0.03567906, 0.0020158135, 0.008645414, 0.0009719399, 0.012559939, -0.024758538, 0.0020844014, 0.0043896222, 0.02015479, -0.0120112365, -0.0017347705, -0.0019857017, -0.040389877, -0.015256613, -0.004734234, -0.029362287, -0.01444025, -0.0062966985, 0.022630643, 0.028371947, 0.008471435, -0.017411275, -0.010646172, 0.0111748, -0.016795658, -0.0004709141, -0.007775519, -0.03616085, -0.008886308, 0.003576605, 0.022362983, 0.009354712, 0.0062465123, -0.012211981, -0.016126508, -0.023982324, -0.012486333, 0.0064673317, -0.015269996, 0.0011400637, 0.0055439053, 0.010017171, 0.016126508, 0.011268481, 0.005727921, 0.0050621172, -0.004309324, -0.009040212, -0.0042992868, 0.014386718, -0.023085665, -0.040256046, 0.005637586, -0.0012337448, 0.0034896156, 0.005011931, 0.0030161922, -0.0017966669, -0.002554479, -0.026310965, 0.009642447, 0.014453633, -0.019499023, -0.011696736, 0.011736886, 0.005570671, 0.015551038, -0.0074810935, 0.0039981697, 0.022229152, 0.00077203143, 0.00066329463, 0.0132491635, 0.0005926157, 0.007815668, -0.004934979, 0.015725018, 0.010846917, -0.010746544, -0.0071063694, 0.03008497, -0.0035331103, -0.009829809, -0.02624405, -0.022684174, -0.0019238053, 0.023353323, -0.006320119, -0.034581657, -0.025829177, 0.0022918377, 0.0021345876, -0.010626097, -0.00849151, -0.0027803169, -0.019552555, 0.0044096964, -0.011094502, -0.005794836, 0.0026431412, 0.0137175685, 0.0013508459, 0.003563222, 0.011148034, 0.013985228, 0.019325044, 0.02592286, 0.029603181, -0.009107128, 0.027435137, -0.0124796415, -0.0062799696, 0.016755508, -0.023982324, 0.005242788, 0.008732404, 0.0022416515, 0.009762894, 0.007568083, -0.009421628, 0.08854189, 0.00036928698, -0.010512342, -0.008632031, 0.005517139, 0.01658153, -0.011469225, 0.030325864, -0.01654138, -0.019967427, 0.015403826, -0.014988952, 0.013757717, -0.035705827, -0.01810719, 0.008243924, 0.009622373, -0.013463291, -0.01810719, -0.008036488, 0.007942807, 0.0020024306, -0.0027368222, -0.0011643205, -0.017531723, 0.010980747, 0.034180168, 0.022470046, -0.0009744492, -0.021760749, -0.016354019, -0.0049650907, -0.02490575, 0.00094768323, 0.028960798, -0.012954738, -0.020810556, -0.020435832, 0.013075185, 0.01944549, -0.01332277, 0.0105457995, -0.024705006, -0.02391541, 0.005747996, -0.0046004043, -0.017558489, -0.021800896, -0.013336153], 'triplet_index': 0, 'head_id': 'Rose bengal_(14118)'}"]
# Check the ground truth for the search
results[0]['triplet_index'], results[0]['head_id'], results[0]['tail_id'], results[0]['edge_type'], results[0]['feat']
(0, 'Rose bengal_(14118)', 'LTF_(3233)', 'drug|carrier|gene/protein', 'Rose bengal (drug) has a direct relationship of drug_protein:carrier with LTF (gene/protein).')
%%time
# Assume node_coll_name is defined and collection exists
collection = Collection('t2kg_primekg_edges')
# Load the collection into memory before query
collection.load()
# Vector similarity search in Milvus
vector_to_search = np.array(results[0]['feat_emb']).tolist() # merged_edges_df["edge_emb"].iloc[0]
search_params = {"metric_type": "IP"}
results = collection.search(
data=[vector_to_search],
anns_field="feat_emb",
param=search_params,
limit=10,
output_fields=["head_id", "tail_id", "edge_type", "feat"]
)
results
CPU times: user 2.89 ms, sys: 0 ns, total: 2.89 ms Wall time: 9.52 ms
data: [[{'triplet_index': 0, 'distance': 1.000000238418579, 'entity': {'edge_type': 'drug|carrier|gene/protein', 'feat': 'Rose bengal (drug) has a direct relationship of drug_protein:carrier with LTF (gene/protein).', 'head_id': 'Rose bengal_(14118)', 'tail_id': 'LTF_(3233)'}}, {'triplet_index': 5636, 'distance': 0.9769061207771301, 'entity': {'edge_type': 'gene/protein|carrier|drug', 'feat': 'LTF (gene/protein) has a direct relationship of drug_protein:carrier with Rose bengal (drug).', 'head_id': 'LTF_(3233)', 'tail_id': 'Rose bengal_(14118)'}}, {'triplet_index': 196, 'distance': 0.900477409362793, 'entity': {'edge_type': 'drug|target|gene/protein', 'feat': '3h-Indole-5,6-Diol (drug) has a direct relationship of drug_protein:target with LTF (gene/protein).', 'head_id': '3h-Indole-5,6-Diol_(18278)', 'tail_id': 'LTF_(3233)'}}, {'triplet_index': 199, 'distance': 0.898104727268219, 'entity': {'edge_type': 'drug|target|gene/protein', 'feat': 'alpha-D-Fucopyranose (drug) has a direct relationship of drug_protein:target with LTF (gene/protein).', 'head_id': 'alpha-D-Fucopyranose_(18280)', 'tail_id': 'LTF_(3233)'}}, {'triplet_index': 198, 'distance': 0.8961387872695923, 'entity': {'edge_type': 'drug|target|gene/protein', 'feat': 'Nitrilotriacetic acid (drug) has a direct relationship of drug_protein:target with LTF (gene/protein).', 'head_id': 'Nitrilotriacetic acid_(18279)', 'tail_id': 'LTF_(3233)'}}, {'triplet_index': 197, 'distance': 0.8955156803131104, 'entity': {'edge_type': 'drug|target|gene/protein', 'feat': 'Lauric acid (drug) has a direct relationship of drug_protein:target with LTF (gene/protein).', 'head_id': 'Lauric acid_(14354)', 'tail_id': 'LTF_(3233)'}}, {'triplet_index': 201, 'distance': 0.8937671780586243, 'entity': {'edge_type': 'drug|target|gene/protein', 'feat': '(R)-Atenolol (drug) has a direct relationship of drug_protein:target with LTF (gene/protein).', 'head_id': '(R)-Atenolol_(18281)', 'tail_id': 'LTF_(3233)'}}, {'triplet_index': 5837, 'distance': 0.8916683197021484, 'entity': {'edge_type': 'gene/protein|target|drug', 'feat': 'LTF (gene/protein) has a direct relationship of drug_protein:target with (R)-Atenolol (drug).', 'head_id': 'LTF_(3233)', 'tail_id': '(R)-Atenolol_(18281)'}}, {'triplet_index': 202, 'distance': 0.8896037340164185, 'entity': {'edge_type': 'drug|target|gene/protein', 'feat': 'Parecoxib (drug) has a direct relationship of drug_protein:target with LTF (gene/protein).', 'head_id': 'Parecoxib_(15587)', 'tail_id': 'LTF_(3233)'}}, {'triplet_index': 5832, 'distance': 0.8886452913284302, 'entity': {'edge_type': 'gene/protein|target|drug', 'feat': 'LTF (gene/protein) has a direct relationship of drug_protein:target with 3h-Indole-5,6-Diol (drug).', 'head_id': 'LTF_(3233)', 'tail_id': '3h-Indole-5,6-Diol_(18278)'}}]]
# Get node indices from the results
[n['triplet_index'] for n in results[0]]
[0, 5636, 196, 199, 198, 197, 201, 5837, 202, 5832]
# Get the cosine similarity scores
[n['distance'] for n in results[0]]
[1.000000238418579, 0.9769061207771301, 0.900477409362793, 0.898104727268219, 0.8961387872695923, 0.8955156803131104, 0.8937671780586243, 0.8916683197021484, 0.8896037340164185, 0.8886452913284302]