Vector Normalization
Vector normalization utilities for GPU COSINE similarity support. Since GPU indexes don't support COSINE distance, we normalize vectors and use IP (Inner Product) distance instead.
NormalizingEmbeddings
Bases: Embeddings
Wrapper around an embedding model that automatically normalizes outputs. This is needed for GPU indexes when using COSINE similarity.
Source code in aiagents4pharma/talk2scholars/tools/pdf/utils/vector_normalization.py
70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 |
|
__getattr__(name)
Delegate other attributes to the underlying model.
Source code in aiagents4pharma/talk2scholars/tools/pdf/utils/vector_normalization.py
112 113 114 |
|
__init__(embedding_model, normalize_for_gpu=True)
Initialize the normalizing wrapper.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
embedding_model
|
Embeddings
|
The underlying embedding model |
required |
normalize_for_gpu
|
bool
|
Whether to normalize embeddings (for GPU compatibility) |
True
|
Source code in aiagents4pharma/talk2scholars/tools/pdf/utils/vector_normalization.py
76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 |
|
embed_documents(texts)
Embed documents and optionally normalize.
Source code in aiagents4pharma/talk2scholars/tools/pdf/utils/vector_normalization.py
92 93 94 95 96 97 98 99 100 |
|
embed_query(text)
Embed query and optionally normalize.
Source code in aiagents4pharma/talk2scholars/tools/pdf/utils/vector_normalization.py
102 103 104 105 106 107 108 109 110 |
|
normalize_vector(vector)
Normalize a single vector to unit length.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
vector
|
Union[List[float], ndarray]
|
Input vector as list or numpy array |
required |
Returns:
Type | Description |
---|---|
List[float]
|
Normalized vector as list |
Source code in aiagents4pharma/talk2scholars/tools/pdf/utils/vector_normalization.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
|
normalize_vectors_batch(vectors)
Normalize a batch of vectors to unit length.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
vectors
|
List[List[float]]
|
List of vectors |
required |
Returns:
Type | Description |
---|---|
List[List[float]]
|
List of normalized vectors |
Source code in aiagents4pharma/talk2scholars/tools/pdf/utils/vector_normalization.py
37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 |
|
should_normalize_vectors(has_gpu, use_cosine)
Determine if vectors should be normalized based on hardware and similarity metric.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
has_gpu
|
bool
|
Whether GPU is being used |
required |
use_cosine
|
bool
|
Whether COSINE similarity is desired |
required |
Returns:
Type | Description |
---|---|
bool
|
True if vectors should be normalized |
Source code in aiagents4pharma/talk2scholars/tools/pdf/utils/vector_normalization.py
117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 |
|
wrap_embedding_model_if_needed(embedding_model, has_gpu, use_cosine=True)
Wrap embedding model with normalization if needed for GPU compatibility.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
embedding_model
|
Embeddings
|
Original embedding model |
required |
has_gpu
|
bool
|
Whether GPU is being used |
required |
use_cosine
|
bool
|
Whether COSINE similarity is desired |
True
|
Returns:
Type | Description |
---|---|
Embeddings
|
Original or wrapped embedding model |
Source code in aiagents4pharma/talk2scholars/tools/pdf/utils/vector_normalization.py
142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 |
|