Embeddings
An embedding model is a tool that converts text data into a vector representation. The quality of the embedding model
is crucial for the quality of the search results. Currently, django-semantic-search
supports just a single integration
with the vector embedding models:
The Sentence Transformers library provides a way to convert text data into a vector
representation. There are over 5,000 pre-trained models
available, and you can choose the one that fits your needs the
best.
One of the available models is all-MiniLM-L6-v2
, which is a lightweight model that provides a good balance between the
quality of the search results and the resource consumption.
Bases: BaseEmbeddingModel
, TextEmbeddingMixin
Sentence-transformers model for embedding text.
It is a wrapper around the sentence-transformers library. Users would rarely need to use this class directly, but
rather specify it in the Django settings.
Requirements:
pip install django-semantic-search[sentence-transformers]
Usage:
settings.pySEMANTIC_SEARCH = {
"default_embeddings": {
"model": "django_semantic_search.embeddings.SentenceTransformerModel",
"configuration": {
"model_name": "sentence-transformers/all-MiniLM-L6-v2",
},
},
...
}
Source code in src/django_semantic_search/embeddings/sentence_transformers.py
| class SentenceTransformerModel(BaseEmbeddingModel, TextEmbeddingMixin):
"""
Sentence-transformers model for embedding text.
It is a wrapper around the sentence-transformers library. Users would rarely need to use this class directly, but
rather specify it in the Django settings.
**Requirements:**
```shell
pip install django-semantic-search[sentence-transformers]
```
**Usage:**
```python title="settings.py"
SEMANTIC_SEARCH = {
"default_embeddings": {
"model": "django_semantic_search.embeddings.SentenceTransformerModel",
"configuration": {
"model_name": "sentence-transformers/all-MiniLM-L6-v2",
},
},
...
}
```
"""
def __init__(
self,
model_name: str,
document_prompt: Optional[str] = None,
query_prompt: Optional[str] = None,
):
"""
Initialize the sentence-transformers _model.
:param model_name: name of the model to use.
"""
from sentence_transformers import SentenceTransformer
self._model = SentenceTransformer(model_name)
self._document_prompt = document_prompt
self._query_prompt = query_prompt
def vector_size(self) -> int:
"""
Return the size of the individual embedding.
:return: size of the embedding.
"""
return self._model.get_sentence_embedding_dimension()
def embed_document(self, document: str) -> Vector:
"""
Embed a document into a vector.
:param document: document to embed.
:return: document embedding.
"""
return self._model.encode(document, prompt=self._document_prompt).tolist()
def embed_query(self, query: str) -> Vector:
"""
Embed a query into a vector.
:param query: query to embed.
:return: query embedding.
"""
return self._model.encode(query, prompt=self._query_prompt).tolist()
|