Embeddings
An embedding model is a tool that converts text data into a vector representation. The quality of the embedding model
is crucial for the quality of the search results. Currently, django-semantic-search
supports just a single integration
with the vector embedding models:
Sentence Transformers
The Sentence Transformers library provides a way to convert text data into a vector representation. There are over 5,000 pre-trained models available, and you can choose the one that fits your needs the best.
One of the available models is all-MiniLM-L6-v2
, which is a lightweight model that provides a good balance between the
quality of the search results and the resource consumption.
django_semantic_search.embeddings.SentenceTransformerModel
Bases: BaseEmbeddingModel
, TextEmbeddingMixin
Sentence-transformers model for embedding text.
It is a wrapper around the sentence-transformers library. Users would rarely need to use this class directly, but rather specify it in the Django settings.
Requirements:
Usage:
SEMANTIC_SEARCH = {
"default_embeddings": {
"model": "django_semantic_search.embeddings.SentenceTransformerModel",
"configuration": {
"model_name": "sentence-transformers/all-MiniLM-L6-v2",
},
},
...
}
Some models accept prompts to be used for the document and query. These prompts are used as additional
instructions for the model to generate embeddings. For example, if the document_prompt
is set to "Doc: "
, the
model will generate embeddings with the prompt "Doc: "
followed by the document text. Similarly, the
query_prompt
is used for the query, if set.
SEMANTIC_SEARCH = {
"default_embeddings": {
"model": "django_semantic_search.embeddings.SentenceTransformerModel",
"configuration": {
"model_name": "sentence-transformers/all-MiniLM-L6-v2",
"document_prompt": "Doc: ",
"query_prompt": "Query: ",
},
},
...
}