Skip to content

Documents

django-semantic-search was designed to mimic some of the patterns used in popular Django libraries, such as django-import-export to reduce the learning curve for new users.

The base concept of the library is a Document subclass that represents a single searchable entity. The library provides a way to define a document class for a selected model. The document class is responsible for converting the model instances into the vector representation and storing them in the vector search engine, as well as for performing the search queries.

Documents

django_semantic_search.Document

Bases: ABC, Generic[T]

Base class for all the documents. There is a one-to-one mapping between the document subclass and the model class, to configure how a specific model instances should be converted to a document.

Usage:

products/models.py
from django.db import models

class Product(models.Model):
    name = models.CharField(max_length=255)
    description = models.TextField()
products/documents.py
from django_semantic_search import Document, VectorIndex
from django_semantic_search.decorators import register_document

@register_document
class ProductDocument(Document):
    class Meta:
        model = Product
        indexes = [
            VectorIndex("name"),
            VectorIndex("description"),
        ]

django-semantic-search will automatically handle all the configuration in the backend. The register_document decorator will register the model signals to update the documents in the vector store when the model is updated or deleted. As a user you don't have to manually call the save or delete methods on the document instances.

Search example:

products/views.py
from django.http import JsonResponse
from products.documents import ProductDocument

def my_view(request):
    query = "this is a query"
    results = ProductDocument.objects.find(name=query)
    return JsonResponse(
        {
            "results": list(name_results.values())
        }
    )

The find method on the objects attribute of the document class will return the queryset of the model instances that are similar to the query. The search is performed using the selected vector index passed as a keyword argument to the find method. In our case, we are searching for the query in the name field of the Product model. If we want to search in the description field, we would call ProductDocument.objects.find(description=query).

Source code in src/django_semantic_search/documents.py
class Document(abc.ABC, Generic[T]):
    """
    Base class for all the documents. There is a one-to-one mapping between the document subclass and the model class,
    to configure how a specific model instances should be converted to a document.

    **Usage**:

    ```python title="products/models.py"
    from django.db import models

    class Product(models.Model):
        name = models.CharField(max_length=255)
        description = models.TextField()

    ```

    ```python title="products/documents.py"
    from django_semantic_search import Document, VectorIndex
    from django_semantic_search.decorators import register_document

    @register_document
    class ProductDocument(Document):
        class Meta:
            model = Product
            indexes = [
                VectorIndex("name"),
                VectorIndex("description"),
            ]
    ```

    `django-semantic-search` will automatically handle all the configuration in the backend. The `register_document`
    decorator will register the model signals to update the documents in the vector store when the model is updated
    or deleted. As a user you don't have to manually call the `save` or `delete` methods on the document instances.

    **Search example:**

    ```python title="products/views.py"
    from django.http import JsonResponse
    from products.documents import ProductDocument

    def my_view(request):
        query = "this is a query"
        results = ProductDocument.objects.find(name=query)
        return JsonResponse(
            {
                "results": list(name_results.values())
            }
        )
    ```

    The `find` method on the `objects` attribute of the document class will return the queryset of the model instances
    that are similar to the query. The search is performed using the selected vector index passed as a keyword argument
    to the `find` method. In our case, we are searching for the query in the `name` field of the `Product` model. If we
    want to search in the `description` field, we would call `ProductDocument.objects.find(description=query)`.
    """

    # Important:
    # The following descriptors have to be defined in the specific order, as they depend on each other
    # and the order of the descriptors is the order in which they are executed.
    meta = MetaManager()
    index_configuration = IndexConfigurationManager()
    backend = BackendManager()
    objects: DocumentManager = DocumentManagerDescriptor[T]()

    def __init__(self, instance: T):
        self._instance = instance

    def save(self) -> None:
        """
        Save the document in the vector store.
        """
        if not self._instance.pk:
            raise ValueError(
                "The model instance has to be saved before creating a document."
            )
        self.backend.save(self)

    def delete(self) -> None:
        """
        Delete the document from the vector store.
        """
        self.backend.delete(self.id)

    @property
    def id(self) -> DocumentID:
        if not self._instance.pk:
            raise ValueError(
                "The model instance has to be saved before accessing the ID."
            )
        return self._instance.pk

    def vectors(self) -> Dict[str, Vector]:
        """
        Return the vectors for the document.
        :return: dictionary of the vectors.
        """
        return {
            index.index_name: index.get_model_embedding(self._instance)
            for index in self.meta.indexes
        }

    def metadata(self) -> Dict[str, MetadataValue]:
        """
        Return the metadata for the document.
        :return: dictionary of the metadata.
        """
        include_fields = getattr(
            self.meta, "include_fields", Document.Meta.include_fields
        )
        if "*" in include_fields:
            include_fields = [field.name for field in self._instance._meta.fields]
        return {field: getattr(self._instance, field) for field in include_fields}

    class Meta:
        # The model this document is associated with
        model: Type[models.Model] = None
        # Namespace for the documents in the vector store, defaults to the model name
        namespace: Optional[str] = None
        # List of vector indexes created out of the model fields
        indexes: Iterable[VectorIndex] = []
        # Model fields that should be included in the metadata
        include_fields: List[str] = ["*"]
        # Flag to disable signals on the model, so the documents are not updated on model changes
        disable_signals: bool = False