Showing posts with label ORM. Show all posts
Showing posts with label ORM. Show all posts

2022-04-09

Generic Functionality without Generic Relations

When you have some generic functionality like anything commentable, likable, or upvotable, it’s common to use Generic Relations in Django. The problem with Generic Relations is that they create the relationships at the application level instead of the database level, and that requires a lot of database queries if you want to aggregate content that shares the generic functionality. There is another way that I will show you in this article.

I learned this technique at my first job in 2002 and then rediscovered it again with Django a few years ago. The trick is to have a generic Item model where every other autonomous model has a one-to-one relation to the Item. Moreover, the Item model has an item_type field, allowing you to recognize the backward one-to-one relationship.

Then whenever you need to have some generic categories, you link them to the Item. Whenever you create generic functionality like media gallery, comments, likes, or upvotes, you attach them to the Item. Whenever you need to work with permissions, publishing status, or workflows, you deal with the Item. Whenever you need to create a global search or trash bin, you work with the Item instances.

Let’s have a look at some code.

Items

First, I'll create the items app with two models: the previously mentioned Item and the abstract model ItemBase with the one-to-one relation for various models to inherit:

# items/models.py
import sys

from django.db import models
from django.apps import apps

if "makemigrations" in sys.argv:
    from django.utils.translation import gettext_noop as _
else:
    from django.utils.translation import gettext_lazy as _


class Item(models.Model):
    """
    A generic model for all autonomous models to link to.
    
    Currently these autonomous models are available:
    - content.Post
    - companies.Company
    - accounts.User
    """
    ITEM_TYPE_CHOICES = (
        ("content.Post", _("Post")),
        ("companies.Company", _("Company")),
        ("accounts.User", _("User")),
    )
    item_type = models.CharField(
        max_length=200, choices=ITEM_TYPE_CHOICES, editable=False, db_index=True
    )

    class Meta:
        verbose_name = _("Item")
        verbose_name_plural = _("Items")

    def __str__(self):
        content_object_title = (
            str(self.content_object) if self.content_object else "BROKEN REFERENCE"
        )
        return (
            f"{content_object_title} ({self.get_item_type_display()})"
        )

    @property
    def content_object(self):
        app_label, model_name = self.item_type.split(".")
        model = apps.get_model(app_label, model_name)
        return model.objects.filter(item=self).first()


class ItemBase(models.Model):
    """
    An abstract model for the autonomous models that will link to the Item.
    """
    item = models.OneToOneField(
        Item,
        verbose_name=_("Item"),
        editable=False,
        blank=True,
        null=True,
        on_delete=models.CASCADE,
        related_name="%(app_label)s_%(class)s",
    )

    class Meta:
        abstract = True

    def save(self, *args, **kwargs):
        if not self.item:
            model = type(self)
            item = Item.objects.create(
                item_type=f"{model._meta.app_label}.{model.__name__}"
            )
            self.item = item
        super().save()

    def delete(self, *args, **kwargs):
        if self.item:
            self.item.delete()
        super().delete(*args, **kwargs)

Then let's create some autonomous models that will have one-to-one relations with the Item. By "autonomous models," I mean those which are enough by themselves, such as posts, companies, or accounts. Models like types, categories, tags, or likes, wouldn't be autonomous.

Posts

Second, I create the content app with the Post model. This model extends ItemBase which will create the one-to-one relation on save, and will define the item_type as content.Post:

# content/models.py
import sys

from django.contrib.auth.base_user import BaseUserManager
from django.db import models
from django.contrib.auth.models import AbstractUser

if "makemigrations" in sys.argv:
    from django.utils.translation import gettext_noop as _
else:
    from django.utils.translation import gettext_lazy as _

from items.models import ItemBase


class Post(ItemBase):
    title = models.CharField(_("Title"), max_length=255)
    slug = models.SlugField(_("Slug"), max_length=255)
    content = models.TextField(_("Content"))

    class Meta:
        verbose_name = _("Post")
        verbose_name_plural = _("Posts")

Companies

Third, I create the companies app with the Company model. This model also extends ItemBase which will create the one-to-one relation on save, and will define the item_type as companies.Company:

# companies/models.py
import sys

from django.contrib.auth.base_user import BaseUserManager
from django.db import models
from django.contrib.auth.models import AbstractUser

if "makemigrations" in sys.argv:
    from django.utils.translation import gettext_noop as _
else:
    from django.utils.translation import gettext_lazy as _

from items.models import ItemBase


class Company(ItemBase):
    name = models.CharField(_("Name"), max_length=255)
    slug = models.SlugField(_("Slug"), max_length=255)
    description = models.TextField(_("Description"))

    class Meta:
        verbose_name = _("Company")
        verbose_name_plural = _("Companies")

Accounts

Fourth, I'll have a more extensive example with the accounts app containing the User model. This model extends AbstractUser from django.contrib.auth as well as ItemBase for the one-to-one relation. The item_type set at the Item model will be accounts.User:

# accounts/models.py
import sys

from django.db import models
from django.contrib.auth.base_user import BaseUserManager
from django.contrib.auth.models import AbstractUser

if "makemigrations" in sys.argv:
    from django.utils.translation import gettext_noop as _
else:
    from django.utils.translation import gettext_lazy as _

from items.models import ItemBase


class UserManager(BaseUserManager):
    def create_user(self, username="", email="", password="", **extra_fields):
        if not email:
            raise ValueError("Enter an email address")
        email = self.normalize_email(email)
        user = self.model(username=username, email=email, **extra_fields)
        user.set_password(password)
        user.save(using=self._db)
        return user

    def create_superuser(self, username="", email="", password=""):
        user = self.create_user(email=email, password=password, username=username)
        user.is_superuser = True
        user.is_staff = True
        user.save(using=self._db)
        return user


class User(AbstractUser, ItemBase):
    # change username to non-editable non-required field
    username = models.CharField(
        _("Username"), max_length=150, editable=False, blank=True
    )
    # change email to unique and required field
    email = models.EmailField(_("Email address"), unique=True)
    bio = models.TextField(_("Bio"))

    USERNAME_FIELD = "email"
    REQUIRED_FIELDS = []

    objects = UserManager()

Creating new items

I will use the Django shell to create several autonomous model instances and the related Items too:

>>> from content.models import Post
>>> from companies.models import Company
>>> from accounts.models import User
>>> from items.models import Item
>>> post = Post.objects.create(
...     title="Hello, World!",
...     slug="hello-world",
...     content="Lorem ipsum…",
... )
>>> company = Company.objects.create(
...     name="Aidas & Co",
...     slug="aidas-co",
...     description="Lorem ipsum…",
... )
>>> user = User.objects.create_user(
...     username="aidas",
...     email="aidas@example.com",
...     password="jdf234oha&6sfhasdfh",
... )
>>> Item.objects.count()
3

Aggregating content from all those relations

Lastly, here is an example of having posts, companies, and users in a single view. For that, we will use the Item queryset with annotations:

from django import forms
from django.db import models
from django.shortcuts import render
from django.utils.translation import gettext, gettext_lazy as _

from .models import Item


class SearchForm(forms.Form):
    q = forms.CharField(label=_("Search"), required=False)
    

def all_items(request):
    qs = Item.objects.annotate(
        title=models.Case(
            models.When(
                item_type="content.Post", 
                then="content_post__title",
            ),
            models.When(
                item_type="companies.Company", 
                then="companies_company__name",
            ),
            models.When(
                item_type="accounts.User",
                then="accounts_user__email",
            ),
            default=models.Value(gettext("<Untitled>")),
        ),
        description=models.Case(
            models.When(
                item_type="content.Post",
                then="content_post__content",
            ),
            models.When(
                item_type="companies.Company",
                then="companies_company__description",
            ),
            models.When(
                item_type="accounts.User", 
                then="accounts_user__bio",
                ),
            default=models.Value(""),
        ),
    )
    
    form = SearchForm(data=request.GET, prefix="search")
    if form.is_valid():
        query = form.cleaned_data["q"]
        if query:
            qs = qs.annotate(
                search=SearchVector(
                    "title",
                    "description",
                )
            ).filter(search=query)

    context = {
        "queryset": qs,
        "search_form": form,
    }
    return render(request, "items/all_items.html", context)

Final words

You can have generic functionality and still avoid multiple hits to the database by using the Item one-to-one approach instead of generic relations.

The name of the Item model can be different, and you can even have multiple such models for various purposes, for example, TaggedItem for tags only.

Do you use anything similar in your projects?

Do you see how this approach could be improved?

Let me know in the comments!


Cover picture by Pixabay

2019-05-04

My 5 Favorite Talks at DjangoCon Europe 2019

Django people at the DjangoCon Europe 2019

This year DjangoCon Europe happened in Copenhagen, Denmark, at a very creative and special place AFUK - Academy for Untamed Creativity. Surrounded by artistic souls, we learned more about web technologies, got to know each other at ecologic reusable disposable cups of coffee, enjoyed delicious authentic food, and socialized with Django-branded beverages. As always, there was also a party with people drinking IPA (not to be confused with API). And at the weekend Django developers were solving bugs for Django and related open-source software at the coding sprints.

Here I would like to present you with the top 5 talks that I liked most of all.

Django and Web Security Headers

Adam Johnson (@AdamChainz) was talking about special response headers that tell browsers to treat data of the website more securely and which Django settings are responsible for those headers.

Summary by rixx

Docs or it didn't Happen!

Mikey Ariel (@ThatDocsLady) was talking about the necessity of documentation and what to write there.

Summary by rixx

Pushing the ORM to its Limits

Sigurd Ljødal (@sigurdlj) was talking about advanced Django ORM use cases.

Summary by rixx

Logging Rethought 2: The Actions of Frank Taylor Jr.

Markus Holtermann (@m_holtermann) was talking about logging structured data to log files instead of the traditional plain text messages.

Summary by rixx

Maintaining a Django Codebase after 10k Commits

Joachim Jablon (@Ewjoachim) and Stéphane Angel (@twidi) were talking about the best practices developing large-scale Django projects.

Summary by rixx

Honorable Mentions

More Information

You can see all talk descriptions, video records and some slides at the official conference website.

Amazing conference photos were taken by Bartek Pawlik.

I was also really astonished how effective were the talk summaries written by rixx almost in real time.


Cover photo by Bartek Pawlik.

2018-06-17

Data Filtering in a Django Website using Elasticsearch

In my Web Development with Django Cookbook section Forms and Views there is a recipe Filtering object lists. It shows you how to filter a Django QuerySet dynamically by different filter parameters selected in a form. From practice, the approach is working well, but with lots of data and complex nested filters, the performance might get slow. You know - because of all those INNER JOINS in SQL, the page might take even 12 seconds to load. And that is not preferable behavior. I know that I could denormalize the database or play with indices to optimize SQL. But I found a better way to increase the loading speed. Recently we started using Elasticsearch for one of the projects and its data filtering performance seems to be enormously faster: in our case, it increased from 2 to 16 times depending on which query parameters you choose.

What is Elasticsearch?

Elasticsearch is java-based search engine which stores data in JSON format and allows you to query it using special JSON-based query language. Using elasticsearch-dsl and django-elasticsearch-dsl, I can bind my Django models to Elasticsearch indexes and rewrite my object list views to use Elasticsearch queries instead of Django ORM. The API of Elasticsearch DSL is chainable like with Django QuerySets or jQuery functions, and we'll have a look at it soon.

The Setup

At first, let's install Elasticsearch server. Elasticsearch is quite a complex system, but it comes with convenient configuration defaults.

On macOS you can install and start the server with Homebrew:

$ brew install elasticsearch
$ brew services start elasticsearch

For other platforms, the installation instructions are also quite clear.

Then in your Django project's virtual environment install django-elasticsearch-dsl. I guess, "DSL" stands for "domain specific language".

With pipenv it would be the following from the project's directory:

$ pipenv install django-elasticsearch-dsl

If you are using just pip and virtual environment, then you would do this with your project's environment activated.

(venv)$ pip install django-elasticsearch-dsl

This, in turn, will install related lower level client libraries: elasticsearch-dsl and elasticsearch-py.

In the Django project settings, add 'django_elasticsearch_dsl' to INSTALLED_APPS.

Finally, add the lines defining default connection configuration there:

ELASTICSEARCH_DSL={
    'default': {
        'hosts': 'localhost:9200'
    },
}

Elasticsearch Documents for Django Models

For the illustration how to use Elasticsearch with Django, I'll create Author and Book models, and then I will create Elasticsearch index document for the books.

models.py

# -*- coding: UTF-8 -*-
from __future__ import unicode_literals

from django.db import models
from django.utils.translation import ugettext_lazy as _
from django.utils.encoding import python_2_unicode_compatible


@python_2_unicode_compatible
class Author(models.Model):
    first_name = models.CharField(_("First name"), max_length=200)
    last_name = models.CharField(_("Last name"), max_length=200)
    author_name = models.CharField(_("Author name"), max_length=200)

    class Meta:
        verbose_name = _("Author")
        verbose_name_plural = _("Authors")
        ordering = ("author_name",)

    def __str__(self):
        return self.author_name


@python_2_unicode_compatible
class Book(models.Model):
    title = models.CharField(_("Title"), max_length=200)
    authors = models.ManyToManyField(Author, verbose_name=_("Authors"))
    publishing_date = models.DateField(_("Publishing date"), blank=True, null=True)
    isbn = models.CharField(_("ISBN"), blank=True, max_length=20)

    class Meta:
        verbose_name = _("Book")
        verbose_name_plural = _("Books")
        ordering = ("title",)

    def __str__(self):
        return self.title

Nothing fancy here. Just an Author model with fields id, first_name, last_name, author_name, and a Book model with fields id, title, authors, publishing_date, and isbn. Let's go to the documents.

documents.py

In the same directory of your app, create documents.py with the following content:

# -*- coding: UTF-8 -*-
from __future__ import unicode_literals

from django_elasticsearch_dsl import DocType, Index, fields
from .models import Author, Book

# Name of the Elasticsearch index
search_index = Index('library')
# See Elasticsearch Indices API reference for available settings
search_index.settings(
    number_of_shards=1,
    number_of_replicas=0
)


@search_index.doc_type
class BookDocument(DocType):
    authors = fields.NestedField(properties={
        'first_name': fields.TextField(),
        'last_name': fields.TextField(),
        'author_name': fields.TextField(),
        'pk': fields.IntegerField(),
    }, include_in_root=True)

    isbn = fields.KeywordField(
        index='not_analyzed',
    )

    class Meta:
        model = Book # The model associated with this DocType

        # The fields of the model you want to be indexed in Elasticsearch
        fields = [
            'title',
            'publishing_date',
        ]
        related_models = [Author]

    def get_instances_from_related(self, related_instance):
        """If related_models is set, define how to retrieve the Book instance(s) from the related model."""
        if isinstance(related_instance, Author):
            return related_instance.book_set.all()

Here we defined a BookDocument which will have fields: title, publishing_date, authors, and isbn.

The authors will be a list of nested dictionaries at the BookDocument. The isbn will be a KeywordField which means that it will be not tokenized, lowercased, nor otherwise processed and handled the whole as is.

The values for those document fields will be read from the Book model.

Using signals, the document will be automatically updated either when a Book instance or Author instance is added, changed, or deleted. In the method get_instances_from_related(), we tell the search engine which books to update when an author is updated.

Building the Index

When the index document is ready, let's build the index at the server:

(venv)$ python manage.py search_index --rebuild

Django QuerySets vs. Elasticsearch Queries

The concepts of SQL and Elasticsearch queries are quite different. One is working with relational tables and the other works with dictionaries. One is using queries that are kind of human-readable logical sentences and another is using nested JSON structures. One is using the content verbosely and another does string processing in the background and gives search relevance for each result.

Even when there are lots of differences, I will try to draw analogies between Django ORM and elasticsearch-dsl API as close as possible.

1. Query definition

Django QuerySet:

queryset = MyModel.objects.all()

Elasticsearch query:

search = MyModelDocument.search()

2. Count

Django QuerySet:

queryset = queryset.count()

Elasticsearch query:

search = search.count()

3. Iteration

Django QuerySet:

for item in queryset:
    print(item.title)

Elasticsearch query:

for item in search:
    print(item.title)

4. To see the generated query:

Django QuerySet:

>>> queryset.query

Elasticsearch query:

>>> search.to_dict()

5. Filter by single field containing a value

Django QuerySet:

queryset = queryset.filter(my_field__icontains=value)

Elasticsearch query:

search = search.filter('match_phrase', my_field=value)

6. Filter by single field equal to a value

Django QuerySet:

queryset = queryset.filter(my_field__exact=value)

Elasticsearch query:

search = search.filter('match', my_field=value)

If a field type is a string, not a number, it has to be defined as KeywordField in the index document:

my_field = fields.KeywordField()

7. Filter with either of the conditions (OR)

Django QuerySet:

from django.db import models
queryset = queryset.filter(
    models.Q(my_field=value) |
    models.Q(my_field2=value2)
)

Elasticsearch query:

from elasticsearch_dsl.query import Q
search = search.query(
    Q('match', my_field=value) |
    Q('match', my_field2=value2)
)

8. Filter with all of the conditions (AND)

Django QuerySet:

from django.db import models
queryset = queryset.filter(
    models.Q(my_field=value) &
    models.Q(my_field2=value2)
)

Elasticsearch query:

from elasticsearch_dsl.query import Q
search = search.query(
    Q('match', my_field=value) & 
    Q('match', my_field2=value2)
)

9. Filter by values less than or equal to certain value

Django QuerySet:

from datetime import datetime

queryset = queryset.filter(
    published_at__lte=datetime.now(),
)

Elasticsearch query:

from datetime import datetime

search = search.filter(
    'range',
    published_at={'lte': datetime.now()}
)

10. Filter by a value in a nested field

Django QuerySet:

queryset = queryset.filter(
    category__pk=category_id,
)

Elasticsearch query:

from elasticsearch_dsl.query import Q

search = search.filter(
    'nested', 
    path='category', 
    query=Q('match', category__pk=category_id)
)

11. Filter by one of many values in a related model

Django QuerySet:

queryset = queryset.filter(
    category__pk__in=category_ids,
)

Elasticsearch query:

from django.utils.six.moves import reduce
from elasticsearch_dsl.query import Q

search = search.query(
    reduce(operator.ior, [
        Q(
            'nested', 
            path='category', 
            query=Q('match', category__pk=category_id),
        )
        for category_id in category_ids
    ])
)

Here the reduce() function combines a list of Q() conditions using the bitwise OR operator (|).

12. Ordering

Django QuerySet:

queryset = queryset.order_by('-my_field', 'my_field2')

Elasticsearch query:

search = search.sort('-my_field', 'my_field2')

13. Creating query dynamically

Django QuerySet:

import operator
from django.utils.six.moves import reduce

filters = []
if value1:
    filters.append(models.Q(
        my_field1=value1,
    ))
if value2:
    filters.append(models.Q(
        my_field2=value2,
    ))
queryset = queryset.filter(
    reduce(operator.iand, filters)
)

Elasticsearch query:

import operator
from django.utils.six.moves import reduce
from elasticsearch_dsl.query import Q

queries = []
if value1:
    queries.append(Q(
        'match',
        my_field1=value1,
    ))
if value2:
    queries.append(Q(
        'match',
        my_field2=value2,
    ))
search = search.query(
    reduce(operator.iand, queries)
)

14. Pagination

Django QuerySet:

from django.core.paginator import (
    Paginator, Page, EmptyPage, PageNotAnInteger
)

paginator = Paginator(queryset, paginate_by)
page_number = request.GET.get('page')
try:
    page = paginator.page(page_number)
except PageNotAnInteger:
    page = paginator.page(1)
except EmptyPage:
    page = paginator.page(paginator.num_pages)

Elasticsearch query:

from django.core.paginator import (
    Paginator, Page, EmptyPage, PageNotAnInteger
)
from django.utils.functional import LazyObject

class SearchResults(LazyObject):
    def __init__(self, search_object):
        self._wrapped = search_object

    def __len__(self):
        return self._wrapped.count()

    def __getitem__(self, index):
        search_results = self._wrapped[index]
        if isinstance(index, slice):
            search_results = list(search_results)
        return search_results

search_results = SearchResults(search)

paginator = Paginator(search_results, paginate_by)
page_number = request.GET.get('page')
try:
    page = paginator.page(page_number)
except PageNotAnInteger:
    page = paginator.page(1)
except EmptyPage:
    page = paginator.page(paginator.num_pages)

ElasticSearch doesn't work with Django's pagination by default. Therefore, we have to wrap the search query with lazy SearchResults class to provide the necessary functionality.

Example

I built an example with books written about Django. You can download it from Github and test it.

Takeaways

  • Filtering with Elasticsearch is much faster than with SQL databases.
  • But it comes at the cost of additional deployment and support time.
  • If you have multiple websites using Elasticsearch on the same server, configure a new cluster and node for each of those websites.
  • Django ORM can be in a way mapped to Elasticsearch DSL.
  • I summarized the comparison of Django ORM and Elasticsearch DSL, mentioned in this article, into a cheat sheet. Print it on a single sheet of paper and use it as a reference for your developments.

Get Django ORM vs. Elasticsearch DSL Cheat Sheet


Cover photo by Karl Fredrickson.

2018-05-27

QuerySet Filters on Many-to-many Relations

Django ORM (Object-relational mapping) makes querying the database so intuitive, that at some point you might forget that SQL is being used in the background.

This year at the DjangoCon Europe Katie McLaughlin was giving a talk and mentioned one thing that affects the SQL query generated by Django ORM, depending on how you call the QuerySet or manager methods. This particularity is especially relevant when you are creating your QuerySets dynamically. Here it is. When you have a many-to-many relationship, and you try to filter objects by the fields of the related model, every new filter() method of a QuerySet creates a new INNER JOIN clause. I won't discuss whether that's a Django bug or a feature, but these are my observations about it.

The Books and Authors Example

Let's create an app with books and authors, where each book can be written by multiple authors.

# -*- coding: UTF-8 -*-
from __future__ import unicode_literals

from django.db import models
from django.utils.translation import ugettext_lazy as _
from django.utils.encoding import python_2_unicode_compatible


@python_2_unicode_compatible
class Author(models.Model):
    first_name = models.CharField(_("First name"), max_length=200)
    last_name = models.CharField(_("Last name"), max_length=200)
    author_name = models.CharField(_("Author name"), max_length=200)

    class Meta:
        verbose_name = _("Author")
        verbose_name_plural = _("Authors")
        ordering = ("author_name",)

    def __str__(self):
        return self.author_name


@python_2_unicode_compatible
class Book(models.Model):
    title = models.CharField(_("Title"), max_length=200)
    authors = models.ManyToManyField(Author, verbose_name=_("Authors"))
    publishing_date = models.DateField(_("Publishing date"), blank=True, null=True)

    class Meta:
        verbose_name = _("Book")
        verbose_name_plural = _("Books")
        ordering = ("title",)

    def __str__(self):
        return self.title

The similar app with sample data can be found in this repository.

Inefficient Filter

With the above models, you could define the following QuerySet to select books which author is me, Aidas Bendoraitis.

queryset = Book.objects.filter(
    authors__first_name='Aidas',
).filter(
    authors__last_name='Bendoraitis',
)

We can check what SQL query it would generate with str(queryset.query) (or queryset.query.__str__()).

The output would be something like this:

SELECT `libraryapp_book`.`id`, `libraryapp_book`.`title`, `libraryapp_book`.`publishing_date`
FROM `libraryapp_book`
INNER JOIN `libraryapp_book_authors` ON ( `libraryapp_book`.`id` = `libraryapp_book_authors`.`book_id` )
INNER JOIN `libraryapp_author` ON ( `libraryapp_book_authors`.`author_id` = `libraryapp_author`.`id` )
INNER JOIN `libraryapp_book_authors` T4 ON ( `libraryapp_book`.`id` = T4.`book_id` )
INNER JOIN `libraryapp_author` T5 ON ( T4.`author_id` = T5.`id` )
WHERE (`libraryapp_author`.`first_name` = 'Aidas' AND T5.`last_name` = 'Bendoraitis')
ORDER BY `libraryapp_book`.`title` ASC;

Did you notice, that the database table libraryapp_author was attached through the libraryapp_book_authors table to the libraryapp_book table TWICE where just ONCE would be enough?

Efficient Filter

On the other hand, if you are defining query expressions in the same filter() method like this:

queryset = Book.objects.filter(
    authors__first_name='Aidas',
    authors__last_name='Bendoraitis',
)

The generated SQL query will be much shorter and (theoretically) would perform faster:

SELECT `libraryapp_book`.`id`, `libraryapp_book`.`title`, `libraryapp_book`.`publishing_date`
FROM `libraryapp_book`
INNER JOIN `libraryapp_book_authors` ON ( `libraryapp_book`.`id` = `libraryapp_book_authors`.`book_id` )
INNER JOIN `libraryapp_author` ON ( `libraryapp_book_authors`.`author_id` = `libraryapp_author`.`id` )
WHERE (`libraryapp_author`.`first_name` = 'Aidas' AND `libraryapp_author`.`last_name` = 'Bendoraitis')
ORDER BY `libraryapp_book`.`title` ASC;

The same SQL query can be achieved using the Q() objects:

queryset = Book.objects.filter(
    models.Q(authors__first_name='Aidas') &
    models.Q(authors__last_name='Bendoraitis')
)

The Q() objects add a lot of flexibility to filters allowing to OR, AND, and negate query expressions.

Dynamic Filtering

So to have faster performance, when creating QuerySets dynamically, DON'T use filter() multiple times:

queryset = Book.objects.all()
if first_name:
    queryset = queryset.filter(
        authors__first_name=first_name,
    )
if last_name:
    queryset = queryset.filter(
        authors__last_name=last_name,
    )

DO this instead:

filters = models.Q()
if first_name:
    filters &= models.Q(
        authors__first_name=first_name,
    )
if last_name:
    filters &= models.Q(
        authors__last_name=last_name,
    )
queryset = Book.objects.filter(filters)

Here the empty Q() doesn't have any impact for the generated SQL query, so you don't need the complexity of creating a list of filters and then joining all of them with the bitwise AND operator, like this:

import operator
from django.utils.six.moves import reduce

filters = []
if first_name:
    filters.append(models.Q(
        authors__first_name=first_name,
    ))
if last_name:
    filters.append(models.Q(
        authors__last_name=last_name,
    ))
queryset = Book.objects.filter(reduce(operator.iand, filters))

Profiling

In DEBUG mode, you can check how long the previously executed SQL queries took by checking django.db.connection.queries:

>>> from django.db import connection
>>> connection.queries
[{'sql': 'SELECT …', 'time': '0.001'}, {'sql': 'SELECT …', 'time': '0.004'}]

The Takeaways

  • When querying many-to-many relationships, avoid using multiple filter() methods, make use of Q() objects instead.
  • You can check the SQL query of a QuerySet with str(queryset.query).
  • Check the performance of recently executed SQL queries with django.db.connection.queries.
  • With small datasets, the performance difference is not so obvious. For your specific cases you should do the benchmarks yourself.

Cover photo by Tobias Fischer.