2024-05-01

Generating Fake Django Model Instances with Factory Boy

As you might know, I am developing PyBazaar, a Python Developer Marketplace. For a project of that scope, I need to create hundreds or thousands of data entries to ensure that everything works as expected. Factory Boy is a tool that allows me to create model instances in batches, and this blog post is about it.

The benefits of using Factory Boy

By creating a bunch of fake entries, I can achieve the following:

  • Work on list and detail representation and styling.
  • Work on and try functionality like filters, sorting, and pagination.
  • Check and improve performance with loads of data entries.
  • Create dummy data for unit or functional tests.

Factory Boy seemed like a pretty complex package, so I want to simplify things and introduce you to all the necessary parts for creating a fake model instances.

Model preparation

At PyBazaar, I have users with profiles, job offers, and resources that can be faked in batch. The related categories are predefined and don't need to be faked.

To make it possible to distinguish between real and fake entries, I added a new boolean field is_fake to all those models that I can create in batch:

# For testing and debugging
is_fake = models.BooleanField(_("Fake"), default=False)

Here is what the list of profiles can look like when created with Factory Boy:

The setup

The installation is pretty straightforward:

(venv)$ pip install factory-boy==3.3.0

And then in each app where you need to create fake entries, create a file factories.py with factory classes, e.g.:

import random
import factory
from pybazaar.apps.accounts.models import User

class UserFactory(factory.django.DjangoModelFactory):
    class Meta:
        model = User

    first_name = factory.Faker("first_name")
    last_name = factory.Faker("last_name")
    
    # ...
    
    is_fake = True

For factory classes, I also add custom class methods delete_fake() and recreate_batch() so that I can quickly create entries or delete them:

class ProfileFactory(factory.django.DjangoModelFactory):
    class Meta:
        model = Profile

    user = factory.SubFactory(UserFactory)

    # ...
        
    is_fake = True

    @classmethod
    def recreate_batch(cls, size, **kwargs):
        cls.delete_fake()
        cls.create_batch(size=size, **kwargs)

    @classmethod
    def delete_fake(cls):
        for profile in Profile.objects.filter(is_fake=True):
            profile.delete()
        for user in User.objects.filter(is_fake=True):
            user.delete()

Factory class attributes tell the system what values to assign to the models when creating instances. Let's explore multiple cases that we can use as values.

Assigning a static value

If it's a simple static value, you can just assign it. It will be the same for all fake entries:

is_fake = True
publishing_status = Profile.PublishingStatusChoices.PUBLISHED

Assigning a value from a list

If it's a value from a list, use the Iterator class:

title = factory.Iterator([
    "Developer",
    "Software Engineer",
    "Programmer",
])

experience_level = factory.Iterator(
    Profile.ExperienceLevelChoices.values
)

Assigning generated value of a certain type

Factory Boy uses the Faker package to allow the creating of fake names, paragraphs, or locations. You can use those as follows:

first_name = factory.Faker("first_name")
last_name = factory.Faker("last_name")
summary = factory.Faker("paragraph")
city = factory.Faker("city")
state = factory.Faker("state")
country = factory.Faker("country_code")

Assigning an instance

If it's a foreign key and you want a random value, use this:

resource_type = factory.LazyAttribute(
    lambda o: ResourceType.objects.order_by("?").first()
)

Assigning a value from a function

Similarly, you can assign a value from a function:

description = factory.LazyAttribute(
    lambda o: generate_quill_content()
)

Assigning a random value

Or a random value:

is_available_for_work = factory.LazyAttribute(
    lambda o: random.choice([True, False])
)

Assigning a value based on attributes or methods of the model instance

Once you define attributes like first_name or last_name, you can set other values depending on those:

username = factory.LazyAttribute(
    lambda o: f"{o.first_name}_{o.last_name}".lower()
)
email = factory.LazyAttribute(
    lambda o: f"{o.first_name}_{o.last_name}@example.com".lower()
)

Assigning a password

There is a special django.Password class for generating password values:

password = factory.django.Password("Pa$$w0rd")

Assigning dummy images

Here is how to create and assign a dummy single-color image:

avatar = factory.django.ImageField(
    width=200, height=200, color="rgb(2,132,199)"
)

Having two factories depending on each other

As we have profiles depending on users, we can define the codependence with SubFactory class.

class ProfileFactory(factory.django.DjangoModelFactory):
    class Meta:
        model = Profile

    user = factory.SubFactory(UserFactory)

Then, when creating profiles, the users will be created automatically for them, too.

Attaching many-to-many relations

In Django, many-to-many relationships must be added after creating a model instance. We can achieve that with the PostGeneration class:

def attach_categories(obj, create, extracted, **kwargs):
    obj.specializations.add(
        *list(Specialization.objects.order_by("?")[: random.randint(3, 7)])
    )

class ProfileFactory(factory.django.DjangoModelFactory):
    # ...
    do_afterwards = factory.PostGeneration(attach_categories)

How you call this attribute doesn't matter - it should just not clash with other field names or attributes.

A complete example

So the final factories.py file could look like this:

import random
import factory
import json
from pybazaar.apps.accounts.models import User
from pybazaar.apps.profiles.models import Profile
from pybazaar.apps.categories.models import Specialization

class UserFactory(factory.django.DjangoModelFactory):
    class Meta:
        model = User

    first_name = factory.Faker("first_name")
    last_name = factory.Faker("last_name")
    username = factory.LazyAttribute(
        lambda o: f"{o.first_name}_{o.last_name}".lower()
    )
    email = factory.LazyAttribute(
        lambda o: f"{o.first_name}_{o.last_name}@example.com".lower()
    )
    password = factory.django.Password("Pa$$w0rd")
    is_fake = True

def generate_quill_content():
    return json.dumps(
        {
            "delta": '',
            "html": "<p>Hey there</p>",
        }
    )

def attach_categories(obj, create, extracted, **kwargs):
    obj.specializations.add(
        *list(Specialization.objects.order_by("?")[: random.randint(3, 7)])
    )

class ProfileFactory(factory.django.DjangoModelFactory):
    class Meta:
        model = Profile

    user = factory.SubFactory(UserFactory)
    title = factory.Iterator(
        [
            "Developer",
            "Software Engineer",
            "Programmer",
        ]
    )
    avatar = factory.django.ImageField(
        width=200, height=200, color="rgb(2,132,199)"
    )
    bio = factory.LazyAttribute(lambda o: generate_quill_content())
    city = factory.Faker("city")
    state = factory.Faker("state")
    country = factory.Faker("country_code")
    is_available_for_work = factory.LazyAttribute(
        lambda o: random.choice([True, False])
    )
    experience_level = factory.Iterator(
        Profile.ExperienceLevelChoices.values
    )
    publishing_status = Profile.PublishingStatusChoices.PUBLISHED
    is_fake = True

    do_afterwards = factory.PostGeneration(attach_categories)

    @classmethod
    def recreate_batch(cls, size, **kwargs):
        cls.delete_fake()
        cls.create_batch(size=size, **kwargs)

    @classmethod
    def delete_fake(cls):
        for profile in Profile.objects.filter(is_fake=True):
            profile.delete()
        for user in User.objects.filter(is_fake=True):
            user.delete()

Creating fake entries

Lastly, I can create the fake entries from the Django shell as follows:

>>> from pybazaar.apps.profiles.factories import ProfileFactory
>>> ProfileFactory.recreate_batch(100)

And later, when I don't need those anymore:

>>> from pybazaar.apps.profiles.factories import ProfileFactory
>>> ProfileFactory.delete_fake()

Whenever I add new fields to the models, I can easily tweak the factories and recreate the whole bunch of models in one step.

Final words

Factory Boy doesn't guarantee data validation and integrity. For example, city, state, and country will be three separate random values that don't match a real location. However, that is sufficient to test your website's basic look and feel or performance.


Cover image by Google DeepMind

No comments:

Post a Comment