2024-07-29

Setting up DigitalOcean Spaces for Django Media

At DigitalOcean, when you need large amounts of static data (images, documents, or videos), you have two options: Volumes Block Storage and Spaces Object Storage. In case of Volumes, you mount an extra hard drive to your server and use the file system to manage your files. Whereas with Spaces, you store files in the cloud and use a special API to create, read and delete files there.

Spaces Object Storage is comparable with AWS S3 Cloud Object Storage. It even supports the same API for dealing with data there.

Here are some benefits of using Spaces:

  • The price of Spaces is at least twice as low as the one of Volumes.
  • Content Delivery Network (CDN) is available for media file caching.
  • The configuration of Spaces is easier and the user interface is more user-friendly than the one of AWS S3.
  • It is relatively easy to start using Spaces for media files with the django-storages package.

I will walk you through setting up Spaces for your media files.

Create Spaces Object Storage at DigitalOcean

When creating Spaces Object Storage at DigitalOcean, you will be asked for these values:

  • Data center - choose the one closest to your business for legal reasons. For example, I chose Frankfurt for PyBazaar.
  • Enable CDN for caching - enable it for server-side caching.
  • Spaces bucket name - lowercase name for your bucket (e.g. "pybazaar")
  • Select a project - Project name for grouping your DigitalOcean resources (e.g. "PyBazaar")

For the created Spaces instance, I keep the settings unchanged:

  • File Listing: Restricted
  • CDN: Enabled
  • CORS Configurations: Unset

This will create an instance, which resources, aka media files, can be accessed under https://pybazaar.fra1.digitaloceanspaces.com and https://pybazaar.fra1.cdn.digitaloceanspaces.com

Create API keys for spaces at DigitalOcean

Now go to APISpaces Keys and choose Generate New Key.

There you'll generate an Access Key and Secret Key. You'll need those in the Django settings and for example Transit app for easy mass file management.

Connect Django to Spaces

Install django-storages and boto3 to your Django project.

(venv)$ pip install boto3
(venv)$ pip install django-storages[s3]

Add the STORAGES setting:

STORAGES = {
    "default": {
        "BACKEND": "storages.backends.s3.S3Storage",
        "OPTIONS": {
            "bucket_name": "pybazaar",
            "access_key": get_secret("SPACES_ACCESS_KEY"),
            "secret_key": get_secret("SPACES_SECRET_KEY"),
            "region_name": "fra1",
            "endpoint_url": "https://pybazaar.fra1.digitaloceanspaces.com",
            "default_acl": "public-read",
            "location": "media",
            # required for the correct storage.exists() functioning
            "file_overwrite": False,
            # don't append any authentication parameters to the files.
            "querystring_auth": False,  
        },
    },
    "staticfiles": {
        # For static files, use file-system storage 
        "BACKEND": "django.contrib.staticfiles.storage.StaticFilesStorage",
        # Or Whitenoise storage
        # "BACKEND": "whitenoise.storage.CompressedStaticFilesStorage",
    },
}
MEDIA_URL = "https://pybazaar.fra1.cdn.digitaloceanspaces.com/pybazaar/media/"

The implementation of get_secrets() depends on your needs. You can import values from environment variables or from a JSON or INI file. Here's a version I am using in my projects. Just make sure not to include your secrets in the Git repository.

MEDIA_ROOT is not participating in the game of django-storages.

For static files, I recommend using File System or Whitenoise, althought theoretically you could also use django-storages for them too.

Try connecting your local environment to spaces at first, then check it remotely.

If you use a rich-text editor supporting images or links to documents, you should ensure that the media paths don't change when you dump your pages from production to development. Otherwise, the images and links will be broken. To ensure that, set MEDIA_URL to "/pybazaar/media/" locally when you use the file system storage locally again.

Use the Storage API for all your file management

Use the Django Storage operations instead of file-system operations for all your media file operations: listing directory contents, creating a file, reading a file, updating file content, deleting a file, or checking the existence of a file.

For example, instead of this code:

import os
from django.conf import settings

with open(os.path.join(settings.MEDIA_ROOT, "README.txt"), "w") as f: 
    f.write("Hello, World!")

use this one:

from django.core.files.storage import default_storage
from django.core.files.base import ContentFile

default_storage.save("README.txt", ContentFile("Hello, World!"))

Ensure that django-imagekit works well

If you are using django-imagekit or another image manipulation library, make sure that you use the latest version that supports the STORAGES setting.

For older Django versions (< 4.2), you might need to set AWS_* settings instead of the STORAGES dictionary. See the django-packages docs here.

Upload some files

You can upload files for Spaces one-by-one at DigitalOcean console. Or better, use Transit v5 on MacOS.

Add new connection and choose Amazon S3

  • Address: fra1.digitaloceanspaces.com
  • Access Key ID: [your access key]
  • Secret: [your secret key]
  • Remote Path: /pybazaar/pybazaar/media

Upload some files there. Check if they are accessible from the CDN endpoint URL.

Set up a subdomain of yours

To have a dedicated subdomain of yours pointing to the spaces, e.g. https://media.pybazaar.com, you can follow the instructions in this article.

If you manage your DNS settings on DigitalOcean, it's not a big deal: you just click a few buttons to set the CNAME record and enable Let's encrypt certificates.

But if you manage your DNS elsewhere, you will have to manually set and regularly update your SSL settings, and set a CNAME record to point your subdomain (e.g. media.pybazaar.com) to their CDN endpoint (e.g. pybazaar.fra1.cdn.digitaloceanspaces.com).

Final words

Using django-storages you can relatively simply replace your file system with cloud-based storage such as Spaces Object Storage at DigitalOcean.

DigitalOcean gives $200 in credit over 60 days for everyone who will use my affiliate link to signup. If you are planning a website with lots of media files, it's worthy to have a try.


Cover photo by gdtography

2024-05-02

Renewing Let's Encrypt Certificates with NGINX Unit

Recently, I moved the DjangoTricks website and started PyBazaar on servers with Nginx Unit. One thing that was left undone was SSL certificate renewals. Let's Encrypt has special certbot parameters for renewing certificates for websites on Apache or Nginx servers, but they don't work out of the box with the Nginx Unit. In this blog post, I will tell you how to do that.

The certificate bundle

Nginx Unit doesn't use the fullchain.pem and privkey.pem generated by certbot directly from the location where they were generated. Instead, one has to create a bundle (like bundle1.pem) by concatenating them and then uploading it to the Nginx Unit configuration endpoint.

The bash script

For that, I created a bash script:

#!/usr/bin/env bash
SECONDS=0
CRON_LOG_FILE=/var/webapps/pybazaar/logs/renew_certificate.log

echo "=== Renewing Letsencrypt Certificate ===" > ${CRON_LOG_FILE}
date >> ${CRON_LOG_FILE}

echo "Renewing certificate..." >> ${CRON_LOG_FILE}
certbot --renew-by-default certonly -n --webroot -w /var/www/letsencrypt/ -m hello@pybazaar.com --agree-tos --no-verify-ssl -d pybazaar.com -d www.pybazaar.com

echo "Creating bundle..." >> ${CRON_LOG_FILE}
cat /etc/letsencrypt/live/pybazaar.com/fullchain.pem /etc/letsencrypt/live/pybazaar.com/privkey.pem > /var/webapps/pybazaar/unit-config/bundle1.pem

echo "Temporarily switching the Unit configuration to a dummy one..." >> ${CRON_LOG_FILE}
curl -X PUT --data-binary @/var/webapps/pybazaar/unit-config/unit-config-pre.json --unix-socket /var/run/control.unit.sock http://localhost/config

echo "Deleting old certificate from Nginx Unit..." >> ${CRON_LOG_FILE}
curl -X DELETE --unix-socket /var/run/control.unit.sock http://localhost/certificates/certbot1

echo "Installing new certificate to Nginx Unit..." >> ${CRON_LOG_FILE}
curl -X PUT --data-binary @/var/webapps/pybazaar/unit-config/bundle1.pem --unix-socket /var/run/control.unit.sock http://localhost/certificates/certbot1

echo "Switching the Unit configuration to the correct one..." >> ${CRON_LOG_FILE}
curl -X PUT --data-binary @/var/webapps/pybazaar/unit-config/unit-config.json --unix-socket /var/run/control.unit.sock http://localhost/config

echo "Restarting Unit..." >> ${CRON_LOG_FILE}
service unit restart

echo "Finished." >> ${CRON_LOG_FILE}
duration=$SECONDS
echo "$(($duration / 60)) minutes and $(($duration % 60)) seconds elapsed." >> ${CRON_LOG_FILE}

Once you have adapted the script, you can run it manually as a root user to test it:

$ chmod +x renew_certificate.sh
$ ./renew_certificate.sh

Note that the certbot command will try to validate your website's URL by attempting to reach a temporary file that it will create on http://example.com/.well-known/acme-challenge/, so make sure that this location is accessible and serving the static files.

For more details about the Nginx Unit, check my previous blog post.

The cron job

If everything works as expected, you can add it to the root user's cron jobs to be executed weekly.

Export the current root cron jobs to a crontab.txt:

$ crontab -l > crontab.txt

Then edit it and add the weekly script to update the SSL certificate:

PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin
SHELL=/bin/bash
MAILTO=""
@weekly /var/webapps/pybazaar/unit-config/renew_certificate.sh

Then run the following as the root user to apply it:

$ crontab crontab.txt

The good thing about not editing the cron job with crontab -e is that you can choose the editor and even put the crontab.txt under Git version control.

Happy web development with WSGI or ASGI!


Cover picture by Gotta Be Worth It

2024-05-01

Generating Fake Django Model Instances with Factory Boy

As you might know, I am developing PyBazaar, a Python Developer Marketplace. For a project of that scope, I need to create hundreds or thousands of data entries to ensure that everything works as expected. Factory Boy is a tool that allows me to create model instances in batches, and this blog post is about it.

The benefits of using Factory Boy

By creating a bunch of fake entries, I can achieve the following:

  • Work on list and detail representation and styling.
  • Work on and try functionality like filters, sorting, and pagination.
  • Check and improve performance with loads of data entries.
  • Create dummy data for unit or functional tests.

Factory Boy seemed like a pretty complex package, so I want to simplify things and introduce you to all the necessary parts for creating a fake model instances.

Model preparation

At PyBazaar, I have users with profiles, job offers, and resources that can be faked in batch. The related categories are predefined and don't need to be faked.

To make it possible to distinguish between real and fake entries, I added a new boolean field is_fake to all those models that I can create in batch:

# For testing and debugging
is_fake = models.BooleanField(_("Fake"), default=False)

Here is what the list of profiles can look like when created with Factory Boy:

The setup

The installation is pretty straightforward:

(venv)$ pip install factory-boy==3.3.0

And then in each app where you need to create fake entries, create a file factories.py with factory classes, e.g.:

import random
import factory
from pybazaar.apps.accounts.models import User

class UserFactory(factory.django.DjangoModelFactory):
    class Meta:
        model = User

    first_name = factory.Faker("first_name")
    last_name = factory.Faker("last_name")
    
    # ...
    
    is_fake = True

For factory classes, I also add custom class methods delete_fake() and recreate_batch() so that I can quickly create entries or delete them:

class ProfileFactory(factory.django.DjangoModelFactory):
    class Meta:
        model = Profile

    user = factory.SubFactory(UserFactory)

    # ...
        
    is_fake = True

    @classmethod
    def recreate_batch(cls, size, **kwargs):
        cls.delete_fake()
        cls.create_batch(size=size, **kwargs)

    @classmethod
    def delete_fake(cls):
        for profile in Profile.objects.filter(is_fake=True):
            profile.delete()
        for user in User.objects.filter(is_fake=True):
            user.delete()

Factory class attributes tell the system what values to assign to the models when creating instances. Let's explore multiple cases that we can use as values.

Assigning a static value

If it's a simple static value, you can just assign it. It will be the same for all fake entries:

is_fake = True
publishing_status = Profile.PublishingStatusChoices.PUBLISHED

Assigning a value from a list

If it's a value from a list, use the Iterator class:

title = factory.Iterator([
    "Developer",
    "Software Engineer",
    "Programmer",
])

experience_level = factory.Iterator(
    Profile.ExperienceLevelChoices.values
)

Assigning generated value of a certain type

Factory Boy uses the Faker package to allow the creating of fake names, paragraphs, or locations. You can use those as follows:

first_name = factory.Faker("first_name")
last_name = factory.Faker("last_name")
summary = factory.Faker("paragraph")
city = factory.Faker("city")
state = factory.Faker("state")
country = factory.Faker("country_code")

Assigning an instance

If it's a foreign key and you want a random value, use this:

resource_type = factory.LazyAttribute(
    lambda o: ResourceType.objects.order_by("?").first()
)

Assigning a value from a function

Similarly, you can assign a value from a function:

description = factory.LazyAttribute(
    lambda o: generate_quill_content()
)

Assigning a random value

Or a random value:

is_available_for_work = factory.LazyAttribute(
    lambda o: random.choice([True, False])
)

Assigning a value based on attributes or methods of the model instance

Once you define attributes like first_name or last_name, you can set other values depending on those:

username = factory.LazyAttribute(
    lambda o: f"{o.first_name}_{o.last_name}".lower()
)
email = factory.LazyAttribute(
    lambda o: f"{o.first_name}_{o.last_name}@example.com".lower()
)

Assigning a password

There is a special django.Password class for generating password values:

password = factory.django.Password("Pa$$w0rd")

Assigning dummy images

Here is how to create and assign a dummy single-color image:

avatar = factory.django.ImageField(
    width=200, height=200, color="rgb(2,132,199)"
)

Having two factories depending on each other

As we have profiles depending on users, we can define the codependence with SubFactory class.

class ProfileFactory(factory.django.DjangoModelFactory):
    class Meta:
        model = Profile

    user = factory.SubFactory(UserFactory)

Then, when creating profiles, the users will be created automatically for them, too.

Attaching many-to-many relations

In Django, many-to-many relationships must be added after creating a model instance. We can achieve that with the PostGeneration class:

def attach_categories(obj, create, extracted, **kwargs):
    obj.specializations.add(
        *list(Specialization.objects.order_by("?")[: random.randint(3, 7)])
    )

class ProfileFactory(factory.django.DjangoModelFactory):
    # ...
    do_afterwards = factory.PostGeneration(attach_categories)

How you call this attribute doesn't matter - it should just not clash with other field names or attributes.

A complete example

So the final factories.py file could look like this:

import random
import factory
import json
from pybazaar.apps.accounts.models import User
from pybazaar.apps.profiles.models import Profile
from pybazaar.apps.categories.models import Specialization

class UserFactory(factory.django.DjangoModelFactory):
    class Meta:
        model = User

    first_name = factory.Faker("first_name")
    last_name = factory.Faker("last_name")
    username = factory.LazyAttribute(
        lambda o: f"{o.first_name}_{o.last_name}".lower()
    )
    email = factory.LazyAttribute(
        lambda o: f"{o.first_name}_{o.last_name}@example.com".lower()
    )
    password = factory.django.Password("Pa$$w0rd")
    is_fake = True

def generate_quill_content():
    return json.dumps(
        {
            "delta": '',
            "html": "<p>Hey there</p>",
        }
    )

def attach_categories(obj, create, extracted, **kwargs):
    obj.specializations.add(
        *list(Specialization.objects.order_by("?")[: random.randint(3, 7)])
    )

class ProfileFactory(factory.django.DjangoModelFactory):
    class Meta:
        model = Profile

    user = factory.SubFactory(UserFactory)
    title = factory.Iterator(
        [
            "Developer",
            "Software Engineer",
            "Programmer",
        ]
    )
    avatar = factory.django.ImageField(
        width=200, height=200, color="rgb(2,132,199)"
    )
    bio = factory.LazyAttribute(lambda o: generate_quill_content())
    city = factory.Faker("city")
    state = factory.Faker("state")
    country = factory.Faker("country_code")
    is_available_for_work = factory.LazyAttribute(
        lambda o: random.choice([True, False])
    )
    experience_level = factory.Iterator(
        Profile.ExperienceLevelChoices.values
    )
    publishing_status = Profile.PublishingStatusChoices.PUBLISHED
    is_fake = True

    do_afterwards = factory.PostGeneration(attach_categories)

    @classmethod
    def recreate_batch(cls, size, **kwargs):
        cls.delete_fake()
        cls.create_batch(size=size, **kwargs)

    @classmethod
    def delete_fake(cls):
        for profile in Profile.objects.filter(is_fake=True):
            profile.delete()
        for user in User.objects.filter(is_fake=True):
            user.delete()

Creating fake entries

Lastly, I can create the fake entries from the Django shell as follows:

>>> from pybazaar.apps.profiles.factories import ProfileFactory
>>> ProfileFactory.recreate_batch(100)

And later, when I don't need those anymore:

>>> from pybazaar.apps.profiles.factories import ProfileFactory
>>> ProfileFactory.delete_fake()

Whenever I add new fields to the models, I can easily tweak the factories and recreate the whole bunch of models in one step.

Final words

Factory Boy doesn't guarantee data validation and integrity. For example, city, state, and country will be three separate random values that don't match a real location. However, that is sufficient to test your website's basic look and feel or performance.


Cover image by Google DeepMind

2024-02-24

Django Project on NGINX Unit

Django Project on NGINX Unit

Recently, I learned about the NGINX Unit and decided to try it on my DjangoTricks website. Unit is a web server developed by people from NGINX, with pluggable support for Python (WSGI and ASGI), Ruby, Node.js, PHP, and a few other languages. I wanted to see whether it's really easy to set it up, have it locally on my Mac and the remote Ubuntu server, and try out the ASGI features of Django, allowing real-time communication. Also, I wanted to see whether Django is faster with Unit than with NGINX and Gunicorn. This article is about my findings.

My observations

Unit service uses HTTP requests to read and update its configuration. The configuration is a single JSON file that you can upload to the Unit service via a command line from the same computer or modify its values by keys in the JSON structure.

Normally, the docs suggest using the curl command to update the configuration. However, as I am using Ansible to deploy my Django websites, I wanted to create a script I could later copy to other projects. I used Google Gemini to convert bash commands from the documentation to Ansible directives and corrected its mistakes.

The trickiest part for me was to figure out how to use Let's Encrypt certificates in the simplest way possible. The docs are extensive and comprehensible, but sometimes, they dig into technical details that are unnecessary for a common Django developer.

Also, it's worth mentioning that the Unit plugin version must match your Python version in the virtual environment. It was unexpected for me when Brew installed Python 3.12 with unit-python3 and then required my project to use Python 3.12 instead of Python 3.10 (which I used for the DjangoTricks website). So I had to recreate my virtual environment and probably will have problems later with pip-compile-multi when I prepare packages for the production server, still running Python 3.10.

Below are the instructions I used to set up the NGINX Unit with my existing DjangoTricks website on Ubuntu 22.04. For simplicity, I am writing plain Terminal commands instead of analogous Ansible directives.

1. Install Unit service to your server

Follow the installation instructions from documentation to install unit, unit-dev, unit-python3.10, and whatever other plugins you want. Make sure the service is running.

2. Prepare Let's Encrypt certificates

Create a temporary JSON configuration file /var/webapps/djangotricks/unit-config/unit-config-pre.json, which will allow Let's Encrypt certbot to access the .well-known directory for domain confirmation:

{
  "listeners": {
    "*:80": {
      "pass": "routes/acme"
    }
  },
  "routes": {
    "acme": [
      {
        "match": {
          "uri": "/.well-known/acme-challenge/*"
        },
        "action": {
          "share": "/var/www/letsencrypt/$uri"
        }
      }
    ]
  }
}

Install it to Unit:

$ curl -X PUT --data-binary @/var/webapps/djangotricks/unit-config/unit-config-pre.json \
--unix-socket /var/run/control.unit.sock http://localhost/config

If you make any mistakes in the configuration, it will be rejected with an error message and not executed.

Create Let's Encrypt certificates:

$ certbot certonly -n --webroot -w /var/www/letsencrypt/ -m hello@djangotricks.com \
--agree-tos --no-verify-ssl -d djangotricks.com -d www.djangotricks.com

Create a bundle that is required by the NGINX Unit:

cat /etc/letsencrypt/live/djangotricks.com/fullchain.pem \
/etc/letsencrypt/live/djangotricks.com/privkey.pem > \
/var/webapps/djangotricks/unit-config/bundle1.pem

Install certificate to NGINX Unit as certbot1:

curl -X PUT --data-binary @/var/webapps/djangotricks/unit-config/bundle1.pem \
--unix-socket /var/run/control.unit.sock http://localhost/certificates/certbot1

3. Install Django project configuration

Create a JSON configuration file /var/webapps/djangotricks/unit-config/unit-config.json which will use your SSL certificate and will serve your Django project:

{
  "listeners": {
    "*:80": {
      "pass": "routes/main"
    },
    "*:443": {
      "pass": "routes/main",
      "tls": {
        "certificate": "certbot1"
      }
    }
  },
  "routes": {
    "main": [
      {
        "match": {
          "host": [
            "djangotricks.com",
            "www.djangotricks.com"
          ],
          "uri": "/.well-known/acme-challenge/*"
        },
        "action": {
          "share": "/var/www/letsencrypt/$uri"
        }
      },
      {
        "match": {
          "host": [
            "djangotricks.com",
            "www.djangotricks.com"
          ],
        },
        "action": {
          "pass": "applications/django"
        }
      },
      {
        "action": {
          "return": 444
        }
      }
    ]
  },
  "applications": {
    "django": {
      "type": "python",
      "path": "/var/webapps/djangotricks/project/djangotricks",
      "home": "/var/webapps/djangotricks/venv/",
      "module": "djangotricks.wsgi",
      "environment": {
        "DJANGO_SETTINGS_MODULE": "djangotricks.settings.production"
      },
      "user": "djangotricks",
      "group": "users"
    }
  }
}

In this configuration, HTTP requests can only be used for certification validation, and HTTPS requests point to the Django project if the domain used is correct. In other cases, the status "444 - No Response" is returned. (It's for preventing access for hackers who point their domains to your IP address).

In the NGINX Unit, switching between WSGI and ASGI is literally a matter of changing one letter from "w" to "a" in the line about the Django application module, from:

"module": "djangotricks.wsgi",

to:

"module": "djangotricks.asgi",

I could have easily served the static files in this configuration here, too, but my STATIC_URL contains a dynamic part to force retrieval of new files from the server instead of the browser cache. So, I used WhiteNoise to serve the static files.

For redirection from djangotricks.com to www.djangotricks.com, I also chose to use PREPEND_WWW = True setting instead of Unit directives.

And here, finally, installing it to Unit (it will overwrite the previous configuration):

$ curl -X PUT --data-binary @/var/webapps/djangotricks/unit-config/unit-config.json \
--unix-socket /var/run/control.unit.sock http://localhost/config

How it performed

DjangoTricks is a pretty small website; therefore, I couldn't do extensive benchmarks, but I checked two cases: how a filtered list view performs with NGINX and Gunicorn vs. NGINX Unit, and how you can replace NGINX, Gunicorn, and Huey background tasks with ASGI requests using NGINX Unit.

First of all, the https://www.djangotricks.com/tricks/?categories=development&technologies=django-4-2 returned the HTML result on average in 139 ms on NGINX with Gunicorn, whereas it was on average 140 ms with NGINX Unit using WSGI and 149 ms with NGINX Unit using ASGI. So, the NGINX Unit with WSGI is 0.72% slower than NGINX with Gunicorn, and the NGINX Unit with ASGI is 7.19% slower than NGINX with Gunicorn.

However, when I checked https://www.djangotricks.com/detect-django-version/ how it performs with background tasks and continuous Ajax requests until the result is retrieved vs. asynchronous checking using ASGI, I went on average from 6.62 s to 0.75 s. Of course, it depends on the timeout of the continuous Ajax request, but generally, a real-time ASGI setup can improve the user experience significantly.

UPDATE on 2024-09-15: After using ASGI with the NGINX Unit for a while, I noticed that it crashed several times, and the server had to be restarted. It's still unclear whether the issue was due to NGINX Unit instability, Django's ASGI implementation, or simply heavy load. So use ASGI at your own risk.

Final words

Although NGINX Unit with Python is slightly (unnoticeably) slower than NGINX with Gunicorn, it allows Django developers to use asynchronous requests and implement real-time user experience. Also, you could probably have a Django website and Matomo analytics or WordPress blog on the same server. The NGINX Unit configuration is relatively easy to understand, and you can script the process for reusability.


Cover Image by Volker Meyer.