DjangoTricks: Administration

Showing posts with label Administration. Show all posts

2024-02-24

Django Project on NGINX Unit

Recently, I learned about the NGINX Unit and decided to try it on my DjangoTricks website. Unit is a web server developed by people from NGINX, with pluggable support for Python (WSGI and ASGI), Ruby, Node.js, PHP, and a few other languages. I wanted to see whether it's really easy to set it up, have it locally on my Mac and the remote Ubuntu server, and try out the ASGI features of Django, allowing real-time communication. Also, I wanted to see whether Django is faster with Unit than with NGINX and Gunicorn. This article is about my findings.

My observations

Unit service uses HTTP requests to read and update its configuration. The configuration is a single JSON file that you can upload to the Unit service via a command line from the same computer or modify its values by keys in the JSON structure.

Normally, the docs suggest using the curl command to update the configuration. However, as I am using Ansible to deploy my Django websites, I wanted to create a script I could later copy to other projects. I used Google Gemini to convert bash commands from the documentation to Ansible directives and corrected its mistakes.

The trickiest part for me was to figure out how to use Let's Encrypt certificates in the simplest way possible. The docs are extensive and comprehensible, but sometimes, they dig into technical details that are unnecessary for a common Django developer.

Also, it's worth mentioning that the Unit plugin version must match your Python version in the virtual environment. It was unexpected for me when Brew installed Python 3.12 with unit-python3 and then required my project to use Python 3.12 instead of Python 3.10 (which I used for the DjangoTricks website). So I had to recreate my virtual environment and probably will have problems later with pip-compile-multi when I prepare packages for the production server, still running Python 3.10.

Below are the instructions I used to set up the NGINX Unit with my existing DjangoTricks website on Ubuntu 22.04. For simplicity, I am writing plain Terminal commands instead of analogous Ansible directives.

1. Install Unit service to your server

Follow the installation instructions from documentation to install unit, unit-dev, unit-python3.10, and whatever other plugins you want. Make sure the service is running.

2. Prepare Let's Encrypt certificates

Create a temporary JSON configuration file /var/webapps/djangotricks/unit-config/unit-config-pre.json, which will allow Let's Encrypt certbot to access the .well-known directory for domain confirmation:

{
  "listeners": {
    "*:80": {
      "pass": "routes/acme"
    }
  },
  "routes": {
    "acme": [
      {
        "match": {
          "uri": "/.well-known/acme-challenge/*"
        },
        "action": {
          "share": "/var/www/letsencrypt/$uri"
        }
      }
    ]
  }
}

Install it to Unit:

$ curl -X PUT --data-binary @/var/webapps/djangotricks/unit-config/unit-config-pre.json \
--unix-socket /var/run/control.unit.sock http://localhost/config

If you make any mistakes in the configuration, it will be rejected with an error message and not executed.

Create Let's Encrypt certificates:

$ certbot certonly -n --webroot -w /var/www/letsencrypt/ -m hello@djangotricks.com \
--agree-tos --no-verify-ssl -d djangotricks.com -d www.djangotricks.com

Create a bundle that is required by the NGINX Unit:

cat /etc/letsencrypt/live/djangotricks.com/fullchain.pem \
/etc/letsencrypt/live/djangotricks.com/privkey.pem > \
/var/webapps/djangotricks/unit-config/bundle1.pem

Install certificate to NGINX Unit as certbot1:

curl -X PUT --data-binary @/var/webapps/djangotricks/unit-config/bundle1.pem \
--unix-socket /var/run/control.unit.sock http://localhost/certificates/certbot1

3. Install Django project configuration

Create a JSON configuration file /var/webapps/djangotricks/unit-config/unit-config.json which will use your SSL certificate and will serve your Django project:

{
  "listeners": {
    "*:80": {
      "pass": "routes/main"
    },
    "*:443": {
      "pass": "routes/main",
      "tls": {
        "certificate": "certbot1"
      }
    }
  },
  "routes": {
    "main": [
      {
        "match": {
          "host": [
            "djangotricks.com",
            "www.djangotricks.com"
          ],
          "uri": "/.well-known/acme-challenge/*"
        },
        "action": {
          "share": "/var/www/letsencrypt/$uri"
        }
      },
      {
        "match": {
          "host": [
            "djangotricks.com",
            "www.djangotricks.com"
          ],
        },
        "action": {
          "pass": "applications/django"
        }
      },
      {
        "action": {
          "return": 444
        }
      }
    ]
  },
  "applications": {
    "django": {
      "type": "python",
      "path": "/var/webapps/djangotricks/project/djangotricks",
      "home": "/var/webapps/djangotricks/venv/",
      "module": "djangotricks.wsgi",
      "environment": {
        "DJANGO_SETTINGS_MODULE": "djangotricks.settings.production"
      },
      "user": "djangotricks",
      "group": "users"
    }
  }
}

In this configuration, HTTP requests can only be used for certification validation, and HTTPS requests point to the Django project if the domain used is correct. In other cases, the status "444 - No Response" is returned. (It's for preventing access for hackers who point their domains to your IP address).

In the NGINX Unit, switching between WSGI and ASGI is literally a matter of changing one letter from "w" to "a" in the line about the Django application module, from:

"module": "djangotricks.wsgi",

to:

"module": "djangotricks.asgi",

I could have easily served the static files in this configuration here, too, but my STATIC_URL contains a dynamic part to force retrieval of new files from the server instead of the browser cache. So, I used WhiteNoise to serve the static files.

For redirection from djangotricks.com to www.djangotricks.com, I also chose to use PREPEND_WWW = True setting instead of Unit directives.

And here, finally, installing it to Unit (it will overwrite the previous configuration):

$ curl -X PUT --data-binary @/var/webapps/djangotricks/unit-config/unit-config.json \
--unix-socket /var/run/control.unit.sock http://localhost/config

How it performed

DjangoTricks is a pretty small website; therefore, I couldn't do extensive benchmarks, but I checked two cases: how a filtered list view performs with NGINX and Gunicorn vs. NGINX Unit, and how you can replace NGINX, Gunicorn, and Huey background tasks with ASGI requests using NGINX Unit.

First of all, the https://www.djangotricks.com/tricks/?categories=development&technologies=django-4-2 returned the HTML result on average in 139 ms on NGINX with Gunicorn, whereas it was on average 140 ms with NGINX Unit using WSGI and 149 ms with NGINX Unit using ASGI. So, the NGINX Unit with WSGI is 0.72% slower than NGINX with Gunicorn, and the NGINX Unit with ASGI is 7.19% slower than NGINX with Gunicorn.

However, when I checked https://www.djangotricks.com/detect-django-version/ how it performs with background tasks and continuous Ajax requests until the result is retrieved vs. asynchronous checking using ASGI, I went on average from 6.62 s to 0.75 s. Of course, it depends on the timeout of the continuous Ajax request, but generally, a real-time ASGI setup can improve the user experience significantly.

UPDATE on 2024-09-15: After using ASGI with the NGINX Unit for a while, I noticed that it crashed several times, and the server had to be restarted. It's still unclear whether the issue was due to NGINX Unit instability, Django's ASGI implementation, or simply heavy load. So use ASGI at your own risk.

Final words

Although NGINX Unit with Python is slightly (unnoticeably) slower than NGINX with Gunicorn, it allows Django developers to use asynchronous requests and implement real-time user experience. Also, you could probably have a Django website and Matomo analytics or WordPress blog on the same server. The NGINX Unit configuration is relatively easy to understand, and you can script the process for reusability.

Cover Image by Volker Meyer.

2019-10-24

Things I want to remember about SSH

SSH, short for Secure Shell, is a protocol for secure network communications. It is widely used for executing commands on remote servers, and for file uploads or downloads. If you are working with Django, use Git version control, or administrate servers, you surely are using SSH. In this post, I want to share some technical details about it.

Secure Shell is using private and public key pairs. You can either use automatically generated private and public keys combined with a password, or manually generated private and public keys. In the latter case, you need to keep your private key on your computer and upload the public key to the remote server.

Creating a pair of SSH keys manually

If you are using GitHub, Bitbucket, DigitalOcean, or some other service, you might have seen the possibility to upload public SSH keys for direct access to remote servers.

Here is how you usually create the SSH keys on the computer from which you want to establish a secure connection (your local machine or one of your servers that has access to other servers or services). In the Terminal you would execute these commands:

$ ssh-keygen
$ ssh-agent /usr/local/bin/bash
$ ssh-add ~/.ssh/id_rsa

The id_rsa is the name of the default SSH private key. The public key would be id_rsa.pub. And by default they both will be located under ~/.ssh/.

When running ssh-keygen you can choose different key names and even add a passphrase. For instance, you could have github_id_rsa and github_id_rsa.pub keys for communication with GitHub. My recommendation would be for each new service to create a new private-public key pair so that in case you need to transfer your computer's data to a different machine, you could selectively transfer the access to the remote servers.

Also, if you are not using the passphrase for the SSH key pair, I would recommend having your disk encrypted and a secure user password for your computer. If your laptop gets stolen, the thief wouldn't be able to get to your remote servers without knowing your computer's password.

Creating an access to a remote server by SSH key

In the case of GitHub, Bitbucket, and other online services with SSH communication, you usually have to copy the contents of the public key into a text field in a web form.

If you want to create a secure communication by manually generated private-public keys with a server where your Django project is deployed, you should append the contents of the public key to the ~/.ssh/authorized_keys file on the remote server.

To get the content of the public key in the Terminal, you can use:

$ cat ~/.ssh/id_rsa.pub

Then copy the output to the clipboard.

Or on macOS you can run pbcopy as follows:

$ pbcopy < ~/.ssh/id_rsa.pub

To append the contents of the public key to the remote server, you can do this:

$  echo "...pasted public key...">>~/.ssh/authorized_keys

Creating authorization at a remote server by password

If you want to establish an SSH connection with a password and automatically generated private-public keys, you would need to edit /etc/ssh/sshd_config and ensure these two settings:

PasswordAuthentication yes
PermitEmptyPasswords no

After the change, you would restart the ssh server with the following command:

$ sudo service ssh restart

Also, make sure that the user you are connecting with has a password:

$ sudo passwd the_user

Connecting to a remote server

The default way to connect via SSH to a remote server with a password is executing the following in the Terminal:

$ ssh the_user@example.com

To connect with a private key, you would execute this:

$ ssh -i ~/.ssh/examplecom_id_rsa the_user@example.com

Next, let's see how we can simplify this using some local SSH configuration.

Configuring local SSH client

Edit ~/.ssh/config and add the following lines for each SSH connection that you want to define:

Host examplecom
    HostName example.com
    User the_user
    IdentityFile ~/.ssh/examplecom_id_rsa

If the domain of the website is not yet pointing to the IP address of the server, you can also connect by IP address:

Host examplecom
    HostName 1.2.3.4
    User the_user
    IdentityFile ~/.ssh/examplecom_id_rsa

The following allows you to login to your remote servers by manually generated private-public key with just these lines:

$ ssh examplecom

To request for password instead of using the manually generated keys, you would need to modify the snippet as follows:

Host examplecom
    HostName example.com
    User the_user
    PubkeyAuthentication=no

When you connect via SSH and wait don't type anything for 30 minutes or so, the connection gets lost. But you can require your client to connect to the server every 4 minutes or so by adding the following lines to the beginning of the ~/.ssh/config on your local computer:

Host *
    ServerAliveInterval 240

Uploading and downloading files using SSH connection

Typically, Secure Shell allows you to execute terminal commands on the remote server using bash, zsh, sh, or another shell. But very often, you also need to transfer files securely to and from the server. For that, you have these options: scp command, rsync command, or FTP client with SFTP support.

scp

The scp stands for Secure Copy.

This is how you would copy the secrets.json file from the remote server to your local development environment:

$ scp the_user@example.com:~/src/myproject/myproject/settings/secrets.json ./myproject/settings/secrets.json

Here is an example of the same, but with custom ~/.ssh/config configuration:

$ scp examplecom:~/src/myproject/myproject/settings/secrets.json ./myproject/settings/secrets.json

To copy the file from the local computer to the remote server, you would switch the places of source and target:

$ scp ./myproject/settings/secrets.json examplecom:~/src/myproject/myproject/settings/secrets.json

rsync

To synchronize directories on the server and locally, you can use the rsync command. This is how to do it for downloading the media/ directory (note that the trailing slashes matter):

$ rsync --archive --compress --partial --progress the_user@example.com:~/src/myproject/myproject/media/ ./myproject/media/

Here is an example of the same with a custom ~/.ssh/config configuration:

$ rsync --archive --compress --partial --progress examplecom:~/src/myproject/myproject/media/ ./myproject/media/

To upload the media/ directory to the remote server, you would again switch places for the source and target:

$ rsync --archive --compress --partial --progress ./myproject/media/ examplecom:~/src/myproject/myproject/media/

sftp

FTP clients like Transmit allow you to have SFTP connections either by username and password or by username and private key. You can even generate the private-public keys directly in the app there.

SFTP works like FTP, but all communication is encrypted there.

The final words

Use only encrypted connections for your network communications, encrypt your hard disk if you use manually generated private-public keys, and use strong passwords.

Be safe!

Cover photo by Jason D.

2017-03-01

Tracking the Results of Cron Jobs

Every Django website needs some automatic background tasks to execute regularly. The outdated sessions need to be cleaned up, search index needs to be updated, some data needs to be imported from RSS feeds or APIs, backups need to be created, you name it. Usually, if not all the time, those regular tasks are being set as cron jobs. However, when some task is run in the background, by default, you don't get any feedback whether it was successfully completed, or whether it crashed on the way. In this post I will show you how I handle the results of cron jobs.

In a Django project, all those tasks are usually implemented as management commands. For each such command I write a short bash script, that will call the management command with specific parameters and will print the verbose output to a log file.

Let's say my project structure is like this on a remote server:

/home/myproject
├── bin
├── include
├── lib
├── public_html
├── backups
├── project
│   └── myproject
├── scripts
└── logs

A virtual environment is created in the home directory of myproject linux user. The Django project itself is kept under project directory. The scripts directory is for my bash scripts. And the logs directory is for the verbose output of the bash scripts.

For example, for the default clearsessions command that removes outdated sessions, I would create scripts/cleanup.sh bash script as follows:

#!/usr/bin/env bash
SECONDS=0
PROJECT_PATH=/home/myproject
CRON_LOG_FILE=${PROJECT_PATH}/logs/cleanup.log

echo "Cleaning up the database" > ${CRON_LOG_FILE}
date >> ${CRON_LOG_FILE}

cd ${PROJECT_PATH}
source bin/activate
cd project/myproject    
python manage.py clearsessions --verbosity=2 --traceback >> ${CRON_LOG_FILE}  2>&1

echo "Finished." >> ${CRON_LOG_FILE}
duration=$SECONDS
echo "$(($duration / 60)) minutes and $(($duration % 60)) seconds elapsed." >> ${CRON_LOG_FILE}

To run this command every night at 1 AM, you could create a file myproject_crontab with the following content:

MAILTO=""
00 01 * * * /home/myproject/scripts/cleanup.sh

Then register the cron jobs with:

$ crontab myproject_crontab

By such a bash script, I can track:

At what time the script was last executed.
What is the verbose output of the management command.
If the management command broke, what was in the traceback.
Whether the command finished executing or hung up.
How long it took to run the command.

In addition, this gives me information whether the crontab was registered and whether the cron service was running at all. As I get the total time of execution in minutes and seconds, I can decide how often I can call the cron job regularly so that it doesn't clash with another cron job.

When you have multiple Django management commands, you can group them thematically into single bash script, or you can wrap them into individual bash scripts. After putting them into the crontab, the only thing left is manually checking the logs from time to time.

If you have any suggestions how I could even improve this setup, I would be glad to hear your opinion in the comments.

Here is the Gist of the scripts in this post. To see some examples of custom Django management commands, you can check Chapter 9, Data Import and Export in my book Web Development with Django Cookbook - Second Edition.

Cover photo by Redd Angelo

2016-05-21

Deploying a Django Website on Heroku

Once you have a working project, you have to host it somewhere. One of the most popular deployment platforms nowadays is Heroku. Heroku belongs to a Platform as a Service (PaaS) category of cloud computing services. Every Django project you host on Heroku is running inside a smart container in a fully managed runtime environment. Your project can scale horizontally (adding more computing machines) and you pay for what you use starting with a free tier. Moreover, you won't need much of system administrator's skills to do the deployment - once you do the initial setup, the further deployment is as simple as pushing Git repository to a special heroku remote.

However, there are some gotchas to know before choosing Heroku for your Django project:

One uses PostgreSQL database with your project. MySQL is not an option.
You cannot store your static and media files on Heroku. One should use Amazon S3 or some other storage for that.
There is no mailing server associated with Heroku. One can use third-party SendGrid plugin with additional costs, GMail SMTP server with sent email amount limitations, or some other SMTP server.
The Django project must be version-controlled under Git.
Heroku works with Python 2.7. Python 3 is not yet supported.

Recently I deployed a small Django project on Heroku. To have a quick reference for the future, I summed up the process here providing instructions how to do that for future reference.

1. Install Heroku Toolbelt

To connect your shell with Heroku, type:

$ heroku login

When asked, enter your Heroku account's email and password.

2. Prepare Pip Requirements

Activate your project's virtual environment and install Python packages required for Heroku:

(myproject_env)$ pip install django-toolbelt

This will install django, psycopg2, gunicorn, dj-database-url, static3, and dj-static to your virtual environment.

Install boto and Django Storages to be able to store static and media files on an S3 bucket:

(myproject_env)$ pip install boto
(myproject_env)$ pip install django-storages

Go to your project's directory and create the pip requirements that Heroku will use in the cloud for your project:

(myproject_env)$ pip freeze -l > requirements.txt

3. Create Heroku-specific Files

You will need two files to tell Heroku what Python version to use and how to start a webserver.

In your project's root directory create a file named runtime.txt with the following content:

python-2.7.11

Then at the same location create a file named Procfile with the following content:

web: gunicorn myproject.wsgi --log-file -

4. Configure the Settings

As mentioned in the "Web Development with Django Cookbook - Second Edition", we keep the developmnent and production settings in separate files both importing the common settings from a base file.

Basically we have myproject/conf/base.py with the settings common for all environments.

Then myproject/conf/dev.py contains the local database and dummy email configuration as follows:

# -*- coding: UTF-8 -*-
from __future__ import unicode_literals
from .base import *

DATABASES = {
    "default": {
        "CONN_MAX_AGE": 0,
        "ENGINE": "django.db.backends.postgresql",
        "HOST": "localhost",
        "NAME": "myproject",
        "PASSWORD": "",
        "PORT": "",
        "USER": "postgres"
    }
}

EMAIL_BACKEND = "django.core.mail.backends.console.EmailBackend"

Lastly for the production settings we need myproject/conf/prod.py with special database configuration, non-debug mode, and unrestrictive allowed hosts as follows:

# -*- coding: UTF-8 -*-
from __future__ import unicode_literals
from .base import *
import dj_database_url

DATABASES = {
    "default": dj_database_url.config()
}

ALLOWED_HOSTS = ["*"]

DEBUG = False

Now let's open myproject/settings.py and add the following content:

# -*- coding: UTF-8 -*-
from __future__ import unicode_literals
from .conf.dev import *

Finally, open the myproject/wsgi.py and change the location of the DJANGO_SETTINGS_MODULE there:

os.environ.setdefault("DJANGO_SETTINGS_MODULE", "myproject.conf.prod")

5. Set Up Amazon S3 for Static and Media Files

Create an Amazon S3 bucket myproject.media at the AWS Console (web interface for Amazon Web Services). Go to the properties of the bucket, expand "Permissions" section, click on the "add bucket policy" button and enter the following:

{
    "Version": "2008-10-17",
    "Statement": [
        {
            "Sid": "AllowPublicRead",
            "Effect": "Allow",
            "Principal": {
                "AWS": "*"
            },
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::myproject.media/*"
        }
    ]
}

This ensures that files on the S3 bucket will be accessible publicly without any API keys.

Go back to your Django project and add storages to the INSTALLED_APPS in myproject/conf/base.py:

INSTALLED_APPS = [
    # ...
    "storages",
]

Media files and static files will be stored on different paths under S3 bucket. To implement that, we need to create two Python classes under a new file myproject/s3utils.py as follows:

# -*- coding: UTF-8 -*-
from __future__ import unicode_literals
from storages.backends.s3boto import S3BotoStorage

class StaticS3BotoStorage(S3BotoStorage):
    """
    Storage for static files.
    """

    def __init__(self, *args, **kwargs):
        kwargs['location'] = 'static'
        super(StaticS3BotoStorage, self).__init__(*args, **kwargs)


class MediaS3BotoStorage(S3BotoStorage):
    """
    Storage for uploaded media files.
    """

    def __init__(self, *args, **kwargs):
        kwargs['location'] = 'media'
        super(MediaS3BotoStorage, self).__init__(*args, **kwargs)

Finally, let's edit the myproject/conf/base.py and add AWS settings:

AWS_S3_SECURE_URLS = False       # use http instead of https
AWS_QUERYSTRING_AUTH = False                # don't add complex authentication-related query parameters for requests
AWS_S3_ACCESS_KEY_ID = "..."                # Your S3 Access Key
AWS_S3_SECRET_ACCESS_KEY = "..."            # Your S3 Secret
AWS_STORAGE_BUCKET_NAME = "myproject.media"
AWS_S3_HOST = "s3-eu-west-1.amazonaws.com"  # Change to the media center you chose when creating the bucket

STATICFILES_STORAGE = "myproject.s3utils.StaticS3BotoStorage"
DEFAULT_FILE_STORAGE = "myproject.s3utils.MediaS3BotoStorage"

# the next monkey patch is necessary to allow dots in the bucket names
import ssl
if hasattr(ssl, '_create_unverified_context'):
   ssl._create_default_https_context = ssl._create_unverified_context

Collect static files to the S3 bucket:

(myproject_env)$ python manage.py collectstatic --noinput

6. Set Up Gmail to Send Emails

Open myproject/conf/prod.py and add the following settings:

EMAIL_USE_TLS = True
EMAIL_HOST = "smtp.gmail.com"
EMAIL_HOST_USER = "myproject@gmail.com"
EMAIL_HOST_PASSWORD = "mygmailpassword"
EMAIL_PORT = 587

7. Push to Heroku

Commit and push all the changes to your Git origin remote. Personally I prefer using SourceTree to do that, but you can also do that in the command line, PyCharm, or another software.

In your project directory type the following:

(myproject_env)$ heroku create my-unique-project

This will create a Git remote called "heroku", and a new Heroku project "my-unique-project" which can be later accessed at http://my-unique-project.herokuapp.com.

Push the changes to heroku remote:

(myproject_env)$ git push heroku master

8. Transfer Your Local Postgres Database To Heroku

Create local database dump:

(myproject_env)$ PGPASSWORD=mypassword pg_dump -Fc --no-acl --no-owner -h localhost -U myuser mydb > mydb.dump

Upload the database dump temporarily to some server, for example, S3 bucket: http://myproject.media.s3-eu-west-1.amazonaws.com/mydb.dump. Then import that dump into the Heroku database:

(myproject_env)$ heroku pg:backups restore 'http://myproject.media.s3-eu-west-1.amazonaws.com/mydb.dump' DATABASE_URL

Remove the database dump from S3 server.

9. Set Environment Variables

If your Git repository is not private, put your secret values in environment variables rather than in the Git repository directly.

(myproject_env)$ heroku config:set AWS_S3_ACCESS_KEY_ID=ABCDEFG123
$ heroku config:set AWS_S3_SECRET_ACCESS_KEY=aBcDeFg123

To read out the environment variables you can type:

(myproject_env)$ heroku config

To read out the environment variables in the Python code open myproject/conf/base.py and type:

import os
AWS_S3_ACCESS_KEY_ID = os.environ.get("AWS_S3_ACCESS_KEY_ID", "")
AWS_S3_SECRET_ACCESS_KEY = os.environ.get("AWS_S3_SECRET_ACCESS_KEY", "")

10. Set DNS Settings

Open your domain settings and set CNAME to "my-unique-project.herokuapp.com".

At last, you are done! Drop in the comments if I missed some part. For the new updates, see the next section.

*. Update Production

Push the changes to heroku remote:

(myproject_env)$ git push heroku master

If you have changed something in the static files, collect them again:

(myproject_env)$ python manage.py collectstatic --noinput

Collecting static files to S3 bucket takes quite a long time, so I do not recommend to do that automatically every time when you want to deploy to Heroku.

2013-12-07

How to Store Your Media Files in Amazon S3 Bucket

In this article, I will show you how to use Amazon Simple Storage Service (S3) to store your media files in the cloud. S3 is known and widely used for its scalability, reliability, and relatively cheap price. It is free to join and you only pay the hosting and bandwidth costs as you use it. The service is provided by Amazon.com. S3 tends to be attractive for start-up companies looking to minimize costs.

S3 uses a concept of buckets which is like a storage database. Each bucket has its own url. Inside the buckets you have folders and under that you have files. In fact, directories don't actually exist within S3 buckets. The entire file structure is actually just one flat single-level container of files. The illusion of directories is actually created based on having the file names like dirA/dirB/file.

If you want to browse the files in a folder-like structure, you can use Transmit FTP client on Mac OS X. It supports S3 services. Amazon browser-based console also has interface for browsing or uploading files.

OK. Now let's have a look how to set up a Django project which will use S3 for media files.

1. Create a bucket

At first you will need to create a bucket at S3 and make it accessible for all visitors. Login to your Amazon Web Services console. Click on "Services" in the menu, then on the "S3". Click on the button "Create bucket" and enter your bucket name like "mywebsite.media" or even better without dots like "mywebsitemedia". Choose a region there which is the closest to your target audience, for example, if your website is for Europeans, choose "Ireland". Go to the properties of the bucket, expand "Permissions" section, click on the "add bucket policy" button and enter the following:

{
    "Version": "2008-10-17",
    "Statement": [
        {
            "Sid": "AllowPublicRead",
            "Effect": "Allow",
            "Principal": {
                "AWS": "*"
            },
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::mywebsite.media/*"
        }
    ]
}

2. Install boto and django-storages

Amazon Web Services provide a python library called boto for accessing the API. There is a django app called django-storages which allows to use AWS S3 as the main storage. So your next step is to activate your virtual environment and install latest versions of boto and django-storages.

pip install boto==2.19.0
pip install django-storages==1.1.8

3. Set up django-storages for your project

Add the following to the settings.py:

INSTALLED_APPS = [
    # ...
    'storages'
    # ...
]
DEFAULT_FILE_STORAGE = 'storages.backends.s3boto.S3BotoStorage'
AWS_S3_SECURE_URLS = False       # use http instead of https
AWS_QUERYSTRING_AUTH = False     # don't add complex authentication-related query parameters for requests
AWS_S3_ACCESS_KEY_ID = '...'     # enter your access key id
AWS_S3_SECRET_ACCESS_KEY = '...' # enter your secret access key
AWS_STORAGE_BUCKET_NAME = 'mywebsite.media'

# the next monkey patch is necessary if you use dots in the bucket name
import ssl
if hasattr(ssl, '_create_unverified_context'):
   ssl._create_default_https_context = ssl._create_unverified_context

4. Create your models with FileField or ImageField

Let's create a Profile model with avatar field.

def upload_avatar_to(instance, filename):
    import os
    from django.utils.timezone import now
    filename_base, filename_ext = os.path.splitext(filename)
    return 'profiles/%s%s' % (
        now().strftime("%Y%m%d%H%M%S"),
        filename_ext.lower(),
    )

class Profile(models.Model):
    # ...
    avatar = models.ImageField(_("Avatar"), upload_to=upload_avatar_to, blank=True)
    # ...

Whenever you save an instance of the Profile with the new avatar picture, avatar will be uploaded to S3 bucket. To show it in a template, you will need something like <img src="{{ profile.avatar.url }}" alt="" /> where the image source will look like "http://mywebsite.media.s3.amazonaws.com/profiles/20140214203012.jpg".

5. Use the storage to manipulate file versions

If you need to create a thumbnail version of your image, it's probably best to overwrite the save method of the model and trigger the generation of the thumbs there. Let's add some methods to the Profile class:

class Profile(models.Model):
    # ...

    def save(self, *args, **kwargs):
        super(Profile, self).save(*args, **kwargs)
        self.create_avatar_thumb()

    def create_avatar_thumb(self):
        import os
        from PIL import Image
        from django.core.files.storage import default_storage as storage
        if not self.avatar:
            return ""
        file_path = self.avatar.name
        filename_base, filename_ext = os.path.splitext(file_path)
        thumb_file_path = "%s_thumb.jpg" % filename_base
        if storage.exists(thumb_file_path):
            return "exists"
        try:
            # resize the original image and return url path of the thumbnail
            f = storage.open(file_path, 'r')
            image = Image.open(f)
            width, height = image.size

            if width > height:
                delta = width - height
                left = int(delta/2)
                upper = 0
                right = height + left
                lower = height
            else:
                delta = height - width
                left = 0
                upper = int(delta/2)
                right = width
                lower = width + upper

            image = image.crop((left, upper, right, lower))
            image = image.resize((50, 50), Image.ANTIALIAS)

            f_thumb = storage.open(thumb_file_path, "w")
            image.save(f_thumb, "JPEG")
            f_thumb.close()
            return "success"
        except:
            return "error"

    def get_avatar_thumb_url(self):
        import os
        from django.core.files.storage import default_storage as storage
        if not self.avatar:
            return ""
        file_path = self.avatar.name
        filename_base, filename_ext = os.path.splitext(file_path)
        thumb_file_path = "%s_thumb.jpg" % filename_base
        if storage.exists(thumb_file_path):
            return storage.url(thumb_file_path)
        return ""

As you might have guessed, the avatar can be placed in the template using something like

{% if profile.get_avatar_thumb_url %}
    <img src="{{ profile.get_avatar_thumb_url }}" alt="" />
{% endif %}

where the image source will look like "http://mywebsite.media.s3.amazonaws.com/profiles/20140214203012_thumb.jpg".

Conclusion

When you have a basic overview about Amazon Simple Storage Service, it is quite easy to use it in Django projects with existing third-party libraries. For flexibility, if you need to modify uploaded files, storage object should be used instead of the default os methods. That way, you can simply switch to the default file storage for local development.

2012-11-07

How to create a virtual environment that also uses global modules

Last weekend I upgraded my Mac to Mountain Lion. I installed Python 2.7 with PIL and MySQLdb and virtualenv using MacPorts.

When I created and activated a virtual environment with

cd myproject
virtualenv .
. bin/activate

the PIL and MySQLdb modules were unrecognizible in that environment. What I needed was just different Django versions in my virtual environments, but MySQLdb and PIL libraries had to be shared.

After a quick search I found an additional parameter for the creation of virtual environment:

virtualenv --system-site-packages .

With this parameter I could use my global modules within the virtual environment.

2009-10-17

Weather App Tutorial. Part 4 of 5. Template Tag

If you followed the first parts of the tutorial, you should have basic understanding how to create an app with models, set up administration, and retrieve data from third-party services. This part is about displaying collected data in any template using custom template tag.

At first, we need to create a directory templatetags containing an empty __init__.py file in the climate_change directory.


mkdir -p climate_change/templatetags
touch climate_change/templatetags/__init__.py

I will call the template library weather. So I have to create a file weather.py in climate_change/templatetags and define and register the template tag in that file. The template tag get_current_weather should display the current weather for a chosen location. To define what location you choose, you could refer to id, name or location_id, but none of them is appropriate for this reason. id and location_id are not remember-able and not informative enough, whereas the name might be changed to translate the city to another language or to add some more specifics and this change would detach the template tag from the location. For those reasons, it is best to create a new field sysname for the location model which would have a unique non-changeable value as a textual humanized identifier for templates.

But wait! It's such a pain to add new fields and modify database schema... Not, if you are using south! Let's quickly install it and then put it under INSTALLED_APPS.

easy_install south


#...
INSTALLED_APPS = (
    # django core
    "django.contrib.auth",
    "django.contrib.contenttypes",
    "django.contrib.sessions",
    "django.contrib.sites",
    "django.contrib.admin",
    # third-party
    "south",
    # project-specific
    "climate_change",
    )
#...

Note that south was installed in the virtual environment not spoiling the global python namespace. Now we will syncronize the database to create south_history table, create the initial migration for climate_change and apply it.


# create the missing database table from south app
python manage.py syncdb
# create initial migration for climate_change app which will be used by new projects
python manage.py startmigration climate_change --initial
# fake this migration for this project
python manage.py migrate climate_change --fake

Now we can finally add the new sysname field, create migration for it and apply it.


#...
class Location(models.Model):
    sysname = models.SlugField(
        _("system name"),
        max_length=200,
        unique=True,
        blank=True,
        default="",
        help_text=_("Do not change this value"),
        )
    name = models.CharField(_("name"), max_length=200)
    location_id = models.CharField(
        _("location ID"),
        max_length=20,
        help_text=_("Location IDs can be retrieved from URLs of weather "
            "at specific cities at Yahoo! Weather, e.g. GMXX0008 from "
            "http://weather.yahoo.com/forecast/GMXX0008.html"),
        )
#...


# create a new migration called "add_sysname"
python manage.py startmigration climate_change add_sysname --auto
# apply it to the database
python manage.py migrate climate_change

I had to set the default value to empty string because otherwise south throws exception when I use sqlite3. Anyway, after running those commands, I started the built in webserver again and added the sysname "berlin" to the record of Berlin's location.

We can get back to weather.py and add the template tag there


# -*- coding: UTF-8 -*-
from django.db import models
from django import template
from django.template import loader

register = template.Library()

### TAGS ###

def do_get_current_weather(parser, token):
    """
    Returns the latest known weather information.

    Usage::

        {% get_current_weather in <location_sysname> [using <template_path>] [as <var_name>] %}
    
    Examples::

        {% get_current_weather in "berlin" using "climate_change/custom_weather.html" %}

        {% get_current_weather in "london" as current_weather %}
        var sCurrentWeather = "{{ current_weather|escapejs }}";

    """    
    bits = token.split_contents()
    tag_name = bits.pop(0)
    template_path = ""
    var_name = ""
    location_sysname = ""
    try:
        while bits:
            first_word = bits.pop(0)
            second_word = bits.pop(0)
            if first_word == "in":
                location_sysname = second_word
            elif first_word == "using":
                template_path = second_word
            elif first_word == "as":
                var_name = second_word
                    
    except ValueError:
        raise template.TemplateSyntaxError, "get_current_weather tag requires a following syntax: {% get_current_weather [using <template_path>] [as <var_name>] %}"
    return CurrentWeatherNode(tag_name, location_sysname, template_path, var_name)

class CurrentWeatherNode(template.Node):
    def __init__(self, tag_name, location_sysname, template_path, var_name):
        self.tag_name = tag_name
        self.location_sysname = location_sysname
        self.template_path = template_path
        self.var_name = var_name
    def render(self, context):
        location_sysname = template.resolve_variable(
            self.location_sysname,
            context,
            )
        template_path = ""
        if self.template_path:
            template_path = template.resolve_variable(
                self.template_path,
                context,
                )
        context.push()
        WeatherLog = models.get_model("climate_change", "WeatherLog")
        logs = WeatherLog.objects.filter(
            location__sysname=location_sysname,
            ).order_by("-timestamp")
        if logs:
            context['weather'] = logs[0]
        output = loader.render_to_string(
            [template_path, "climate_change/current_weather.html"],
            context,
            )
        context.pop()
        if self.var_name:
            context[self.var_name] = output
            return ""
        else:
            return output


register.tag("get_current_weather", do_get_current_weather)

### FILTERS ###

# none at the moment

As you might see from the code, the template tag is using a template which can be redefined by the template designer. We still need the default template itself, so I will create a directory templates/climate_change and a file current_weather.html with this content:


{% load i18n %}
<div class="current-wheather">
    <h3>{{ weather.location.name }}</h3>
    <dl>
        <dt>{% trans "Temperature" %}:</dt>
        <dd>{{ weather.temperature }}° C</dd>
        <dt>{% trans "Humidity" %}:</dt>
        <dd>{{ weather.humidity }} %</dd>
        <dt>{% trans "Wind speed" %}:</dt>
        <dd>{{ weather.wind_speed }} km/h</dd>
        <dt>{% trans "Visibility" %}:</dt>
        <dd>{{ weather.visibility }} km</dd>
    </dl>
</div>

How can I test the template tag? I will need a new page which will include it. So I will add a rule in urls.py to redirect root url to index.html which will extend from base.html.

So the base.html looks like this:


{% block doctype %}>!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN""http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">{% endblock %}

{% load i18n %}

<html>
<head>
    <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
    <title>{% block title %}simple document{% endblock %}</title>
    {% block extra_head %}{% endblock %}
</head>
<body>
    <div id="header">{% block header %}{% endblock %}</div>
    <div id="content">{% block content %}{% endblock %}</div>
    <div id="footer">{% block footer %}{% endblock %}</div>
</body>
</html>

The index.html looks like this:


{% extends "base.html" %}
{% load i18n weather %}
{% block content %}
    {% get_current_weather in "berlin" %}
{% endblock %}

And also we'll need an extension in the urls.py:


from django.conf.urls.defaults import *
from django.contrib import admin

admin.autodiscover()

urlpatterns = patterns("",
    (
        r"^$",
        "django.views.generic.simple.direct_to_template",
        {'template': "index.html"},
        ),
    (r"^admin/", include(admin.site.urls)),
    )

When I run the development server and go to http://127.0.0.1:8000/, I see this:

It's time for the graphs! The end of this tutorial will be published here tomorrow.

2009-10-14

Weather App Tutorial. Part 1 of 5. Preparation

It is Blog Action Day tomorrow so I decided to participate in it as a blogger again. The theme for this year is "Climate Change". After a little brainstorm, I came up with an idea to write a tutorial how to build an app which regularly checks local weather and lets you compare different weather features for months of different years.

I'll use Yahoo! Weather to check the current weather. From all information that it provides, we'll be mostly interested in temperature, humidity, wind speed, and visibility in the current location. The app will check the weather regularly, will allow you to show the current weather, and also provide a graph comparing average monthly weathers throughout years.

So let's start with the new project. I have quite a clean computer and want to do the app the nice way. So first of all, I will install virtualenv to be able to install third-party python libraries in a closed environment which will only be used for this project (I have already installed setuptools and django).


# install virtualenv
sudo easy_install virtualenv

I created a directory Projects in my home directory and cd to it.

Let's create a virtual environment and start the new project and app.


# create virtual environment "climate_change_env"
virtualenv climate_change_env
cd climate_change_env/
# activate the environment
source bin/activate

Since now I see (climate_change_env) as a prefix to each command line. I'll type deactivate at some point later to get out of this virtual environment.


# create django project "blog_action_day_2009"
django-admin.py startproject blog_action_day_2009
cd blog_action_day_2009/
# create django app "climate_change"
django-admin.py startapp climate_change
# create folders for templates and media
mkdir templates media

To get started quickly, I will use sqlite3 database now. As I am using python 2.5, I don't need to install sqlite3 module, because it's already there. So I open the settings.py and define those settings:

import os
PROJECT_DIR = os.path.dirname(__file__)
# ...
DATABASE_ENGINE = "sqlite3"
DATABASE_NAME = "blog_action_day_2009.sqlite3"
# ...
MEDIA_ROOT = os.path.join(PROJECT_DIR, "media")
MEDIA_URL = "/media/"
ADMIN_MEDIA_PREFIX = "/admin/media/"
# ...
TEMPLATE_DIRS = [
    os.path.join(PROJECT_DIR, "templates"),
    ]
# ...
INSTALLED_APPS = (
    "django.contrib.auth",
    "django.contrib.contenttypes",
    "django.contrib.sessions",
    "django.contrib.sites",
    "django.contrib.admin",
    "climate_change",
    )

Now we'll need those parts:

Models for locations and imported weather information.
Management command for importing data from Yahoo! Weather.
Template tag for displaying latest weather.
View for displaying a graph.

But I'll continue about that tomorrow.

To end today's post, let's watch a video about a regular guy who talks about weather when he has nothing to say:

2008-09-13

Gotchas about Fixtures for Initial Data

One part of Django testing environment is fixtures. They are pre-prepared datasets to fill into the database before running tests. Django uses a separate database for running tests, but the fixtures from the files initial_data.* are also loaded into the main database when you synchronize it.

I make fixtures for my application like this:

python manage.py dumpdata --format=json --indent=4 myapp > apps/myapp/fixtures/initial_data.json

The initial data is read out from apps/myapp/fixtures/initial_data.json and written to the main or the test database when I synchronize the database

python manage.py syncdb

or when I run the tests

python manage.py test myapp

Fixtures are great for deploying standard data like predefined categories, lists of countries and languages, default flatpages, default navigation, default user groups, and so on. However, you should be very cautious with them while developing.

When I create new models, it's common practice to me to sync db so that the new database tables are created and the database schema reflects the models. Just after creation of new tables all fixtures called initial_data from all applications will be loaded. N.B. The fixtures from initial_data overwrite all existing data while synchronizing database. So if you have some important data that differs from the defaults, better make some database backup before syncing or use sql command to print out the SQL statements and execute them for the database manually:

python manage.py sql myapp_with_new_models

You might have pre_save signal handlers or custom save methods (check an example below) which should recognize newly created objects and do something special with them, i.e. prepare PDF reports, generate images, send emails, index for global text search, or something else. Usually in such cases I checked the existence of the primary key: the object is new if it has no primary key. But this is wrong when you use fixtures, because fixtures come with primary keys. N.B. The object is new only if there is no object in the database which primary key equals to the primary key of the current object.

class MyModel(models.Model):
    ...
    def save(self, *args, **kwargs):
        is_new = True
        pk = self._get_pk_val()
        model = type(self)
        if pk and model._default_manager.filter(pk=pk):
            is_new = False
        # something before saving
        super(model, self).save(*args, **kwargs)
        # something after saving

aka

class MyModel(models.Model):
    ...
    def save(self, *args, **kwargs):
        is_new = True
        if self.id and MyModel.objects.filter(id=self.id):
            is_new = False
        # something before saving
        super(MyModel, self).save(*args, **kwargs)
        # something after saving

Another alternative for storing default data would be custom sql located at apps/myapp/sql/mymodel.sql, but I haven't tried that yet and probably won't.

BTW, happy programmer day!

2008-09-03

A Note on Python Paths

This time I decided to share some knowledge about Python paths which seemed a little bit confusing to me in the beginning of diving into Python. I am working with Django in different platforms like Mac OS X, Windows, and Linux, therefore the common patterns how to activate new python modules in all of those environments should be familiar to me.

Python modules are either *.py files or directories containing __init__.py. When defining paths to python modules, you will usually need to deal with the latter ones. A module is meant to be under python path if you can run python and import that module.

For example, if you can run the following, then django is under your python path.

python
>>> import django

Stay tuned to get deeper into python paths.

Installing modules

If a module is installable, usually all you need to do is to extract its setup directory, cd to it, and run

python setup.py install

This will copy the module into the site-packages directory of the current python installation. It might be that you have multiple Python versions on your computer. According to django documentation, you can find the currently used site-packages by

python -c "from distutils.sysconfig import get_python_lib; print get_python_lib()"

Or you can use PEAK EasyInstall for installing python modules even faster.

But sometimes you will need the latest and greatest versions of your modules directly from version control system. To make them accessible from python you should either check them out directly to site-packages (very messy and inflexible) or keep them somewhere else and do some additional magic.

Sym-linking

You can create symbolic links (symlinks) in unix-based systems like Linux or Mac OS X. A symlink is like a shortcut to a file or directory. If you create a symlink in site-packages which points to a python module which is located somewhere else, it will work as if the module was copied into site-packages.

To create a symlink, type the following in a console/terminal:

ln -s <source> <target>

For example, if you want python to access django which is under /Library/Subversion/django_src/trunk/django, you need to write something like this (considering that your site-packages are at /Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/site-packages/)

ln -s /Library/Subversion/django_src/trunk/django /Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/site-packages/django

To delete the symlink, simply remove it (this won't delete the original module):

rm /Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/site-packages/django

But as I've already mentioned, this works only in unix-based environments and you can't use shortcuts in Windows for the same purpose.

`*.pth` files

Python supports *.pth files which contain the paths to the parent directories of your python modules one per line. Those files should be placed in site-packages and can be called whatever you want them to call. For example, you can create a file my-project.pth and write

/Library/Subversion/django_src/trunk
/Library/Subversion/myproject_src/trunk

C:\Subversion\django_src\trunk
C:\Subversion\myproject\trunk

into it. Then django and your project files will be importable in python.

However, you might have no permissions to create files under site-packages or you might need to activate different locations of python modules for different projects.

`PYTHONPATH` variable

The other way is to set additional paths for python just before running the externally kept modules. This is done by setting the python paths to the environment variable PYTHONPATH. Note again that python paths point not to the modules themselves, but to their parent directories!

The syntax slightly differs among different platforms.

Linux and Mac OS X:

# checking value
echo $PYTHONPATH
# setting value
export PYTHONPATH="/Library/Subversion/django_src/trunk"
# appending to the existing value
export PYTHONPATH="$PYTHONPATH;/Library/Subversion/django_src/trunk"

Windows:

# checking value
echo %PYTHONPATH%
# setting value
set PYTHONPATH="C:\\Subversion\\django_src\\trunk"
# appending to the existing value
set PYTHONPATH="%PYTHONPATH%;C:\\Subversion\\django_src\\trunk"

Multiple paths can be separated by a colon (";").

PYTHONPATH can be used in scripts and webserver configuration files, but it is not very comfortable in daily use.

Adding paths to `sys.path`

For the projects that you develop and which should run as standalone applications, you can set the required python paths relatively inside your python code.

Note that all python paths which you set in the PYTHONPATH variable or *.pth files as well as the path of default python libraries and the path of site-packages get listed in python variable sys.path. When you import a module, it is loaded from the first location which contains the required module. So if you have two paths to different django versions in your python paths and you import django, the django version from the first location will be used.

You can read the list of loaded python paths like this:

>>> import sys
>>> sys.path

You can also freely modify it, for example:

>>> import sys
>>> sys.path.append("/Library/Subversion/django_src/trunk")
>>> import django

And this is an example, how to get and use paths relative to the currently loaded file:

import os, sys

SVN_PATH = os.path.abspath(os.path.join(os.path.dirname(__file__), "..", ".."))
DJANGO_PATH = os.path.join(SVN_PATH, "django_src", "trunk")
PROJECT_PATH = os.path.join(SVN_PATH, "myproject", "trunk")

sys.path += [DJANGO_PATH, PROJECT_PATH]

I hope this introduction was useful for developers and made the big picture about the paths clearer.

Some more related information can be found at the official python documentation.