2008-09-13

Gotchas about Fixtures for Initial Data

One part of Django testing environment is fixtures. They are pre-prepared datasets to fill into the database before running tests. Django uses a separate database for running tests, but the fixtures from the files initial_data.* are also loaded into the main database when you synchronize it.

I make fixtures for my application like this:
python manage.py dumpdata --format=json --indent=4 myapp > apps/myapp/fixtures/initial_data.json 


The initial data is read out from apps/myapp/fixtures/initial_data.json and written to the main or the test database when I synchronize the database
python manage.py syncdb

or when I run the tests
python manage.py test myapp


Fixtures are great for deploying standard data like predefined categories, lists of countries and languages, default flatpages, default navigation, default user groups, and so on. However, you should be very cautious with them while developing.

When I create new models, it's common practice to me to sync db so that the new database tables are created and the database schema reflects the models. Just after creation of new tables all fixtures called initial_data from all applications will be loaded. N.B. The fixtures from initial_data overwrite all existing data while synchronizing database. So if you have some important data that differs from the defaults, better make some database backup before syncing or use sql command to print out the SQL statements and execute them for the database manually:
python manage.py sql myapp_with_new_models


You might have pre_save signal handlers or custom save methods (check an example below) which should recognize newly created objects and do something special with them, i.e. prepare PDF reports, generate images, send emails, index for global text search, or something else. Usually in such cases I checked the existence of the primary key: the object is new if it has no primary key. But this is wrong when you use fixtures, because fixtures come with primary keys. N.B. The object is new only if there is no object in the database which primary key equals to the primary key of the current object.

class MyModel(models.Model):
...
def save(self, *args, **kwargs):
is_new = True
pk = self._get_pk_val()
model = type(self)
if pk and model._default_manager.filter(pk=pk):
is_new = False
# something before saving
super(model, self).save(*args, **kwargs)
# something after saving


aka

class MyModel(models.Model):
...
def save(self, *args, **kwargs):
is_new = True
if self.id and MyModel.objects.filter(id=self.id):
is_new = False
# something before saving
super(MyModel, self).save(*args, **kwargs)
# something after saving


Another alternative for storing default data would be custom sql located at apps/myapp/sql/mymodel.sql, but I haven't tried that yet and probably won't.

BTW, happy programmer day!

2008-09-03

A Note on Python Paths

This time I decided to share some knowledge about Python paths which seemed a little bit confusing to me in the beginning of diving into Python. I am working with Django in different platforms like Mac OS X, Windows, and Linux, therefore the common patterns how to activate new python modules in all of those environments should be familiar to me.

Python modules are either *.py files or directories containing __init__.py. When defining paths to python modules, you will usually need to deal with the latter ones. A module is meant to be under python path if you can run python and import that module.

For example, if you can run the following, then django is under your python path.
python
>>> import django


Stay tuned to get deeper into python paths.

Installing modules



If a module is installable, usually all you need to do is to extract its setup directory, cd to it, and run
python setup.py install

This will copy the module into the site-packages directory of the current python installation. It might be that you have multiple Python versions on your computer. According to django documentation, you can find the currently used site-packages by
python -c "from distutils.sysconfig import get_python_lib; print get_python_lib()"

Or you can use PEAK EasyInstall for installing python modules even faster.

But sometimes you will need the latest and greatest versions of your modules directly from version control system. To make them accessible from python you should either check them out directly to site-packages (very messy and inflexible) or keep them somewhere else and do some additional magic.

Sym-linking



You can create symbolic links (symlinks) in unix-based systems like Linux or Mac OS X. A symlink is like a shortcut to a file or directory. If you create a symlink in site-packages which points to a python module which is located somewhere else, it will work as if the module was copied into site-packages.

To create a symlink, type the following in a console/terminal:
ln -s <source> <target>

For example, if you want python to access django which is under /Library/Subversion/django_src/trunk/django, you need to write something like this (considering that your site-packages are at /Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/site-packages/)
ln -s /Library/Subversion/django_src/trunk/django /Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/site-packages/django

To delete the symlink, simply remove it (this won't delete the original module):
rm /Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/site-packages/django


But as I've already mentioned, this works only in unix-based environments and you can't use shortcuts in Windows for the same purpose.

*.pth files



Python supports *.pth files which contain the paths to the parent directories of your python modules one per line. Those files should be placed in site-packages and can be called whatever you want them to call. For example, you can create a file my-project.pth and write
/Library/Subversion/django_src/trunk
/Library/Subversion/myproject_src/trunk

or
C:\Subversion\django_src\trunk
C:\Subversion\myproject\trunk

into it. Then django and your project files will be importable in python.

However, you might have no permissions to create files under site-packages or you might need to activate different locations of python modules for different projects.

PYTHONPATH variable



The other way is to set additional paths for python just before running the externally kept modules. This is done by setting the python paths to the environment variable PYTHONPATH. Note again that python paths point not to the modules themselves, but to their parent directories!

The syntax slightly differs among different platforms.

Linux and Mac OS X:
# checking value
echo $PYTHONPATH
# setting value
export PYTHONPATH="/Library/Subversion/django_src/trunk"
# appending to the existing value
export PYTHONPATH="$PYTHONPATH;/Library/Subversion/django_src/trunk"

Windows:
# checking value
echo %PYTHONPATH%
# setting value
set PYTHONPATH="C:\\Subversion\\django_src\\trunk"
# appending to the existing value
set PYTHONPATH="%PYTHONPATH%;C:\\Subversion\\django_src\\trunk"


Multiple paths can be separated by a colon (";").

PYTHONPATH can be used in scripts and webserver configuration files, but it is not very comfortable in daily use.

Adding paths to sys.path



For the projects that you develop and which should run as standalone applications, you can set the required python paths relatively inside your python code.

Note that all python paths which you set in the PYTHONPATH variable or *.pth files as well as the path of default python libraries and the path of site-packages get listed in python variable sys.path. When you import a module, it is loaded from the first location which contains the required module. So if you have two paths to different django versions in your python paths and you import django, the django version from the first location will be used.

You can read the list of loaded python paths like this:
>>> import sys
>>> sys.path


You can also freely modify it, for example:
>>> import sys
>>> sys.path.append("/Library/Subversion/django_src/trunk")
>>> import django


And this is an example, how to get and use paths relative to the currently loaded file:
import os, sys

SVN_PATH = os.path.abspath(os.path.join(os.path.dirname(__file__), "..", ".."))
DJANGO_PATH = os.path.join(SVN_PATH, "django_src", "trunk")
PROJECT_PATH = os.path.join(SVN_PATH, "myproject", "trunk")

sys.path += [DJANGO_PATH, PROJECT_PATH]



I hope this introduction was useful for developers and made the big picture about the paths clearer.

Some more related information can be found at the official python documentation.