2013-12-18

How to Export Your Data as CSV, XLS, or XLSX

There are times, when you need to export the data from your database to different formats. For example, you want to create some diagrams in Office program for a presentation. In this post I will show you how to create admin actions which export selected items as files for a spreadsheet application (like MS Excel, OpenOffice Calc, LibreOffice Calc, Gnumeric, or Numbers). I will cover the mostly used formats: Comma Separated Values (CSV), Binary Excel (XLS), and Office Open XML (XLSX).

First of all, have a look at the model we will be dealing with. It's a simple model with title, description, and - of course - the id.

# models.py
from django.db import models
class MyModel(models.Model):
    title = models.CharField(max_length=100)
    description = models.TextField(blank=True)

In the admininstration options, we'll define three admin actions: export_csv, export_xls, and export_xlsx.

# admin.py
from django.contrib import admin
from models import MyModel

# ... export functions will go here ...

class MyModelAdmin(admin.ModelAdmin):
    actions = [export_csv, export_xls, export_xlsx]

admin.site.register(MyModel, MyModelAdmin)

Now let's create functions for each of those actions!

Comma-Separated Values Format

CSV is the most common import and export format for spreadsheets and databases. It's a textual format which one could easily create or parse himself, but there is also a python built-in library csv for handy data manipulation.

def export_csv(modeladmin, request, queryset):
    import csv
    from django.utils.encoding import smart_str
    response = HttpResponse(content_type='text/csv')
    response['Content-Disposition'] = 'attachment; filename=mymodel.csv'
    writer = csv.writer(response, csv.excel)
    response.write(u'\ufeff'.encode('utf8')) # BOM (optional...Excel needs it to open UTF-8 file properly)
    writer.writerow([
        smart_str(u"ID"),
        smart_str(u"Title"),
        smart_str(u"Description"),
    ])
    for obj in queryset:
        writer.writerow([
            smart_str(obj.pk),
            smart_str(obj.title),
            smart_str(obj.description),
        ])
    return response
export_csv.short_description = u"Export CSV"

As you can see, HttpResponse is a file-like object and we used it to write data to.

Excel Binary File Format

XLS is the main spreadsheet format which holds data in worksheets, charts, and macros. We are going to use xlwt library to create a spreadsheet. There is analogous library xlrd to read XLS files. Note, that this format allows to have only 256 columns.

def export_xls(modeladmin, request, queryset):
    import xlwt
    response = HttpResponse(content_type='application/ms-excel')
    response['Content-Disposition'] = 'attachment; filename=mymodel.xls'
    wb = xlwt.Workbook(encoding='utf-8')
    ws = wb.add_sheet("MyModel")
    
    row_num = 0
    
    columns = [
        (u"ID", 2000),
        (u"Title", 6000),
        (u"Description", 8000),
    ]

    font_style = xlwt.XFStyle()
    font_style.font.bold = True

    for col_num in xrange(len(columns)):
        ws.write(row_num, col_num, columns[col_num][0], font_style)
        # set column width
        ws.col(col_num).width = columns[col_num][1]

    font_style = xlwt.XFStyle()
    font_style.alignment.wrap = 1
    
    for obj in queryset:
        row_num += 1
        row = [
            obj.pk,
            obj.title,
            obj.description,
        ]
        for col_num in xrange(len(row)):
            ws.write(row_num, col_num, row[col_num], font_style)
            
    wb.save(response)
    return response
    
export_xls.short_description = u"Export XLS"

Here we created one worksheet, filled it with data, marked the first row in bold, and made the lines in the other cells wrapped. Also we set the width for each column. We'll do the same in the next format too.

Office Open XML Format

XLSX (a.k.a. OOXML or OpenXML) is a zipped, XML-based file format developed by Microsoft. It is fully supported by Microsoft Office 2007 and newer versions. OpenOffice 4.0, for example, can only read it. There is a python library openpyxl for reading and writing those files. This format is great when you need more than 256 columns and text formatting options.

def export_xlsx(modeladmin, request, queryset):
    import openpyxl
    from openpyxl.cell import get_column_letter
    response = HttpResponse(content_type='application/vnd.openxmlformats-officedocument.spreadsheetml.sheet')
    response['Content-Disposition'] = 'attachment; filename=mymodel.xlsx'
    wb = openpyxl.Workbook()
    ws = wb.get_active_sheet()
    ws.title = "MyModel"

    row_num = 0

    columns = [
        (u"ID", 15),
        (u"Title", 70),
        (u"Description", 70),
    ]

    for col_num in xrange(len(columns)):
        c = ws.cell(row=row_num + 1, column=col_num + 1)
        c.value = columns[col_num][0]
        c.style.font.bold = True
        # set column width
        ws.column_dimensions[get_column_letter(col_num+1)].width = columns[col_num][1]

    for obj in queryset:
        row_num += 1
        row = [
            obj.pk,
            obj.title,
            obj.description,
        ]
        for col_num in xrange(len(row)):
            c = ws.cell(row=row_num + 1, column=col_num + 1)
            c.value = row[col_num]
            c.style.alignment.wrap_text = True

    wb.save(response)
    return response

export_xlsx.short_description = u"Export XLSX"

Conclusion

So whenever you need to get your Django project data to some spreadsheet application, there are several ways to do that. If you are planning to import the data to some other database, CSV is probably the best, as it is simple, straightforward, and requires no third-party libraries. However, if you need your data with nice formatting and maybe some statistical formulas, you should export XLS or XLSX format. The maximum amount of columns in XLS format is limited to 256, whereas XLSX format allows more columns, but is not fully supported by all spreadsheet applications.

2013-12-07

How to Store Your Media Files in Amazon S3 Bucket

In this article, I will show you how to use Amazon Simple Storage Service (S3) to store your media files in the cloud. S3 is known and widely used for its scalability, reliability, and relatively cheap price. It is free to join and you only pay the hosting and bandwidth costs as you use it. The service is provided by Amazon.com. S3 tends to be attractive for start-up companies looking to minimize costs.

S3 uses a concept of buckets which is like a storage database. Each bucket has its own url. Inside the buckets you have folders and under that you have files. In fact, directories don't actually exist within S3 buckets. The entire file structure is actually just one flat single-level container of files. The illusion of directories is actually created based on having the file names like dirA/dirB/file.

If you want to browse the files in a folder-like structure, you can use Transmit FTP client on Mac OS X. It supports S3 services. Amazon browser-based console also has interface for browsing or uploading files.

OK. Now let's have a look how to set up a Django project which will use S3 for media files.

1. Create a bucket

At first you will need to create a bucket at S3 and make it accessible for all visitors. Login to your Amazon Web Services console. Click on "Services" in the menu, then on the "S3". Click on the button "Create bucket" and enter your bucket name like "mywebsite.media" or even better without dots like "mywebsitemedia". Choose a region there which is the closest to your target audience, for example, if your website is for Europeans, choose "Ireland". Go to the properties of the bucket, expand "Permissions" section, click on the "add bucket policy" button and enter the following:

{
    "Version": "2008-10-17",
    "Statement": [
        {
            "Sid": "AllowPublicRead",
            "Effect": "Allow",
            "Principal": {
                "AWS": "*"
            },
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::mywebsite.media/*"
        }
    ]
}

2. Install boto and django-storages

Amazon Web Services provide a python library called boto for accessing the API. There is a django app called django-storages which allows to use AWS S3 as the main storage. So your next step is to activate your virtual environment and install latest versions of boto and django-storages.

pip install boto==2.19.0
pip install django-storages==1.1.8

3. Set up django-storages for your project

Add the following to the settings.py:

INSTALLED_APPS = [
    # ...
    'storages'
    # ...
]
DEFAULT_FILE_STORAGE = 'storages.backends.s3boto.S3BotoStorage'
AWS_S3_SECURE_URLS = False       # use http instead of https
AWS_QUERYSTRING_AUTH = False     # don't add complex authentication-related query parameters for requests
AWS_S3_ACCESS_KEY_ID = '...'     # enter your access key id
AWS_S3_SECRET_ACCESS_KEY = '...' # enter your secret access key
AWS_STORAGE_BUCKET_NAME = 'mywebsite.media'

# the next monkey patch is necessary if you use dots in the bucket name
import ssl
if hasattr(ssl, '_create_unverified_context'):
   ssl._create_default_https_context = ssl._create_unverified_context

4. Create your models with FileField or ImageField

Let's create a Profile model with avatar field.

def upload_avatar_to(instance, filename):
    import os
    from django.utils.timezone import now
    filename_base, filename_ext = os.path.splitext(filename)
    return 'profiles/%s%s' % (
        now().strftime("%Y%m%d%H%M%S"),
        filename_ext.lower(),
    )

class Profile(models.Model):
    # ...
    avatar = models.ImageField(_("Avatar"), upload_to=upload_avatar_to, blank=True)
    # ...

Whenever you save an instance of the Profile with the new avatar picture, avatar will be uploaded to S3 bucket. To show it in a template, you will need something like <img src="{{ profile.avatar.url }}" alt="" /> where the image source will look like "http://mywebsite.media.s3.amazonaws.com/profiles/20140214203012.jpg".

5. Use the storage to manipulate file versions

If you need to create a thumbnail version of your image, it's probably best to overwrite the save method of the model and trigger the generation of the thumbs there. Let's add some methods to the Profile class:

class Profile(models.Model):
    # ...

    def save(self, *args, **kwargs):
        super(Profile, self).save(*args, **kwargs)
        self.create_avatar_thumb()

    def create_avatar_thumb(self):
        import os
        from PIL import Image
        from django.core.files.storage import default_storage as storage
        if not self.avatar:
            return ""
        file_path = self.avatar.name
        filename_base, filename_ext = os.path.splitext(file_path)
        thumb_file_path = "%s_thumb.jpg" % filename_base
        if storage.exists(thumb_file_path):
            return "exists"
        try:
            # resize the original image and return url path of the thumbnail
            f = storage.open(file_path, 'r')
            image = Image.open(f)
            width, height = image.size

            if width > height:
                delta = width - height
                left = int(delta/2)
                upper = 0
                right = height + left
                lower = height
            else:
                delta = height - width
                left = 0
                upper = int(delta/2)
                right = width
                lower = width + upper

            image = image.crop((left, upper, right, lower))
            image = image.resize((50, 50), Image.ANTIALIAS)

            f_thumb = storage.open(thumb_file_path, "w")
            image.save(f_thumb, "JPEG")
            f_thumb.close()
            return "success"
        except:
            return "error"

    def get_avatar_thumb_url(self):
        import os
        from django.core.files.storage import default_storage as storage
        if not self.avatar:
            return ""
        file_path = self.avatar.name
        filename_base, filename_ext = os.path.splitext(file_path)
        thumb_file_path = "%s_thumb.jpg" % filename_base
        if storage.exists(thumb_file_path):
            return storage.url(thumb_file_path)
        return ""

As you might have guessed, the avatar can be placed in the template using something like

{% if profile.get_avatar_thumb_url %}
    <img src="{{ profile.get_avatar_thumb_url }}" alt="" />
{% endif %}

where the image source will look like "http://mywebsite.media.s3.amazonaws.com/profiles/20140214203012_thumb.jpg".

Conclusion

When you have a basic overview about Amazon Simple Storage Service, it is quite easy to use it in Django projects with existing third-party libraries. For flexibility, if you need to modify uploaded files, storage object should be used instead of the default os methods. That way, you can simply switch to the default file storage for local development.

2013-09-22

How to Create a File Upload Widget

Ajaxified file uploads are becoming de facto standard on the web. People want to see what they chose right after selecting a file instead of during submit of the form. Also if a form has validation errors, nobody wants to select the files again; the selection made should be still available in the form. Therefore, we need a solution that works by Ajax.

Luckily there is a Django app that can help. It's called django-ajax-uploader. Ajax uploader allows to create asynchronous single and multiple file uploads. All uploads are done into the uploads directory. In addition to the default functionality we need a possibility to show the preview of the temporary uploaded file, translate all messages and buttons, delete a file, move a file to a specific directory. I created a demo project which implements all that.

If you have a ModelForm with a FileField or ImageField, you need to not show this field in the form, but instead to add a hidden field for uploaded file path. When you choose a file, it will be uploaded to a temporary uploads directory and its path will be set in the hidden field. Also you need a hidden BooleanField for the state of file deletion. When you submit the form, the file will be moved from the temporary location to the correct location defined by the FileField or ImageField.

In the given example, besides django-ajax-uploader, I am using django-crispy-forms with bootstrap 3.0.0 support. It allows you to define the structure for the forms without creating repetitive HTML markup.

This would be a quick workflow of integrating ajax uploads into your project:

  • Install django-ajax-uploader and django-crispy-forms.
  • Put 'crispy_forms' and 'ajaxuploader' into INSTALLED_APPS.
  • Define CRISPY_TEMPLATE_PACK = 'bootstrap3' in the settings.
  • Define the url rule for ajax uploader
    from ajaxuploader.views import AjaxFileUploader
    uploader = AjaxFileUploader()
    urlpatterns = patterns('',
        # ...
        url(r'^helper/ajax-upload/$', uploader, name="ajax_uploader"),
        # ...
    )
    
  • Create a form for your model which will have the ajax uploader.
  • Create the view which will handle your model and the chosen files and call it in your urls.
  • Create the template with appropriate javascript.

For a working example, please check image uploads project.

2013-09-05

7 Powerful Features of PyCharm Editor I Enjoy Using Daily

Not long ago I noticed PyCharm editor's advertisement at Stack Overflow. As it was targeted to Python and Django developers, I got very curious. I tried the 30-day trial and after years of using jEdit, I finally switched to this new editor. Here is a list of features I am using every day and enjoy very much.

1. Separation by project

Every project can be opened in a different window. PyCharm supports and recognizes virtual environments and python paths. In addition, you can mark some directories as source roots to put them under python path. This is useful when you have some dynamical modifications of python paths. When you open a file of a project, it will be shown in a tab. You can split the window vertically or horizontally to organize tabs in groups. It is especially useful, when you need one file for reference (like models.py) when you are editing another file (like detail template). You can open additional files which don't belong to the project for reference too; their tabs will be marked in different color.

2. Syntax highlighting

This is the core feature of any coder's text editor. PyCharm supports syntax highlighting for all different formats that might be used in Django development: Python, HTML templates, CSS, JavaScript, SQL, and many more. As expected, Django template comments are grayed out. Anywhere in the comments "TODO:" and "FIXME:" lines are marked in different color. At the bottom of the editor you can browse through different files with those TODO lines. PyCharm also marks unused imports and undefined variables with the different color. This editor also encourages PEP 8 standard which makes the coding style uniform and easier to understand and modify for different developers.

3. Browsing

Like many other IDEs, PyCharm has a side-bar file browser in a tree structure. For the clear list view, the *.pyc files are not shown. Edited files are marked in different color. You can drag and drop files to different directories there. You can create new files or new python packages (directories with __init__.py files) quickly. Version control systems like Git, Subversion, Mercurial and CVS are supported (although for this purpose, I am using SourceTree). Also one feature I like is "Scroll from Source" which scrolls and activates the file of the active tab. This is very useful when you need to open another file of the same Django app. By middle-clicking on a variable/class/function, you get to the definition/implementation of it. By ⌘⇧O you can search for specific file. Analogously by ⌘O you can go to specific class implementation. When you have simple views, you can go from the view to the template by clicking on an icon and back (unfortunately, this didn't work in a more complex situation, where I used wrapping view functions).

4. Autocompletion

PyCharm tracks python paths and autocompletes almost everywhere: variables, classes, or functions defined or imported in the current file, imports themselves, Django templates, Django shell, Django management commands. Just don't forget to fire Ctrl + Space for autocompletion.

5. Editing shortcuts

Different shortcuts make the coding faster. Press ⌥→ to move the cursor right by one word. Press ⌥⇧→ to select one word from the right. Press ⌥↑ to select one block of code (multiple times to select containing blocks, e.g. from method declaration to class). Press ⇥ to indent the block of code or ⇧⇥ to unindent it. Press ⌘/ to comment out or uncomment a block of code. Press ⌘R to activate the replace dialog. (These where examples of keyboard shortcuts on Mac OS X. Shortcuts on Windows and Linux will/may differ).

6. Code refactoring.

PyCharm has a powerful feature of renaming classes, functions, variables, or modules. I tried it a couple of times in straightforward situations, but I am still a little bit suspicious about its accuracy for large projects. Although it is worth mentioning that in complex situations it shows you the occurrences of classes, functions, variables, or modules in a list for you to confirm.

7. Django management commands

Running schema migration previously required a lot of typing in the Terminal, i.e. changing the directory to the specific virtual environment, activating the virtual environment, changing directory to Django project, running

python manage.py schemamigration appname modificationdone --auto
Now it is as fast as pressing ⌥R, typing "sc", pressing ⇧⏎, typing "appname modificationdone --auto", pressing ⏎ just in the PyCharm IDE. Running development server or restarting can now be done just with one click. Django shell can also be used from within the IDE. So you don't get distracted by different windows and user interfaces while developing.

I am still quite new to PyCharm and I am discovering new useful features every day. In my opinion, the price of PyCharm is worth every Euro, because my coding, I would say, is now 30 to 40% more effective. Finally, there is some official video introduction to the editor from the developers "JetBrains":