2019-02-15

How to Export Data to XLSX Files

A while ago I wrote an article about exporting data to different spreadsheet formats. As recently I was reimplementing export to Excel for the 1st things 1st project, I noticed that the API changed a little, so it's time to blog about that again.

For Excel export I am using the XLSX file format which is a zipped XML-based format for spreadsheets with formatting support. XLSX files can be opened with Microsoft Excel, Apache OpenOffice, Apple Numbers, LibreOffice, Google Drive, and a handful of other applications. For building the XLSX file I am using openpyxl library.

Installing openpyxl

You can install openpyxl to your virtual environment the usual way with pip:

(venv) pip install openpyxl==2.6.0

Simplest Export View

To create a function exporting data from a QuerySet to XLSX file, you would need to create a view that returns a response with a special content type and file content as an attachment. Plug that view to URL rules and then link it from an export button in a template.

Probably the simplest view that generates XLSX file out of Django QuerySet would be this:

# movies/views.py
from datetime import datetime
from datetime import timedelta
from openpyxl import Workbook
from django.http import HttpResponse

from .models import MovieCategory, Movie

def export_movies_to_xlsx(request):
    """
    Downloads all movies as Excel file with a single worksheet
    """
    movie_queryset = Movie.objects.all()
    
    response = HttpResponse(
        content_type='application/vnd.openxmlformats-officedocument.spreadsheetml.sheet',
    )
    response['Content-Disposition'] = 'attachment; filename={date}-movies.xlsx'.format(
        date=datetime.now().strftime('%Y-%m-%d'),
    )
    workbook = Workbook()
    
    # Get active worksheet/tab
    worksheet = workbook.active
    worksheet.title = 'Movies'

    # Define the titles for columns
    columns = [
        'ID',
        'Title',
        'Description',
        'Length',
        'Rating',
        'Price',
    ]
    row_num = 1

    # Assign the titles for each cell of the header
    for col_num, column_title in enumerate(columns, 1):
        cell = worksheet.cell(row=row_num, column=col_num)
        cell.value = column_title

    # Iterate through all movies
    for movie in movie_queryset:
        row_num += 1
        
        # Define the data for each cell in the row 
        row = [
            movie.pk,
            movie.title,
            movie.description,
            movie.length_in_minutes,
            movie.rating,
            movie.price,
        ]
        
        # Assign the data for each cell of the row 
        for col_num, cell_value in enumerate(row, 1):
            cell = worksheet.cell(row=row_num, column=col_num)
            cell.value = cell_value

    workbook.save(response)

    return response

If you try this, you will notice, that there is no special formatting in it, all columns are of the same width, the value types are barely recognized, the header is displayed the same as the content. This is enough for further data export to CSV or manipulation with pandas. But if you want to present the data for the user in a friendly way, you need to add some magic.

Creating More Worksheets

By default, each Excel file has one worksheet represented as a tab. You can access it with:

worksheet = workbook.active
worksheet.title = 'The New Tab Title'

If you want to create tabs dynamically with data from the database of Python structures, you can at first delete the current tab and add the others with:

workbook.remove(workbook.active)

for index, category in enumerate(category_queryset):
    worksheet = workbook.create_sheet(
        title=category.title,
        index=index,
    )

Although not all spreadsheet applications support this, you can set the background color of the worksheet tab with:

worksheet.sheet_properties.tabColor = 'f7f7f9'

Working with Cells

Each cell can be accessed by its 1-based indexes for the rows and for the columns:

top_left_cell = worksheet.cell(row=1, column=1)
top_left_cell.value = "This is good!"

Styles and formatting are applied to individual cells instead of rows or columns. There are several styling categories with multiple configurations for each of them. You can find some available options from the documentation, but even more by exploring the source code.

from openpyxl.styles import Font, Alignment, Border, Side, PatternFill

top_left_cell.font = Font(name='Calibri', bold=True)
top_left_cell.alignment = Alignment(horizontal='center')
top_left_cell.border = Border(
    bottom=Side(border_style='medium', color='FF000000'),
)
top_left_cell.fill = PatternFill(
    start_color='f7f7f9',
    end_color='f7f7f9',
    fill_type='solid',
)

If you are planning to have multiple styled elements, instantiate the font, alignment, border, fill options upfront and then assign the instances to the cell attributes. Otherwise, you can get into memory issues when you have a lot of data entries.

Setting Column Widths

If you want to have some wider or narrower width for some of your columns, you can do this by modifying column dimensions. They are accessed by column letter which can be retrieved using a utility function:

from openpyxl.utils import get_column_letter

column_letter = get_column_letter(col_num)
column_dimensions = worksheet.column_dimensions[column_letter]
column_dimensions.width = 40

The units here are some relative points depending on the width of the letters in the specified font. I would suggest playing around with the width value until you find what works for you.

When defining column width is not enough, you might want to wrap text into multiple lines so that everything can be read by people without problems. This can be done with the alignment setting for the cell as follows:

from openpyxl.styles import Alignment

wrapped_alignment = Alignment(vertical='top', wrap_text=True)
cell.alignment = wrapped_alignment

Data Formatting

Excel automatically detects text or number types and aligns text to the left and numbers to the right. If necessary that can be overwritten.

There are some gotchas on how to format cells when you need a percentage, prices, or time durations.

Percentage

For percentage, you have to pass the number in float format from 0.0 till 1.0 and style should be 'Percent' as follows:

cell.value = 0.75
cell.style = 'Percent'

Currency

For currency, you need values of Decimal format, the style should be 'Currency', and you will need a special number format for currency other than American dollars, for example:

from decimal import Decimal
cell.value = Decimal('14.99')
cell.style = 'Currency'
cell.number_format = '#,##0.00 €'

Durations

For time duration, you have to pass timedelta as the value and define special number format:

from datetime import timedelta

cell.value = timedelta(minutes=90)
cell.number_format = '[h]:mm;@'

This number format ensures that your duration can be greater than '23:59', for example, '140:00'.

Freezing Rows and Columns

In Excel, you can freeze rows and columns so that they stay fixed when you scroll the content vertically or horizontally. That's similar to position: fixed in CSS.

To freeze the rows and columns, locate the top-left cell that is below the row that you want to freeze and is on the right from the column that you want to freeze. For example, if you want to freeze one row and one column, the cell would be 'B2'. Then run this:

worksheet.freeze_panes = worksheet['B2']

Fully Customized Export View

So having the knowledge of this article now we can build a view that creates separate sheets for each movie category. Each sheet would list movies of the category with titles, descriptions, length in hours and minutes, rating in percent, and price in Euros. The tabs, as well as the headers, can have different background colors for each movie category. Cells would be well formatted. Titles and descriptions would use multiple lines to fully fit into the cells.

# movies/views.py
from datetime import datetime
from datetime import timedelta
from openpyxl import Workbook
from openpyxl.styles import Font, Alignment, Border, Side, PatternFill
from openpyxl.utils import get_column_letter
from django.http import HttpResponse

from .models import MovieCategory, Movie

def export_movies_to_xlsx(request):
    """
    Downloads all movies as Excel file with a worksheet for each movie category
    """
    category_queryset = MovieCategory.objects.all()
    
    response = HttpResponse(
        content_type='application/vnd.openxmlformats-officedocument.spreadsheetml.sheet',
    )
    response['Content-Disposition'] = 'attachment; filename={date}-movies.xlsx'.format(
        date=datetime.now().strftime('%Y-%m-%d'),
    )
    workbook = Workbook()
    
    # Delete the default worksheet
    workbook.remove(workbook.active)

    # Define some styles and formatting that will be later used for cells
    header_font = Font(name='Calibri', bold=True)
    centered_alignment = Alignment(horizontal='center')
    border_bottom = Border(
        bottom=Side(border_style='medium', color='FF000000'),
    )
    wrapped_alignment = Alignment(
        vertical='top',
        wrap_text=True
    )

    # Define the column titles and widths
    columns = [
        ('ID', 8),
        ('Title', 40),
        ('Description', 80),
        ('Length', 15),
        ('Rating', 15),
        ('Price', 15),
    ]
    
    # Iterate through movie categories
    for category_index, category in enumerate(category_queryset):
        # Create a worksheet/tab with the title of the category
        worksheet = workbook.create_sheet(
            title=category.title,
            index=category_index,
        )
        # Define the background color of the header cells
        fill = PatternFill(
            start_color=category.html_color,
            end_color=category.html_color,
            fill_type='solid',
        )
        row_num = 1
        
        # Assign values, styles, and formatting for each cell in the header
        for col_num, (column_title, column_width) in enumerate(columns, 1):
            cell = worksheet.cell(row=row_num, column=col_num)
            cell.value = column_title
            cell.font = header_font
            cell.border = border_bottom
            cell.alignment = centered_alignment
            cell.fill = fill
            # set column width
            column_letter = get_column_letter(col_num)
            column_dimensions = worksheet.column_dimensions[column_letter]
            column_dimensions.width = column_width

        # Iterate through all movies of a category
        for movie in category.movie_set.all():
            row_num += 1
            
            # Define data and formats for each cell in the row
            row = [
                (movie.pk, 'Normal'),
                (movie.title, 'Normal'),
                (movie.description, 'Normal'),
                (timedelta(minutes=movie.length_in_minutes), 'Normal'),
                (movie.rating / 100, 'Percent'),
                (movie.price, 'Currency'),
            ]
            
            # Assign values, styles, and formatting for each cell in the row
            for col_num, (cell_value, cell_format) in enumerate(row, 1):
                cell = worksheet.cell(row=row_num, column=col_num)
                cell.value = cell_value
                cell.style = cell_format
                if cell_format == 'Currency':
                    cell.number_format = '#,##0.00 €'
                if col_num == 4:
                    cell.number_format = '[h]:mm;@'
                cell.alignment = wrapped_alignment

        # freeze the first row
        worksheet.freeze_panes = worksheet['A2']
        
        # set tab color
        worksheet.sheet_properties.tabColor = category.html_color

    workbook.save(response)

    return response

The Takeaways

  • Spreadsheet data can be used for further mathematical processing with pandas.
  • XLSX file format allows quite a bunch of formatting options that can make your spreadsheet data more presentable and user-friendly.
  • To see Excel export in action, go to 1st things 1st, log in as a demo user, and navigate to project results where you can export them as XLSX. Feedback is always welcome.

Cover photo by Tim Evans.

No comments:

Post a Comment