2019-10-24

Things I want to remember about SSH

SSH, short for Secure Shell, is a protocol for secure network communications. It is widely used for executing commands on remote servers, and for file uploads or downloads. If you are working with Django, use Git version control, or administrate servers, you surely are using SSH. In this post, I want to share some technical details about it.

Secure Shell is using private and public key pairs. You can either use automatically generated private and public keys combined with a password, or manually generated private and public keys. In the latter case, you need to keep your private key on your computer and upload the public key to the remote server.

Creating a pair of SSH keys manually

If you are using GitHub, Bitbucket, DigitalOcean, or some other service, you might have seen the possibility to upload public SSH keys for direct access to remote servers.

Here is how you usually create the SSH keys on the computer from which you want to establish a secure connection (your local machine or one of your servers that has access to other servers or services). In the Terminal you would execute these commands:

$ ssh-keygen
$ ssh-agent /usr/local/bin/bash
$ ssh-add ~/.ssh/id_rsa

The id_rsa is the name of the default SSH private key. The public key would be id_rsa.pub. And by default they both will be located under ~/.ssh/.

When running ssh-keygen you can choose different key names and even add a passphrase. For instance, you could have github_id_rsa and github_id_rsa.pub keys for communication with GitHub. My recommendation would be for each new service to create a new private-public key pair so that in case you need to transfer your computer's data to a different machine, you could selectively transfer the access to the remote servers.

Also, if you are not using the passphrase for the SSH key pair, I would recommend having your disk encrypted and a secure user password for your computer. If your laptop gets stolen, the thief wouldn't be able to get to your remote servers without knowing your computer's password.

Creating an access to a remote server by SSH key

In the case of GitHub, Bitbucket, and other online services with SSH communication, you usually have to copy the contents of the public key into a text field in a web form.

If you want to create a secure communication by manually generated private-public keys with a server where your Django project is deployed, you should append the contents of the public key to the ~/.ssh/authorized_keys file on the remote server.

To get the content of the public key in the Terminal, you can use:

$ cat ~/.ssh/id_rsa.pub

Then copy the output to the clipboard.

Or on macOS you can run pbcopy as follows:

$ pbcopy < ~/.ssh/id_rsa.pub 

To append the contents of the public key to the remote server, you can do this:

$  echo "...pasted public key...">>~/.ssh/authorized_keys

Creating authorization at a remote server by password

If you want to establish an SSH connection with a password and automatically generated private-public keys, you would need to edit /etc/ssh/sshd_config and ensure these two settings:

PasswordAuthentication yes
PermitEmptyPasswords no

After the change, you would restart the ssh server with the following command:

$ sudo service ssh restart

Also, make sure that the user you are connecting with has a password:

$ sudo passwd the_user

Connecting to a remote server

The default way to connect via SSH to a remote server with a password is executing the following in the Terminal:

$ ssh the_user@example.com

To connect with a private key, you would execute this:

$ ssh -i ~/.ssh/examplecom_id_rsa the_user@example.com

Next, let's see how we can simplify this using some local SSH configuration.

Configuring local SSH client

Edit ~/.ssh/config and add the following lines for each SSH connection that you want to define:

Host examplecom
    HostName example.com
    User the_user
    IdentityFile ~/.ssh/examplecom_id_rsa

If the domain of the website is not yet pointing to the IP address of the server, you can also connect by IP address:

Host examplecom
    HostName 1.2.3.4
    User the_user
    IdentityFile ~/.ssh/examplecom_id_rsa

The following allows you to login to your remote servers by manually generated private-public key with just these lines:

$ ssh examplecom

To request for password instead of using the manually generated keys, you would need to modify the snippet as follows:

Host examplecom
    HostName example.com
    User the_user
    PubkeyAuthentication=no

When you connect via SSH and wait don't type anything for 30 minutes or so, the connection gets lost. But you can require your client to connect to the server every 4 minutes or so by adding the following lines to the beginning of the ~/.ssh/config on your local computer:

Host *
    ServerAliveInterval 240

Uploading and downloading files using SSH connection

Typically, Secure Shell allows you to execute terminal commands on the remote server using bash, zsh, sh, or another shell. But very often, you also need to transfer files securely to and from the server. For that, you have these options: scp command, rsync command, or FTP client with SFTP support.

scp

The scp stands for Secure Copy.

This is how you would copy the secrets.json file from the remote server to your local development environment:

$ scp the_user@example.com:~/src/myproject/myproject/settings/secrets.json ./myproject/settings/secrets.json

Here is an example of the same, but with custom ~/.ssh/config configuration:

$ scp examplecom:~/src/myproject/myproject/settings/secrets.json ./myproject/settings/secrets.json

To copy the file from the local computer to the remote server, you would switch the places of source and target:

$ scp ./myproject/settings/secrets.json examplecom:~/src/myproject/myproject/settings/secrets.json

rsync

To synchronize directories on the server and locally, you can use the rsync command. This is how to do it for downloading the media/ directory (note that the trailing slashes matter):

$ rsync --archive --compress --partial --progress the_user@example.com:~/src/myproject/myproject/media/ ./myproject/media/

Here is an example of the same with a custom ~/.ssh/config configuration:

$ rsync --archive --compress --partial --progress examplecom:~/src/myproject/myproject/media/ ./myproject/media/

To upload the media/ directory to the remote server, you would again switch places for the source and target:

$ rsync --archive --compress --partial --progress ./myproject/media/ examplecom:~/src/myproject/myproject/media/

sftp

FTP clients like Transmit allow you to have SFTP connections either by username and password or by username and private key. You can even generate the private-public keys directly in the app there.

SFTP works like FTP, but all communication is encrypted there.

The final words

Use only encrypted connections for your network communications, encrypt your hard disk if you use manually generated private-public keys, and use strong passwords.

Be safe!


Cover photo by Jason D.

2019-10-08

Working with Dates and Times in the Forms

HTML5 comes with a bunch of new types for the input fields that are rendered as rich native widgets. Browsers even restrict invalid values and validate the input immediately. Let's explore how we could make use of them in Django forms.

We will be using an Exhibition model with models.DateField, models.TimeField, and models.DateTimeField:

# exhibitions/models.py
from django.db import models
from django.utils.translation import gettext_lazy as _

class Exhibition(models.Model):
    title = models.CharField(_("Title"), max_length=200)
    start = models.DateField(_("Start"))
    end = models.DateField(_("End"), blank=True, null=True)
    opening = models.TimeField(_("Opening every day"))
    closing = models.TimeField(_("Closing every day"))
    vernissage = models.DateTimeField(_("Vernissage"), blank=True, null=True)
    finissage = models.DateTimeField(_("Finissage"), blank=True, null=True)

    class Meta:
        verbose_name = _("Exhibition")
        verbose_name_plural = _("Exhibitions")

    def __str__(self):
        return self.title

Here is a quick model form for the Exhibition model:

# exhibitions/forms.py
from django import forms
from .models import Exhibition

class ExhibitionForm(forms.ModelForm):
    class Meta:
        model = Exhibition
        fields = "__all__"

If we now open a Django shell and create an instance of the model form with some initial values, then print the form as HTML to the console, we will notice, that all date and time fields are rendered as <input type="text" /> and the values for the dates are in a local format, not the ISO standard YYYY-MM-DD:

(venv)$ python manage.py shell
>>> from exhibitions.forms import ExhibitionForm
>>> from datetime import datetime, date, time
>>> form = ExhibitionForm(initial={
...     "start": date(2020, 1, 1),
...     "end": date(2020, 3, 31),
...     "opening": time(11, 0),
...     "closing": time(20, 0),
...     "vernissage": datetime(2019, 12, 27, 19, 0),
...     "finissage": datetime(2020, 4, 1, 19, 0),
>>> })
>>> print(form.as_p())
<p><label for="id_title">Title:</label> <input type="text" name="title" maxlength="200" required id="id_title"></p>
<p><label for="id_start">Start:</label> <input type="text" name="start" value="01.01.2020" required id="id_start"></p>
<p><label for="id_end">End:</label> <input type="text" name="end" value="31.03.2020" id="id_end"></p>
<p><label for="id_opening">Opening every day:</label> <input type="text" name="opening" value="11:00:00" required id="id_opening"></p>
<p><label for="id_closing">Closing every day:</label> <input type="text" name="closing" value="20:00:00" required id="id_closing"></p>
<p><label for="id_vernissage">Vernissage:</label> <input type="text" name="vernissage" value="27.12.2019 19:00:00" id="id_vernissage"></p>
<p><label for="id_finissage">Finissage:</label> <input type="text" name="finissage" value="01.04.2020 19:00:00" id="id_finissage"></p>

Let's modify the model form and customize the date and time inputs. We will extend and use forms.DateInput, forms.TimeInput, and forms.DateTimeInput widgets. We want to show date inputs as <input type="date" />, time inputs as <input type="time" />, and date-time inputs as <input type="datetime-local" />. In addition, the format for the dates should be based on ISO standard.

# exhibitions/forms.py
from django import forms
from .models import Exhibition


class DateInput(forms.DateInput):
    input_type = "date"

    def __init__(self, **kwargs):
        kwargs["format"] = "%Y-%m-%d"
        super().__init__(**kwargs)


class TimeInput(forms.TimeInput):
    input_type = "time"


class DateTimeInput(forms.DateTimeInput):
    input_type = "datetime-local"

    def __init__(self, **kwargs):
        kwargs["format"] = "%Y-%m-%dT%H:%M"
        super().__init__(**kwargs)


class ExhibitionForm(forms.ModelForm):
    class Meta:
        model = Exhibition
        fields = "__all__"

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.fields["start"].widget = DateInput()
        self.fields["end"].widget = DateInput()
        self.fields["opening"].widget = TimeInput()
        self.fields["closing"].widget = TimeInput()
        self.fields["vernissage"].widget = DateTimeInput()
        self.fields["vernissage"].input_formats = ["%Y-%m-%dT%H:%M", "%Y-%m-%d %H:%M"]
        self.fields["finissage"].widget = DateTimeInput()
        self.fields["finissage"].input_formats = ["%Y-%m-%dT%H:%M", "%Y-%m-%d %H:%M"]

Let's see now in the Django shell if that worked as expected:

(venv)$ python manage.py shell
>>> from exhibitions.forms import ExhibitionForm
>>> from datetime import datetime, date, time
>>> form = ExhibitionForm(initial={
...     "start": date(2020, 1, 1),
...     "end": date(2020, 3, 31),
...     "opening": time(11, 0),
...     "closing": time(20, 0),
...     "vernissage": datetime(2019, 12, 27, 19, 0),
...     "finissage": datetime(2020, 4, 1, 19, 0),
>>> })
>>> print(form.as_p())
<p><label for="id_title">Title:</label> <input type="text" name="title" maxlength="200" required id="id_title"></p>
<p><label for="id_start">Start:</label> <input type="date" name="start" value="2020-01-01" required id="id_start"></p>
<p><label for="id_end">End:</label> <input type="date" name="end" value="2020-03-31" id="id_end"></p>
<p><label for="id_opening">Opening every day:</label> <input type="time" name="opening" value="11:00:00" required id="id_opening"></p>
<p><label for="id_closing">Closing every day:</label> <input type="time" name="closing" value="20:00:00" required id="id_closing"></p>
<p><label for="id_vernissage">Vernissage:</label> <input type="datetime-local" name="vernissage" value="2019-12-27T19:00" id="id_vernissage"></p>
<p><label for="id_finissage">Finissage:</label> <input type="datetime-local" name="finissage" value="2020-04-01T19:00" id="id_finissage"></p>

The same way you can also create widgets for other HTML5 input types: color, email, month, number, range, tel, url, week, and alike.

Happy coding!


Cover photo by Eric Rothermel.

2019-05-04

My 5 Favorite Talks at DjangoCon Europe 2019

Django people at the DjangoCon Europe 2019

This year DjangoCon Europe happened in Copenhagen, Denmark, at a very creative and special place AFUK - Academy for Untamed Creativity. Surrounded by artistic souls, we learned more about web technologies, got to know each other at ecologic reusable disposable cups of coffee, enjoyed delicious authentic food, and socialized with Django-branded beverages. As always, there was also a party with people drinking IPA (not to be confused with API). And at the weekend Django developers were solving bugs for Django and related open-source software at the coding sprints.

Here I would like to present you with the top 5 talks that I liked most of all.

Django and Web Security Headers

Adam Johnson (@AdamChainz) was talking about special response headers that tell browsers to treat data of the website more securely and which Django settings are responsible for those headers.

Summary by rixx

Docs or it didn't Happen!

Mikey Ariel (@ThatDocsLady) was talking about the necessity of documentation and what to write there.

Summary by rixx

Pushing the ORM to its Limits

Sigurd Ljødal (@sigurdlj) was talking about advanced Django ORM use cases.

Summary by rixx

Logging Rethought 2: The Actions of Frank Taylor Jr.

Markus Holtermann (@m_holtermann) was talking about logging structured data to log files instead of the traditional plain text messages.

Summary by rixx

Maintaining a Django Codebase after 10k Commits

Joachim Jablon (@Ewjoachim) and Stéphane Angel (@twidi) were talking about the best practices developing large-scale Django projects.

Summary by rixx

Honorable Mentions

More Information

You can see all talk descriptions, video records and some slides at the official conference website.

Amazing conference photos were taken by Bartek Pawlik.

I was also really astonished how effective were the talk summaries written by rixx almost in real time.


Cover photo by Bartek Pawlik.

2019-04-27

Improving Page Speed with Incremental Loading

Improving Page Speed with Incremental Loading

Summary: you can use django-include-by-ajax to improve the performance and usability of your website by forcing some parts of the Django website page to be loaded and shown before other parts of the page.


Web browsers load and render traditional HTML pages from top to down, from left to right and as a developer you have little control over what will be shown first, second, and last. However, sometimes you need a different loading sequence to improve user experience and usability. Let's examine a couple of cases when it is advantageous to have primary content showing up immediately and secondary content loading in a moment.

Case 1. Above the Fold vs. Below the Fold

People want speed. 47% of visitors expect the website to be loaded in less than 2 seconds. If the website takes more than 3 seconds to show up, it's a big chance, that you will lose 40% of visitors. If you sell something on your website, every one-second delay causes 7% fewer visitors becoming buyers.

One technique to improve the perception of the speed of the website is to display the visible part of the screen as soon as possible, and then load the rest of the website in another go. Usually, the website pages are long vertically scrollable areas. The part of it that fits in the screen is called "above the fold" and the part underneath is called "below the fold".

Primary content above the fold and secondary content below the fold

It is recommended to load the part above the fold in 6 requests, including all your HTML, CSS, JavaScript, images and fonts. It's only 6 requests for a reason - that's the maximal number of requests that most browsers keep to the same HTTP/1.1 server at the same time. With HTTP/2 there is no such limitation.

You can only achieve this minimal load if you bundle and minimize your CSS and JavaScript to single files, and use only a couple of images and fonts. Going one step further you can split your CSS and JavaScript into parts that are used above the fold, and the ones that are used below the fold.

Case 2. Main Content vs. Navigation

For the users to have best user experience and smooth loading, you could display the content of articles or blog post first, and then load and display the website navigation in the header, sidebars, or footer.

Content is primary and the navigation is secondary

If the visitor navigated to a specific page of your website, they most likely want to see the content of that page rather than navigate out to other pages.

If you have extensive nested navigation, you can also save some milliseconds of its loading at the first request, by skipping it there, but loading it by Ajax at the next go.

Additionally, if visitor disables JavaScript in their browser, they will still be able to read the content.

Case 3. Own Content vs. Third-party Content

Wouldn't you agree, websites that show ads before their own content are pretty annoying? One way to improve the user experience is to show the main content at first and show the ads or third-party widgets after several seconds.

Own content is primary and third-party widgets are secondary

The primary content will be correctly indexed by search engines, whereas the included widgets might be skipped, depending on implementation, which we'll examine next.

Solution 1. Iframes

One way to load the delayed secondary content is to use iframes.

Pros:

  • Works without JavaScript.

Cons:

  • For each iframed section, you need a separate HTML with custom CSS.
  • You have to predefine and hardcode all heights of each secondary section, so it wouldn't work well with increased or decreased font size or different amounts of content.
  • You cannot have interactive elements like tooltips that would go outside the boundaries of the iframe.

Solution 2. API Calls by Ajax

The page would load with empty placeholders for the secondary content and then some JavaScript function would load content for the missing sections in HTML, JSON, or XML format by Ajax, parse them, and include into the placeholders. This approach has been used by Facebook.

Pros:

  • You can use the same global CSS for everything.
  • The amount of content is flexible, so the designs would look good with different variations.

Cons:

  • For each secondary section, you need to define a separate API endpoint.
  • There are many extra requests (unless you use GraphQL for that).

Solution 3. A Second Request to the Same Page with Specific Query Parameters

The page loads with empty placeholders for the secondary content. A JavaScript function uses Ajax to load the HTML of the same page this time containing all rendered primary and secondary content. Then another JavaScript function goes through all placeholders and fills the content from the second load.

Pros:

  • You can use the same global CSS for everything.
  • The amount of content is flexible, so the designs could look good with different variations.
  • Each page uses a single data endpoint.
  • Only one extra request is necessary for the full HTML.

Cons:

  • If there is a lot of primary content and not so much of secondary content, it might take too long to load and parse the secondary content.

Implementation for a Django Website using django-include-by-ajax

You can implement the third solution in a Django website using my open-source Django app django-include-by-ajax. It is meant to be understandable and simple to use for frontend Django developers, who don't touch Python code but need to work on the layouts and styling.

The idea is that instead of including different sections of a template with the {% include template_name %} template tag, you do the same using {% include_by_ajax template_name %} template tag. This template tag renders as an empty placeholder unless you access the page from a search crawler or if you access the page with a specific query parameter. Otherwise, it works more-or-less the same as the {% include %} template tag.

By adding jQuery and one jQuery-based JavaScript file to your page template, you enable the magic that does all the loading and parsing. Since version 1.0, CSS and JavaScript files can also be included in those delayed sections.

You can see django-include-by-ajax in action at the start page of my personal project 1st things 1st. There I use the above-the-fold case with the visible content coming to the screen almost immediately and the offscreen content loading in several more seconds.

Installation

It should be trivial to convert any standard heavy website page to a page loading the secondary content dynamically. There are mainly these steps to follow:

  1. Install the app with your Python package manager:

    (venv)$ pip install django-include-by-ajax==1.0.0
  2. Put the app into the INSTALLED_APPS in your Django project settings:

    # settings.py
    INSTALLED_APPS = [
        # ...
        # Third-party apps
        'include_by_ajax',
        # ...
    ]
  3. Put these in your base.html:

    {% load staticfiles %}
    <script src="https://code.jquery.com/jquery-3.4.0.min.js" crossorigin="anonymous"></script>
    <script src="{% static 'include_by_ajax/js/include_by_ajax.min.js' %}" defer></script>

    It should also work with older or newer jQuery versions.

  4. Use the new template tag in any template where you need it:

    {% load include_by_ajax_tags %}
    {% include_by_ajax "blog/includes/latest_blog_posts.html" %}

    You can even define the content for the placeholder that will be shown while the main content is loading:

    {% load include_by_ajax_tags %}
    {% include_by_ajax "blog/includes/latest_blog_posts.html" placeholder_template_name="utils/loading.html" %}
  5. If you need some JavaScript action to be called after all content is loaded, you can use the custom include_by_ajax_all_loaded event on the document like this:

    $(document).on('include_by_ajax_all_loaded', function() {
        console.log('Now all placeholders are loaded and replaced with content. Hurray!');
    });

I would be glad if you tried it on some of your projects and see how it improved user experience and loading times. If you use it in production, please mention it in the comments of this blog post.

More Information

The app on Github: A Django App Providing the {% include_by_ajax %} Template Tag

The practical usage example: Strategic planner - Prioritizer - 1st things 1st.

An article on performance and usability: 5 Reasons Visitors Leave Your Website


Cover photo by Thomas Tucker.

Thanks to Adam Johnson for reviewing this post.

2019-02-15

How to Export Data to XLSX Files

A while ago I wrote an article about exporting data to different spreadsheet formats. As recently I was reimplementing export to Excel for the 1st things 1st project, I noticed that the API changed a little, so it's time to blog about that again.

For Excel export I am using the XLSX file format which is a zipped XML-based format for spreadsheets with formatting support. XLSX files can be opened with Microsoft Excel, Apache OpenOffice, Apple Numbers, LibreOffice, Google Drive, and a handful of other applications. For building the XLSX file I am using openpyxl library.

Installing openpyxl

You can install openpyxl to your virtual environment the usual way with pip:

(venv) pip install openpyxl==2.6.0

Simplest Export View

To create a function exporting data from a QuerySet to XLSX file, you would need to create a view that returns a response with a special content type and file content as an attachment. Plug that view to URL rules and then link it from an export button in a template.

Probably the simplest view that generates XLSX file out of Django QuerySet would be this:

# movies/views.py
from datetime import datetime
from datetime import timedelta
from openpyxl import Workbook
from django.http import HttpResponse

from .models import MovieCategory, Movie

def export_movies_to_xlsx(request):
    """
    Downloads all movies as Excel file with a single worksheet
    """
    movie_queryset = Movie.objects.all()
    
    response = HttpResponse(
        content_type='application/vnd.openxmlformats-officedocument.spreadsheetml.sheet',
    )
    response['Content-Disposition'] = 'attachment; filename={date}-movies.xlsx'.format(
        date=datetime.now().strftime('%Y-%m-%d'),
    )
    workbook = Workbook()
    
    # Get active worksheet/tab
    worksheet = workbook.active
    worksheet.title = 'Movies'

    # Define the titles for columns
    columns = [
        'ID',
        'Title',
        'Description',
        'Length',
        'Rating',
        'Price',
    ]
    row_num = 1

    # Assign the titles for each cell of the header
    for col_num, column_title in enumerate(columns, 1):
        cell = worksheet.cell(row=row_num, column=col_num)
        cell.value = column_title

    # Iterate through all movies
    for movie in movie_queryset:
        row_num += 1
        
        # Define the data for each cell in the row 
        row = [
            movie.pk,
            movie.title,
            movie.description,
            movie.length_in_minutes,
            movie.rating,
            movie.price,
        ]
        
        # Assign the data for each cell of the row 
        for col_num, cell_value in enumerate(row, 1):
            cell = worksheet.cell(row=row_num, column=col_num)
            cell.value = cell_value

    workbook.save(response)

    return response

If you try this, you will notice, that there is no special formatting in it, all columns are of the same width, the value types are barely recognized, the header is displayed the same as the content. This is enough for further data export to CSV or manipulation with pandas. But if you want to present the data for the user in a friendly way, you need to add some magic.

Creating More Worksheets

By default, each Excel file has one worksheet represented as a tab. You can access it with:

worksheet = workbook.active
worksheet.title = 'The New Tab Title'

If you want to create tabs dynamically with data from the database of Python structures, you can at first delete the current tab and add the others with:

workbook.remove(workbook.active)

for index, category in enumerate(category_queryset):
    worksheet = workbook.create_sheet(
        title=category.title,
        index=index,
    )

Although not all spreadsheet applications support this, you can set the background color of the worksheet tab with:

worksheet.sheet_properties.tabColor = 'f7f7f9'

Working with Cells

Each cell can be accessed by its 1-based indexes for the rows and for the columns:

top_left_cell = worksheet.cell(row=1, column=1)
top_left_cell.value = "This is good!"

Styles and formatting are applied to individual cells instead of rows or columns. There are several styling categories with multiple configurations for each of them. You can find some available options from the documentation, but even more by exploring the source code.

from openpyxl.styles import Font, Alignment, Border, Side, PatternFill

top_left_cell.font = Font(name='Calibri', bold=True)
top_left_cell.alignment = Alignment(horizontal='center')
top_left_cell.border = Border(
    bottom=Side(border_style='medium', color='FF000000'),
)
top_left_cell.fill = PatternFill(
    start_color='f7f7f9',
    end_color='f7f7f9',
    fill_type='solid',
)

If you are planning to have multiple styled elements, instantiate the font, alignment, border, fill options upfront and then assign the instances to the cell attributes. Otherwise, you can get into memory issues when you have a lot of data entries.

Setting Column Widths

If you want to have some wider or narrower width for some of your columns, you can do this by modifying column dimensions. They are accessed by column letter which can be retrieved using a utility function:

from openpyxl.utils import get_column_letter

column_letter = get_column_letter(col_num)
column_dimensions = worksheet.column_dimensions[column_letter]
column_dimensions.width = 40

The units here are some relative points depending on the width of the letters in the specified font. I would suggest playing around with the width value until you find what works for you.

When defining column width is not enough, you might want to wrap text into multiple lines so that everything can be read by people without problems. This can be done with the alignment setting for the cell as follows:

from openpyxl.styles import Alignment

wrapped_alignment = Alignment(vertical='top', wrap_text=True)
cell.alignment = wrapped_alignment

Data Formatting

Excel automatically detects text or number types and aligns text to the left and numbers to the right. If necessary that can be overwritten.

There are some gotchas on how to format cells when you need a percentage, prices, or time durations.

Percentage

For percentage, you have to pass the number in float format from 0.0 till 1.0 and style should be 'Percent' as follows:

cell.value = 0.75
cell.style = 'Percent'

Currency

For currency, you need values of Decimal format, the style should be 'Currency', and you will need a special number format for currency other than American dollars, for example:

from decimal import Decimal
cell.value = Decimal('14.99')
cell.style = 'Currency'
cell.number_format = '#,##0.00 €'

Durations

For time duration, you have to pass timedelta as the value and define special number format:

from datetime import timedelta

cell.value = timedelta(minutes=90)
cell.number_format = '[h]:mm;@'

This number format ensures that your duration can be greater than '23:59', for example, '140:00'.

Freezing Rows and Columns

In Excel, you can freeze rows and columns so that they stay fixed when you scroll the content vertically or horizontally. That's similar to position: fixed in CSS.

To freeze the rows and columns, locate the top-left cell that is below the row that you want to freeze and is on the right from the column that you want to freeze. For example, if you want to freeze one row and one column, the cell would be 'B2'. Then run this:

worksheet.freeze_panes = worksheet['B2']

Fully Customized Export View

So having the knowledge of this article now we can build a view that creates separate sheets for each movie category. Each sheet would list movies of the category with titles, descriptions, length in hours and minutes, rating in percent, and price in Euros. The tabs, as well as the headers, can have different background colors for each movie category. Cells would be well formatted. Titles and descriptions would use multiple lines to fully fit into the cells.

# movies/views.py
from datetime import datetime
from datetime import timedelta
from openpyxl import Workbook
from openpyxl.styles import Font, Alignment, Border, Side, PatternFill
from openpyxl.utils import get_column_letter
from django.http import HttpResponse

from .models import MovieCategory, Movie

def export_movies_to_xlsx(request):
    """
    Downloads all movies as Excel file with a worksheet for each movie category
    """
    category_queryset = MovieCategory.objects.all()
    
    response = HttpResponse(
        content_type='application/vnd.openxmlformats-officedocument.spreadsheetml.sheet',
    )
    response['Content-Disposition'] = 'attachment; filename={date}-movies.xlsx'.format(
        date=datetime.now().strftime('%Y-%m-%d'),
    )
    workbook = Workbook()
    
    # Delete the default worksheet
    workbook.remove(workbook.active)

    # Define some styles and formatting that will be later used for cells
    header_font = Font(name='Calibri', bold=True)
    centered_alignment = Alignment(horizontal='center')
    border_bottom = Border(
        bottom=Side(border_style='medium', color='FF000000'),
    )
    wrapped_alignment = Alignment(
        vertical='top',
        wrap_text=True
    )

    # Define the column titles and widths
    columns = [
        ('ID', 8),
        ('Title', 40),
        ('Description', 80),
        ('Length', 15),
        ('Rating', 15),
        ('Price', 15),
    ]
    
    # Iterate through movie categories
    for category_index, category in enumerate(category_queryset):
        # Create a worksheet/tab with the title of the category
        worksheet = workbook.create_sheet(
            title=category.title,
            index=category_index,
        )
        # Define the background color of the header cells
        fill = PatternFill(
            start_color=category.html_color,
            end_color=category.html_color,
            fill_type='solid',
        )
        row_num = 1
        
        # Assign values, styles, and formatting for each cell in the header
        for col_num, (column_title, column_width) in enumerate(columns, 1):
            cell = worksheet.cell(row=row_num, column=col_num)
            cell.value = column_title
            cell.font = header_font
            cell.border = border_bottom
            cell.alignment = centered_alignment
            cell.fill = fill
            # set column width
            column_letter = get_column_letter(col_num)
            column_dimensions = worksheet.column_dimensions[column_letter]
            column_dimensions.width = column_width

        # Iterate through all movies of a category
        for movie in category.movie_set.all():
            row_num += 1
            
            # Define data and formats for each cell in the row
            row = [
                (movie.pk, 'Normal'),
                (movie.title, 'Normal'),
                (movie.description, 'Normal'),
                (timedelta(minutes=movie.length_in_minutes), 'Normal'),
                (movie.rating / 100, 'Percent'),
                (movie.price, 'Currency'),
            ]
            
            # Assign values, styles, and formatting for each cell in the row
            for col_num, (cell_value, cell_format) in enumerate(row, 1):
                cell = worksheet.cell(row=row_num, column=col_num)
                cell.value = cell_value
                cell.style = cell_format
                if cell_format == 'Currency':
                    cell.number_format = '#,##0.00 €'
                if col_num == 4:
                    cell.number_format = '[h]:mm;@'
                cell.alignment = wrapped_alignment

        # freeze the first row
        worksheet.freeze_panes = worksheet['A2']
        
        # set tab color
        worksheet.sheet_properties.tabColor = category.html_color

    workbook.save(response)

    return response

The Takeaways

  • Spreadsheet data can be used for further mathematical processing with pandas.
  • XLSX file format allows quite a bunch of formatting options that can make your spreadsheet data more presentable and user-friendly.
  • To see Excel export in action, go to 1st things 1st, log in as a demo user, and navigate to project results where you can export them as XLSX. Feedback is always welcome.

Cover photo by Tim Evans.

2019-02-02

Equivalents in Python and JavaScript. Bonus

From time to time I google for the right syntax how to process lists and dictionaries in Python or arrays and objects in JavaScript. So I decided to extend my series of equivalents with those functions. After all, it's me too, who will be using the information I provide here.

All truthful elements

Sometimes we need to check from a list of conditions if all of them are true, or from a list of elements if all of them are not empty.

This can be checked with the following in Python:

items = [1, 2, 3]
all_truthy = all(items)
# True

And here is an equivalent in JavaScript:

items = [1, 2, 3];
all_truthy = items.every(Boolean);
// true

Any truthful elements

Similarly, we can check if at least one of the conditions is true, or there is at least one non-empty element in a list.

It Python we would do that with:

items = [0, 1, 2, 3]
some_truthy = any(items)
# True

And in JavaScript we would check it like this:

items = [0, 1, 2, 3];
some_truthy = items.some(Boolean);
// true

Iterate through each element and its index

Here is an example of how to iterate through a list of items and also check their indices in Python. It is useful for verbose console output when creating different command line tools that process data:

items = ['a', 'b', 'c', 'd']
for index, element in enumerate(items):
    print(f'{index}: {element};')

In JavaScript an analogous way to do the same would be using the forEach() method. The usual for loop is also an option, but I find the forEach() more elegant and clear.

items = ['a', 'b', 'c', 'd'];
items.forEach(function(element, index) {
    console.log(`${index}: ${element};`);
});

Map elements to the results of a function

To process all elements of a list, you can either iterate through them with the for loop and create a new list with modifications, or you can do that in one step by mapping the list items to a modification function. In Python this can be done with the map() function:

items = [0, 1, 2, 3]
all_doubled = list(map(lambda x: 2 * x, items))
# [0, 2, 4, 6]

In JavaScript the map() is a method of an array:

items = [0, 1, 2, 3];
all_doubled = items.map(x => 2 * x);
// [0, 2, 4, 6]

Filter elements by a function

When you need to search for some elements in a list or array and want to avoid for loops, you can use the filtering functionality. In Python that is doable with the filter() function that accepts the filtering function and the list and returns a new filtered list.

items = [0, 1, 2, 3]
only_even = list(filter(lambda x: x % 2 == 0, items))
# [0, 2]

In JavaScript there is a filter() method of the array for that.

items = [0, 1, 2, 3];
only_even = items.filter(x => x % 2 === 0);
// [0, 2]

In both cases, the filtering function checks each item if it is matching the filter criteria and returns true in that case.

Reduce elements by a function to a single value

When you want to apply some function to a list of items to get a single result in one go, you can use the reduce function. It works for summing, multiplying, ORing, ANDing, or checking maximums and minimums.

In Python there is a reduce() function for that.

from functools import reduce
items = [1, 2, 3, 4]
total = reduce(lambda total, current: total + current, items)
# 10

In JavaScript there is a reduce() method of the array.

items = [1, 2, 3, 4];
total = items.reduce((total, current) => total + current);
// 10

Merge dictionaries

There are multiple ways to merge dictionaries in Python or objects in JavaScript. But these are probably the simplest ones.

In Python it's decomposing dictionaries to tuples of keys and arrays, joining them, and creating a new dictionary.

d1 = {'a': 'A', 'b': 'B'}
d2 = {'a': 'AAA', 'c': 'CCC'}
merged = dict(list(d1.items()) + list(d2.items()))
# {'a': 'AAA', 'b': 'B', 'c': 'CCC'}

Analogously, in JavaScript it's spreading two objects into a new object:

d1 = {a: 'A', b: 'B'}
d2 = {a: 'AAA', c: 'CCC'}
merged = {...d1, ...d2};
// {a: 'AAA', b: 'B', c: 'CCC'}

The Takeaways

  • In both languages, you can traverse through lists of items without explicitly incrementing and referencing an index.
  • For processing list items, you don't necessarily need a loop. The dedicated methods or functions all() / every(), any() / some(), map(), filter(), and reduce() are there to help you.
  • In both languages, you can merge multiple dictionaries into one. If the same key appears in several dictionaries, the latest one will be used in the merged dictionary.

Of course, I also updated the cheat sheet with the full list of equivalents in Python and JavaScript that you saw here described. This cheat sheet helps me with a good overview next to my laptop, so I believe that it would be helpful to you too. The new revision 10 is with syntax highlighting, so it makes it even better to explore and understand.

Get the Ultimate Cheat Sheet of Equivalents in Python and JavaScript

Use it for good!


Cover photo by Darren Chan.

2019-01-11

How to Create PDF Documents with Django in 2019

If you've read my Web Development with Django Cookbook, you might remember a recipe for creating PDF documents using Pisa xhtml2pdf. Well, this library does its job, but it supports only a subset of HTML and CSS features. For example, for multi-column layouts, you have to use tables, like it's 1994.

I needed some fresh and flexible option to generate donation receipts for the donation platform www.make-impact.org and reports for the strategic planner 1st things 1st I have been building. After a quick research I found another much more suitable library. It's called WeasyPrint. In this article, I will tell you how to use it with Django and what's valuable in it.

Features

WeasyPrint uses HTML and CSS 2.1 to create pixel-perfect, or let's rather say point-perfect, PDF documents. WeasyPrint doesn't use WebKit or Gecko but has its own rendering engine. As a proof that it works correctly, it passes the famous among web developers Acid2 test which was created back in the days before HTML5 to check how compatible browsers are with CSS 2 standards.

All supported features (and unsupported exceptions) are listed in the documentation. But my absolute favorites are these:

  • Layouts with floated elements. You don't have to use tables anymore if you want to have the recipient address on the left side and the sender information on the right side in a letter, or if you want to have the main content and the side notes in an exercise book. Just use floated elements.
  • Working links. The generated document can have clickable links to external URLs and internal anchors. You can straightforwardly create a clickable table of contents or a banner that leads back to your website.
  • Support for web fonts. With the wide variety of embeddable web fonts, your documents don't need to look boring anymore. Why not write titles in elegant cursive or in bold western letters?
  • Background images. By default, when you print an HTML page, all foreground images get printed, but the backgrounds are skipped. When you generate a PDF document for printing, you can show background images anywhere, even in the margins of the printed page.
  • SVG kept as vector images. When you have diagrams and graphics in a PDF document, you usually want to preserve the quality of the lines. Even if they look good on the screen, raster images might be not what you want, because on a printed page the resolution will differ and the quality can be lost. WeasyPrint keeps SVG images as vector images, so you have the highest possible quality in the prints.

Important Notes

WeasyPrint needs Python 3.4 or newer. That's great for new Django projects, but might be an obstacle if you want to integrate it into an existing website running on Python 2.7. Can it be the main argumentation for you to upgrade your old Django projects to the new Python version?

WeasyPrint is dependent on several OS libraries: Pango, GdkPixbuf, Cairo, and Libffi. In the documentation, there are understandable one-line instructions how to install them on different operating systems. You can have a problem only if you don't have full control of the server where you are going to deploy your project.

If you need some basic headers and footers for all pages, you can use @page CSS selector for that. If you need extended headers and footers for each page, it's best to combine the PDF document out of separate HTML documents for each page. Examples follow below.

The fun fact, Emojis are drawn using some weird raster single-color font. I don't recommend using them in your PDFs unless you replace them with SVG images.

Show Me the Code

A technical article is always more valuable when it has some quick code snippets to copy and paste. Here you go!

Simple PDF View

This snippet generates a donation receipt and shows it directly in the browser. Should the PDF be downloadable immediately, change content disposition from inline to attachment.

# -*- coding: UTF-8 -*-
from __future__ import unicode_literals

from django.http import HttpResponse
from django.template.loader import render_to_string
from django.utils.text import slugify
from django.contrib.auth.decorators import login_required

from weasyprint import HTML
from weasyprint.fonts import FontConfiguration

from .models import Donation

@login_required
def donation_receipt(request, donation_id):
    donation = get_object_or_404(Donation, pk=donation_id, user=request.user)
    response = HttpResponse(content_type="application/pdf")
    response['Content-Disposition'] = "inline; filename={date}-{name}-donation-receipt.pdf".format(
        date=donation.created.strftime('%Y-%m-%d'),
        name=slugify(donation.donor_name),
    )
    html = render_to_string("donations/receipt_pdf.html", {
        'donation': donation,
    })

    font_config = FontConfiguration()
    HTML(string=html).write_pdf(response, font_config=font_config)
    return response

Page Configuration Using CSS

Your PDF document can have a footer with an image and text on every page, using background-image and content properties:

{% load staticfiles i18n %}
<link href="https://fonts.googleapis.com/css?family=Playfair+Display:400,400i,700,700i,900" rel="stylesheet" />
<style>
@page {
    size: "A4";
    margin: 2.5cm 1.5cm 3.5cm 1.5cm;
    @bottom-center {
        background: url({% static 'site/img/logo-pdf.svg' %}) no-repeat center top;
        background-size: auto 1.5cm;
        padding-top: 1.8cm;
        content: "{% trans "Donation made via www.make-impact.org" %}";
        font: 10pt "Playfair Display";
        text-align: center;
        vertical-align: top;
    }
}
</style>

Pagination

You can show page numbers in the footer using CSS as follows.

@page {
    margin: 3cm 2cm;
    @top-center {
        content: "Documentation";
    }
    @bottom-right {
        content: "Page " counter(page) " of " counter(pages); 
    }
}

Horizontal Page Layout

You can rotate the page to horizontal layout with size: landscape.

@page {
    size: landscape;
}

HTML-based Footer

Another option to show an image and text in the header or footer on every page is to use an HTML element with position: fixed. This way you have more flexibility about formatting, but the element on all your pages will have the same content.

<style>
footer {
    position: fixed;
    bottom: -2.5cm;
    width: 100%;
    text-align: center;
    font-size: 10pt;
}
footer img {
    height: 1.5cm;
}
</style>
<footer>
    {% with website_url="https://www.make-impact.org" %}
        <a href="{{ website_url }}">
            <img alt="" src="{% static 'site/img/logo-contoured.svg' %}" />
        </a><br />
        {% blocktrans %}Donation made via <a href="{{ website_url }}">www.make-impact.org</a>{% endblocktrans %}
    {% endwith %}
</footer>

Document Rendering from Page to Page

When you need to have a document with complex unique headers and footers, it is best to render each page as a separate HTML document and then to combine them into one. This is how to do that:

def letter_pdf(request, letter_id):
    letter = get_object_or_404(Letter, pk=letter_id)
    response = HttpResponse(content_type='application/pdf')
    response['Content-Disposition'] = (
        'inline; '
        f'filename={letter.created:%Y-%m-%d}-letter.pdf'
    )
    COMPONENTS = [
        'letters/pdf/cover.html',
        'letters/pdf/page01.html',
        'letters/pdf/page02.html',
        'letters/pdf/page03.html',
    ]
    documents = []
    font_config = FontConfiguration()
    for template_name in COMPONENTS:
        html = render_to_string(template_name, {
            'letter': letter,
        })
        document = HTML(string=html).render(font_config=font_config)
        documents.append(document)

    all_pages = [page for document in documents for page in document.pages]
    documents[0].copy(all_pages).write_pdf(response)

    return response

Final Thoughts

I believe that WeasyPrint could be used not only for invoices, tickets, or booking confirmations but also for online magazines and small booklets. If you want to see PDF rendering with WeasyPrint in action, make a donation to your chosen organization at www.make-impact.org (when it's ready) and download the donation receipt. Or check the demo account at my.1st-things-1st.com and find the button to download the results of a prioritization project as PDF document.


Cover photo by Daniel Korpai.