2018-06-30

Equivalents in Python and JavaScript. Part 1

Although Python and JavaScript are quite different languages, there are some analogies which full stack Python developers should know when developing web projects. In this series of 4 parts, I will explore what is similar in each of those languages and what are the common ways to solve common problems. This is not meant to be a reference and I will skip the basics like primitive variable types, conditions, and loops. But I will dig into more complex structures and data operations using both, Python and JavaScript. Also, I will try to focus on the practical use cases. This series should be interesting for the developers of Django, Flask, or another Python framework who want to get a grasp of traditional and modern vanilla JavaScript. On the other hand, it will be useful for the front-enders who want to better understand how the backend is working and maybe even start their own Django website.

Parsing integers

We'll begin with integer parsing.

In Python that's straightforward:

number = int(text)

But in JavaScript you have to explain what number system you expect: decimal, octal, hexadecimal, or binary:

number = parseInt(text, 10);

To use the "normal" decimal number system we are passing number 10 as the second parameter of the parseInt() function. 8 goes for octal, 16 for hexadecimal, or 2 – for binary. If the second parameter is missing, the number in text starts with zero, and you are using a slightly older browser, the number in the text will be interpreted as octal. For example,

parseInt('012') == 10  // in some older browsers
parseInt('012', 10) == 12

And that can really mess up your calculations.

Conditional assignment

For conditional assignment, Python and JavaScript have different syntaxes, but conditional assignments are quite popular in both languages. That's popular, because it's just a single statement to have a condition checking, the true-case value, and the false-case value.

Since Python 2.7 you can write conditional assignments like this:

value = 'ADULT' if age >= 18 else 'CHILD'

In JavaScript conditional assignments are done using ternary operator ?:, similar to the ones in C, C++, C#, Java, Ruby, PHP, Perl, Swift, and ActionScript:

value = age >= 18? 'ADULT': 'CHILD';

Object attribute value by attribute name

The normal way to access an object's attribute is by the dot notation in both, Python and JavaScript:

obj.color = 'YELLOW'

But what if you want to refer to an attribute by its name saved as a string? For example, the attribute name could be coming from a list of attributes or the attribute name is combined from two strings like 'title_' + lang_code.

For that reason, in Python, there are functions getattr() and setattr(). I use them a lot.

attribute = 'color'
value = getattr(obj, attribute, 'GREEN')
setattr(obj, attribute, value)

In JavaScript you can treat an object like a dictionary and pass the attribute name in square brackets:

attribute = 'color';
value = obj[attribute] || 'GREEN';
obj[attribute] = value;

To retrieve a default value when an object has no such attribute, in Python, getattr() has the third parameter. In JavaScript, if obj attribute doesn't exist, it will return the undefined value. Then it can be OR-ed with the default value that you want to assign. That's a common practice in JavaScript that you can find in many JavaScript libraries and frameworks.

Dictionary value by key

This is similar to the previous one. The normal way to assign a dictionary's value by key in both languages is using the square brackets:

dictionary = {}
dictionary['color'] = 'YELLOW'

To read a value in Python you can use the square-bracket notation, but it will fail on non-existing keys with KeyError. The more flexible way is to use the get() method which returns None for non-existing keys. Also you can pass an optional default value as the second parameter:

key = 'color'
value = dictionary.get(key, 'GREEN')

In JavaScript you would use the same trick as with object attributes, because dictionaries and objects are the same there:

key = 'color';
value = dictionary[key] || 'GREEN';

Slicing lists and strings

Python has the slice [:] operator to get parts of lists, tuples, and similar more complex structures, for example Django QuerySets:

items = [1, 2, 3, 4, 5]
first_two = items[:2]      # [1, 2]
last_two = items[-2:]      # [4, 5]
middle_three = items[1:4]  # [2, 3, 4]

In JavaScript arrays have the slice() method with the same effect and similar usage:

items = [1, 2, 3, 4, 5];
first_two = items.slice(0, 2);     // [1, 2] 
last_two = items.slice(-2);        // [4, 5]
middle_three = items.slice(1, 4);  // [2, 3, 4]

But don't mix it up with the splice() method which modifies the original array!


The [:] slice operator in Python also works for strings:

text = 'ABCDE'
first_two = text[:2]      # 'AB'
last_two = text[-2:]      # 'DE'
middle_three = text[1:4]  # 'BCD'

In JavaScript strings just like arrays have the slice() method:

text = 'ABCDE';
first_two = text.slice(0, 2);    // 'AB'
last_two = text.slice(-2);       // 'DE'
middle_three = text.slice(1, 4); // 'BCD'

Operations with list items

In programming it is very common to collect and analyze sequences of elements. In Python that is usually done with lists and in JavaScript with arrays. They have similar syntax and operations, but different method names to add and remove values.

This is how to concatenate two lists, add one value to the end, add one value to the beginning, get and remove a value from the beginning, get and remove a value from the end, and delete a certain value by index in Python:

items1 = ['A']
items2 = ['B']
items = items1 + items2  # items == ['A', 'B']
items.append('C')        # ['A', 'B', 'C']
items.insert(0, 'D')     # ['D', 'A', 'B', 'C']
first = items.pop(0)     # ['A', 'B', 'C']
last = items.pop()       # ['A', 'B']
items.delete(0)          # ['B']

This is how to do exactly the same with arrays in JavaScript:

items1 = ['A'];
items2 = ['B'];
items = items1.concat(items2);  // items === ['A', 'B']
items.push('C');                // ['A', 'B', 'C']
items.unshift('D');             // ['D', 'A', 'B', 'C']
first = items.shift();          // ['A', 'B', 'C']
last = items.pop();             // ['A', 'B']
items.splice(0, 1);             // ['B']

Joining lists of strings

It is very common after having a list or array of strings, to combine them into one string by a separator like comma or new line.

In Python that is done by the join() method of a string where you pass the list or tuple. Although it might feel unnatural, you start with the separator there. But I can assure that you get used to it after several times of usage.

items = ['A', 'B', 'C']
text = ', '.join(items)  # 'A, B, C'

In JavaScript the array has the join() method where you pass the separator:

items = ['A', 'B', 'C'];
text = items.join(', ');  // 'A, B, C'

The Takeaways

  • List and tuples in Python are similar to arrays in JavaScript.
  • Dictionaries in Python are similar to objects in JavaScript.
  • Strings in Python are similar to strings in JavaScript.
  • Numbers in JavaScript should be parsed with care.
  • Single-line conditional assignments exist in both languages.
  • Joining sequences of strings in Python is confusing, but you can quickly get used to it.

I compiled the whole list of equivalents of Python and JavaScript to a cheat sheet that you can print out and use for good. Side by side it compares traditional Python 2.7 and JavaScript based on ECMAScript 5 standard, as well as newer Python 3.6 and JavaScript based on ECMAScript 6 standard with such goodies as string interpolation, lambdas, generators, classes, etc.

Get the Ultimate Cheat Sheet of Equivalents in Python and JavaScript

In the next part of the series, we will have a look at JSON creation and parsing, operations with regular expressions, and error handling. Stay tuned!


Cover photo by Benjamin Hung.

2018-06-17

Data Filtering in a Django Website using Elasticsearch

In my Web Development with Django Cookbook section Forms and Views there is a recipe Filtering object lists. It shows you how to filter a Django QuerySet dynamically by different filter parameters selected in a form. From practice, the approach is working well, but with lots of data and complex nested filters, the performance might get slow. You know - because of all those INNER JOINS in SQL, the page might take even 12 seconds to load. And that is not preferable behavior. I know that I could denormalize the database or play with indices to optimize SQL. But I found a better way to increase the loading speed. Recently we started using Elasticsearch for one of the projects and its data filtering performance seems to be enormously faster: in our case, it increased from 2 to 16 times depending on which query parameters you choose.

What is Elasticsearch?

Elasticsearch is java-based search engine which stores data in JSON format and allows you to query it using special JSON-based query language. Using elasticsearch-dsl and django-elasticsearch-dsl, I can bind my Django models to Elasticsearch indexes and rewrite my object list views to use Elasticsearch queries instead of Django ORM. The API of Elasticsearch DSL is chainable like with Django QuerySets or jQuery functions, and we'll have a look at it soon.

The Setup

At first, let's install Elasticsearch server. Elasticsearch is quite a complex system, but it comes with convenient configuration defaults.

On macOS you can install and start the server with Homebrew:

$ brew install elasticsearch
$ brew services start elasticsearch

For other platforms, the installation instructions are also quite clear.

Then in your Django project's virtual environment install django-elasticsearch-dsl. I guess, "DSL" stands for "domain specific language".

With pipenv it would be the following from the project's directory:

$ pipenv install django-elasticsearch-dsl

If you are using just pip and virtual environment, then you would do this with your project's environment activated.

(venv)$ pip install django-elasticsearch-dsl

This, in turn, will install related lower level client libraries: elasticsearch-dsl and elasticsearch-py.

In the Django project settings, add 'django_elasticsearch_dsl' to INSTALLED_APPS.

Finally, add the lines defining default connection configuration there:

ELASTICSEARCH_DSL={
    'default': {
        'hosts': 'localhost:9200'
    },
}

Elasticsearch Documents for Django Models

For the illustration how to use Elasticsearch with Django, I'll create Author and Book models, and then I will create Elasticsearch index document for the books.

models.py

# -*- coding: UTF-8 -*-
from __future__ import unicode_literals

from django.db import models
from django.utils.translation import ugettext_lazy as _
from django.utils.encoding import python_2_unicode_compatible


@python_2_unicode_compatible
class Author(models.Model):
    first_name = models.CharField(_("First name"), max_length=200)
    last_name = models.CharField(_("Last name"), max_length=200)
    author_name = models.CharField(_("Author name"), max_length=200)

    class Meta:
        verbose_name = _("Author")
        verbose_name_plural = _("Authors")
        ordering = ("author_name",)

    def __str__(self):
        return self.author_name


@python_2_unicode_compatible
class Book(models.Model):
    title = models.CharField(_("Title"), max_length=200)
    authors = models.ManyToManyField(Author, verbose_name=_("Authors"))
    publishing_date = models.DateField(_("Publishing date"), blank=True, null=True)
    isbn = models.CharField(_("ISBN"), blank=True, max_length=20)

    class Meta:
        verbose_name = _("Book")
        verbose_name_plural = _("Books")
        ordering = ("title",)

    def __str__(self):
        return self.title

Nothing fancy here. Just an Author model with fields id, first_name, last_name, author_name, and a Book model with fields id, title, authors, publishing_date, and isbn. Let's go to the documents.

documents.py

In the same directory of your app, create documents.py with the following content:

# -*- coding: UTF-8 -*-
from __future__ import unicode_literals

from django_elasticsearch_dsl import DocType, Index, fields
from .models import Author, Book

# Name of the Elasticsearch index
search_index = Index('library')
# See Elasticsearch Indices API reference for available settings
search_index.settings(
    number_of_shards=1,
    number_of_replicas=0
)


@search_index.doc_type
class BookDocument(DocType):
    authors = fields.NestedField(properties={
        'first_name': fields.TextField(),
        'last_name': fields.TextField(),
        'author_name': fields.TextField(),
        'pk': fields.IntegerField(),
    }, include_in_root=True)

    isbn = fields.KeywordField(
        index='not_analyzed',
    )

    class Meta:
        model = Book # The model associated with this DocType

        # The fields of the model you want to be indexed in Elasticsearch
        fields = [
            'title',
            'publishing_date',
        ]
        related_models = [Author]

    def get_instances_from_related(self, related_instance):
        """If related_models is set, define how to retrieve the Book instance(s) from the related model."""
        if isinstance(related_instance, Author):
            return related_instance.book_set.all()

Here we defined a BookDocument which will have fields: title, publishing_date, authors, and isbn.

The authors will be a list of nested dictionaries at the BookDocument. The isbn will be a KeywordField which means that it will be not tokenized, lowercased, nor otherwise processed and handled the whole as is.

The values for those document fields will be read from the Book model.

Using signals, the document will be automatically updated either when a Book instance or Author instance is added, changed, or deleted. In the method get_instances_from_related(), we tell the search engine which books to update when an author is updated.

Building the Index

When the index document is ready, let's build the index at the server:

(venv)$ python manage.py search_index --rebuild

Django QuerySets vs. Elasticsearch Queries

The concepts of SQL and Elasticsearch queries are quite different. One is working with relational tables and the other works with dictionaries. One is using queries that are kind of human-readable logical sentences and another is using nested JSON structures. One is using the content verbosely and another does string processing in the background and gives search relevance for each result.

Even when there are lots of differences, I will try to draw analogies between Django ORM and elasticsearch-dsl API as close as possible.

1. Query definition

Django QuerySet:

queryset = MyModel.objects.all()

Elasticsearch query:

search = MyModelDocument.search()

2. Count

Django QuerySet:

queryset = queryset.count()

Elasticsearch query:

search = search.count()

3. Iteration

Django QuerySet:

for item in queryset:
    print(item.title)

Elasticsearch query:

for item in search:
    print(item.title)

4. To see the generated query:

Django QuerySet:

>>> queryset.query

Elasticsearch query:

>>> search.to_dict()

5. Filter by single field containing a value

Django QuerySet:

queryset = queryset.filter(my_field__icontains=value)

Elasticsearch query:

search = search.filter('match_phrase', my_field=value)

6. Filter by single field equal to a value

Django QuerySet:

queryset = queryset.filter(my_field__exact=value)

Elasticsearch query:

search = search.filter('match', my_field=value)

If a field type is a string, not a number, it has to be defined as KeywordField in the index document:

my_field = fields.KeywordField()

7. Filter with either of the conditions (OR)

Django QuerySet:

from django.db import models
queryset = queryset.filter(
    models.Q(my_field=value) |
    models.Q(my_field2=value2)
)

Elasticsearch query:

from elasticsearch_dsl.query import Q
search = search.query(
    Q('match', my_field=value) |
    Q('match', my_field2=value2)
)

8. Filter with all of the conditions (AND)

Django QuerySet:

from django.db import models
queryset = queryset.filter(
    models.Q(my_field=value) &
    models.Q(my_field2=value2)
)

Elasticsearch query:

from elasticsearch_dsl.query import Q
search = search.query(
    Q('match', my_field=value) & 
    Q('match', my_field2=value2)
)

9. Filter by values less than or equal to certain value

Django QuerySet:

from datetime import datetime

queryset = queryset.filter(
    published_at__lte=datetime.now(),
)

Elasticsearch query:

from datetime import datetime

search = search.filter(
    'range',
    published_at={'lte': datetime.now()}
)

10. Filter by a value in a nested field

Django QuerySet:

queryset = queryset.filter(
    category__pk=category_id,
)

Elasticsearch query:

search = search.filter(
    'nested', 
    path='category', 
    query('match', category__pk=category_id)
)

11. Filter by one of many values in a related model

Django QuerySet:

queryset = queryset.filter(
    category__pk__in=category_ids,
)

Elasticsearch query:

from django.utils.six.moves import reduce
from elasticsearch_dsl.query import Q

search = search.query(
    reduce(operator.ior, [
        Q(
            'nested', 
            path='category', 
            query('match', category__pk=category_id),
        )
        for category_id in category_ids
    ])
)

Here the reduce() function combines a list of Q() conditions using the bitwise OR operator (|).

12. Ordering

Django QuerySet:

queryset = queryset.order_by('-my_field', 'my_field2')

Elasticsearch query:

search = search.sort('-my_field', 'my_field2')

13. Creating query dynamically

Django QuerySet:

import operator
from django.utils.six.moves import reduce

filters = []
if value1:
    filters.append(models.Q(
        my_field1=value1,
    ))
if value2:
    filters.append(models.Q(
        my_field2=value2,
    ))
queryset = queryset.filter(
    reduce(operator.iand, filters)
)

Elasticsearch query:

import operator
from django.utils.six.moves import reduce
from elasticsearch_dsl.query import Q

queries = []
if value1:
    queries.append(Q(
        'match',
        my_field1=value1,
    ))
if value2:
    queries.append(Q(
        'match',
        my_field2=value2,
    ))
search = search.query(
    reduce(operator.iand, queries)
)

14. Pagination

Django QuerySet:

from django.core.paginator import (
    Paginator, Page, EmptyPage, PageNotAnInteger
)

paginator = Paginator(queryset, paginate_by)
page_number = request.GET.get('page')
try:
    page = paginator.page(page_number)
except PageNotAnInteger:
    page = paginator.page(1)
except EmptyPage:
    page = paginator.page(paginator.num_pages)

Elasticsearch query:

from django.core.paginator import (
    Paginator, Page, EmptyPage, PageNotAnInteger
)
from django.utils.functional import LazyObject

class SearchResults(LazyObject):
    def __init__(self, search_object):
        self._wrapped = search_object

    def __len__(self):
        return self._wrapped.count()

    def __getitem__(self, index):
        search_results = self._wrapped[index]
        if isinstance(index, slice):
            search_results = list(search_results)
        return search_results

search_results = SearchResults(search)

paginator = Paginator(search_results, paginate_by)
page_number = request.GET.get('page')
try:
    page = paginator.page(page_number)
except PageNotAnInteger:
    page = paginator.page(1)
except EmptyPage:
    page = paginator.page(paginator.num_pages)

ElasticSearch doesn't work with Django's pagination by default. Therefore, we have to wrap the search query with lazy SearchResults class to provide the necessary functionality.

Example

I built an example with books written about Django. You can download it from Github and test it.

Takeaways

  • Filtering with Elasticsearch is much faster than with SQL databases.
  • But it comes at the cost of additional deployment and support time.
  • If you have multiple websites using Elasticsearch on the same server, configure a new cluster and node for each of those websites.
  • Django ORM can be in a way mapped to Elasticsearch DSL.
  • I summarized the comparison of Django ORM and Elasticsearch DSL, mentioned in this article, into a cheat sheet. You can get it for a symbolic fee. Print it on a single sheet of paper and use it as a reference for your developments.

Buy Django ORM vs. Elasticsearch DSL Cheat Sheet


Cover photo by Karl Fredrickson.