2018-07-09

Equivalents in Python and JavaScript. Part 4

In the last three parts of the series of articles about analogies in Python and JavaScript, we explored lots of interesting concepts like serializing to JSON, error handling, using regular expressions, string interpolation, generators, lambdas, and many more. This time we will delve into function arguments, creating classes, using class inheritance, and defining getters and setters of class properties.

Function arguments

Python is very flexible with argument handling for functions: you can set default values there, allow a flexible amount of positional or keyword arguments (*args and **kwargs). When you pass values to a function, you can define by name to which argument that value should be assigned. All that in a way is now possible in JavaScript too.

Default values for function arguments in Python can be defined like this:

from pprint import pprint

def report(post_id, reason='not-relevant'):
    pprint({'post_id': post_id, 'reason': reason})
    
report(42)
report(post_id=24, reason='spam')

In JavaScript that can be achieved similarly:

function report(post_id, reason='not-relevant') {
    console.log({post_id: post_id, reason: reason});
}

report(42);
report(post_id=24, reason='spam');

Positional arguments in Python can be accepted using the * operator like this:

from pprint import pprint

def add_tags(post_id, *tags):
    pprint({'post_id': post_id, 'tags': tags})
    
add_tags(42, 'python', 'javascript', 'django')

In JavaScript positional arguments can be accepted using the ... operator:

function add_tags(post_id, ...tags) {
    console.log({post_id: post_id, tags: tags});
}

add_tags(42, 'python', 'javascript', 'django');    

Keyword arguments are often used in Python when you want to allow a flexible amount of options:

from pprint import pprint

def create_post(**options):
    pprint(options)

create_post(
    title='Hello, World!', 
    content='This is our first post.',
    is_published=True,
)
create_post(
    title='Hello again!',
    content='This is our second post.',
)

A common practice to pass multiple optional arguments to a JavaScript function is through a dictionary object, for example, options.

function create_post(options) {
    console.log(options);
}

create_post({
    'title': 'Hello, World!', 
    'content': 'This is our first post.',
    'is_published': true
});
create_post({
    'title': 'Hello again!', 
    'content': 'This is our second post.'
});

Classes and inheritance

Python is an object-oriented language. Since ECMAScript 6 standard support, it's also possible to write object-oriented code in JavaScript without hacks and weird prototype syntax.

In Python you would create a class with the constructor and a method to represent its instances textually like this:

class Post(object):
    def __init__(self, id, title):
        self.id = id
        self.title = title
        
    def __str__(self):
        return self.title

post = Post(42, 'Hello, World!')
isinstance(post, Post) == True
print(post)  # Hello, World!

In JavaScript to create a class with the constructor and a method to represent its instances textually, you would write:

class Post {
    constructor (id, title) {
        this.id = id;
        this.title = title;
    }
    toString() {
        return this.title;
    }
}

post = new Post(42, 'Hello, World!');
post instanceof Post === true;
console.log(post.toString());  // Hello, World!

Now we can create two classes Article and Link in Python that will extend the Post class. Here you can also see how we are using super to call methods from the base Post class.

class Article(Post):
    def __init__(self, id, title, content):
        super(Article, self).__init__(id, title)
        self.content = content

class Link(Post):
    def __init__(self, id, title, url):
        super(Link, self).__init__(id, title)
        self.url = url
        
    def __str__(self):
        return '{} ({})'.format(
            super(Link, self).__str__(),
            self.url,
        )
           
article = Article(1, 'Hello, World!', 'This is my first article.')
link = Link(2, 'DjangoTricks', 'https://djangotricks.blogspot.com')
isinstance(article, Post) == True
isinstance(link, Post) == True
print(link)
# DjangoTricks (https://djangotricks.blogspot.com)

In JavaScript the same is also doable by the following code:

class Article extends Post {
    constructor (id, title, content) {
        super(id, title);
        this.content = content;
    }
}

class Link extends Post {
    constructor (id, title, url) {
        super(id, title);
        this.url = url;
    }
    toString() {
        return super.toString() + ' (' + this.url + ')';
    }
}

article = new Article(1, 'Hello, World!', 'This is my first article.');
link = new Link(2, 'DjangoTricks', 'https://djangotricks.blogspot.com');
article instanceof Post === true;
link instanceof Post === true;
console.log(link.toString());
// DjangoTricks (https://djangotricks.blogspot.com)

Class properties: getters and setters

In object oriented programming, classes can have attributes, methods, and properties. Properties are a mixture of attributes and methods. You deal with them as attributes, but in the background they call special getter and setter methods to process data somehow before setting or returning to the caller.

The basic wireframe for getters and setters of the slug property in Python would be like this:

class Post(object):
    def __init__(self, id, title):
        self.id = id
        self.title = title
        self._slug = ''
        
    @property
    def slug(self):
        return self._slug
        
    @slug.setter
    def slug(self, value):
        self._slug = value
            
post = new Post(1, 'Hello, World!')
post.slug = 'hello-world'
print(post.slug)

In JavaScript getters and setters for the slug property can be defined as:

class Post {
    constructor (id, title) {
        this.id = id;
        this.title = title;
        this._slug = '';
    }
    
    set slug(value) {
        this._slug = value;
    }
    
    get slug() {
        return this._slug;
    }
}

post = new Post(1, 'Hello, World!');
post.slug = 'hello-world';
console.log(post.slug);

The Takeaways

  • In both languages, you can define default argument values for functions.
  • In both languages, you can pass a flexible amount of positional or keyword arguments for functions.
  • In both languages, object-oriented programming is possible.

As you might have noticed, I am offering a cheat sheet with the full list of equivalents in Python and JavaScript that you saw here described. At least for me, it is much more convenient to have some printed sheet of paper with valuable information next to my laptop, rather than switching among windows or tabs and scrolling to get the right piece of snippet. So I encourage you to get this cheat sheet and improve your programming!

Get the Ultimate Cheat Sheet of Equivalents in Python and JavaScript

Use it for good!


Cover photo by Andre Benz

2018-07-06

Equivalents in Python and JavaScript. Part 3

This is the 3rd part of 4-article series about analogies in Python and JavaScript. In the previous parts we covered a large part of the traditional Python 2.7 and JavaScript based on the ECMAScript 5 standard. This time we will start looking into Python 3.6 and JavaScript based on the ECMAScript 6 standard. ECMAScript 6 standard is pretty new and supported only the newest versions of browsers. For older browsers you will need Babel to compile your next-generation JavaScript code to the cross-browser-compatible equivalents. It opens the door to so many interesting things to explore. We will start from string interpolation, unpacking lists, lambda functions, iterations without indexes, generators, and sets!

Variables in strings

The old and inefficient way to build strings with values from variables is this concatenation:

name = 'World'
value = 'Hello, ' + name + '!\nWelcome!'

This can get very sparse and difficult to read. Also it is very easy to miss whitespaces in the sentence around variables.

Since Python version 3.6 and JavaScript based on the ECMAScript 6 standard, you can use so called string interpolation. These are string templates which are filled in with values from variables.

In Python they are also called f-string, because their notation starts with letter "f":

name = 'World'
value = f"""Hello, {name}!
Welcome!"""

price = 14.9
value = f'Price: {price:.2f} €'  # 'Price: 14.90 €'

In JavaScript string templates start and end with backticks:

name = 'World';
value = `Hello, ${name}!
Welcome!`;

price = 14.9;
value = `Price ${price.toFixed(2)} €`;  // 'Price: 14.90 €'

Note that string templates can be of a single line as well as of multiple lines. For f-strings in Python you can pass the format for variables, but you can't call methods of a variable unless they are properties and call getter methods.

Unpacking lists

Python and now JavaScript has an interesting feature to assign items of sequences into separate variables. For example, we can read the three values of a list into variables a, b, and c with the following syntax:

[a, b, c] = [1, 2, 3]

For tuples the parenthesis can be omitted. The following is a very popular way to swap values of two variables in Python:

a = 1
b = 2
a, b = b, a  # swap values

With the next generation JavaScript this can also be achieved:

a = 1;
b = 2;
[a, b] = [b, a];  // swap values

In Python 3.6 if we have an unknown number of items in a list or tuple, we can assign them to a tuple of several values while also unpacking the rest to a list:

first, second, *the_rest = [1, 2, 3, 4]
# first == 1
# second == 2
# the_rest == [3, 4]

This can also be done with JavaScript (ECMAScript 6):

[first, second, ...the_rest] = [1, 2, 3, 4];
// first === 1
// last === 2
// the_rest === [3, 4]

Lambda functions

Python and JavaScript have a very neat functionality to create functions in a single line. These functions are called lambdas. Lambdas are very simple functions that take one or more arguments and return some calculated value. Usually lambdas are used when you need to pass a function to another function as a callback or as a function to manipulate every separate elements in a sequence.

In Python, you would define a lambda using the lambda keyword, like this:

sum = lambda x, y: x + y
square = lambda x: x ** 2

In JavaScript lambdas use the => notation. If there are more than one arguments, they have to be in parenthesis:

sum = (x, y) => x + y;
square = x => Math.pow(x, 2);

Iteration without indexes

Many programming languages allow iterating through a sequence only by using indexes and incrementing their values. Then to get an item at some position, you would read it from an array, for example:

for (i=0; i<items.length; i++) {
    console.log(items[i]);
}

This is not a nice syntax and is very technical - it doesn't read naturally. What we really want is just to grab each value from the list. And Python has a very neat possibility just to iterate through the elements:

for item in ['A', 'B', 'C']:
    print(item)

In the modern JavaScript this is also possible with the for..of operator:

for (let item of ['A', 'B', 'C']) {
    console.log(item);
}

You can also iterate through a string characters in Python:

for character in 'ABC':
    print(character)

And in the modern JavaScript:

for (let character of 'ABC') {
    console.log(character);
}

Generators

Python and modern JavaScript has a possibility to define special functions through which you can iterate. With each iteration they return the next generated value in a sequence. These functions are called generators. With generators you can get numbers in a range, lines from a file, data from different paginated API calls, fibonacci numbers, and any other dynamicly generated sequences.

Technically generators are like normal functions, but instead of returning a value, they yield a value. This value will be returned for one iteration. And this generation happens as long as the end of the function is reached.

To illustrate that, the following Python code will create a generator countdown() which will return numbers from the given one back to 1, (like 10, 9, 8, ..., 1):

def countdown(counter):
    while counter > 0:
        yield counter
        counter -= 1
        
for counter in countdown(10):
    print(counter)

Exactly the same can be achieved in modern JavaScript, but notice the asterisk at the function statement. It defines that it's a generator:

function* countdown(counter) {
    while (counter > 0) {
        yield counter;
        counter--;
    }
}
for (let counter of countdown(10)) {
    console.log(counter);
}

Sets

We already had a look at lists, tuples and arrays. But here is another type of data - sets. Sets are groups of elements that ensure that each element there exists only once. Set theory also specifies set operations like union, intersection, and difference, but we won't cover them here today.

This is how to create a set, add elements to it, check if a value exists, check the total amount of elements in a set, and iterate through its values, and remove a value using Python:

s = set(['A'])
s.add('B'); s.add('C')
'A' in s
len(s) == 3
for elem in s:
    print(elem)
s.remove('C')

Here is how to achieve the same in modern JavaScript:

s = new Set(['A']);
s.add('B').add('C');
s.has('A') === true;
s.size === 3;
for (let elem of s.values()) {
    console.log(elem);
}
s.delete('C')

The Takeaways

  • String interpolation or literal templates allows you to have much cleaner and nicer code even with a possibility to have multiple lines of text.
  • You can iterate through elements in a sequence or group without using indexes.
  • Use generators when you have a sequence of nearly unlimited elements.
  • Use sets when you want to ensure fast check if data exists in a collection.
  • Use lambdas when you need short and clear single-line functions.

As you know from the previous parts, I am offering a cheat sheet with the whole list of equivalents in Python and JavaScript, both, time honored and future proof. To have something printed in front of your eyes is much more convenient than switching among windows or scrolling up and down until you find what you exactly were searching for. So I suggest you to get the cheat sheet and use it for good!

Get the Ultimate Cheat Sheet of Equivalents in Python and JavaScript

In the next and last part of the series, we will have a look at function arguments, classes, inheritance, and properties. Stay tuned!


Cover photo by Alex Knight

2018-07-03

Equivalents in Python and JavaScript. Part 2

Last time we started a new series of articles about analogies in Python and JavaScript. We had a look at lists, arrays, dictionaries, objects, and strings, conditional assignments, and parsing integers. This time we will go through more interesting and more complex things like serializing dictionaries and lists to JSON, operations with regular expressions, as well as raising and catching errors.

JSON

When working with APIs it is very usual to serialize objects to JSON format and be able to parse JSON strings.

In Python it is done with the json module like this:

import json
json_data = json.dumps(dictionary, indent=4)
dictionary = json.loads(json_data)

Here we'll indent the nested elements in the JSON string by 4 spaces.

In JavaScript there is a JSON object that has methods to create and parse JSON strings:

json_data = JSON.stringify(dictionary, null, 4);
dictionary = JSON.parse(json_data);

Splitting strings by regular expressions

Regular expressions are multi-tool that once you master, you can accomplish lots of things.

In the last article, we saw how one can join lists of strings into a single string. But how can you split a long string into lists of strings? What if the delimiter can be not a single character as the comma, but a range of possible variations? This can be done with regular expressions and the split() method.

In Python, the split() method belongs to the regular expression pattern object. This is how you could split a text string into sentences by punctuation marks:

import re

# One or more characters of "!?." followed by whitespace
delimiter = re.compile(r'[!?\.]+\s*')

text = "Hello!!! What's new? Follow me."
sentences = delimiter.split(text)
# sentences == ['Hello', "What's new", 'Follow me', '']

In JavaScript the split() method belongs to the string:

// One or more characters of "!?." followed by whitespace
delimiter = /[!?\.]+\s*/;

text = "Hello!!! What's new? Follow me.";
sentences = text.split(delimiter)
// sentences === ["Hello", "What's new", "Follow me", ""]

Matching regular expression patterns in strings

Regular expressions are often used to validate data from the forms.

For example, to validate if the entered email address is correct, you would need to match it against a regular expression pattern. In Python that would look like this:

import re

# name, "@", and domain
pattern = re.compile(r'([\w.+\-]+)@([\w\-]+\.[\w\-.]+)')

match = pattern.match('hi@example.com')
# match.group(0) == 'hi@example.com'
# match.group(1) == 'hi'
# match.group(2) == 'example.com'

If the text matches the pattern, it returns a match object with the group() method to read the whole matched string, or separate captures of the pattern that were defined with the parenthesis. 0 means getting the whole string, 1 means getting the match in the first group, 2 means getting the match in the second group, and so on. If the text doesn't match the pattern, the None value will be returned.

In JavaScript the match() method belongs to the string and it returns either a match object, or null. Pretty similar:

// name, "@", and domain
pattern = /([\w.+\-]+)@([\w\-]+\.[\w\-.]+)/;

match = 'hi@example.com'.match(pattern);
// match[0] === 'hi@example.com'
// match[1] === 'hi'
// match[2] === 'example.com'

The match object in JavaScript acts as an array. Its value at the zeroth position is the whole matched string. The other indexes correspond to the captures of the pattern defined with the parenthesis.


Moreover, sometimes you need to search if a specific value exists in a string and at which letter position it will be found. That can be done with the search() method.

In Python this method belongs to the regular expression pattern and it returns the match object. The match object has the start() method telling at which letter position the match starts:

text = 'Say hi at hi@example.com'
first_match = pattern.search(text)
if first_match:
    start = first_match.start()  # start == 10

In JavaScript the search() method belongs to the string and it returns just an integer telling at which letter position the match starts. If nothing is found, -1 is returned:

text = 'Say hi at hi@example.com';
first_match = text.search(pattern);
if (first_match > -1) {
    start = first_match;  // start === 10
}

Replacing patterns in strings using regular expressions

Replacing with regular expressions usually happen when cleaning up data, or adding additional features. For example, we could take some text and make all email addresses clickable.

Python developers would use the sub() method of the regular expression pattern:

html = pattern.sub(
    r'<a href="mailto:\g<0>">\g<0></a>',
    'Say hi at hi@example.com',
)
# html == 'Say hi at <a href="mailto:hi@example.com">hi@example.com</a>'

JavaScript developers would use the replace() method of the string:

html = 'Say hi at hi@example.com'.replace(
    pattern, 
    '<a href="mailto:$&">$&</a>',
);
// html === 'Say hi at <a href="mailto:hi@example.com">hi@example.com</a>'

In Python the captures, also called as "backreferences", are accessible in the replacement string as \g<0>, \g<1>, \g<2>, etc. In JavaScript the same is accessible as $&, $1, $2, etc. Backreferences are usually used to wrap some strings or to switch places of different pieces of text.


It is also possible to replace a match with a function call. This can be used to do replacements within replacements or to count or collect some features of a text. For example, using replacements with function calls in JavaScript, I once wrote a fully functional HTML syntax highlighter.

Here let's change all email addresses in a text to UPPERCASE.

In Python, the replacement function receives the match object. We can use its group() method to do something with the matched text and return a text as a replacement:

text = pattern.sub(
    lambda match: match.group(0).upper(), 
    'Say hi at hi@example.com',
)
# text == 'Say hi at HI@EXAMPLE.COM'

In JavaScript the replacement function receives the whole match string, the first capture, the second capture, etc. We can do what we need with those values and then return some string as a replacement:

text = 'Say hi at hi@example.com'.replace(
    pattern,
    function(match, p1, p2) {
        return match.toUpperCase();
    }
);
// text === 'Say hi at HI@EXAMPLE.COM'

Error handling

Contrary to Python, client-side JavaScript normally isn't used for saving or reading files or connecting to remote databases. So try..catch blocks are quite rare in JavaScript compared to try..except analogy in Python.

Anyway, error handling can be used with custom user errors implemented and raised in JavaScript libraries and caught in the main code.

The following example in Python shows how to define a custom exception class MyException, how to raise it in a function, and how to catch it and handle in a try..except..finally block:

class MyException(Exception):
    def __init__(self, message):
        self.message = message
        
    def __str__(self):
        return self.message
        
def proceed():
    raise MyException('Error happened!')

try:
    proceed()
except MyException as err:
    print('Sorry! {}'.format(err))
finally:
    print('Finishing')    

The following example in JavaScript does exactly the same: here we define a MyException class, throw it in a function, and catch it and handle in the try..catch..finally block.

function MyException(message) {
   this.message = message;
   this.toString = function() {
       return this.message;
   }
}

function proceed() {
    throw new MyException('Error happened!');
}

try {
    proceed();
} catch (err) {
    if (err instanceof MyException) {
        console.log('Sorry! ' + err);
    }
} finally {
    console.log('Finishing');
}

The MyException class in both languages has a parameter message and a method to represent itself as a string using the value of the message.

Of course, exceptions should be raised/thrown just in the case of errors. And you define what is an error in your module design.

The Takeaways

  • Serialization to JSON is quite straightforward in both, Python and JavaScript.
  • Regular expressions can be used as multi-tools when working with textual data.
  • You can do replacements with function calls in both languages.
  • For more sophisticated software design you can use custom error classes.

As I mentioned last time, you can grab a side-by-side comparison of Python and JavaScript that I compiled for you (and my future self). Side by side you will see features from traditional list, array, dictionary, object, and string handling to modern string interpolation, lambdas, generators, sets, classes, and everything else. Use it for good.

Get the Ultimate Cheat Sheet of Equivalents in Python and JavaScript

In the next part of the series, we will have a look at textual templates, list unpacking, lambda functions, iteration without indexes, generators, and sets. Stay tuned!


Cover photo by Benjamin Hung.

2018-06-30

Equivalents in Python and JavaScript. Part 1

Although Python and JavaScript are quite different languages, there are some analogies which full stack Python developers should know when developing web projects. In this series of 4 parts, I will explore what is similar in each of those languages and what are the common ways to solve common problems. This is not meant to be a reference and I will skip the basics like primitive variable types, conditions, and loops. But I will dig into more complex structures and data operations using both, Python and JavaScript. Also, I will try to focus on the practical use cases. This series should be interesting for the developers of Django, Flask, or another Python framework who want to get a grasp of traditional and modern vanilla JavaScript. On the other hand, it will be useful for the front-enders who want to better understand how the backend is working and maybe even start their own Django website.

Parsing integers

We'll begin with integer parsing.

In Python that's straightforward:

number = int(text)

But in JavaScript you have to explain what number system you expect: decimal, octal, hexadecimal, or binary:

number = parseInt(text, 10);

To use the "normal" decimal number system we are passing number 10 as the second parameter of the parseInt() function. 8 goes for octal, 16 for hexadecimal, or 2 – for binary. If the second parameter is missing, the number in text starts with zero, and you are using a slightly older browser, the number in the text will be interpreted as octal. For example,

parseInt('012') == 10  // in some older browsers
parseInt('012', 10) == 12

And that can really mess up your calculations.

Conditional assignment

For conditional assignment, Python and JavaScript have different syntaxes, but conditional assignments are quite popular in both languages. That's popular, because it's just a single statement to have a condition checking, the true-case value, and the false-case value.

Since Python 2.7 you can write conditional assignments like this:

value = 'ADULT' if age >= 18 else 'CHILD'

In JavaScript conditional assignments are done using ternary operator ?:, similar to the ones in C, C++, C#, Java, Ruby, PHP, Perl, Swift, and ActionScript:

value = age >= 18? 'ADULT': 'CHILD';

Object attribute value by attribute name

The normal way to access an object's attribute is by the dot notation in both, Python and JavaScript:

obj.color = 'YELLOW'

But what if you want to refer to an attribute by its name saved as a string? For example, the attribute name could be coming from a list of attributes or the attribute name is combined from two strings like 'title_' + lang_code.

For that reason, in Python, there are functions getattr() and setattr(). I use them a lot.

attribute = 'color'
value = getattr(obj, attribute, 'GREEN')
setattr(obj, attribute, value)

In JavaScript you can treat an object like a dictionary and pass the attribute name in square brackets:

attribute = 'color';
value = obj[attribute] || 'GREEN';
obj[attribute] = value;

To retrieve a default value when an object has no such attribute, in Python, getattr() has the third parameter. In JavaScript, if obj attribute doesn't exist, it will return the undefined value. Then it can be OR-ed with the default value that you want to assign. That's a common practice in JavaScript that you can find in many JavaScript libraries and frameworks.

Dictionary value by key

This is similar to the previous one. The normal way to assign a dictionary's value by key in both languages is using the square brackets:

dictionary = {}
dictionary['color'] = 'YELLOW'

To read a value in Python you can use the square-bracket notation, but it will fail on non-existing keys with KeyError. The more flexible way is to use the get() method which returns None for non-existing keys. Also you can pass an optional default value as the second parameter:

key = 'color'
value = dictionary.get(key, 'GREEN')

In JavaScript you would use the same trick as with object attributes, because dictionaries and objects are the same there:

key = 'color';
value = dictionary[key] || 'GREEN';

Slicing lists and strings

Python has the slice [:] operator to get parts of lists, tuples, and similar more complex structures, for example Django QuerySets:

items = [1, 2, 3, 4, 5]
first_two = items[:2]      # [1, 2]
last_two = items[-2:]      # [4, 5]
middle_three = items[1:4]  # [2, 3, 4]

In JavaScript arrays have the slice() method with the same effect and similar usage:

items = [1, 2, 3, 4, 5];
first_two = items.slice(0, 2);     // [1, 2] 
last_two = items.slice(-2);        // [4, 5]
middle_three = items.slice(1, 4);  // [2, 3, 4]

But don't mix it up with the splice() method which modifies the original array!


The [:] slice operator in Python also works for strings:

text = 'ABCDE'
first_two = text[:2]      # 'AB'
last_two = text[-2:]      # 'DE'
middle_three = text[1:4]  # 'BCD'

In JavaScript strings just like arrays have the slice() method:

text = 'ABCDE';
first_two = text.slice(0, 2);    // 'AB'
last_two = text.slice(-2);       // 'DE'
middle_three = text.slice(1, 4); // 'BCD'

Operations with list items

In programming it is very common to collect and analyze sequences of elements. In Python that is usually done with lists and in JavaScript with arrays. They have similar syntax and operations, but different method names to add and remove values.

This is how to concatenate two lists, add one value to the end, add one value to the beginning, get and remove a value from the beginning, get and remove a value from the end, and delete a certain value by index in Python:

items1 = ['A']
items2 = ['B']
items = items1 + items2  # items == ['A', 'B']
items.append('C')        # ['A', 'B', 'C']
items.insert(0, 'D')     # ['D', 'A', 'B', 'C']
first = items.pop(0)     # ['A', 'B', 'C']
last = items.pop()       # ['A', 'B']
items.delete(0)          # ['B']

This is how to do exactly the same with arrays in JavaScript:

items1 = ['A'];
items2 = ['B'];
items = items1.concat(items2);  // items === ['A', 'B']
items.push('C');                // ['A', 'B', 'C']
items.unshift('D');             // ['D', 'A', 'B', 'C']
first = items.shift();          // ['A', 'B', 'C']
last = items.pop();             // ['A', 'B']
items.splice(0, 1);             // ['B']

Joining lists of strings

It is very common after having a list or array of strings, to combine them into one string by a separator like comma or new line.

In Python that is done by the join() method of a string where you pass the list or tuple. Although it might feel unnatural, you start with the separator there. But I can assure that you get used to it after several times of usage.

items = ['A', 'B', 'C']
text = ', '.join(items)  # 'A, B, C'

In JavaScript the array has the join() method where you pass the separator:

items = ['A', 'B', 'C'];
text = items.join(', ');  // 'A, B, C'

The Takeaways

  • List and tuples in Python are similar to arrays in JavaScript.
  • Dictionaries in Python are similar to objects in JavaScript.
  • Strings in Python are similar to strings in JavaScript.
  • Numbers in JavaScript should be parsed with care.
  • Single-line conditional assignments exist in both languages.
  • Joining sequences of strings in Python is confusing, but you can quickly get used to it.

I compiled the whole list of equivalents of Python and JavaScript to a cheat sheet that you can print out and use for good. Side by side it compares traditional Python 2.7 and JavaScript based on ECMAScript 5 standard, as well as newer Python 3.6 and JavaScript based on ECMAScript 6 standard with such goodies as string interpolation, lambdas, generators, classes, etc.

Get the Ultimate Cheat Sheet of Equivalents in Python and JavaScript

In the next part of the series, we will have a look at JSON creation and parsing, operations with regular expressions, and error handling. Stay tuned!


Cover photo by Benjamin Hung.

2018-06-17

Data Filtering in a Django Website using Elasticsearch

In my Web Development with Django Cookbook section Forms and Views there is a recipe Filtering object lists. It shows you how to filter a Django QuerySet dynamically by different filter parameters selected in a form. From practice, the approach is working well, but with lots of data and complex nested filters, the performance might get slow. You know - because of all those INNER JOINS in SQL, the page might take even 12 seconds to load. And that is not preferable behavior. I know that I could denormalize the database or play with indices to optimize SQL. But I found a better way to increase the loading speed. Recently we started using Elasticsearch for one of the projects and its data filtering performance seems to be enormously faster: in our case, it increased from 2 to 16 times depending on which query parameters you choose.

What is Elasticsearch?

Elasticsearch is java-based search engine which stores data in JSON format and allows you to query it using special JSON-based query language. Using elasticsearch-dsl and django-elasticsearch-dsl, I can bind my Django models to Elasticsearch indexes and rewrite my object list views to use Elasticsearch queries instead of Django ORM. The API of Elasticsearch DSL is chainable like with Django QuerySets or jQuery functions, and we'll have a look at it soon.

The Setup

At first, let's install Elasticsearch server. Elasticsearch is quite a complex system, but it comes with convenient configuration defaults.

On macOS you can install and start the server with Homebrew:

$ brew install elasticsearch
$ brew services start elasticsearch

For other platforms, the installation instructions are also quite clear.

Then in your Django project's virtual environment install django-elasticsearch-dsl. I guess, "DSL" stands for "domain specific language".

With pipenv it would be the following from the project's directory:

$ pipenv install django-elasticsearch-dsl

If you are using just pip and virtual environment, then you would do this with your project's environment activated.

(venv)$ pip install django-elasticsearch-dsl

This, in turn, will install related lower level client libraries: elasticsearch-dsl and elasticsearch-py.

In the Django project settings, add 'django_elasticsearch_dsl' to INSTALLED_APPS.

Finally, add the lines defining default connection configuration there:

ELASTICSEARCH_DSL={
    'default': {
        'hosts': 'localhost:9200'
    },
}

Elasticsearch Documents for Django Models

For the illustration how to use Elasticsearch with Django, I'll create Author and Book models, and then I will create Elasticsearch index document for the books.

models.py

# -*- coding: UTF-8 -*-
from __future__ import unicode_literals

from django.db import models
from django.utils.translation import ugettext_lazy as _
from django.utils.encoding import python_2_unicode_compatible


@python_2_unicode_compatible
class Author(models.Model):
    first_name = models.CharField(_("First name"), max_length=200)
    last_name = models.CharField(_("Last name"), max_length=200)
    author_name = models.CharField(_("Author name"), max_length=200)

    class Meta:
        verbose_name = _("Author")
        verbose_name_plural = _("Authors")
        ordering = ("author_name",)

    def __str__(self):
        return self.author_name


@python_2_unicode_compatible
class Book(models.Model):
    title = models.CharField(_("Title"), max_length=200)
    authors = models.ManyToManyField(Author, verbose_name=_("Authors"))
    publishing_date = models.DateField(_("Publishing date"), blank=True, null=True)
    isbn = models.CharField(_("ISBN"), blank=True, max_length=20)

    class Meta:
        verbose_name = _("Book")
        verbose_name_plural = _("Books")
        ordering = ("title",)

    def __str__(self):
        return self.title

Nothing fancy here. Just an Author model with fields id, first_name, last_name, author_name, and a Book model with fields id, title, authors, publishing_date, and isbn. Let's go to the documents.

documents.py

In the same directory of your app, create documents.py with the following content:

# -*- coding: UTF-8 -*-
from __future__ import unicode_literals

from django_elasticsearch_dsl import DocType, Index, fields
from .models import Author, Book

# Name of the Elasticsearch index
search_index = Index('library')
# See Elasticsearch Indices API reference for available settings
search_index.settings(
    number_of_shards=1,
    number_of_replicas=0
)


@search_index.doc_type
class BookDocument(DocType):
    authors = fields.NestedField(properties={
        'first_name': fields.TextField(),
        'last_name': fields.TextField(),
        'author_name': fields.TextField(),
        'pk': fields.IntegerField(),
    }, include_in_root=True)

    isbn = fields.KeywordField(
        index='not_analyzed',
    )

    class Meta:
        model = Book # The model associated with this DocType

        # The fields of the model you want to be indexed in Elasticsearch
        fields = [
            'title',
            'publishing_date',
        ]
        related_models = [Author]

    def get_instances_from_related(self, related_instance):
        """If related_models is set, define how to retrieve the Book instance(s) from the related model."""
        if isinstance(related_instance, Author):
            return related_instance.book_set.all()

Here we defined a BookDocument which will have fields: title, publishing_date, authors, and isbn.

The authors will be a list of nested dictionaries at the BookDocument. The isbn will be a KeywordField which means that it will be not tokenized, lowercased, nor otherwise processed and handled the whole as is.

The values for those document fields will be read from the Book model.

Using signals, the document will be automatically updated either when a Book instance or Author instance is added, changed, or deleted. In the method get_instances_from_related(), we tell the search engine which books to update when an author is updated.

Building the Index

When the index document is ready, let's build the index at the server:

(venv)$ python manage.py search_index --rebuild

Django QuerySets vs. Elasticsearch Queries

The concepts of SQL and Elasticsearch queries are quite different. One is working with relational tables and the other works with dictionaries. One is using queries that are kind of human-readable logical sentences and another is using nested JSON structures. One is using the content verbosely and another does string processing in the background and gives search relevance for each result.

Even when there are lots of differences, I will try to draw analogies between Django ORM and elasticsearch-dsl API as close as possible.

1. Query definition

Django QuerySet:

queryset = MyModel.objects.all()

Elasticsearch query:

search = MyModelDocument.search()

2. Count

Django QuerySet:

queryset = queryset.count()

Elasticsearch query:

search = search.count()

3. Iteration

Django QuerySet:

for item in queryset:
    print(item.title)

Elasticsearch query:

for item in search:
    print(item.title)

4. To see the generated query:

Django QuerySet:

>>> queryset.query

Elasticsearch query:

>>> search.to_dict()

5. Filter by single field containing a value

Django QuerySet:

queryset = queryset.filter(my_field__icontains=value)

Elasticsearch query:

search = search.filter('match_phrase', my_field=value)

6. Filter by single field equal to a value

Django QuerySet:

queryset = queryset.filter(my_field__exact=value)

Elasticsearch query:

search = search.filter('match', my_field=value)

If a field type is a string, not a number, it has to be defined as KeywordField in the index document:

my_field = fields.KeywordField()

7. Filter with either of the conditions (OR)

Django QuerySet:

from django.db import models
queryset = queryset.filter(
    models.Q(my_field=value) |
    models.Q(my_field2=value2)
)

Elasticsearch query:

from elasticsearch_dsl.query import Q
search = search.query(
    Q('match', my_field=value) |
    Q('match', my_field2=value2)
)

8. Filter with all of the conditions (AND)

Django QuerySet:

from django.db import models
queryset = queryset.filter(
    models.Q(my_field=value) &
    models.Q(my_field2=value2)
)

Elasticsearch query:

from elasticsearch_dsl.query import Q
search = search.query(
    Q('match', my_field=value) & 
    Q('match', my_field2=value2)
)

9. Filter by values less than or equal to certain value

Django QuerySet:

from datetime import datetime

queryset = queryset.filter(
    published_at__lte=datetime.now(),
)

Elasticsearch query:

from datetime import datetime

search = search.filter(
    'range',
    published_at={'lte': datetime.now()}
)

10. Filter by a value in a nested field

Django QuerySet:

queryset = queryset.filter(
    category__pk=category_id,
)

Elasticsearch query:

search = search.filter(
    'nested', 
    path='category', 
    query('match', category__pk=category_id)
)

11. Filter by one of many values in a related model

Django QuerySet:

queryset = queryset.filter(
    category__pk__in=category_ids,
)

Elasticsearch query:

from django.utils.six.moves import reduce
from elasticsearch_dsl.query import Q

search = search.query(
    reduce(operator.ior, [
        Q(
            'nested', 
            path='category', 
            query('match', category__pk=category_id),
        )
        for category_id in category_ids
    ])
)

Here the reduce() function combines a list of Q() conditions using the bitwise OR operator (|).

12. Ordering

Django QuerySet:

queryset = queryset.order_by('-my_field', 'my_field2')

Elasticsearch query:

search = search.sort('-my_field', 'my_field2')

13. Creating query dynamically

Django QuerySet:

import operator
from django.utils.six.moves import reduce

filters = []
if value1:
    filters.append(models.Q(
        my_field1=value1,
    ))
if value2:
    filters.append(models.Q(
        my_field2=value2,
    ))
queryset = queryset.filter(
    reduce(operator.iand, filters)
)

Elasticsearch query:

import operator
from django.utils.six.moves import reduce
from elasticsearch_dsl.query import Q

queries = []
if value1:
    queries.append(Q(
        'match',
        my_field1=value1,
    ))
if value2:
    queries.append(Q(
        'match',
        my_field2=value2,
    ))
search = search.query(
    reduce(operator.iand, queries)
)

14. Pagination

Django QuerySet:

from django.core.paginator import (
    Paginator, Page, EmptyPage, PageNotAnInteger
)

paginator = Paginator(queryset, paginate_by)
page_number = request.GET.get('page')
try:
    page = paginator.page(page_number)
except PageNotAnInteger:
    page = paginator.page(1)
except EmptyPage:
    page = paginator.page(paginator.num_pages)

Elasticsearch query:

from django.core.paginator import (
    Paginator, Page, EmptyPage, PageNotAnInteger
)
from django.utils.functional import LazyObject

class SearchResults(LazyObject):
    def __init__(self, search_object):
        self._wrapped = search_object

    def __len__(self):
        return self._wrapped.count()

    def __getitem__(self, index):
        search_results = self._wrapped[index]
        if isinstance(index, slice):
            search_results = list(search_results)
        return search_results

search_results = SearchResults(search)

paginator = Paginator(search_results, paginate_by)
page_number = request.GET.get('page')
try:
    page = paginator.page(page_number)
except PageNotAnInteger:
    page = paginator.page(1)
except EmptyPage:
    page = paginator.page(paginator.num_pages)

ElasticSearch doesn't work with Django's pagination by default. Therefore, we have to wrap the search query with lazy SearchResults class to provide the necessary functionality.

Example

I built an example with books written about Django. You can download it from Github and test it.

Takeaways

  • Filtering with Elasticsearch is much faster than with SQL databases.
  • But it comes at the cost of additional deployment and support time.
  • If you have multiple websites using Elasticsearch on the same server, configure a new cluster and node for each of those websites.
  • Django ORM can be in a way mapped to Elasticsearch DSL.
  • I summarized the comparison of Django ORM and Elasticsearch DSL, mentioned in this article, into a cheat sheet. You can get it for a symbolic fee. Print it on a single sheet of paper and use it as a reference for your developments.

Buy Django ORM vs. Elasticsearch DSL Cheat Sheet


Cover photo by Karl Fredrickson.

2018-05-27

QuerySet Filters on Many-to-many Relations

Django ORM (Object-relational mapping) makes querying the database so intuitive, that at some point you might forget that SQL is being used in the background.

This year at the DjangoCon Europe Katie McLaughlin was giving a talk and mentioned one thing that affects the SQL query generated by Django ORM, depending on how you call the QuerySet or manager methods. This particularity is especially relevant when you are creating your QuerySets dynamically. Here it is. When you have a many-to-many relationship, and you try to filter objects by the fields of the related model, every new filter() method of a QuerySet creates a new INNER JOIN clause. I won't discuss whether that's a Django bug or a feature, but these are my observations about it.

The Books and Authors Example

Let's create an app with books and authors, where each book can be written by multiple authors.

# -*- coding: UTF-8 -*-
from __future__ import unicode_literals

from django.db import models
from django.utils.translation import ugettext_lazy as _
from django.utils.encoding import python_2_unicode_compatible


@python_2_unicode_compatible
class Author(models.Model):
    first_name = models.CharField(_("First name"), max_length=200)
    last_name = models.CharField(_("Last name"), max_length=200)
    author_name = models.CharField(_("Author name"), max_length=200)

    class Meta:
        verbose_name = _("Author")
        verbose_name_plural = _("Authors")
        ordering = ("author_name",)

    def __str__(self):
        return self.author_name


@python_2_unicode_compatible
class Book(models.Model):
    title = models.CharField(_("Title"), max_length=200)
    authors = models.ManyToManyField(Author, verbose_name=_("Authors"))
    publishing_date = models.DateField(_("Publishing date"), blank=True, null=True)

    class Meta:
        verbose_name = _("Book")
        verbose_name_plural = _("Books")
        ordering = ("title",)

    def __str__(self):
        return self.title

The similar app with sample data can be found in this repository.

Inefficient Filter

With the above models, you could define the following QuerySet to select books which author is me, Aidas Bendoraitis.

queryset = Book.objects.filter(
    authors__first_name='Aidas',
).filter(
    authors__last_name='Bendoraitis',
)

We can check what SQL query it would generate with str(queryset.query) (or queryset.query.__str__()).

The output would be something like this:

SELECT `libraryapp_book`.`id`, `libraryapp_book`.`title`, `libraryapp_book`.`publishing_date`
FROM `libraryapp_book`
INNER JOIN `libraryapp_book_authors` ON ( `libraryapp_book`.`id` = `libraryapp_book_authors`.`book_id` )
INNER JOIN `libraryapp_author` ON ( `libraryapp_book_authors`.`author_id` = `libraryapp_author`.`id` )
INNER JOIN `libraryapp_book_authors` T4 ON ( `libraryapp_book`.`id` = T4.`book_id` )
INNER JOIN `libraryapp_author` T5 ON ( T4.`author_id` = T5.`id` )
WHERE (`libraryapp_author`.`first_name` = 'Aidas' AND T5.`last_name` = 'Bendoraitis')
ORDER BY `libraryapp_book`.`title` ASC;

Did you notice, that the database table libraryapp_author was attached through the libraryapp_book_authors table to the libraryapp_book table TWICE where just ONCE would be enough?

Efficient Filter

On the other hand, if you are defining query expressions in the same filter() method like this:

queryset = Book.objects.filter(
    authors__first_name='Aidas',
    authors__last_name='Bendoraitis',
)

The generated SQL query will be much shorter and (theoretically) would perform faster:

SELECT `libraryapp_book`.`id`, `libraryapp_book`.`title`, `libraryapp_book`.`publishing_date`
FROM `libraryapp_book`
INNER JOIN `libraryapp_book_authors` ON ( `libraryapp_book`.`id` = `libraryapp_book_authors`.`book_id` )
INNER JOIN `libraryapp_author` ON ( `libraryapp_book_authors`.`author_id` = `libraryapp_author`.`id` )
WHERE (`libraryapp_author`.`first_name` = 'Aidas' AND `libraryapp_author`.`last_name` = 'Bendoraitis')
ORDER BY `libraryapp_book`.`title` ASC;

The same SQL query can be achieved using the Q() objects:

queryset = Book.objects.filter(
    models.Q(authors__first_name='Aidas') &
    models.Q(authors__last_name='Bendoraitis')
)

The Q() objects add a lot of flexibility to filters allowing to OR, AND, and negate query expressions.

Dynamic Filtering

So to have faster performance, when creating QuerySets dynamically, DON'T use filter() multiple times:

queryset = Book.objects.all()
if first_name:
    queryset = queryset.filter(
        authors__first_name=first_name,
    )
if last_name:
    queryset = queryset.filter(
        authors__last_name=last_name,
    )

DO this instead:

filters = models.Q()
if first_name:
    filters &= models.Q(
        authors__first_name=first_name,
    )
if last_name:
    filters &= models.Q(
        authors__last_name=last_name,
    )
queryset = Book.objects.filter(filters)

Here the empty Q() doesn't have any impact for the generated SQL query, so you don't need the complexity of creating a list of filters and then joining all of them with the bitwise AND operator, like this:

import operator
from django.utils.six.moves import reduce

filters = []
if first_name:
    filters.append(models.Q(
        authors__first_name=first_name,
    ))
if last_name:
    filters.append(models.Q(
        authors__last_name=last_name,
    ))
queryset = Book.objects.filter(reduce(operator.iand, filters))

Profiling

In DEBUG mode, you can check how long the previously executed SQL queries took by checking django.db.connection.queries:

>>> from django.db import connection
>>> connection.queries
[{'sql': 'SELECT …', 'time': '0.001'}, {'sql': 'SELECT …', 'time': '0.004'}]

The Takeaways

  • When querying many-to-many relationships, avoid using multiple filter() methods, make use of Q() objects instead.
  • You can check the SQL query of a QuerySet with str(queryset.query).
  • Check the performance of recently executed SQL queries with django.db.connection.queries.
  • With small datasets, the performance difference is not so obvious. For your specific cases you should do the benchmarks yourself.

Cover photo by Tobias Fischer.

2017-12-08

From MySQL to PostgreSQL

In this article I will guide you through the steps I had to take to migrate Django projects from MySQL to PostgreSQL.

MySQL database has proven to be a good start for small and medium scale projects. It is known and used widely and has good documentation. Also there are great clients for easy management, like phpMyAdmin (web), HeidiSQL (windows), or Sequel Pro (macOS). However, in my professional life there were unfortunate moments, when databases from different projects crashed because of large queries or file system errors. Also I had database integrity errors which appeared in the MySQL databases throughout years because of different bugs at the application level.

When one thinks about scaling a project, they have to choose something more suitable. It should be something that is fast, reliable, and well supports ANSI standards of relational databases. Something that most top Django developers use. And the database of choice for most professionals happens to be PostgreSQL. PostgreSQL enables using several vendor-specific features that were not possible with MySQL, e.g. multidimensional arrays, JSON fields, key-value pair fields, special case-insensitive text fields, range fields, special indexes, full-text search, etc. For a newcomer, the best client that I know - pgAdmin (macOS, linux, windows) - might seem too complex at first, compared with MySQL clients, but as you have Django administration and handy ORM, you probably won't need to inspect the database in raw format too often.

So what does it take to migrate from MySQL to PostgreSQL? We will do that in a few steps and we will be using pgloader to help us with data migration. I learned about this tool from Louise Grandjonc, who was giving a presentation about PostgreSQL query optimization at DjangoCon Europe 2017.

One prerequisite for the migration are passing tests. You need to have functional tests to check if all pages are functioning correctly and unit tests to check at least the most critical or complex classes, methods, and functions.

1. Prepare your MySQL database

Make sure that your production MySQL database migration state is up to date:

(env)$ python manage.py migrate --settings=settings.production

Then create a local copy of your production MySQL database. We are going to use it for the migration.

2. Install pgloader

As I mentioned before, for the database migration we will use a tool called pgloader (version 3.4.1 or higher). This tool was programmed by Dimitri Fontaine and is available as an open source project on GitHub. You can compile the required version from the source. Or if you are using macOS, you can install it with Homebrew:

$ brew update
$ brew install pgloader

Note that PostgreSQL will also be installed as a dependency.

3. Create a new PostgreSQL user and database

Unlike with MySQL, creating new database users and databases with PostgreSQL usually happen in the shell rather than in the database client.

Let's create a user and database with the same name myproject.

$ createuser --createdb --password myproject
$ createdb --username=myproject myproject

The --createdb parameter will enable privilege to create databases. The --password parameter will offer to enter a password. The --username parameter will set the owner of the created database.

4. Create the schema

Link the project to this new PostgreSQL database in the settings, for example:

DATABASES = {
    'postgresql': {
        'ENGINE': 'django.db.backends.postgresql_psycopg2',
        'NAME': get_secret("DATABASE_NAME"),
        'USER': get_secret("DATABASE_USER"),
        'PASSWORD': get_secret("DATABASE_PASSWORD"),
    },
}
DATABASES['default'] = DATABASES['postgresql']

Here the custom get_secret() function returns sensitive information from environment variables or a text file that is not tracked under version control. Its implementation is up to you.

Run the migrations to create tables and foreign key constraints in the new PostgreSQL database:

(env)$ python manage.py migrate --settings=settings.local

5. Create the data migration script

The pgloader uses configuration files with the settings defining how to deal with migrations. Let's create the configuration file myproject.load with the following content:

LOAD DATABASE
    FROM mysql://mysql_username:mysql_password@localhost/mysql_dbname
    INTO postgresql:///myproject
    WITH truncate, data only, disable triggers, preserve index names, include no drop, reset sequences
    ALTER SCHEMA 'mysql_dbname' RENAME TO 'public'
;

6. Run data migration

Now it's time to copy the data:

$ pgloader myproject.load

Typically you will get a bunch of warnings about type conversions. These can probably be ignored, because the script will take its best guess how to convert data when importing. If in addition you get errors about duplicated data or tables with foreign keys to missing entries, you will need to fix the issues at MySQL database and then repeat the process. In that case, clean up the MySQL database, update your local copy, recreate PostgreSQL database with dropdb and createdb commands, run Django migrations to create the database schema, and copy the data again.

7. Adapt the code

When the database is successfully migrated, we should run Django project tests and fix all PostgreSQL-specific problems in the project's code. The code running Django ORM will run smoothly, but very likely there will be issues with raw SQL, QuerySet's extra() method, and type conversions.

Typically, these are the differences that you might have to keep in mind:

  • String values in PostgreSQL queries are always quoted with 'single quotes'.

  • PostgreSQL doesn't convert types when comparing values automatically as MySQL does. If you use any raw SQL, you will need to do some casting before comparison like CAST(blog_post.id AS text) = likes_like.object_id or blog_post.id::text = likes_like.object_id. The latter double-colon syntax is not understood by MySQL, so if you want to support both databases, you will need to write vendor-specific cases for each database management system.

  • PostgreSQL is case-sensitive for string comparisons, so in the QuerySet filters you will need to use *__iexact lookup instead *__exact and *__icontains lookup instead of *__contains.

  • When ordering, convert the column to lowercase with the Lower() function:

    from django.db import models
    posts = Post.objects.order_by(models.Lower('title'))
  • When using *__in lookup, make sure that the type of the listed elements matches the type of the model field. For example, you may have a Like model with generic relation, i.e. content_type and object_id fields that together combine a generic foreign key to any model instance. The object_id field is usually of a string type, but the IDs of related models might be integers, strings, or UUIDs. If you then want to get the liked Posts which primary keys are integers, you would need to convert the object_id values to integers before assigning them to the pk__in lookup in the filter:

    liked_ids = map(int, Like.objects.filter(
        user=request.user,
        content_type=ContentType.objects.get_for_model(Post)
    ).values("object_id", flat=True))
    
    liked_posts = Post.objects.filter(pk__in=liked_ids)

8. Repeat the process for production

When you are sure that the migration process is fluent and all Django tests pass, you can take your production website down, repeat the migration process locally with the latest production data, copy the migrated local database to production server, update the production code, install new dependencies, and take the website back online.

To create a database dump you can use command:

$ pg_dump --format=c --compress=9 --file=myproject.backup myproject

To restore or create the database from dump use commands:

$ dropdb --username=pgsql myproject
$ createdb --username=myproject myproject
$ pg_restore --dbname=myproject --role=myproject --schema=public myproject.backup

I might probably miss some points and there are some ways to automate the upgrade process for production, but you got the idea.

Conclusion

PostgreSQL is more restrictive than MySQL, but it provides greater performance, more stability, and better compliance with standards. In addition, in PostgreSQL there is a bunch of features that were not available in MySQL. If you are lucky, you can switch your project from MySQL to PostgreSQL in one day.


Cover photo by Casey Allen

2017-09-27

Numbers in Translatable Strings

Sentences in websites like "You've got 1 messages." or "Messages you've got: 5" sound unnatural and not human-friendly. But the GNU gettext tool used with Django for translations has an option to define different pluralization depending on the number which goes together with the counted noun. Things get even more interesting with certain languages which have not just singular and plural like English, German, French, or Spanish, but more plural forms or just a single one.

Tell me the background

Let's talk about grammar. Most languages have two plural forms for counted elements: one for singular, like "1 thing", and one for plural, like "n things". However, certain languages have either just one form for singular and plural, or multiple plural forms depending on the number of elements that go with them.

For example, my mother tongue Lithuanian is a Baltic language coming from Indo-European language family keeping archaic features from ancient Sanskrit. Lithuanian has 3 plural forms. When one counts apples in Lithuanian, they say "1 obuolys", "2-9 obuoliai", "10-20 obuol", "21 obuolys", "22-29 obuoliai", "30 obuol", "31 obuolys", "32-39 obuoliai", etc.

The second most widespread language on the web after English is Russian. Russian is an Eastern Slavic language from Indo-European language family officially used as the main language in Russia, Belarus, Kazakhstan, Kyrgyzstan and some smaller countries. Russian is using a special Cyrillic alphabet and it has 3 plural forms too. When one counts apples in Russian, they say "1 яблоко", "2-4 яблока", "5-20 яблок", "21 яблоко", "22-24 яблока", "25-30 яблок", etc.

Arabic is the 5th most spoken language in the world. It is written from right to left and Arabic language is an interesting example having even 6 plural forms. When counting apples, they would say:

‫"0 تفاحة"، "تفاح واحدة"، "تفاحتين"، "3-10 التفاح"، "11-99 التفاح"، "100-102 التفاح"

OK OK, with apples starting from 3 it's all the same, but theoretically it differs with other words or in different contexts.

On the contrary, Japanese - East Asian language with 125 million speakers - has just one plural form. No matter, whether it's 1 apple or 100 apples, they will be counted using the same words: "りんご1個" or "りんご100個".

By the way, please correct me if there are any translation mistakes in my examples.

Show me some code

If you want to localize your Django website, you will need to do quite a bunch of things:

  1. Add the LANGUAGES setting in your settings:

    LANGUAGES = [
        ('ar', _('Arabic')),
        ('en', _('English')),
        ('ja', _('Japanese')),
        ('lt', _('Lithuanian')),
        ('ru', _('Russian')),
    ]
  2. Add 'django.middleware.locale.LocaleMiddleware' to your MIDDLEWARE list in the settings.

  3. Create a directory locale in your project directory with subdirectories called after each language code for which you need translations, e.g. ar, ja, lt, ru.

  4. Add LOCALE_PATHS in the settings to define where the translations will be localed:

    LOCALE_PATHS = [
        os.path.join(BASE_DIR, 'locale'),
    ]
  5. Use i18n_patterns() for your translatable URLs to prefix all paths with language code:

    from django.conf.urls import url
    from django.conf.urls.i18n import i18n_patterns
    
    from notifications.views import notification_list
    
    urlpatterns = i18n_patterns(
        url(r'^$', notification_list),
    )
  6. Use gettext() and its flavors in Python code and {% trans %} and {% blocktrans %} template tags in Django templates to define translatable strings.

  7. Use ungettext() in Python code to create translatable strings with counted elements:

    # using the new-style Python string format:
    notification = ungettext(
        "You've got {n} message.",
        "You've got {n} messages.",
        message_count,
    ).format(n=message_count)
    
    # using the old-style Python string format
    notification = ungettext(
        "You've got %(n)d message.",
        "You've got %(n)d messages.",
        message_count,
    ) % {'n': message_count}
  8. Use {% blocktrans %} with count to create translatable strings with counted elements in Django templates:

    {% load i18n %}
    
    {# will create the old-style Python string #}
    {% blocktrans trimmed count n=message_count %}
        You've got {{ n }} message.
    {% plural %}
        You've got {{ n }} messages.
    {% endblocktrans %}
  9. Run makemessages management command to collect translatable strings:

    (myenv)$ python manage.py makemessages --all
  10. Translate the English terms into other languages in the locale/*/LC_MESSAGES/django.po files.

  11. Compile translations into django.mo files using the compilemessages management command:

    (myenv)$ python manage.py compilemessages
  12. Restart the webserver to reload the translations.

So what about the plural forms?

As you might know, the most common translation in the *.po file looks like this:

    #: templates/base.html
    #, fuzzy
    msgid "My Original String"
    msgstr "My Translated String"

Very long strings are broken into multiple lines using the Pythonic concatenation without any joining symbol:

    msgstr ""
    "Very very very very very very ve"
    "ry very very very very very very"
    " very very very very long string."

Just before the msgid you see some comments where the string is being used, in what context, whether it is "fuzzy", i.e. not yet active, or what kind of format it is using for variables: old-style "python-format" like %(variable)s or new-style "python-brace-format" like {variable}.

The first msgid is an empty string which translation has some meta information about the translation file: language, translation timestamps, author information, contacts, version, etc. One piece of the meta information is the plural forms for that language. For example, Lithuanian part looks like this:

"Plural-Forms: nplurals=3; plural=(n%10==1 && n%100!=11 ? 0 : n%10>=2 && (n%100<10 || n%100>=20) ? 1 : 2);\n"

as in:

#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: 1.0.0\n"
"Report-Msgid-Bugs-To: admin@example.com\n"
"POT-Creation-Date: 2017-09-18 01:12+0000\n"
"PO-Revision-Date: 2017-12-12 17:20+0000\n"
"Last-Translator: Vardenis Pavardenis <vardenis@example.com>\n"
"Language-Team: Lithuanian <lt@example.com>\n"
"Language: Lithuanian\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Plural-Forms: nplurals=3; plural=(n%10==1 && n%100!=11 ? 0 : n%10>=2 && (n"
"%100<10 || n%100>=20) ? 1 : 2);\n"

It is using JavaScript-like syntax to define how many plural forms the language has, and what conditions define which type of the plural form each count gets.

Then the plurals are defined like this:

#: notifications/templates/notifications/notification_list.html:2
#, python-format
msgid "You've got %(n)s message."
msgid_plural "You've got %(n)s messages."
msgstr[0] "Jūs gavote %(n)s žinutę."
msgstr[1] "Jūs gavote %(n)s žinutes."
msgstr[2] "Jūs gavote %(n)s žinučių."

#: notifications/views.py:11
#, python-brace-format
msgid "You've got {n} message."
msgid_plural "You've got {n} messages."
msgstr[0] "Jūs gavote {n} žinutę."
msgstr[1] "Jūs gavote {n} žinutes."
msgstr[2] "Jūs gavote {n} žinučių."

Let's have a look at the other languages mentioned before. The Russian language would have plural forms defined like this:

"Plural-Forms: nplurals=3; plural=(n%10==1 && n%100!=11 ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2);\n"

Then then translations for each of the 3 forms would go like this:

#: notifications/views.py:11
#, python-brace-format
msgid "You've got {n} message."
msgid_plural "You've got {n} messages."
msgstr[0] "У вас есть {n} сообщение."
msgstr[1] "У вас есть {n} сообщения."
msgstr[2] "У вас есть {n} сообщений."

You would define 6 plural forms for the Arabic language:

"Plural-Forms: nplurals=6; plural=(n==0 ? 0 : n==1 ? 1 : n==2 ? 2 : n%100>=3 && n%100<=10 ? 3 : n%100>=11 ? 4 : 5);\n"

And the translations for Arabic would look like this:

#: notifications/views.py:11
#, python-brace-format
msgid "You've got {n} message."
msgid_plural "You've got {n} messages."
msgstr[0] "لديك {n} رسائل."
msgstr[1] "لديك رسالة واحدة."
msgstr[2] "لديك رسالتان."
msgstr[3] "لديك {n} رسائل."
msgstr[4] "لديك {n} رسالة."
msgstr[5] "لديك {n} رسالة."

The Japanese language would have just one plural form defined:

"Plural-Forms: nplurals=1; plural=0;\n"

And it would have just one translation:

#: notifications/views.py:11
#, python-brace-format
msgid "You've got {n} message."
msgid_plural "You've got {n} messages."
msgstr[0] "あなたはメッセージが{n}つを持っています。"

Tips to take away

  • Use the new-style Python format for variables whenever possible, because it is more understandable and less error prone for not-so-technical translators and it looks cleaner in the Python code.
  • Note that {% blocktrans %} template tag produces the old-style Python format for variables, whereas in Python code you can decide for yourself which format to use.
  • For the first entry msgstr[0], which usually represents singular form, don't replace the first {n} with 1 in the translation, because in many languages it also means 21, 31, 41, 101, etc. Let the variable be passed.
  • You can look up for the plural forms of a certain language at translatehouse.org. But the latest versions of Django also include some kind of plural forms, although they don't always match the conditions from the mentioned list.
  • If you want to edit plural forms more human-friendly than in a text editor, you can use the Poedit translation editor with graphical user interface. It shows the numbering cases listed, so you don't need reverse-engineer the conditions and guess about the leftovers in the else case.
  • Unfortunately, it is not possible to have multiple translatable counted objects in the same sentence using gettext. For example, "There are 5 apples, 3 pears, and 1 orange on the table" with changeable numbers is not a valid translatable sentence if you want to keep the counted elements human-friendly. To work around, you need to formulate three different translatable sentences.

Cover photo by Romain Vignes

2017-07-10

Domain Name for Django Development Server

Isn't it strange that browsing the web you usually access the websites by domain names, however, while developing a Django website, you usually access it through IP address? Wouldn't it be handy to navigate through your local website by domain name too? Let's have a look what possibilities there are to access the local development server by a domain name.

Access via IP Address

You probably know the following line by heart since the first day of developing with Django and can type it with closed eyes?

(myenv)$ python manage.py runserver

When you run a management command runserver, it starts a lightweight Django development server which by default listens to HTTP requests on your local machine's port 8000, whereas by default, HTTP websites are running on the 80 and HTTPS websites are running on 443. Enter http://127.0.0.1:8000 in a browser and you can click through your Django project.

Note that this is a local address and it is not accessible from other devices in the network. Other people accessing the same address from their computers will see what is provided by web servers on their own machines, if any web server is running there at all.

Each device in a local network has its own Internet Protocol (IP) address. There are two versions of IP addresses: IPv4, typically formed from 4 decimal numbers separated by dots (e.g. 197.160.2.1), and IPv6, formed from hexadecimal numbers separated by colons (e.g. [fe80::200:f8ff:fe21:67cf]). The IP address can be set automatically and generated dynamically when you connect to the network, or you can set it manually and make it static. For example, the printer in the network will usually have a static address, whereas a mobile phone or tablet will have a dynamically attached IP addresses.

If you want to access a responsive website on your computer from another device in the network, I recommend you to set the IP address manually in the network settings. It is much more convenient to have an address that doesn't change every time you connect to the same network - you can bookmark it or use in different configuration files. Just don't let it clash with the IP addresses of other devices in the network.

Then run the local development server passing IP address 0.0.0.0 and port 8000:

(myenv)$ python manage.py runserver 0.0.0.0:8000

The 0.0.0.0 is a special case. It allows you to access the website through any IP address that is assigned to your computer: 0.0.0.0 or 127.0.0.1, or the one that is set in your network settings. To access the website through any of those addresses, you will have to list those IP addresses in your Django setting ALLOWED_HOSTS.

Moreover, this allows you to check the website you are building through your computer's IP address, e.g. http://197.160.2.7:8000, not only from your computer, but from any smartphone, tablet, or another computer in the same local network. Also through the same IP address you can access the website from a virtual machine. For example, by installing Windows in Parallels Desktop on a Mac, you can test how Django websites behave in Opera, Microsoft Edge, or Internet Explorer.

Domain Names for Local Host

Sometimes you want to address the website you are developing using a unique host name. This is necessary either when you have subdomains which lead to different parts of the website (e.g. http://aidas.example.com should show my profile), or when you need to test social authentication (e.g. using Python Social Auth).

One of the ways to deal with that is configuring a hosts file, which allows to map host names to IP addresses manually. Unfortunately, the hosts file doesn't support wildcard entries, such as <anything>.example.com, so for any new subdomain, you will need to modify the file as a Super User on Unix-based operating systems or as System Administrator on Windows.

A better way is to use a wildcard domain name that points to the IP of local host: 127.0.0.1. You can either set it up yourself at a domain provider, or use one of the available services.

For example, localtest.me by Scott Forsyth allows you to have unlimited wildcard entries pointing to local host. So all of the following domains would show a website at local host:

http://localtest.me:8000
http://myproject.localtest.me:8000
http://aidas.myproject.localtest.me:8000

Whichever domains you need to make work, don't forget to add them to ALLOWED_HOSTS in the Django project settings.

This enables to use authentication at Facebook or payments by PayPal (except the Instant Payment Notification which we'll cover a little later).

Also you can test subdomain resolution. For example, Django context processor might parse the subdomain and add some context variables, or a middleware might parse the subdomain and rewrite the path or redirect to a specific view.

Unfortunately, you can't test the website from an iPhone or iPad, using such address. And setting up your own domain's Address Record (A record) to the static IP of a computer in a local network is too inconvenient.

Domain Names for Local IP

There is another service - xip.io provided by Basecamp which allows you to use a wild card domain entries pointing to specific IP address.

Supposing that your computer's IP address is 197.160.2.7, all of the following domains would show a website on your computer's local web server:

http://197.160.2.7.xip.io:8000
http://myproject.197.160.2.7.xip.io:8000
http://aidas.myproject.197.160.2.7.xip.io:8000

Add them to ALLOWED_HOSTS in the project settings and you can check the website from any capable device in the local network.

Unless you are using the standard port 80, you will always have to add the port number. Also your website will be shown unsecured under HTTP, not HTTPS, and in some cases you will need to test the Django website under secure conditions, for example, when creating a Facebook canvas app or working with payments.

Tunnelling

Sometimes you want to demonstrate your fresh website to other participants at a hackathon. Or you want to share your website temporarily with the interested colleagues or friends. Or you need to test services that use Webhooks - HTTP callbacks, that post data to your server on specific events, like Instant Payment Notification at PayPal or notifications about sent SMS messages at twilio.

One way to do that is to have a remote staging website and to deploy to it very often to test the development results. For that you need a specific domain and server, and probably some automation for deployment. Also you will need to log all activities and edit log files in Terminal - no ability to make use of handy visual PyCharm debugging with breakpoints.

This is quite inconvenient. Luckily, alternatives to this method exist.

Tunnels are systems making your local host open to the public Internet. Tunnels have a frontend - that's the server by which the website will be accessed, and backend - that's your own development machine. By creating a tunnel, you open access through a firewall from a frontend server to local servers running on specified ports.

The best known open source tunnelling systems are ngrok.com, localtunnel.me, and pagekite.net. Let's have a look at each of them.

ngrok.com

Although it is not under active development now - the last commit was more than a year ago - ngrok is the most popular one. At the time of writing, it has 10573 GitHub stars. The tool was coded in the go programming language.

The ngrok is a freemium service giving you one persistent session and one randomly generated subdomain for free, but if you want to customize the setup or even install it on your own servers, you have to pay an annual fee.

To start a tunnel for a local Django project, you would type the following in the Terminal:

$ ngrok http 8000

Then anybody on the Internet could access your http://127.0.0.1:8000 entering something like https://92832de0.ngrok.io in their browser's address bar.

The default ngrok configuration would also start a special website running at http://localhost:4040 that would show the details of the traffic to and from your Django website.

If you are a paying customer and want to have a custom subdomain for your website, you can start the tunnel typing this in the Terminal:

$ ngrok http -subdomain=myproject 8000

This would create a domain like https://myproject.ngrok.io that would show the content of the Django project on your local host.

Using Canonical Name Records (CNAME records) in DNS configuration, it is also possible to create tunnels within ngrok under custom domain names like https://dev.example.com, and even wildcard entries like https://<anything>.dev.example.com.

To restrict access only to specific users, you can also use the Basic authentication with the following command:

$ ngrok http -auth="username:password" 8000

localtunnel.me

This service was created overnight at a hackathon and then published and maintained as it proved to be a useful tool. Localtunnel.me doesn't require any user account, and it creates a temporary access to your localhost under a randomly generated subdomain like https://nkfmosjsgh.localtunnel.me or a custom subdomain like https://myproject.localtunnel.me if it is available. When you close the tunnel, the address is not saved for you for future usage.

Localtunnel is free and open source. If you want or need, you can install the frontend part on your own server, so called "on premise".

To start a tunnel you would normally type the following in the Terminal:

$ lt --port 8000

If you need a custom domain, you can also type this instead:

$ lt --port 8000 --subdomain myproject

Localtunnel is meant to be relatively simple for quick temporary access. Therefore, CNAME configuration and wildcard subdomains are not possible.

Still this project is under active development. It was programmed in Node JS and by the time of writing it received 4832 GitHub Stars.

pagekite.net

Pagekite is open source, python based, pay-what-you-want solution. Comparing to the previous projects, it has only 368 GitHub Stars, but is also worth giving a try.

You can start a tunnel with Pagekite, by entering a command with your private user's domain name in the Terminal:

$ pagekite.py 8000 myuser.pagekite.me

This will open a secure access to your local Django project from https://myuser.pagekite.me.

For each project you can then have a separate project's address, like https://myproject-myuser.pagekite.me which can be created starting the tunnel like this:

$ pagekite.py 8000 myproject-myuser.pagekite.me

With Pagekite you can have custom domains like https://dev.example.com for your tunnel using CNAME setting in the domain configuration. It's possible to expose non-web services, for example SSH or Minecraft server, too.

The Basic authentication is available using a command like this:

$ pagekite.py 8000 myproject-myuser.pagekite.me +password/username=password

Django Project Configuration

If you want to use tunnelling with your Django project, you will have to do a couple of modifications here and there:

  • Change the URL configuration to show static and media files even in non DEBUG mode:

    # urls.py
    # ...
    import re
    from django.views.static import serve
    
    if settings.STATIC_URL.startswith("/"):
        urlpatterns += [
            url(
                r'^{STATIC_URL}(?P<path>.*)$'.format(STATIC_URL=re.escape(settings.STATIC_URL.lstrip('/'))),
                serve,
                # {'document_root': settings.STATIC_ROOT},
            ),
        ]
    if settings.MEDIA_URL.startswith("/"):
        urlpatterns += [
            url(
                r'^{MEDIA_URL}(?P<path>.*)$'.format(MEDIA_URL=re.escape(settings.MEDIA_URL.lstrip('/'))),
                serve,
                {'document_root': settings.MEDIA_ROOT},
            ),
        ]
    

    If you want the static files to get recognized from various apps automatically, omit the {'document_root': settings.STATIC_ROOT}. Otherwise you will have to run collectstatic management command every time you change a CSS, JavaScript, or styling image file.

  • Have separate settings for the exposed access.

    # settings.local_exposed
    from .local import *
    DEBUG = False
    ALLOWED_HOSTS = [...]  # enter the domains of your tunnel's frontend
    SECURE_PROXY_SSL_HEADER = ('HTTP_X_FORWARDED_PROTO', 'https')
    

    To use those settings run the following in your virtual environment:

    (myenv)$ python manage.py runserver --settings=settings.local_exposed --insecure

    Here the --insecure directive forces automatic static file recognition from different places in your project even in non DEBUG mode. Leave it out, if you are serving the static files collected by collectstatic command.

Security Recommendations

This list of security recommendations is by no means complete. Use tunnelling at your own risk.

  • Don't keep tunnels running all the time. When not in need, close the connection.
  • Share the frontend URL only with trusted people. If you make the URL easy to remember or guess, set the Basic authentication for the tunnel's frontend.
  • Switch off the DEBUG mode in your Django project.
  • Have frequent backups of your project's code, media files, and database.
  • Don't use production data for development.
  • Don't use sensitive data for testing: no real passwords or API tokens of live system, use sandbox credentials for PayPal or Stripe payments, etc.
  • If you don't trust the tunnelling services, you can set up a tunnelling frontend on your own servers.

Do you see any other security issues about using tunnelling with Django development server? Then please share your thoughts in the comments.

Final Thoughts

When you are developing a responsive website with Django and need to check how it works on a mobile device, you can run the development server with 0.0.0.0:8000 and access it on your Wifi network through the IP address of your computer, or you can use xip.io to analogically check it by a domain name.

When you need to check subdomain resolution, you can use the hosts file, configure your private subdomain pointing directly to your local IP, or use localtest.me, xip.io, or one of the tunnelling services.

When you want to debug Webhooks in order to get notified about executed payments, received messages, or completed serverless processes, you can use ngrok.com, localtunnel.me, pagekite.net or some other tunnelling service. Or of course you can set a staging website with logging, but that makes a lot of hassle debugging.

Perhaps you know some other interesting solutions how to deal with domains and local development server. If you do, don't hesitate to share your tips in the comments.


Cover photo by Michael D Beckwith