2017-09-27

Numbers in Translatable Strings

Sentences in websites like "You've got 1 messages." or "Messages you've got: 5" sound unnatural and not human-friendly. But the GNU gettext tool used with Django for translations has an option to define different pluralization depending on the number which goes together with the counted noun. Things get even more interesting with certain languages which have not just singular and plural like English, German, French, or Spanish, but more plural forms or just a single one.

Tell me the background

Let's talk about grammar. Most languages have two plural forms for counted elements: one for singular, like "1 thing", and one for plural, like "n things". However, certain languages have either just one form for singular and plural, or multiple plural forms depending on the number of elements that go with them.

For example, my mother tongue Lithuanian is a Baltic language coming from Indo-European language family keeping archaic features from ancient Sanskrit. Lithuanian has 3 plural forms. When one counts apples in Lithuanian, they say "1 obuolys", "2-9 obuoliai", "10-20 obuol", "21 obuolys", "22-29 obuoliai", "30 obuol", "31 obuolys", "32-39 obuoliai", etc.

The second most widespread language on the web after English is Russian. Russian is an Eastern Slavic language from Indo-European language family officially used as the main language in Russia, Belarus, Kazakhstan, Kyrgyzstan and some smaller countries. Russian is using a special Cyrillic alphabet and it has 3 plural forms too. When one counts apples in Russian, they say "1 яблоко", "2-4 яблока", "5-20 яблок", "21 яблоко", "22-24 яблока", "25-30 яблок", etc.

Arabic is the 5th most spoken language in the world. It is written from right to left and Arabic language is an interesting example having even 6 plural forms. When counting apples, they would say:

‫"0 تفاحة"، "تفاح واحدة"، "تفاحتين"، "3-10 التفاح"، "11-99 التفاح"، "100-102 التفاح"

OK OK, with apples starting from 3 it's all the same, but theoretically it differs with other words or in different contexts.

On the contrary, Japanese - East Asian language with 125 million speakers - has just one plural form. No matter, whether it's 1 apple or 100 apples, they will be counted using the same words: "りんご1個" or "りんご100個".

By the way, please correct me if there are any translation mistakes in my examples.

Show me some code

If you want to localize your Django website, you will need to do quite a bunch of things:

  1. Add the LANGUAGES setting in your settings:

    LANGUAGES = [
        ('ar', _('Arabic')),
        ('en', _('English')),
        ('ja', _('Japanese')),
        ('lt', _('Lithuanian')),
        ('ru', _('Russian')),
    ]
  2. Add 'django.middleware.locale.LocaleMiddleware' to your MIDDLEWARE list in the settings.

  3. Create a directory locale in your project directory with subdirectories called after each language code for which you need translations, e.g. ar, ja, lt, ru.

  4. Add LOCALE_PATHS in the settings to define where the translations will be localed:

    LOCALE_PATHS = [
        os.path.join(BASE_DIR, 'locale'),
    ]
  5. Use i18n_patterns() for your translatable URLs to prefix all paths with language code:

    from django.conf.urls import url
    from django.conf.urls.i18n import i18n_patterns
    
    from notifications.views import notification_list
    
    urlpatterns = i18n_patterns(
        url(r'^$', notification_list),
    )
  6. Use gettext() and its flavors in Python code and {% trans %} and {% blocktrans %} template tags in Django templates to define translatable strings.

  7. Use ungettext() in Python code to create translatable strings with counted elements:

    # using the new-style Python string format:
    notification = ungettext(
        "You've got {n} message.",
        "You've got {n} messages.",
        message_count,
    ).format(n=message_count)
    
    # using the old-style Python string format
    notification = ungettext(
        "You've got %(n)d message.",
        "You've got %(n)d messages.",
        message_count,
    ) % {'n': message_count}
  8. Use {% blocktrans %} with count to create translatable strings with counted elements in Django templates:

    {% load i18n %}
    
    {# will create the old-style Python string #}
    {% blocktrans trimmed count n=message_count %}
        You've got {{ n }} message.
    {% plural %}
        You've got {{ n }} messages.
    {% endblocktrans %}
  9. Run makemessages management command to collect translatable strings:

    (myenv)$ python manage.py makemessages --all
  10. Translate the English terms into other languages in the locale/*/LC_MESSAGES/django.po files.

  11. Compile translations into django.mo files using the compilemessages management command:

    (myenv)$ python manage.py compilemessages
  12. Restart the webserver to reload the translations.

So what about the plural forms?

As you might know, the most common translation in the *.po file looks like this:

    #: templates/base.html
    #, fuzzy
    msgid "My Original String"
    msgstr "My Translated String"

Very long strings are broken into multiple lines using the Pythonic concatenation without any joining symbol:

    msgstr ""
    "Very very very very very very ve"
    "ry very very very very very very"
    " very very very very long string."

Just before the msgid you see some comments where the string is being used, in what context, whether it is "fuzzy", i.e. not yet active, or what kind of format it is using for variables: old-style "python-format" like %(variable)s or new-style "python-brace-format" like {variable}.

The first msgid is an empty string which translation has some meta information about the translation file: language, translation timestamps, author information, contacts, version, etc. One piece of the meta information is the plural forms for that language. For example, Lithuanian part looks like this:

"Plural-Forms: nplurals=3; plural=(n%10==1 && n%100!=11 ? 0 : n%10>=2 && (n%100<10 || n%100>=20) ? 1 : 2);\n"

as in:

#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: 1.0.0\n"
"Report-Msgid-Bugs-To: admin@example.com\n"
"POT-Creation-Date: 2017-09-18 01:12+0000\n"
"PO-Revision-Date: 2017-12-12 17:20+0000\n"
"Last-Translator: Vardenis Pavardenis <vardenis@example.com>\n"
"Language-Team: Lithuanian <lt@example.com>\n"
"Language: Lithuanian\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Plural-Forms: nplurals=3; plural=(n%10==1 && n%100!=11 ? 0 : n%10>=2 && (n"
"%100<10 || n%100>=20) ? 1 : 2);\n"

It is using JavaScript-like syntax to define how many plural forms the language has, and what conditions define which type of the plural form each count gets.

Then the plurals are defined like this:

#: notifications/templates/notifications/notification_list.html:2
#, python-format
msgid "You've got %(n)s message."
msgid_plural "You've got %(n)s messages."
msgstr[0] "Jūs gavote %(n)s žinutę."
msgstr[1] "Jūs gavote %(n)s žinutes."
msgstr[2] "Jūs gavote %(n)s žinučių."

#: notifications/views.py:11
#, python-brace-format
msgid "You've got {n} message."
msgid_plural "You've got {n} messages."
msgstr[0] "Jūs gavote {n} žinutę."
msgstr[1] "Jūs gavote {n} žinutes."
msgstr[2] "Jūs gavote {n} žinučių."

Let's have a look at the other languages mentioned before. The Russian language would have plural forms defined like this:

"Plural-Forms: nplurals=3; plural=(n%10==1 && n%100!=11 ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2);\n"

Then then translations for each of the 3 forms would go like this:

#: notifications/views.py:11
#, python-brace-format
msgid "You've got {n} message."
msgid_plural "You've got {n} messages."
msgstr[0] "У вас есть {n} сообщение."
msgstr[1] "У вас есть {n} сообщения."
msgstr[2] "У вас есть {n} сообщений."

You would define 6 plural forms for the Arabic language:

"Plural-Forms: nplurals=6; plural=(n==0 ? 0 : n==1 ? 1 : n==2 ? 2 : n%100>=3 && n%100<=10 ? 3 : n%100>=11 ? 4 : 5);\n"

And the translations for Arabic would look like this:

#: notifications/views.py:11
#, python-brace-format
msgid "You've got {n} message."
msgid_plural "You've got {n} messages."
msgstr[0] "لديك {n} رسائل."
msgstr[1] "لديك رسالة واحدة."
msgstr[2] "لديك رسالتان."
msgstr[3] "لديك {n} رسائل."
msgstr[4] "لديك {n} رسالة."
msgstr[5] "لديك {n} رسالة."

The Japanese language would have just one plural form defined:

"Plural-Forms: nplurals=1; plural=0;\n"

And it would have just one translation:

#: notifications/views.py:11
#, python-brace-format
msgid "You've got {n} message."
msgid_plural "You've got {n} messages."
msgstr[0] "あなたはメッセージが{n}つを持っています。"

Tips to take away

  • Use the new-style Python format for variables whenever possible, because it is more understandable and less error prone for not-so-technical translators and it looks cleaner in the Python code.
  • Note that {% blocktrans %} template tag produces the old-style Python format for variables, whereas in Python code you can decide for yourself which format to use.
  • For the first entry msgstr[0], which usually represents singular form, don't replace the first {n} with 1 in the translation, because in many languages it also means 21, 31, 41, 101, etc. Let the variable be passed.
  • You can look up for the plural forms of a certain language at translatehouse.org. But the latest versions of Django also include some kind of plural forms, although they don't always match the conditions from the mentioned list.
  • If you want to edit plural forms more human-friendly than in a text editor, you can use the Poedit translation editor with graphical user interface. It shows the numbering cases listed, so you don't need reverse-engineer the conditions and guess about the leftovers in the else case.
  • Unfortunately, it is not possible to have multiple translatable counted objects in the same sentence using gettext. For example, "There are 5 apples, 3 pears, and 1 orange on the table" with changeable numbers is not a valid translatable sentence if you want to keep the counted elements human-friendly. To work around, you need to formulate three different translatable sentences.

No comments:

Post a Comment