From 3ee8da73d904a64ada1719945e7b8467e6c0c60a Mon Sep 17 00:00:00 2001 From: Leandro Lucarella Date: Wed, 18 Nov 2020 12:39:11 +0100 Subject: [PATCH 1/1] blog: Publish post about LANGUAGE --- .../2020/11/18-language-broken-for-en.rst | 71 +++++++++++++++++++ source/blog/posts/db | 1 + 2 files changed, 72 insertions(+) create mode 100644 source/blog/posts/2020/11/18-language-broken-for-en.rst diff --git a/source/blog/posts/2020/11/18-language-broken-for-en.rst b/source/blog/posts/2020/11/18-language-broken-for-en.rst new file mode 100644 index 0000000..e93fdb0 --- /dev/null +++ b/source/blog/posts/2020/11/18-language-broken-for-en.rst @@ -0,0 +1,71 @@ +Title: The LANGUAGE variable is broken for English as main language +Tags: en, linux, language, gettext, lang + +The ``LANGUAGE`` environment variable can `accept multiple fallback +languages`__ (at least if your commands are using ``gettext``), so if your main +``LANG`` is, say, ``es``, but you also speak ``fr``, then you can use +``LANGUAGE=es:fr``. + +__ https://www.gnu.org/software/gettext/manual/gettext.html#The-LANGUAGE-variable + +But what happens when you main ``LANG`` is ``en``, so for example your +``LANGUAGE`` looks like ``en:es:de``? You'll notice some message that used to +be in perfect English before using the multi-language fallback now seem to be +shown randomly in ``es`` or ``de``. + +Well, it is not random. The thing is, since English tends to be the de-facto +language for the original strings in a program, it looks like almost **nobody** +provides an ``en`` *translation*, so when fallback is active, almost no +programs will show messages in English. + +For example, this is my Debian testing system with roughly 3.5K packages +installed: + +.. code:: console + + $ dpkg -l |wc -l + 3522 + $ ls /usr/share/locale/en/LC_MESSAGES/ | wc -l + 12 + +Only 12 packages have a plain English locale. ``en_GB`` does a bit better: + +.. code:: console + + $ ls /usr/share/locale/en_GB/LC_MESSAGES/ | wc -l + 732 + +732 packages. This is still lower than both ``en`` and ``de``: + +.. code:: console + + $ ls /usr/share/locale/es/LC_MESSAGES/ | wc -l + 821 + $ ls /usr/share/locale/de/LC_MESSAGES/ | wc -l + 820 + +The weird thing is packages as basic as ``psmisc`` (providing, for example, +``killall``) and ``coreutils`` (providing, for example, ``ls``) don't have an +``en`` locale, and ``psmisc`` doesn't provide ``es``. This is why at some point +it seemed like a random locale was being used. I had something like +``LANGUAGE=en_GB:en_US:en:es:de`` and I use KDE as my desktop environment. KDE +seems to be correctly *translated* to ``en_GB``, so I was seeing most of my +desktop in English as expected, but when using ``killall``, I got errors in +German, and when using ``ls``, I got errors in Spanish. + +If you don't provide other fallback languages, gettext will automatically fall +back to the ``C`` locale, which is the original strings embedded in the source +code, which are usually in English, and this is why if you don't provide +fallback languages (other than English at least), all will work in English as +expected. Of course if you use ``C`` in your fallback languages, before any +non-English language, then they will be ignored as the ``C`` locale should +always be present, so that's not an option. + +I find it very curious that this issue has almost zero visibility. At least my +searches for the issue didn't throw any useful results. I had to figure it all +out by myself like in the good old pre-stackoverflow times... + +.. note:: I know is not a typical use case, as since almost all software use + English for the ``C`` locale it hardly makes any sense to use fallback + languages in practice if your main language is English. But theoretically it + could happen, and providing an ``en`` translation is trivial. diff --git a/source/blog/posts/db b/source/blog/posts/db index 73ab855..c3694c2 100644 --- a/source/blog/posts/db +++ b/source/blog/posts/db @@ -429,3 +429,4 @@ 2015/09/26-día-de-la-condena-errada.rst, 1443299538.0, 1443299538.0 2016/04/02-simplicity.rst, 1459551507.0, 1459551507.0 2016/05/12-elephone-p9000.rst, 1605697852.0, 1605697852.0 +2020/11/18-language-broken-for-en.rst, 1605699495.0, 1605699495.0 -- 2.43.0