Localization - Specifics

Modern technical writing - Andrew Etter 2016

Localization
Specifics

Localization, the process of translating documentation to other languages, is a nightmare. If you ever think you need to do it, interface with management and perform a careful cost-benefit analysis, because the process is expensive, time-consuming, error-prone, and tedious. Once you've arrived at what you believe is an accurate estimate for company costs, triple it. Now you have a realistic estimate.

If you try to keep all translations of the documentation in sync at all times, you can't publish very often, which leads to lower quality documentation. To fix a bug in your native tongue, you update the content, build, and publish. To fix a bug in a translated copy of the documentation, you update the content, manually send it to a translation company, wait days or weeks to receive the updated content, build, and publish.

Because publishing in a single language is so simple, you can work with increased velocity. You can improve large chunks of text, reorganize pages, fix awkward wordings, and generally treat everything as a working draft. Writing content for translation means meticulous review of new content, because any inaccuracy has to go through costly revisions. It means rarely refactoring old content because of the sheer expense involved in translating it into eight different languages again7. It means delaying software releases because it's Carnival in Brazil8 and the Portuguese translations won't be ready for another week.

Lest you doubt me, consider this common open source localization workflow:

1. Write a script that calls gettext, a translation tool, on each of your lightweight markup files. This operation produces a collection of POT files. What POT stands for doesn't really matter. Conceptually, POT files are just line by line splits of your source files into two strings: the original one, and an empty one for the translator to fill in. They look like this:

2. msgid "My name is Andrew."

3. msgstr ""

4.

5. msgid "Another paragraph."

msgstr ""

6. Still using gettext, generate sets of PO files from the POT files, one set per language. Send the PO files to the translation company, who will insert translated content into the empty strings. PO files look nearly identical to POT files:

7. msgid "My name is Andrew."

8. msgstr "Je m'appelle Andrew."

9.

10. msgid "Another paragraph."

msgstr "Un autre paragraphe."

11. When the translation company returns the PO files, commit them to version control.

12. Then you—sadly, this is not a joke—convert the PO files to MO files, again with gettext. MO files are binary compilations of PO files. Because PO files can be rather large, you compile them to MO files to improve the speed at which a machine can process them. You don't need to commit the MO files to version control.

13. Build and publish the translated help system from the MO files.

14. Repeat these steps each time you need to update the translated help system. Even though it might feel as if you're overwriting the translated PO files with new, blank ones, the PO files will retain their existing translations. The translation company can then fill in the gaps created by new or modified content:

15. msgid "My name is Andrew."

16. msgstr "Je m'appelle Andrew."

17.

18. msgid "A new section."

19. msgstr ""

20.

21. msgid "Another paragraph."

msgstr "Un autre paragraphe."

You don't have to follow this workflow9. In fact, your static site generator might not even support building from MO files. I've only provided it as an example of the complexities involved in the localization process. I'm not being melodramatic; it really is a nightmare.

For a method that uses common tools (i.e. Git), you might try:

1. Send your lightweight markup files to the translation company as-is. Tag the latest commit in your repository to mark this point in time.

2. When the translated files return, run a linter on the files, because invariably the translation company will have messed up the whitespace on your documentation. Then commit the translated files to version control.

3. Build and publish the translated help system.

4. When you want to update the translated help system, make a new branch and use git rebase to squash all work between now and your tag into a single commit. Then send the diff to the translation company, along with the latest lightweight markup files for both languages.

5. Delete the temporary branch and mark the latest commit in the repository with a new tag.

Remember, you have to repeat this process for each language. Truly a nightmare. Some specialized software applications purport to simplify the translation process—and to be fair, I'm sure they do help a bit—but until computers can accurately translate between languages, localization will remain messy and expensive.