Internationalisation wishlist 2017

This is a document of the "bag of issues" type, created by Wikimedia i18n aficionados Nikerabbit and Nemo_bis. Its purpose is to lay out mostly independent projects that would make Wikimedia's i18n infrastructure even more awesome, and in some cases prevent it from falling off of the train of i18n progress.

Previous editions

 * Internationalisation wishlist 2013
 * Internationalisation wishlist 2014

Visual page translation
The wiki page translation feature of the Translate extension does not currently work with Visual Editor due to the special tags it uses. More specifically, this is about editing the source pages that are used as the source for translations, not the translation process itself. The work can be divided into three steps:
 * 1) Migrate the special tag handling to a more standard way to handle tags in the parser. This need some changes to the PHP parser for it to be able to produce wanted output.
 * 2) Add support to Parsoid and Visual Editor so that editing page contents preserves the structures that page translation adds to keep track of the content.
 * 3) Add to Visual Editor some visual aid for marking the parts of the page that can (or cannot) be translated.

This is a difficult project due to complexities of wikitext parsing and intersecting multiple different products: Translate, MediaWiki core parser, Parsoid, Visual Editor.

Translation of non-prose MediaWiki strings
Magic words, special page aliases and namespaces should be translatable with a web interface to:
 * allow translators to change or update translations easily and quickly, without having to know about order of precedence or allowed characters and so on, but also reports on mistakes;
 * keep translations in a data format which is resilient to mistakes (no fatals due to data errors) and can be easily exported to the repositories (without worrying about removing translations which should be kept for backwards compatibility), like some JSON format on ContentHandler pages on translatewiki.net;
 * ideally, export such updates as part of the usual scripts to follow the usual continuous translation model and reduce breakage.

Activity reporting and engagement
Project administrators/coordinators (project contacts on translatewiki.net) should be able not only to have a clear sense of what work is going on, but also of what translators/languages may need an additional effort (or vice versa are going especially well), in order to be able to contact translators where needed. Detailed reporting may be needed, if not an interface to semi-automatically send notifications in certain cases (such as translators who've reduced activity a lot in a language which needs more translations).

Thanking translators is still best done manually, but project contacts need to know whom to thank (knowing about new languages exported may also be helpful to tweak their configuration to actually use them, at times). The ability to easily communicate with "your own" translators can help project administrators build a sense of community and make them feel they're still in charge of the project even though they've merged with a larger wiki/community.

Translators should be able to stay on top of new translation work easily, e.g. by subscribing to feeds and notifications in the projects of their interest when there are new messages in the source language or requests for translations update (which no longer triggers edits and hence escapes enotifwatchlist). They are also interested in knowing how they rank against others, but our tools to this purpose may be: currently we have a monthly rank on the main page, a contribution count with a babel template and total "ranks" with language statistics

Relevant statistics
Sometimes projects want to know more about the workload for translators and so on. Translate offers a lot of reporting, but one simple feature we're currently lacking is the ability to count translations by number of words rather than by number of messages.

A reliable way for system administrators or wiki administrators to force hard updates of statistics and all caches may also be welcome, to easily overcome and problem with cache or job queue or other (compare T145295).

See also https://phabricator.wikimedia.org/tag/mediawiki-extensions-translate/ column "statistics"

Better insertables
RTL should be supported for variable handling in general, which includes insertables, tags, syntax.

Insertables should perhaps be easier to control, so that project contacts have more visibility on them without diving in the configuration code?

Librarization of MediaWiki i18n
We should have a reference library which embeds all our learnings and best practices on i18n handling and l10n formats, to promote and use it widely in PHP and JavaScript projects. The library should also try to unify the custom/diverse formats like those for dates from moment.js or others (compare T31235).

Currently, we have a sort of conflict between our own PHP and JavaScript libraries and even many Wikimedia projects in PHP end up using custom solutions. We don't have recommendations for important languages like Python, which are "stuck" with Gettext (or custom formats like pywikibot?).

Extract our PHP message parsing code to a library
There are many PHP projects that would benefit from high quality i18n library. MediaWiki has many excellent features such as extensive handling of parameters, parameter types etc. It has some drawbacks though such as not being able to support nested constructions. See also https://github.com/Nikerabbit/monkey-i18n

At translatewiki.net we have multiple PHP projects. The licence (GPL-2.0+) might be a problem if they want to reuse code from MediaWiki.

File format support
Some more file format work?

Language addition process
Adding a new language on translatewiki.net (Translatewiki.net languages) requires many decisions and checks (e.g. ISO status, names in Wikipedia/CLDR/request, jquery.uls) and changes in various repositories. It's also not clear to translators what the status of their request is, sometimes data is forgotten. Only core staff can help (in practice just a single person) since full access to configuration and repositories is needed.

Suggesting to build a good documentation for the process and clear criteria that can be executed by anyone, leaving only +2 and oversight to admin. Thanks to more active code review tracking, patches there are slightly less likely to get stuck.

Handle multiple translation of multiple branches
Software translated in translatewiki.net uses the master branch as input and export. This means that once a stable branch is created, it stops receiving translation updates. It should be possible to translate, import and export multiple branches simultaneously. When translating, the messages which are same across branches should only be translated once.

Branch support has two benefits:
 * 1) software that is branched but not yet released can receive translation updates
 * 2) software that is already released, can release minor updates with latest translations

Track exports in translatewiki.net
It is often unclear to people when their translations will appear in the software. With some more integration of repository scripts, it should be possible to add metadata to translation revisions in which commits or branches they are included. Different kind of summaries can then be built on this data, such as "these translations of yours are still waiting to be exported".

As an extension, we could try to hook up into Wikimedia LocalisationUpdate process and release processed of different projects to also record the information when they are deployed. This is likely much more complicated.

Move export thresholds to message groups
It would be helpful to alert users when translations are not being exported due to not meeting the export threshold. This information should be accessible to the Translate extension. Currently this is specified in the repository management. If this information is moved to the message group configuration, we would avoid duplication, and simplify repository management for exports.