Translatable modules/Technical implementation

From mediawiki.org

This page proposes a technical implementation plan for translatable modules. This proposal is based on proposed solutions and engineering considerations. The author(s) believe this is the best solution, but feedback is sough on that as well as the open questions. This solution most closely resembles "JSON .tab file in the Data namespace on Commons" from the already proposed solutions, but it goes deeper and has some changes such as suggesting to use subpage(s) of modules to store the translations by default, instead of having them in the Data namespace.

Resourcing[edit]

Affected components: Translate extension, Scribunto extension, On-wiki modules

Initial implementation: Language team

Code steward: Language team

Stakeholders: See Translatable modules/Stakeholders

Motivation[edit]

For motivation, see Translatable_modules.

For requirements, see Translatable_modules/Principles.

Proposed solution: Message bundles on module subpage(s)[edit]

Translate extension will provide a “message bundle” content handler. This handler will enforce a JSON file format that is a subset of the banana message format. For initial implementation, there are strict restrictions on message key length and format. At a later stage, those restrictions could be made more lax by using Translate’s StringMangler that encodes special characters so that they are safe to use in MediaWiki page names.

Examples of restrictions:

  • Maximum key length:
    • Initial: 50 bytes
    • Relaxed: 255 bytes - length of page name - 35 (language code suffix and delimiters)
  • Key format:
    • Initial: Allowed characters in keys: [a-zA-Z0-9-_]
    • Relaxed: Disallowed characters: /, space, <, >, “, ‘
  • Translations cannot contain keys not present in the source language
  • Translations must be strings
  • Page must be valid JSON

Open question: all translations on one page, or translations on language subpages[edit]

All on one page:

  • Easier to do mass changes and changes across languages (depending on the editor we provide)
  • Pages can become huge with big history (not determined if this will be an issue)
  • Can add message and its documentation in one go
  • Increased JSON encoding/decoding costs (not determined if this will be an issue)

Subpages:

  • Can watch changes to one language only
  • Easier to prevent direct editing of translations (if wanted, probably not necessary)
  • Less edit conflicts

Open question: extend the format[edit]

The current banana format has some known drawbacks:

  • message documentation is separated from message content
  • no support for tags like optional, outdated

Latter could be implemented like Translate does it internally:

{
    "key": "!!FUZZY!!Outdated translation"
}

!!FUZZY!! is stripped on read, and the outdated status is kept separately

Or we could address both issues with an extended format like

{
    "en": {
        "key": {
            "value": "This is a message",
            "doc": "This is its documentation",
            "tags": [ "optional" ]
        },
        "another-key": "Short format is still supported for convenience"
    },
    "fi": {
        "key": {
            "value": "Outdated translation",
            "tags": [ "outdated" ]
        }
    }
}

Location of message bundles on the wiki[edit]

Content handler gives full freedom to place message bundles wherever it makes sense on the wiki. For module-specific messages, we propose a naming pattern such as Module:Name/translations.json for standardization and automatic setup. Equivalent thing for Gadgets is easy to do in the future. Shared message bundles can be placed for example in the Data namespace on Commons.

Workflow[edit]

All such pages are automatically available using Special:Translate. Unlike on translatable pages, there is no separate confirmation step restricted to translation admins. To combat vandalism, it is suggested that at least direct editing of message bundle pages is restricted to trusted users using mechanism already available.

The source language of the bundle is the page language. When a message bundle is updated, the following things happen: If the page is valid:

  • an entry is added to Translate’s revtag table (so that Translate knows which message bundles are available)
  • a diff is made against the previous version to identify new/changed/deleted definitions and translations. Jobs are created to:
    • for new and changed definitions corresponding pages in the Translations namespace are created or updated
    • for changed definitions, corresponding translations are marked as fuzzy
    • for new and changed translations, corresponding pages in the Translations namespace are created or updated
    • for removed definitions and translations, corresponding pages in the Translations namespace are deleted

If the page is invalid:

  • bundle is removed revtag table and is not available for translation
  • pages in Translations namespace are not changed, but they become uneditable
  • warnings are shown on the bundle page that explain how it is invalid
  • module stops working or only produces “placeholder message translations” like qqx

In most cases, hooks should prevent making bundle pages invalid in the first place. Only “valid” messages in Translations namespace can be edited. When a page is created/updated/deleted, it will spawn a job to update the corresponding message bundle page.

Essentially there is a 2-way sync between the bundle page(s) and individual translation pages.

Open question: Renaming message bundles[edit]

Do we want to support renaming of message bundles? That could potentially require moving thousands of pages (like with translatable pages, where it is shown that the process is very fragile). For initial implementation, we could disallow moving pages. Workaround would be to copy the message bundle contents manually. This would lose history and fuzzy status if fuzzy status is not reflected in the bundle page itself (see open question above about the format).

Use[edit]

Messsages could be used through a new mw.messagebundle Lua interface. mw.messagebundle.new( pagename ) would create a new bundle instance. pagename would default to current module or page + /messages.json (whatever is feasible and whatever the naming convention will be). The instance would have a method get( key, params... ) that would create a new mw.message instance using the mw.message.newRawMessage.

In the future this could loop in local wiki overrides either in two ways:

  • In the get method, check if prefix + key message exists in the MediaWiki namespace.
  • In the message bundle constructor, check for presence of override file. This depends on how global gadgets show up in a remote wiki.

Evaluation against requirements[edit]

Translation experience must be similar to that of core, extensions, and pages: By using Translate extension, we basically have identical translation experience for translators regardless of what kind of content they translate. Translate extension can be and is used by many third party wikis.

All multilingual wikis: Translate extension is already installed on most if not all multilingual wikis. There are no known blockers for expanding the use of Translate extension to new wikis.

Transition goal: Migration to the new format is straightforward: it is easy to produce a JSON file in the correct format that can be copy-pasted to a suitable page. Migrating to the new helpers will require more effort.

Standardization: This will be implemented in the Translate and Scribunto extensions. No local modules will be needed.

Wiki principles: This approach embraces the “everything is a page” approach. There are no inherent restrictions on editing these pages, other than following the required format. This may be difficult to non-technical users. Most of them will be translators using Special:Translate and not editing these pages directly. Module developers will need to edit these files directly. They will get help from validation and syntax highlighting, potentially even a form-based editor in the future to avoid incorrect syntax.

History is easily available for both the bundle and individual translations.

Finding messages to translate: This is similar to translatable pages: people can send links, there is search and statistics. Scalability issues with regards to the number of message groups in the statistics page and message group selector are anticipated and will be addressed when they become apparent.

Performance and updating: Performance concerns about this solution:

  • parsing big json blobs… likely not an issue
  • growth in the number of message groups… likely an issue in the not-near future

This proposal does not cover whether there would be any kind of caching on top of message bundles. It however does enable almost real-time updating, unlike solutions that would involve translatewiki.net

Editing raw messages: Initially, raw editing is possible by editing the JSON directly. In the future it is possible to implement a nicer editor for it.

Local customization of messages: Not solved by the current proposal. In theory it is easy to “merge” overrides to the base translations. The difficult part is where those should be placed so that modules would automatically pick them up. This topic only becomes relevant when global modules’ implementation will begin.

Message syntax and parameters: This will be taken care of wikitext parser.

Translation memory: Translation memory will be usable in this proposal, and shared across all public Wikimedia wikis.

Sharing messages across modules: Message bundles will either be identified by their page name (or with an ID in order to support local customization). There are no restrictions which bundles a module can request, but for global modules they will likely need to declare them somehow.

Importing and exporting: I did not find any code in TemplateStyles that would do the described behavior.

Left for future considerations[edit]

As a summary:

  • Potential performance bottlenecks wrt number of message groups
  • Implementing a nicer editor for message bundle pages
  • Support for local customizations when support for global modules is added
  • Expanding support for gadgets
  • Using a MCR slot so that messages can be updated in the same edit as the module itself