Help:Extension:Translate/Components

Translate extension is extensible in many ways. The most likely ways to extend Translate is to add support for new file formats (link to section) or new message groups (link to section). Sometimes it is also useful to write a new message checks (link to section) or to extend Translate via hooks (link to section). Sometimes you might get along only by using the existing web API.

In addition to the concepts already mentioned, there are many more important concepts and classes in Translate that are useful to understand when hacking Translate. This pages aims to comprehensively detail all components of Translate.

WebAPI

 * In-depth documentation at API

In addition to hooks and interfaces that can only be used from PHP code, the WebAPI provides access to many message group and translating related information and actions. It is based on the MediaWiki API framework, supporting many output formats like json and xml.

File format support (FFS)

 * In-depth documentation at X
 * List of supported file formats at X
 * Tutorial for writing FFS classes at X

The Translate extension supports translating of non-wiki content like software interface messages via File format support (FFS) classes. These classes implement the FFS interface and abstract away parsing and generating of file contents. The FFS classes are used by FileBasedMessageGroup class via the YAML configuration files.

Message groups
Message groups bring together a collection of messages. They come with a description and dictate the namespace where messages are stored, how can they be exported (usually via FFS) how are the definitions loaded (like from wiki pages, or via FFS) and other things.

Message group hierarchy
MessageGroupBase ! MessageGroupOld !
 * AggregateMessageGroup
 * FileBasedMessageGroup
 * SingleFileBasedMessageGroup
 * CoreMessageGroup
 * CoreMostUsedMessageGroup
 * ExtensionMessageGroup
 * AliasMessageGroup
 * WikiMessageGroup
 * RecentMessageGroup
 * SvgMessageGroup !!
 * WikiPageMessageGroup !!!
 * WorkflowStatesMessageGroup

! MessageGroupBase is the more recent base class. These groups are usually defined via the Yaml configuration, while the ones extending MessageGroupOld are defined in other ways. Core and extension message groups are for MediaWiki and are waiting to be migrated to FileBasedMessageGroup.

!! In TranslateSvg extension.

!!! Provides the page translation feature.

Main functions
Basic info. Each message group has an unique id (within a wiki), a label (short name) and a short description, than includes important information like licensing and link to page which has more information.

Namespace. All the definitions of messages and translations are placed in a namespace given by the message group. The file based message groups allow users to set the namespace, while other are hard coded to NS_TRANSLATIONS (page translation) or NS_MEDIAWIKI.

Checkers. Message groups can give a list of message checkers, that can warn the user. Common checks are for missing or unknown variables and invalid html.

Mangler. Each group has a message key mangler. It is mostly relevant only to file based message groups where the format of message keys cannot be enforced. See mangler section below.

Message loading. There are many methods for loading message definitions. The load method usually reads the messages from files via FFS class or from a database table. getDefinitions is a shortcut for loading the messages in the source language (see below).

Source language. All of the message definitions are expected to be in a single language. Only dynamic message groups like RecentMessageGroup can work around this limitation.

Tags. Messages can have tags. Tags ignored and optional are often used. In addition message collection provides its own additional tags.

Workflow states. Message groups can override the global workflow states.

Languages. By default all known languages can be translated into. The message groups can provide a subset of languages to disable translation in other languages.

Font finder
When rendering bitmap graphics, a suitable font is needed for a language. To solve this problem FCFontFinder class was written. It class the  command from fontconfig package (so this doesn't work on Windows) to find a suitable font. Many additional fonts should be installed on the server to make this useful. It can either return path to font file or the name of the font, whichever is more suitable.

Message group cache
The messages of file-based message groups are stored in CDB files. Each language of each group has its own cache CDB file. The reason for cache files are twofold.

First they provide constant and efficient access to message data and avoids the potentially expensive parsing of files in various places. For example the list of message keys for each group can be loaded efficiently when rebuilding message index.

The second reason is that the cache files are used together with the translations in the wiki to process external message changes. Having a snapshot of the state of translations in files and wiki (hopefully consistent at that point) allows us to automatically deduct whether something has been changed in the wiki or externally and make intelligent choices, leaving only real conflicts (messages changed both externally and on the wiki since last snapshot) to be resolved by the translation administrator.

Message index
Message index is a reverse map of all known messages. It provides efficient answer to the questions is this a known message and what groups does this message belong to. It needs to be fast for single and multiple message key lookups. Multiple different backends are implemented, with different trade-offs.
 * Serialized file is fast to parse, but don't provide random access and is very memory inefficient when the number of keys grow.
 * CDB file takes more disk space, but provides random access and reasonably fast lookups, while loading everything into memory is slower.
 * Database backend provides efficient random access and full load with the expense of little slower individual lookups. It also doesn't need to write to any files avoiding any permission problems.
 * Also memory backend (memcached, apc) is provided, which could be useful alternatives to database backend in multiple server setups to reduce database contention.

Message index does not support incremental rebuilds. Thus rebuilding the index gets relatively resource intensive when the number of message groups and message keys increase. Depending on the message group, this might involve parsing files or doing database queries and loading the definitions, which can take a lot of memory. The message index rebuilding is triggered in various places in Translate, and by default it is executed immediately during the request. As it gets slower, it can be delayed via the MediaWiki job queue and run outside of web requests.

Ttmserver (translation memory)

 * In-depth documentation at Translation memories.
 * Blog post of history of this feature at Niklas' blog.

Ttmserver is the name of translation memory interface. It supports multiple backends for inserting and querying translation suggestions. The code is located under  directory.

Repository layout
Files in the root of the repository include:
 * Standard MediaWiki extensions files like, translations and some documentation files like   and   which includes change notes.
 * Major translate classes like  and   and some misc utilities not yet moved under utils.

Rest of the code is under subdirectories. Major parts have their own subdirectories each:
 * for WebAPI code
 * for file format support code
 * for message groups
 * for command line scripts
 * for page translation code
 * for translation memory code
 * for all special pages
 * for all PHP unit tests

Most of the code is under utils. Some additional folders for non-code:
 * for miscellaneous data files
 * for bundled library dependencies
 * for all css, scripts and images
 * for all SQL table definitions