Help:Extension:Translate/Components

From mediawiki.org

Translate extension is extensible in many ways. The most likely ways to extend Translate is to add support for new file formats (link to section) or new message groups (link to section). Sometimes it is also useful to write a new message checks (link to section) or to extend Translate via hooks (link to section). Sometimes you might get along only by using the existing web API .

In addition to the concepts already mentioned, there are many more important concepts and classes in Translate that are useful to understand when hacking Translate. This pages aims to comprehensively detail all components of Translate.

Primary extensible components[edit]

WebAPI[edit]

  • In-depth documentation about API

In addition to hooks and interfaces that can only be used from PHP code, the WebAPI provides access to many message group and translating related information and actions. It is based on the MediaWiki API framework, supporting many output formats like json and xml.

File format support (FFS)[edit]

The Translate extension supports translating of non-wiki content like software interface messages via File format support (FFS) classes. These classes implement the FFS interface and abstract away parsing and generating of file contents. The FFS classes are used by FileBasedMessageGroup class via the YAML configuration files.

Message groups[edit]

Message groups bring together a collection of messages. They come in various types: translatable pages, SVG files or software interface messages stored in various file formats. Each message group instance has a unique identifier, name and description. In the code message groups are primarily referenced by their identifier, while the MessageGroups class can be used to get the instances for a given id. Message groups can also control many translation process related actions like allowed translation languages and the message group workflow states. Usually these behaviors fallback to the global defaults.

The two primary ways to register message groups to Translate are the TranslatePostInitGroups hook and YAML configuration.

Translation aids (helpers)[edit]

Translation aids are little modules that provide helpful and necessary information for the translator when translating. Different aids can provide suggestions from translation memory and machine translation, documentation about the message or even such a basic thing as the message definition – the text that needs to be translated.

Translate comes with many aid classes. Currently there is no hook to add new classes. Each class that extends the TranslationAid class only needs to implement one method called getData. It should return the information in structured format (nested arrays), which is then exposed via ApiQueryTranslationAids WebAPI module. In addition to the aid class, changes are needed to actually use the provided data in the translation editor(s).

One special case of translation aids are machine translation services. See the next section.

Web services[edit]

Adding more machine translation services can easily be done by extending the TranslationWebService class. See the webservices subdirectory for examples. You will need some basic information to implement such a class:

  • URL for the service
  • What language pairs are supported
  • Whether they use language codes that differ from the codes used in MediaWiki
  • Whether the service needs an API key

When you have this information, it is straightforward to write the mapCode, doPairs and doRequest methods. You should use the TranslationWebServiceException to signal errors. The errors are automatically logged and tracked, and if the service goes down, it will automatically be suspended to avoid unnecessary requests to it. The suggestions will automatically be displayed in the translation editor via the MachineTranslationAid class and the ApiQueryTranslationAids WebAPI module. See also $wgTranslateTranslationServices to see how those services are registered.

Message checkers[edit]

We use computers to catch simple errors in translations, like unbalanced parenthesis or failing to use a variable placeholder. These checkers can emit warnings that are displayed in the translation editor (constantly updating). Any warning present in saved translation will also mark the translation as outdated (fuzzy in jargon). Each message group determines which checks it uses.

Other core components[edit]

Message collection[edit]

Message collection provides access to the list of messages for a message group. It is used to load a set of languages for certain group in a certain language. It provides paging and filtering functionality.

There is currently a limitation that all messages in a collection must be in the same namespace. This prevents the creation of aggregate groups that include groups which have messages in different namespaces.

Here is short a example of how to use message collection to load all Finnish translations of group core and print the first ten of them:

$group = MessageGroups::getGroup( 'core' );
$collection = $group->initCollection( 'fi' );
$collection->filter( 'ignored' );
$collection->filter( 'translated', false );
$collection->loadTranslations();
$collection->slice( 0, 10 );
foreach ( $collection->keys() as $mkey => $title ) {
	echo $title->getPrefixedText() . ': ';
	echo $collection[$mkey]->translation() . "\n\n";
}

Message[edit]

Utility classes[edit]

Font finder[edit]

When rendering bitmap graphics, suitable fonts are needed for each language or script. To solve this problem, the FCFontFinder class was written. It uses the fc-match command of the package fontconfig (so this doesn't work on Windows) to find a suitable font. Many additional fonts should be installed on the server to make this useful. It can either return a path to a font file or the name of the font, whichever is more suitable.

Message group cache[edit]

The messages of file-based message groups are stored in CDB files. Each language of each group has its own CDB cache file. The reason for cache files are twofold.

First they provide constant and efficient access to message data avoiding the potentially expensive parsing of files in various places. For example the list of message keys for each group can be loaded efficiently when rebuilding a message index.

The second reason is that the cache files are used together with the translations in the wiki to process external message changes. Having a snapshot of the state of translations in files and wiki (hopefully consistent at that point) allows us to automatically deduct whether something has been changed in the wiki or externally and make intelligent choices, leaving only real conflicts (messages changed both externally and on the wiki since last snapshot) to be resolved by the translation administrator.

Message group utilities[edit]

Message index[edit]

Message index is a reverse map of all known messages. It provides efficient answer to the questions is this a known message and what groups does this message belong to. It needs to be fast for single and multiple message key lookups. Multiple different backends are implemented, with different trade-offs.

  • Serialized file is fast to parse, but don't provide random access and is very memory inefficient when the number of keys grow.
  • CDB file takes more disk space, but provides random access and reasonably fast lookups, while loading everything into memory is slower.
  • Database backend provides efficient random access and full load with the expense of little slower individual lookups. It also doesn't need to write to any files avoiding any permission problems.
  • Also memory backend (memcached, apc) is provided, which could be useful alternatives to database backend in multiple server setups to reduce database contention.

Message index does not support incremental rebuilds. Thus rebuilding the index gets relatively resource intensive when the number of message groups and message keys increase. Depending on the message group, this might involve parsing files or doing database queries and loading the definitions, which can take a lot of memory. The message index rebuilding is triggered in various places in Translate, and by default it is executed immediately during the request. As it gets slower, it can be delayed via the MediaWiki job queue and run outside of web requests.

Message table[edit]

Metadata table[edit]

Revtag[edit]

Stats code[edit]

String matcher/mangler[edit]

Ttmserver (translation memory)[edit]

Ttmserver is the name of translation memory interface. It supports multiple backends for inserting and querying translation suggestions. The code is located under ttmserver directory.

Misc stuff: RC integration, preferences, toolbox, jobs[edit]

Repository layout[edit]

Files in the root of the repository include:

  • Standard MediaWiki extensions files like Translate.php, translations and some documentation files like hooks.txt and README which includes change notes.
  • Major translate classes like MessageCollection and Message and some misc utilities not yet moved under utils.

Rest of the code is under subdirectories. Major parts have their own subdirectories each:

  • api - for WebAPI code
  • ffs - for file format support code
  • messagegroups - for message groups
  • scripts - for command line scripts
  • tag - for page translation code
  • ttmserver - for translation memory code
  • specials - for all special pages
  • tests - for all PHP unit tests

Most of the code is under utils. Some additional folders for non-code:

  • data - for miscellaneous data files
  • libs - for bundled library dependencies
  • resources - for all css, scripts and images
  • sql - for all SQL table definitions