Help:Extension:Translate/コンポーネント

From MediaWiki.org
Jump to navigation Jump to search
This page is a translated version of the page Help:Extension:Translate/Components and the translation is 94% complete.
Other languages:
English • ‎中文 • ‎日本語

翻訳拡張機能はさまざまな面で拡張が可能です。 翻訳の拡張とは、おそらく新規のファイル形式への対応 (節単位のリンク) あるいは新規のメッセージグループ (節単位のリンク) が最も多用されます。 ときには新しくへのリンクを作るため、メッセージチェックまたはフックからコンテンツ翻訳拡張機能を伸ばす方が便利なこともあります。 場合によっては既存のweb API の利用によってのみ利用できる状況があります。

前述の原理のほか、翻訳拡張機能には重要な原理やクラスが多く備わり、翻訳拡張機能のハッキングの際に利用できます。 このページでは翻訳拡張機能の構成要素をすべてわかりやすく解説します。

Primary extensible components

WebAPI

  • APIに関する詳細な解説文書

PHP コード上でのみ利用できるフックやインターフェースに加え、WebAPIも多くのメッセージグループにアクセスして関連の情報や動作を翻訳できます。 MediaWiki API フレームワークに基づき、json や xmlなど複数の出力形式に対応します。

ファイル形式対応 (FFS)

翻訳拡張機能を使うと、ウィキ形式ではないコンテンツの翻訳に使え、たとえばソフトウェアのUIメッセージを処理するにはファイル形式対応 (FFS) のクラスを使います。 これらのクラスはFFSインターフェイスを実装し、ファイルコンテンツの解析と生成を抽象化します。 FFS クラスはYAML 設定ファイルによりFileBasedMessageGroupクラスに充当されます。

Message groups

Message groups bring together a collection of messages. They come in various types: translatable pages, SVG files or software interface messages stored in various file formats. Each message group instance has a unique identifier, name and description. In the code message groups are primarily referenced by their identifier, while the MessageGroups class can be used to get the instances for a given id. Message groups can also control many translation process related actions like allowed translation languages and the message group workflow states. Usually these behaviors fallback to the global defaults.

The two primary ways to register message groups to Translate are the TranslatePostInitGroups hook and YAML configuration.

Translation aids (helpers)

Translation aids are little modules that provide helpful and necessary information for the translator when translating. Different aids can provide suggestions from translation memory and machine translation, documentation about the message or even such a basic thing as the message definition – the text that needs to be translated.

Translate comes with many aid classes. Currently there is no hook to add new classes. Each class that extends the TranslationAid class only needs to implement one method called getData. It should return the information in structured format (nested arrays), which is then exposed via ApiQueryTranslationAids WebAPI module. In addition to the aid class, changes are needed to actually use the provided data in the translation editor(s).

One special case of translation aids are machine translation services. See the next section.

Web services

Adding more machine translation services can easily be done by extending the TranslationWebService class. See the webservices subdirectory for examples. You will need some basic information to implement such a class:

  • URL for the service
  • What language pairs are supported
  • Whether they use language codes that differ from the codes used in MediaWiki
  • Whether the service needs an API key

When you have this information, it is straightforward to write the mapCode, doPairs and doRequest methods. You should use the TranslationWebServiceException to signal errors. The errors are automatically logged and tracked, and if the service goes down, it will automatically be suspended to avoid unnecessary requests to it. The suggestions will automatically be displayed in the translation editor via the MachineTranslationAid class and the ApiQueryTranslationAids WebAPI module. See also $wgTranslateTranslationServices to see how those services are registered.

Message checkers

We use computers to catch simple errors in translations, like unbalanced parenthesis or failing to use a variable placeholder. These checkers can emit warnings that are displayed in the translation editor (constantly updating). Any warning present in saved translation will also mark the translation as outdated (fuzzy in jargon). Each message group determines which checks it uses.

Other core components

Message collection

Message collection provides access to the list of messages for a message group. It is used to load a set of languages for certain group in a certain language. It provides paging and filtering functionality.

There is currently a limitation that all messages in a collection must be in the same namespace. This prevents the creation of aggregate groups that include groups which have messages in different namespaces.

Here is short a example of how to use message collection to load all Finnish translations of group core and print the first ten of them:

$group = MessageGroups::getGroup( 'core' );
$collection = $group->initCollection( 'fi' );
$collection->filter( 'ignored' );
$collection->filter( 'translated', false );
$collection->loadTranslations();
$collection->slice( 0, 10 );
foreach ( $collection->keys() as $mkey => $title ) {
	echo $title->getPrefixedText() . ': ';
	echo $collection[$mkey]->translation() . "\n\n";
}

Message

Utility classes

Font finder

When rendering bitmap graphics, suitable fonts are needed for each language or script. To solve this problem, the FCFontFinder class was written. It uses the fc-match command of the package fontconfig (so this doesn't work on Windows) to find a suitable font. Many additional fonts should be installed on the server to make this useful. It can either return a path to a font file or the name of the font, whichever is more suitable.

Message group cache

The messages of file-based message groups are stored in CDB files. Each language of each group has its own CDB cache file. The reason for cache files are twofold.

First they provide constant and efficient access to message data avoiding the potentially expensive parsing of files in various places. For example the list of message keys for each group can be loaded efficiently when rebuilding a message index.

The second reason is that the cache files are used together with the translations in the wiki to process external message changes. Having a snapshot of the state of translations in files and wiki (hopefully consistent at that point) allows us to automatically deduct whether something has been changed in the wiki or externally and make intelligent choices, leaving only real conflicts (messages changed both externally and on the wiki since last snapshot) to be resolved by the translation administrator.

Message group utilities

Message index

Message index is a reverse map of all known messages. It provides efficient answer to the questions is this a known message and what groups does this message belong to. It needs to be fast for single and multiple message key lookups. Multiple different backends are implemented, with different trade-offs.

  • Serialized file is fast to parse, but don't provide random access and is very memory inefficient when the number of keys grow.
  • CDB file takes more disk space, but provides random access and reasonably fast lookups, while loading everything into memory is slower.
  • Database backend provides efficient random access and full load with the expense of little slower individual lookups. It also doesn't need to write to any files avoiding any permission problems.
  • Also memory backend (memcached, apc) is provided, which could be useful alternatives to database backend in multiple server setups to reduce database contention.

Message index does not support incremental rebuilds. Thus rebuilding the index gets relatively resource intensive when the number of message groups and message keys increase. Depending on the message group, this might involve parsing files or doing database queries and loading the definitions, which can take a lot of memory. The message index rebuilding is triggered in various places in Translate, and by default it is executed immediately during the request. As it gets slower, it can be delayed via the MediaWiki job queue and run outside of web requests.

Message table

Metadata table

Revtag

Stats code

String matcher/mangler

Ttmserver (translation memory)

Ttmserver is the name of translation memory interface. It supports multiple backends for inserting and querying translation suggestions. The code is located under ttmserver directory.

Misc stuff: RC integration, preferences, toolbox, jobs

Repository layout

Files in the root of the repository include:

  • Standard MediaWiki extensions files like Translate.php, translations and some documentation files like hooks.txt and README which includes change notes.
  • Major translate classes like MessageCollection and Message and some misc utilities not yet moved under utils.

Rest of the code is under subdirectories. Major parts have their own subdirectories each:

  • api for WebAPI code
  • ffs for file format support code
  • messagegroups for message groups
  • scripts for command line scripts
  • tag for page translation code
  • ttmserver for translation memory code
  • specials for all special pages
  • tests for all PHP unit tests

Most of the code is under utils. Some additional folders for non-code:

  • data for miscellaneous data files
  • libs for bundled library dependencies
  • resources for all css, scripts and images
  • sql for all SQL table definitions