Localisation

From MediaWiki.org
(Redirected from Localization)
Jump to: navigation, search
shortcuts:
I18N
I18n
L10n
For the Wikimedia Foundation localisation projects, see Language portal.

This page gives a technical description of MediaWiki's internationalisation and localisation (i18n and L10n) system, and gives hints that coders should be aware of. Our mantra is that i18n must not be an afterthought: it's an essential component since the earliest phases of your software.

Contents

Translation resources[edit | edit source]

translatewiki.net[edit | edit source]

translatewiki.net supports in-wiki translation of all the messages of core MediaWiki and of the extensions. If you would like to have nothing to do with all the technicalities in this page about editing files, Git, creating patches, and so forth, go directly to translatewiki.net.

All translation of MediaWiki user interface messaged should go through translatewiki.net and not committed directly to code. Only the English messages and their initial documentation must be done in the source code.

Core MediaWiki and extensions must use system messages for any text displayed in the user interface. For an example of how to do this, please see Manual:Special pages. If the extension is well written, it will probably be included in translatewiki.net in a few days, after its staff notices it on gerrit. If it's not noticed, contact them. If it's too unstable to be translated, note so in the code or commit and contact them if necessary.

See also Overview of the localisation system and What can be localised.

Finding messages[edit | edit source]

Manual:System message explains how to find a particular string you want to translate. In particular, note the qqx trick, which was introduced in MediaWiki 1.18.

I18n mailing list[edit | edit source]

You can subscribe to the i18n list. At the moment it is low-traffic.

Code structure[edit | edit source]

First, you have a Language object in Language.php. This object contains all the localisable message strings, as well as other important language-specific settings and custom behavior (uppercasing, lowercasing, printing dates, formatting numbers, direction, custom grammar rules etc.)

The object is constructed from two sources: subclassed versions of itself (classes) and Message files (messages).

There's also the MessageCache class, which handles input of text via the MediaWiki namespace. Most internationalization is nowadays done via Message objects and by using the wfMessage() shortcut function, which is defined in includes/GlobalFunctions.php. Legacy code might still be using the old wfMsg*() functions, which are now considered deprecated in favor of the above-mentioned Message objects.

General use (for developers)[edit | edit source]

See also Manual:Messages API.

Language objects[edit | edit source]

There are two ways to get a language object. You can use the globals $wgLang and $wgContLang for user interface and content language respectively. For an arbitrary language you can construct an object by using Language::factory( 'en' ), by replacing en with the code of the language. You can also use wfGetLangObj( $code ); if $code could already be a language object. The list of codes is in languages/Names.php.

Language objects are needed for doing language-specific functions, most often to do number, time and date formatting, but also to construct lists and other things. There are multiple layers of caching and merging with fallback languages, but the details are irrelevant in normal use.

Using messages[edit | edit source]

MediaWiki uses a central repository of messages which are referenced by keys in the code. This is different from, for example, Gettext, which just extracts the translatable strings from the source files. The key-based system makes some things easier, like refining the original texts and tracking changes to messages. The drawback is of course that the list of used messages and the list of source texts for those keys can get out of sync. In practice this isn't a big problem, and the only significant problem is that sometimes extra messages that are not used anymore still stay up for translation.

To make message keys more manageable and easy to find, always write them completely and don't rely too much on creating them dynamically. You may concatenate parts of message keys if you feel that it gives your code better structure, but put a comment nearby with a list of the possible resulting keys. For example:

// Messages that can be used here:
// * myextension-connection-success
// * myextension-connection-warning
// * myextension-connection-error
$text = wfMessage( 'myextension-connection-' . $status )->parse();

The detailed use of message functions in PHP and JavaScript is on Manual:Messages API.

Adding new messages[edit | edit source]

See also: Localisation file format
  1. Decide a name (key) for the message. Try to follow global or local conventions for naming. For extensions, use a standard prefix, preferably the extension name in lower case, followed by a hyphen ("-"). Try to stick to lower case letters, numbers and dashes in message names; most others are between less practical or not working at all. See also Manual:Coding conventions#System messages.
  2. Make sure that you are using suitable handling for the message (parsing, {{-replacement, escaping for HTML, etc.)
    • If your message is part of core add it to languages/i18n/en.json
    • If your message is in an extension add it to the i18n/en.json file or the en.json file in the appropriate subdirectory. If an extensions has a lot of messages, you may create subdirectories under i18n and list them in the $wgMessagesDirs variable.
  3. Take a pause and consider the wording of the message. Is it as clear as possible? Can it be misunderstood? Ask for comments from other developers or localizers if possible. Follow the #internationalization hints.
  4. Add documentation to qqq.json in the same directory. Read more about message documentation.

Messages that should not be translated[edit | edit source]

Ignored messages are those which should exist only in the English messages file. They are messages that should not need translation (because they reference other messages or language-neutral features, e.g. a message of '{{SITENAME}}'. To flag such a message:

Optional messages may be translated only if changed in the target language. To flag such a message:

Removing existing messages[edit | edit source]

Remove it from en.json. Don't bother with other languages, not even qqq. Updates from translatewiki.net will handle those automatically.

Changing existing messages[edit | edit source]

  1. Consider updating the message documentation (see Adding new messages).
  2. Change the message key if old translations are not suitable for the new meaning. This also includes changes in message handling (parsing, escaping, parameters, etc.). Improving the phrasing of a message without technical changes is usually not a reason for changing a key. At translatewiki.net, the translations will be marked as outdated so that they can be targeted by translators. If in doubt, ask in #mediawiki-i18n or in the support page at translatewiki.net.
  3. If the extension is supported by translatewiki, please only change the English source message and/or key. If needed, translatewiki.net staff will take care of updating the translations, marking them as outdated, cleaning up the file or renaming keys where possible. This also applies when you're only changing things like HTML tags that you could change in other languages without speaking those languages. Most of these actions will take place in translatewiki.net and will reach Git with about one day of delay.

Localizing namespaces and special page aliases[edit | edit source]

Namespaces and special page names (i.e. RecentChanges in Special:RecentChanges) are also translatable.

Namespaces[edit | edit source]

To allow custom namespaces introduced by your extension to be translated, create a MyExtension.namespaces.php file that looks like this:

<?php
/**
 * Translations of the namespaces introduced by MyExtension.
 *
 * @file
 */
 
$namespaceNames = array();
 
// For wikis where the MyExtension extension is not installed.
if( !defined( 'NS_MYEXTENSION' ) ) {
	define( 'NS_MYEXTENSION', 2510 );
}
 
if( !defined( 'NS_MYEXTENSION_TALK' ) ) {
	define( 'NS_MYEXTENSION_TALK', 2511 );
}
 
/** English */
$namespaceNames['en'] = array(
	NS_MYEXTENSION => 'MyNamespace',
	NS_MYEXTENSION_TALK => 'MyNamespace_talk',
);
 
/** Finnish (Suomi) */
$namespaceNames['fi'] = array(
	NS_MYEXTENSION => 'Nimiavaruuteni',
	NS_MYEXTENSION_TALK => 'Keskustelu_nimiavaruudestani',
);

Then load the namespace translation file in MyExtension.php via $wgExtensionMessagesFiles['MyExtensionNamespaces'] = dirname( __FILE__ ) . '/MyExtension.namespaces.php';

When a user installs MyExtension on their Finnish (fi) wiki, the custom namespace will be translated into Finnish magically, and the user doesn't need to do a thing!

Also remember to register your extension's namespace(s) on the extension default namespaces page.

Special page aliases[edit | edit source]

Create a new file for the special page aliases in this format:

<?php
/**
 * Aliases for the MyExtension extension.
 *
 * @file
 * @ingroup Extensions
 */
 
$aliases = array();
 
/** English */
$aliases['en'] = array(
	'MyExtension' => array( 'MyExtension' )
);
 
/** Finnish (Suomi) */
$aliases['fi'] = array(
	'MyExtension' => array( 'Lisäosani' )
);

Then load it in the extension's setup file like this: $wgExtensionMessagesFiles['MyExtensionAlias'] = dirname( __FILE__ ) . '/MyExtension.alias.php';

When your special page code uses either SpecialPage::getTitleFor( 'MyExtension' ) or $this->getTitle() (in the class that provides Special:MyExtension), the localized alias will be used, if it's available.

Message parameters[edit | edit source]

Some messages take parameters. They are represented by $1, $2, $3, … in the (static) message texts, and replaced at run time. Typical parameter values are numbers ("Delete 3 versions?"), or user names ("Page last edited by $1"), page names, links and so on, or sometimes other messages. They can be of arbitrary complexity.

Switches in messages…[edit | edit source]

See also Manual:Messages API#Notes about gender, grammar, plural.

Parameters values at times influence the exact wording, or grammatical variations in messages. Not resorting to ugly constructs like "$1 (sub)page(s) of his/her userpage", we make switches depending on values known at run time. The (static) message text then supplies each of the possible choices in a list, preceded by the name of the switch, and a reference to the value making a difference. This very much resembles the way, parser functions are called in MediaWiki. Several types of switches are available. These only work if you do full parsing or {{-transformation for the messages.

…on numbers via PLURAL[edit | edit source]

See also Manual:Messages API#Notes about gender, grammar, plural.

MediaWiki supports plurals, which makes for a nicer-looking product. For example:

'undelete_short' => 'Undelete {{PLURAL:$1|one edit|$1 edits}}',

If there is an explicit plural form to be given for a specific number, it is possible with the following syntax

'Box has {{PLURAL:$1|one egg|$1 eggs|12=a dozen eggs}}.'
Be aware of PLURAL use on all numbers[edit | edit source]
See also Plural.

When a number has to be inserted into a message text, be aware that some languages will have to use PLURAL on it even if always larger than 1. The reason is that PLURAL in languages other than English can make very different and complex distinctions, comparable to English 1st, 2nd, 3rd, 4th, … 11th, 12th, 13th, … 21st, 22nd, 23rd, … etc.

Do not try to supply three different messages for cases like 0, 1, more items counted. Rather let one message take them all, and leave it to translators and PLURAL to properly treat possible differences of presenting them in their respective languages.

Always include the number as a parameter if possible. Always add {{PLURAL:}} syntax to the source messages if possible, even if it makes no sense in English. The syntax guides translators.

Fractional numbers are supported, but the plural rules may not be complete.

Pass the number of list items as parameters to messages talking about lists[edit | edit source]

Don't assume that there's only singular and plural. Many languages have more than two forms, which depend on the actual number used and they have to use grammar varying with the number of list items when expressing what is listed in a list visible to readers. Thus, whenever your code computes a list, include count( $list ) as parameter to headlines, lead-ins, footers and other messages about the list, even if the count is not used in English. There is a neutral way to talk about invisible lists, so you can have links to lists on extra pages without having to count items in advance.

…on user names via GENDER[edit | edit source]

See also Manual:Messages API#Notes about gender, grammar, plural.
'foobar-edit-review' => 'Please review {{GENDER:$1|his|her|their}} edits.'

If you refer to a user in a message, pass the user name as parameter to the message and add a mention in the message documentation that gender is supported. If it is likely that GENDER will be used in translations for languages with gender inflections, add it explicitly in the English language source message.

Users have grammatical genders[edit | edit source]
See also Gender.

When a message talks about a user, or relates to a user, or addresses a user directly, the user name should be passed to the message as a parameter. Thus languages having to, or wanting to, use proper gender dependent grammar, can do so. This should be done even when the user name is not intended to appear in the message, such as in "inform the user on his/her talk page", which is better made "inform the user on {{GENDER:$1|his|her|their}} talk page" in English as well.

This doesn't mean that you're encouraged to "sexualize" messages' language: please use gender-neutral language where this can be done with clarity and precision.

…on use context inside sentences via GRAMMAR[edit | edit source]

See also Manual:Messages API#Notes about gender, grammar, plural.

Grammatical transformations for agglutinative languages is also available. For example for Finnish, where it was an absolute necessity to make language files site-independent, i.e. to remove the Wikipedia references. In Finnish, "about Wikipedia" becomes "Tietoja Wikipediasta" and "you can upload it to Wikipedia" becomes "Voit tallentaa tiedoston Wikipediaan". Suffixes are added depending on how the word is used, plus minor modifications to the base. There is a long list of exceptions, but since only a few words needed to be translated, such as the site name, we didn't need to include it.

MediaWiki has grammatical transformation functions for over 20 languages. Some of these are just dictionaries for Wikimedia site names, but others have simple algorithms which will fail for all but the most common cases.

Even before MediaWiki had arbitrary grammatical transformation, it had a nominative/genitive distinction for month names. This distinction is necessary for some languages if you wish to substitute month names into sentences.

Filtering special characters in parameters and messages[edit | edit source]

The other (much simpler) issue with parameter substitution is HTML escaping. Despite being much simpler, MediaWiki does a pretty poor job of it.

Message documentation[edit | edit source]

There is a pseudo-language code qqq for message documentation. It is one of the ISO 639 codes reserved for private use. There we do not keep translations of each message, but collect English sentences about each message: telling us where it is used, giving hints about how to translate it, enumerate and describe its parameters, link to related messages, and so on. In translatewiki.net, these hints are shown to translators when they edit messages.

Programmers must document each message. Message documentation is an essential resource, not just for translators, but for all the maintainers of the module. Whenever a message is added to the software, a corresponding qqq entry must be added as well; revisions which don't do so are marked fixme until the documentation is added (if it's only a couple messages, it might be easier for the developer to add via the translatewiki.net translation interface).

Useful information that should be in the documentation includes:

  1. Message handling (parsing, escaping, plain text).
  2. Type of parameters with example values.
  3. Where the message is used (pages, locations in the user interface).
  4. How the message is used where it is used (a page title, button text, etc..)
  5. What other messages are used together with this message, or which other messages this message refers to
  6. Anything else that could be understood when the message is seen on the context, but not when the message is displayed alone (which is the case when it is being translated).
  7. If applicable, notes about grammar. For example, "Open" in English can be both a verb and an adjective. In many other languages the words are different and it's impossible to guess how to translate them without documentation.
  8. Adjectives that describe things, such as "disabled", "open" or "blocked", must always say what are they describing. In many languages adjectives must have the gender of the noun that they describe. It may also happen that different kinds of things need different adjectives.
  9. If the message has special properties, for example, if it is a page name, or if it should not be a direct translation, but adapted to the culture or the project.
  10. Whether the message appears near other message, for example in a list or a menu. The wording or the grammatical features of the words should probably be similar to the messages nearby. Also, items in a list may have to be properly related to the heading of the list.
  11. Parts of the message that must not be translated, such as generic namespace names, URLs or tags.
  12. Explanations of potentially unclear words, for example abbreviations, like "CTA", or specific jargon, like "oversight" and "stub". (It's best to avoid such words in the first place!)
  13. Screenshots are very helpful. Don't crop - an image of the full screen in which the message appears gives complete context and can be reused in several messages.

A few other hints:

  • Remember that very, very often translators translate the messages without actually using the software.
  • Don't use developers jargon like "nav" or "comps".
  • Consider writing a glossary of the technical terms that are used in your module. If you do it, link to it from the messages.

You can link to other messages by using {{msg-mw|message key}}. Please do this if parts of the messages come from other messages (if it cannot be avoided), or if some messages are shown together or in same context.

translatewiki.net provides some default templates for documentation:

  • {{doc-action|[...]}} for action- messages
  • {{doc-right|[...]}} for right- messages
  • {{doc-group|[...]|[...]}} for messages around user groups (group, member, page, js and css)
  • {{doc-accesskey|[...]}} for accesskey- messages

Have a look at the template pages for more information.

Internationalization hints[edit | edit source]

Besides documentation, translators ask to consider some hints so as to make their work easier and more efficient and to allow an actual and good localisation for all languages. Even if only adding or editing messages in English, one should be aware of the needs of all languages. Each message is translated into more than 300 languages and this should be done in the best possible way. Correct implementation of these hints will very often help you write better messages in English, too.

These are the main places where you can find the assistance of experienced and knowledgeable people regarding i18n:

Please do ask there!

Use #Message parameters and switches properly[edit | edit source]

That's a prerequisite of a correct wording for your messages.

Avoid message reuse[edit | edit source]

The translators encourage avoiding message reuse. This may seem counter-intuitive, because copying and duplicating code is usually a bad practice, but in system messages it is often needed. Although two concepts can be expressed with the same word in English, this doesn't mean they can be expressed with the same word in every language. "OK" is a good example: in English this is used for a generic button label, but in some languages they prefer to use a button label related to the operation which will be performed by the button. Another example is practically any adjective: a word like "multiple" changes according to gender in many languages, so you cannot reuse it to describe several different things, and you must create several separate messages.

If you are adding multiple identical messages, please add message documentation to describe the differences in their contexts. Don't worry about the extra work for translators. Translation memory helps a lot in these while keeping the flexibility to have different translations if needed.

Avoid patchwork messages[edit | edit source]

Languages have varying word orders, and complex grammatical and syntactic rules. Messages put together from lots of pieces of text, possibly with some indirection, are very hard, if not impossible, to translate. (MediaWiki internationalization people call such messages "lego".)

It is better to make messages complete sentences each, with a full stop at the end. Several sentences can usually much more easily be combined into a text block, if needed. When you want to combine several strings, use parameters - translators can order them correctly.

Messages quoting each other[edit | edit source]

An exception from the rule may be messages referring to one another: Enter the original author's name in the field labelled "{{int:name}}" and click "{{int:proceed}}" when done. It is safe when a wiki operator alters the messages "name" or "proceed". Without the int-hack, operators would have to be aware of all related messages needing adjustment, when they alter one.

Separate times from dates in sentences[edit | edit source]

Some languages have to insert something between a date and a time which grammatically depends on other words in a sentence. Thus they will not be able to use date/time combined. Others may find the combination convenient, thus it is usually the best choice to supply three parameter values (date/time, date, time) in such cases.

Avoid {{SITENAME}} in messages[edit | edit source]

{{SITENAME}} has several disadvantages. It can be anything (acronym, word, short phrase, etc.) and, depending on language, may need {{GRAMMAR}} on each occurrence. No matter what, very likely in most wiki languages, each message having {{SITENAME}} will need review for each new wiki installed. When there is not a general GRAMMAR program for a language, as almost always, sysops will have to add or amend php code so as to get {{GRAMMAR}} for {{SITENAME}} working. This requires both more skills, and more understanding, than otherwise. It is more convenient to have generic references like "this wiki". This does not keep installations from altering these messages to use {{SITENAME}}, but at least they don't have to, and they can postpone message adaption until the wiki is already running and used.

Avoid references to screen layout and positions[edit | edit source]

What is rendered where depends on skins. Most often screen layouts of languages written from left to right are mirrored compared to those used for languages written from right to left, but not always, and for some languages and wikis, not entirely. Handheld devices, narrow windows, and so on show blocks underneath each other, that appear side to side on large displays. Since user selected and user written JavaScript gadgets can, and do, hide parts, or move things around in unpredictable ways, there is no reliable way of knowing the actual screen layout.

It is wrong to tie layout information to languages, since the user language may not be the wiki language, and layout is taken from wiki languages, not user languages, unless wiki operators choose to use their home made layout anyways. Acoustic screen readers, and other auxiliary devices do not even have a concept of layout. So, you cannot refer to layout positions in the majority of cases.

We do not currently have a way to branch on wiki directionality (bug 28997)

The upcoming browser support for East and North Asian top-down writing[1] will make screen layouts even more unpredictable.

Have message elements before and after input fields[edit | edit source]

This rule has not become de facto standard in MediaWiki development

While English allows efficient use of prompting in the form "item colon space input-field", many other languages don't. Even in English, you often want to use "Distance: ___ feet" rather than "Distance (in feet): ___". Leaving <textarea> aside, just think of each and every input field following the "Distance: ___ feet" pattern. So:

  • give it two messages, even if the 2nd one is most often empty in English, or
  • allow the placement of inputs via $i parameters.

Avoid untranslated HTML markup in messages[edit | edit source]

HTML markup not requiring translation, such as enclosing <div>s, rulers above or below, and similar, should usually better not be part of messages. They unnecessarily burden translators, increase message file size, and pose the risk to accidentally being altered in the translation process.

Messages are usually longer than you think![edit | edit source]

Skimming foreign language message files, you find messages almost never shorter than Chinese ones, rarely shorter than English ones, and most usually much longer than English ones.

Especially in forms, in front of input fields, English messages tend to be terse, and short. That is often not kept in translations. Especially genuinely un-technical third world languages, vernacular, medieval, or ancient languages require multiple words or even complete sentences to explain foreign, or technical, prompts. E.g. "TSV file:" may have to be translated as: "Please type a name here which denotes a collection of computer data that is comprised of a sequentially organized series of typewritten lines which themselves are organized as a series of informational fields each, where said fields of information are fenced, and the fences between them are single signs of the kind that slips a typewriter carriage forward to the next predefined position each. Here we go: _____ (thank you)" — admittedly an extreme example, but you got the trait. Imagine this sentence in a column in a form where each word occupies a line of its own, and the input field is vertically centered in the next column. :-(

Avoid using very close, similar, or identical words to denote different things, or concepts[edit | edit source]

For example, pages may have older revisions (of a specific date, time, and edit), comprising past versions of said page. The words revision, and version can be used interchangeably. A problem arises, when versioned pages are revised, and the revision, i.e. the process of revising them, is being mentioned, too. This may not pose a serious problem when the two synonyms of "revision" have different translations. Do not rely on that, however. Better is to avoid the use of "revision" aka "version" altogether, then, so as to avoid it being misinterpreted.

Basic words may have unforeseen connotations, or not exist at all[edit | edit source]

There are some words that are hard to translate because of their very specific use in MediaWiki. Some may not be translated at all. For example "namespace", and "apartment", translate the same in Kölsch. There is no word "user" relating to "to use something" in several languages. Sticking to Kölsch, they say "corroborator and participant" in one word since any reference to "use" would too strongly imply "abuse" as well. The term "wiki farm" is translated as "stable full of wikis", since a single crop farm would be a contradiction in terms in the language, and not understood, etc.

Expect untranslated words[edit | edit source]

This rule has not yet become de facto standard in MediaWiki development

It is not uncommon that computerese English is not translated and taken as loanwords, or foreign words. In the latter case, technically correct translations mark them as belonging to another language, usually with apropriate html markup, such as <span lang="en" xml:lang="en"></span>. Make sure that your message output handler passes it along unmolested.

Permit explanatory inline markup[edit | edit source]

This rule has yet to become de facto standard in MediaWiki development

Sometimes there are abbreviations, technical terms, or generally ambiguous words in target languages that may not be immediately understood by newcomers, but are obvious to experienced computer users. So as not to create lengthy explanations causing screen clutter, it may be advisable to have explanations as annotations shown by browsers when you move the mouse over them, such as in:

mḍwwer 90° <abbr title="Ĝks (ṫ-ṫijah) Ĝaqarib s-Saĝa">ĜĜS</abbr>

giving:

mḍwwer 90° ĜĜS

explaining the abbreviation for "counter clockwise" when needed. Thus make sure, your output handler accepts them, even if the original message does not use them.

Symbols, colons, brackets, etc. are parts of messages[edit | edit source]

Many symbols are translated, too. Some scripts have other kinds of brackets than the Latin script has. A colon may not be appropriate after a label or input prompt in some languages. Having those symbols included in messages helps to better and less anglo-centric translations, and by the way reduces code clutter.

Do not expect symbols and punctuation to survive translation[edit | edit source]

Languages written from right to left (as opposed to English) usually swap arrow symbols being presented with "next" and "previous" links, and their placement relative to a message text may, or may not, be inverted as well. Ellipsis may be translated to "etc." or to words. Question marks, exclamation marks, colons do appear at other places than at the end of sentences, or not at all, or twice. As a consequence, always include all of those in your messages, never insert them programmatically.

Use full stops[edit | edit source]

Do terminate normal sentences with full stops. This is often the only indicator for a translator to know that they are not headlines or list items, which may need to be translated differently.

Link anchors[edit | edit source]

Wikicode of links[edit | edit source]

Link anchors can be put into messages in several technical ways:

  1. via wikitext: … [[a wiki page|anchor]] …
  2. via wikitext: … [some-url anchor] …
  3. the anchor text is a message in the MediaWiki namespace. Avoid it!

The latter is often hard or impossible to handle for translators, avoid patchwork messages here, too. Make sure that "some-url" does not contain spaces.

Use meaningful link anchors[edit | edit source]

Care for your wording. Link anchors play an important role in search engine assessment of pages, both the linking ones, and the ones linked to. Make sure that, the anchor describes the target page well. Do avoid commonplace and generic words! For example, "Click here" is an absolute nogo,[1] since target pages never are about "click here". Do not put that in sentences around links either, because "here" was not the place to click. Use precise words telling what a user will get to when following the link, such as "You can upload a file if you wish."

Avoid jargon and slang[edit | edit source]

Avoid developer and power user jargon in messages. Try to use a simple language whenever possible.

One sentence per line[edit | edit source]

Try to have one sentence or similar block in one line. This helps to compare the messages in different languages, and may be used as an hint for segmentation and alignment in translation memories.

Be aware of whitespace and line breaks[edit | edit source]

MediaWiki's localized messages usually get edited within the wiki, either by admins on live wikis or by the translators on translatewiki.net. You should be aware of how whitespace, especially at the beginning or end of your message, will affect editors:

  • Newlines at the beginning or end of a message are fragile, and will be frequently removed by accident. Start and end your message with active text; if you need a newline or paragraph break around it, your surrounding code should deal with adding it to the returned text.
  • Spaces at the beginning or end of a message are also likely to end up being removed during editing, and should be avoided. If a space is required for output, usually your code should be appending it or else you should be using a non-breaking space such as &nbsp; (in which case check your escaping settings!)
  • Try to use literal newlines rather than "\n" characters in the message files; while \n works in double-quoted strings, the file will be formatted more consistently if you stay literal.

Use standard capitalization[edit | edit source]

Capitalization gives hints to translators as to what they are translating, such as single words, list or menu items, phrases, or full sentences. Correct (standard) capitalization may also play a role in search engine assessment of your pages. If you really need to emphasise something with capitals, use CSS styles to do so. For instance, the HTML attributes style="text-transform:uppercase" (uppercase) or style="font-variant:small-caps" (Small Caps) will do. Since these may be adjusted to something else during translation, most specifically for non-Latin scripts, they need to be part of the messages and must not be added programmatically.

Always remember that many writing systems don't have capital letters at all, and some of those that do have them, use them differently from English.

Emphasis[edit | edit source]

In normal text, emphasis like boldface or italics and similar should be part of message texts. Local conventions on emphasis often vary, especially some Asian scripts have their own. Translators must be able to adjust emphasis to their target languages and areas. Try to use "em" and "strong" in your user interface to allow mark-up on a per language or per script basis, even though by end of 2013, this is not yet being done.

Overview of the localisation system[edit | edit source]

Update of localisation[edit | edit source]

As said above, translation happens on translatewiki.net and other systems are discouraged. Here's a little background on the process that gets translations for MediaWiki live on a Wikimedia wiki.

Developers add or change system messages.

Users make translations on translatewiki.net.

Automated tools export these messages, build new versions of message files incorporating these messages, for both core and extensions, and commit them to git.

Wikimedia projects and any other wikis can benefit immediately and automatically from localisation work thanks to the LocalisationUpdate extension.[2] This compares the latest English messages to the English messages in production. If the English messages are the same, the production translations are updated and made available to users.

Once translations are in the version control system, the Wikimedia Foundation has a daily job that updates a checkout or clone of the extension repository.

Because changes on translatewiki.net are pushed to the code daily as well, this means that each change to a message can potentially be applied to all existing MediaWiki installations in a couple days without any manual intervention or traumatic code update.

As you can see this is a multi-step process. Over time, we have found out that many things can go wrong. If you think the process is broken, please make sure to report it on our Support page, or create a new bug in Bugzilla. Always be sure to describe a precise observation.

Message sources[edit | edit source]

Code looks up system messages from these sources:

  • The MediaWiki namespace. This allows wikis to adopt, or override, all of their messages, when standard messages do not fit or are not desired (see #Old local translation system).
    • MediaWiki:Message-name is the default message,
    • MediaWiki:Message-name/language-code is the message to be used when a user has selected a language other then the wiki's default language.
  • From message files:
    • Core MediaWiki itself and most currently maintained extensions use a file per language, named zxx.json, where zxx is the language code for the language.
    • Some older extensions use a combined message file holding all messages in all languages, usually named MyExtensionName.i18n.php.
    • Many Wikimedia Foundation wikis access some messages from the WikimediaMessages extension, allowing them to standardize messages across WMF wikis without imposing them on every MediaWiki installation.
    • A few extensions use other techniques.

Caching[edit | edit source]

System messages are one of the more significant components of MediaWiki, primarily because it is used in every web request. The PHP message files are large, since they store thousands of message keys and values. Loading this file (and possibly multiple files, if the user's language is different from the content language) has a large memory and performance cost. An aggressive, layered caching system is used to reduce this performance impact.

MediaWiki has lots of caching mechanisms built in, which make the code somewhat more difficult to understand. Since 1.16 there is a new caching system, which caches messages either in .cdb files or in the database. Customised messages are cached in the filesystem and in memcached (or alternative), depending on the configuration.

The table below gives an overview of the settings involved:

Location of cache storage $wgLocalisationCacheConf
'store' => 'db'
 
'store' => 'detect'
(default)
'store' => 'files'
 
$wgCacheDirectory = false
(default)
l10n cache table l10n cache table error (undefined path)
= path l10n cache table local filesystem local filesystem

Function backtrace[edit | edit source]

To better visually depict the layers of caching, here is a function backtrace of what methods are called when retrieving a message. See the below sections for an explanation of each layer.

  • Message::fetchMessage()
  • MessageCache::get()
  • Language::getMessage()
  • LocalisationCache::getSubitem()
  • LCStore::get()

MessageCache[edit | edit source]

The MessageCache class is the top level of caching for messages. It is called from the Message class and returns the final raw contents of a message. This layer handles the following logic:

  • Checking for message overrides in the database
  • Caching overridden messages in Memcached, or whatever $wgMessageCacheType is set to
  • Resolving the remainder of the language fallback sequence

The last bullet is important. Language fallbacks allow MediaWiki to fall back on another language if the original does not have a message being asked for. As mentioned in the next section, most of the language fallback resolution occurs at a lower level. However, only the MessageCache layer checks the database for overridden messages. Thus integrating overridden messages from the database into the fallback chain is done here. If not using the database, this entire layer can be disabled.

LocalisationCache[edit | edit source]

See LocalisationCache

LCStore[edit | edit source]

The LCStore class is merely a back-end implementation used by the LocalisationCache class for actually caching and retrieving messages. Like the BagOStuff class, which is used for general caching in MediaWiki, there are a number of different cache types (configured using $wgLocalisationCacheConf):

  • "db" (default) - Caches messages in the database
  • "file" (default if $wgCacheDirectory is set) - Uses CDB to cache messages in a local file
  • "accel" - Uses APC or another opcode cache to store the data

The "file" option is used by the Wikimedia Foundation and is recommended because it is faster than going to the database and more reliable than the APC cache, especially since APC is incompatible with PHP versions 5.5 or later.

License[edit | edit source]

Any edits made to the language must be licensed under the terms of the GNU General Public License (and GFDL?) to be included in the MediaWiki software.

Old local translation system[edit | edit source]

With MediaWiki 1.3.0 a new system was set up for localizing MediaWiki. Instead of editing the language file and asking developers to apply the change, users can edit the interface strings directly from their wikis. This is the system in use as of August 2005. People can find the message they want to translate in Special:AllMessages and then edit the relevant string in the MediaWiki: namespace. Once edited, these changes are live. There is no more need to request an update, and wait for developers to check and update the file.

The system is great for Wikipedia projects; however a side effect is that the MediaWiki language files shipped with the software are no longer quite up-to-date, and it is harder for developers to keep the files on meta in sync with the real language files.

As the default language files do not provide enough translated material, we face two problems:

  1. New Wikimedia projects created in a language which has not been updated for a long time, need a total re-translation of the interface.
  2. Other users of MediaWiki (including Wikimedia projects in the same language) are left with untranslated interfaces. This is especially unfortunate for the smaller languages which don't have many translators.

This is not such a big issue anymore, because translatewiki.net is advertised prominently and used by almost all translations. Local translations still do happen sometimes but they're strongly discouraged. Local messages mostly have to be deleted, moving the relevant translations to translatewiki.net and leaving on the wiki only the site-specific customisation; there's a huge backlog especially in older projects, this tool helps with cleanup.

Keeping messages centralized and in sync[edit | edit source]

English messages are very rarely out of sync with the code. Experience has shown that it's convenient to have all the English messages in the same place. Revising the English text can be done without reference to the code, just like translation can. Programmers sometimes make very poor choices for the default text.

Appendix[edit | edit source]

What can be localised[edit | edit source]

So many things are localisable on MediaWiki that not all of them are directly available on translatewiki.net: see translatewiki:Translating:MediaWiki. If something requires a developer intervention on the code, you can request it on bugzilla, or ask at translatewiki:Support if you don't know what to do exactly.

Graph of languages fallback
  • Fallback (that is, other more closely related language(s) to use when a translation is not available, instead of the default fallback, which is English)
  • Directionality (left to right or right to left, RTL)
  • Direction mark character depending on RTL
  • Arrow depending on RTL
  • Languages where italics cannot be used
  • Number formatting (commafy i.e. adding or not digits separators; transform digits; transform separators)[3]
  • Truncate (multibyte)
  • Grammar conversions for inflected languages
  • Plural transformations
  • Formatting expiry times[clarification needed]
  • Segmenting for diffs (Chinese)
  • Convert to variants of language (between different orthographies, or scripts)
  • Language specific user preference options
  • Link trails and link prefix, e.g.: [[foo]]bar These are letters that can be glued after/before the closing/opening brackets of a wiki link, but appear rendered on the screen as if part of the link (that is, clickable and in the same color). By default the link trail is "a-z"; you may want to add the accentuated or non-latin letters used by your language to the list.
  • Language code (preferably used according to the latest RFC in standard BCP 47, currently RFC 5646, with its associated IANA database ; avoid deprecated, grandfthered and private-use codes: look at what they mean in standard ISO 649, and avoid codes assigned to collections/families of languages in ISO 649-5, and ISO 649 codes which were not imported in the IANA database for BCP 47)
  • Type of emphasizing
  • The Cite extension has a special page file per language, cite_text-zxx for language code zxx.

Neat functionality:

  • I18N sprintfDate
  • Roman numeral formatting

Namespace name aliases[edit | edit source]

Namespace name aliases are additional names which can be used to address existing namespaces. They are rarely needed, but not having them when they are, usually creates havoc in existing wikis.

You need namespace name aliases:

  1. When a language has variants, and these variants spell some namespaces differently, and you want editors to be able to use the variant spellings. Variants are selectable in the user preferences. Users always see their selected variant, except in wikitext, but when editing or searching, an arbitrary variant can be used.
  2. When a wiki exists, and its language, fall back language(s), or localization is changed, and with it are some namespace names. So as not to break the links already present in the wiki, that are using the old namespace names, you need to add each of the altered previous namespace names to its namespace name aliases, when, or before, the change is made.

The generic English namespace names are always present as namespace name aliases in all localizations, so you need not, and should not, add those.

Aliases can't be translated on translatewiki.net, but can be requested there or on bugzilla: see translatewiki:Translating:MediaWiki#Namespace name aliases.

Regional settings[edit | edit source]

Some linguistic settings vary across geographies; MediaWiki doesn't have a concept of region, it only has languages. These settings need to be set once as a language's default, then individual wikis can change them as they wish in their configuration.

Time and date formats[edit | edit source]

Time and dates are shown on special pages and alike. The default time and date format is used for signatures, so it should be the most used and most widely understood format for users of that language. Also anonymous users see the default format. Registered users can choose other formats in their preferences.

If you are familiar with PHP's time() format, you can try to construct formats yourself. MediaWiki uses a similar format string, with some extra features. If you don't understand the previous sentence, that's OK. You can provide a list of examples for developers.

Edit window toolbar buttons[edit | edit source]

Not to be confused with WikiEditor's "advanced toolbar", which has similar features.

When a wiki page is being edited, and a user has allowed it in their Special:Preferences, a set of icons is displayed above the text area where one can edit. The toolbar buttons can be set [2] but there are no messages for it. What we need is a set of properly sized .png files. Plenty of samples can be found in commons:Category:ButtonToolbar, and there is an empty button image to start off from.

Note, this can only be done when your language is already enabled in MediaWiki, which usually means a good portion of its messages have been translated; otherwise you must just wait, and have it done later.

Missing[edit | edit source]

This section is missing about the changes in the i18n system related to extensions. The format was standardized and messages are automatically loaded. See Message sources.

References[edit | edit source]

  1. http://dev.w3.org/csswg/css3-writing-modes/
  2. Which works through the localisation cache and for instance on Wikimedia projects updates it daily; see also the technical details about the specific implementation.
  3. These are configured by language in the respective language/classes/LanguageXx.php or language/messages/MessagesXx.php files.

See also[edit | edit source]

Language: English  • 日本語