Help:Extension:Translate/Page translation administration

What. The page translation feature allows controlled translation of wiki pages into other languages. That means that the content of each translation will be, usually, equal to the source page. This is opposed to, for example, the different language version of articles in different Wikipedias, which are fully independent of each other. It is assumed that pages are only translated from one primary language to other languages, but translators can take advantage of translations in other languages too if they exist.

Why. Without any help, translating more than a few pages into other languages becomes a time-waster at best, an unmaintainable mess at worst. With the page translation feature you can avoid the mess and bring structure to the translation process. The core idea is that the source text is segmented into smaller units, each which will be translated individually. When the source text is segmented into units, all changes can be isolated and translators only need to update the translations of units which have had changes in source text. It also enables translators to work on manageable size units, share the work between multiple translators or continue the translation in later sessions because they don't need to do all at once.

Who. This page elaborates on the page translation tutorial by providing deeper insight how the system works and suggests best practices for wide variety of cases. This page is intended for page translation administrators and generally for everyone who edits the source text of translatable pages, even if they don't have the access to the administrative features of approving changes for translation. Development oriented things including known issues and future plans are documented at the page translation reference page.

Life of a translatable page
Roles. Multiple people are involved in the process of writing and translating a wiki page: the initial writer creates a page, someone corrects spelling errors, a page translation administrator marks the page for translation, translators translate, someone makes changes to the page, a page translation administrator marks those changes for translation and translators update translations. Those roles may overlap more or less, but the ultimate responsibility for a hassle-free translation is left for the page translation administrator. The administrator decides when the page is ready for translation the first time, ensures that the segmentation serves a purpose and approves (or corrects) changes.

Preparation. To have something translated you have to write it first. If you already have done translation without translate extension, see below the section about migrating translations. If you want lots of translation and quickly, it is crucial for the source text to be in good shape. Before marking page for translation, ask someone else to proofread it and if possible ask language specialist to make the text more clear and concise. Difficult vocabulary and hard to understand sentences are a show stopper to many volunteer translations. Markup too can cause problems for translators, but as a translation administrator you can avoid those issues, see below the section about handling markup. Naturally the changes you make to the source text of translation force update of all existing translations, so it is better to wait until the contents of the page have stabilized. On the other hand, changes do happen often, and the system handles that well, so check out the section about handling changes below.

Tagging. When the text is otherwise ready for translation, anyone can mark the translatable parts by wrapping them inside &lt;translate> tags and adding the &lt;languages /> bar to the page. The latter adds a list of all translations of the page, with their completion and up-to-date percentages. There is no other indication that translations exist. See below how to actually do the tagging. The system will detect when the tags are placed on the translatable page, and the page will have link to mark it for translation. It will also complain and prevent saving if you for example forgot to adding closing tag. The translatable page will also be listed on Special:PageTranslation as ready for marking.

Marking. After the tagging a translation administrator marks the page for translation. The interface is explained in /Page translation example. The translation administrator's responsibility is to make sure that the segmentation makes sense and that the tagging has been properly. Page can be marked again if it has had changes since the previous time it was marked. See below [[#changes|how to make changes that cause minimal disruptions]. The marking of the page starts a background process that uses MediaWiki's jobqueue. This process goes over through each translation subpage and regenerates it. Changes in template will be reflected and outdated translations will be marked as such. The translation interface is updated immediately. Translating new units may not work until the background process has also updated message index - a mapping from translation unit pages to message keys to the message groups (in this case translatable pages) it belongs.

Changes. Users can keep making changes to the translatable page source. The changes will be visible to users viewing the source language version, but translations are done against the latest marked version and so translation pages also show the old version and can be 100% up to date even if the source page only has unmarked changes. There is also translation subpage with language code of the source language. This page is not linked from the interface, but it is handy when you want to for example include the page (when translating templates) or export it (transltion pages don't contain the extra tags and other markup used to mark the translatable units). You can easily see whether there are unmarked changes when viewing the translatable page in source language: there is a notice at the top which says that you can translate this page and also links to changes if there are any.

Discouraging. Some translatable pages have content that is only interesting for certain period of time. For example announcements and regular status updates like the Wikimedia monthly highlights. You can keep those pages around with translations, but hide them from the translation interface. This does not prevent further translations to the pages, but it greatly reduces the change that user accidentally starts translating the page. Discouraging and the reverse, encouraging is done from Special:TranslatablePages.

Prioritizing languages. You can also define a list of languages that you specifically want translations into. When translating into language that is not listed in the priority list, translators are given a notice. You can also prevent translation in other languages, say if you need to use the translations in other systems that can only handle limited number of languages. The page will appear as if it would be discouraged for the languages not in the priority list.

Grouping. It is possible to group related pages together, say documentation of one MediaWiki extension. These aggregate groups work like all other message groups. They have their own statistics and contain all the messages of the subgroups (translatable pages). This functionality is currently in Special:AggregateGroups. Aggregate message groups are collapsed by default in Special:LanguageStats.

Moving. You can move translatable pages like you would move any other page. When moving you can choose whether you want to move any non-translation subpages too. The moving also uses background job to move the many related pages. While the move is in progress, it is not possible to translate the page. Completion is noted in the page translation log.

Deleting. Like moving, deleting is accessed from the normal place. You can either delete the whole translatable page, or just one translation of it. To delete one translation, go to the translation subpage and then access delete. Like moving, a background process will delete the pages over time. Completion is noted in the page translation log.

It is also possible to unmark a page. First you need to remove all &lt;translate> tags from the page. Then you can use Special:PageTranslation (section broken pages) or follow the link in the translatable page to do it. This will remove any page translation related structures, but leaves all the existing pages in place, freely editable. This is not recommended.

Anatomy of translatable page
A translatable page will produce many pages when translated.
 * Page (the source page or translatable page)
 * Page/ (the translation pages, also including copy of the source page without markup acting like other translations)
 * Translations:Page/ / (all the individually translated units)

In addition there is the translatable page template and source text of each translatable unit stored in the database. The system keeps track which versions of the source page contain translate tags and which version of them have been marked for translation. Page translation features are accessed via Special:PageTranslation.

Each time a translation unit translation is updated, the system will also regenerate the corresponding translation page. This will result in two edits. The translation unit edit is hidden by default in recent changes and can be shown by choosing 'show translations' from the translation filter. Any other action than editing on the source pages will not trigger regeneration of the translation page.

Segmentation and markup
General principles:
 * 1) All text intended for translation must be wrapped inside translate tags. There can be multiple tag pairs in one page.
 * 2) Everything outside those tags will not change in any translated version. This static text together with placeholder for translatable units is called the translatable page template.
 * 3) Too much mark-up in the text makes it difficult for translators to translate.Use more fine grained placing of Translate tags when there is lots of markup.
 * 4) The text inside translate tags are split into units between one or more empty lines.

Restrictions. The page translation feature places some restrictions on the text. There should not be any mark-up that will span over two or more sections. In other words, each paragraph should be standalone and be complete in isolation. This is currently not enforced in the software, but violating it will cause invalid rendering of the page, the severity depending on whether the resulting html is fixed by tidy or not.

Parsing order. Beware, the translate tags work differently from other tags, because they do not go trough the parser. This should not cause problems usually, but may if you are trying something fancy. In more detail, they are parsed before any other tags like &lt;pre> or &lt;source>, but after &lt;nowiki>.

Tag placing. The extension has simple whitespace handling: whitespace is preserved, except if a starting or ending tag is the only thing on a line. In that case the trailing newline character for starting tags is eaten, and similarly for the preceding newline character for the closing tag. This only means that they don't cause extra new lines in the rendered version of the page. If possible, try to put the tags on their own lines, with no empty lines between the content and the tags. Sometimes this is not possible, for example if you want to translate some content surrounded by the markup, but not the markup itself. This is fine too, for example:

Variables. It is possible to use variables similar to template variables. The syntax for this is &lt;tvar|name>contents. For translators these will show up only as $name, and will automatically be replaced by the value in translation pages. Variables can be used to hide untranslatable content in middle of translation unit. It also works for things like numbers that need to be updated. You can update the number in all translations by re-marking the page without invalidating translations.

Markup examples
Below are listed some alternatives and suggested ways to handle different kinds of wiki markup.

{| class=wikitable No translation: Category:Cars Translation by adding language suffix: Category:Cars/fi (recommended)
 * Categories
 * Categories can be added in two ways: in the translation template or in translation units. If you have the categories in the template, all translations will end up with the category. If you have them up for translation, you should teach the users a naming scheme. On the right we show three possible schemes which are independent of the technical means to adopt them.
 * All translations in same category (good if only few languages, bad if many)
 * Category name not translated (can be put as is in the translation template)
 * Category page name not translated (just like the page names)
 * One category for each language
 * Page translation could be used for the category itself and the categories would be linked together and the headers would be translated (but not the name of the category in links and such)

Wrong: == &lt;translate>Culture&lt;/translate> ==
 * Headers
 * Headers can in principle be tied to the following paragraph, but it is better to have them separated. This way someone can quickly translate the table of contents before going into the contents. When tagging headers, it is important to include the header markup inside the tags, or MediaWiki will no longer identify them properly, for example when trying to edit a specific section of the source page. The markup also immediately gives translator a context: he/she is translating a header.
 * Headers can in principle be tied to the following paragraph, but it is better to have them separated. This way someone can quickly translate the table of contents before going into the contents. When tagging headers, it is important to include the header markup inside the tags, or MediaWiki will no longer identify them properly, for example when trying to edit a specific section of the source page. The markup also immediately gives translator a context: he/she is translating a header.

Correct: &lt;translate>== Culture ==&lt;/translate> Suggested segmentation: &lt;translate>

Culture
Lorem ipsum dolor. &lt;/translate>

&lt;translate>&lt;/translate>
 * Images
 * Images that do contain language specific content like text should include the full image syntax in an unit. Other images can only tag the description with optional hint in message documentation of the page has been marked.
 * Images that do contain language specific content like text should include the full image syntax in an unit. Other images can only tag the description with optional hint in message documentation of the page has been marked.


 * Links
 * Links can be included in the paragraph they are inside.
 * &lt;translate>Helsinki is capital of Finland.&lt;/translate>
 * &lt;translate>Helsinki is capital of Finland.&lt;/translate>

&lt;translate> &lt;/translate>&lt;translate> &lt;/translate>
 * Lists
 * Lists can get long, so might want to split them into multiple parts with for example five items in each as follows. Do so only if the items are sufficiently independent to be translate separately in all languages, don't create "lego messages": for instance, you must avoid to split a single sentence in multiple units, or to separate logically dependent parts which may affect each other (with regard to punctuation or style of the list, for instance).
 * Lists can get long, so might want to split them into multiple parts with for example five items in each as follows. Do so only if the items are sufficiently independent to be translate separately in all languages, don't create "lego messages": for instance, you must avoid to split a single sentence in multiple units, or to separate logically dependent parts which may affect each other (with regard to punctuation or style of the list, for instance).
 * General principles
 * Headings
 * Images
 * Tables
 * Categories
 * Links
 * Templates

&lt;translate>Income this month &lt;tvar|income> EUR&lt;translate> Note that this prevents the translators for localising the number by doing currency conversion. The formatnum call makes sure the number is formatted correctly in the target language.
 * Numbers
 * With numbers and other non-linguistic elements you may want to pull the actual number of translation and making it a variable. This has multiple benefits:
 * You can update the number without invalidating translations.
 * Translation memory can work better when the changing number is ignored.
 * Translation memory can work better when the changing number is ignored.


 * Templates
 * Templates have varying functions and purposes, so the best solution depends on what the template is for. If the template is not a part of longer paragraph, it should be left out, unless it has parameters that need to be translated. If the template has no linguistic content itself, you don't need to do anything for the template itself.
 * For example of templates translated with page translation, see Template:Extension-Translate. To use this template, you need to have another template similar to Template:Translatable_navigation_template, because you cannot include the template by anymore. This is not yet provided by the Translate extension itself, but that is in the plans.
 * For example of templates translated with page translation, see Template:Extension-Translate. To use this template, you need to have another template similar to Template:Translatable_navigation_template, because you cannot include the template by anymore. This is not yet provided by the Translate extension itself, but that is in the plans.

Another way is to use the unstructed element translation to translate the template, but then the language of the template will follow the user's interface language, not the language of the page he is viewing.
 * }

Changing the source text
General principles:
 * Avoid changes
 * Make the changes as isolated as possible
 * Do not add unit markers yourself

Unit markers. When page is marked for translation, the system will update the translatable page source and add unique identifiers for each translation unit. See example below. These markers are crucial for the system, which uses them to track changes to each translation unit. You should never add unit markers yourself. The markers are always on the line before the section, or if it starts with a header, after the first header on the same line. The reason for different placement for headers is to keep section editing working as expected.

&lt;translate>

Birds
&lt;!--T:1--> Birds are animals which....

&lt;!--T:2--> Birds can fly and... &lt;/translate>

Changing unit text. Changing is the most common operation for translation units. You can fix spelling mistake, correct grammar or do other changes to the unit. When re-marking you will see the difference in the unit text. The same difference is also shown to translators when they update their translations. For simple spelling fixes, you can opt out of invalidating existing translations. The translators will still see the difference if they ever update the translation for any reason. If you make changes that change the meaning considerably, you might want to remove the unit marker to prevent outdated information in translations. In this case translators have to translate the message from scratch, although translation memory can help.

Adding new text. You can freely add new text inside translate tags. Make sure that there is one empty line between adjacent units, so that the system will see it as a new unit. You can also add translate tags around the new text, if it is not inside existing translate tags. Again, do not add unit markers yourself, the system will do it.

Deleting text. You can delete whole units, and if you do so, also remove the unit marker.

Splitting units. You can split existing units by adding empty line in middle of unit, or by placing translate tags so that they split the unit. You can either keep the unit marker with the first unit or remove altogether. In the first case, when the page is remarked, the old translations remains visible, but marked as outdated. The new unit will appear in source language and repeat the latter half of the old translation which would belong to the new unit. If you removed the unit marker, both units will appear in the source language, when the page is remarked.

Merging units. If you merge units, it is recommended that you remove the unit markers. If you only keep the first unit marker, the translation page would not show the text of the latter unit until the translation would be updated.

Moving units. You can move units around without invalidating translations - just keep the unit marker with rest of the unit.

Reviewing changes. When marking a new version for translation, wikitext differences are shown for each section. New and deleted sections are also listed. The person who marks can then do the final review, before creating new work for translators.

Before marking the new version of the page for translation, ensure that the best practices are followed, especially that translators get a new section if the content has changed. Also make sure that there are no unnecessary changes to prevent wasting translators time. If the source page is getting many changes, it may be worthwhile to wait for it to stabilize, and only after that push the work for translators.

The software does not check if a previously used section id is first made unused and taken into use again. These messages will show the difference like a changed message to translators. Unused unit translations are not deleted automatically, but that should not cause trouble.

Migrating to page translation
If you have been translating pages before using the page translation system, you might want to migrate the pages to the new system, at least the ones you expect still to have new translations and want statistics for. You will probably have existing language switching templates and maybe different page naming conventions.

You can start migration by cleaning up, tagging and marking the source page. You can keep the existing language switching templates while you migrate the old translations. If your pages follow the langauge code subpage naming convention, they will be replaced with source text, but you can still access translations from history.

This is manual work, where you have to open the old translation page and copy paste translations from there to correct translation units in the new system using the translation interface. For this you need roughly know which part of the translation matches which part of the old text (and hope they match). You might want to consider marking all the migrated translations fuzzy by prepending string !!FUZZY!! to the translations and have a translator to look at them. Once migrated, you can delete the old translation pages if they are not using the same naming convention. Once all pages are migrated you can also remove old language navigation templates.