Content translation/Documentation/Comparison with the Translate extension

The Language team works on two major MediaWiki extensions, both of which are used for translation: Content Translation and Translate. This document serves to explain the differences between them and why the two extensions are needed.

Content Translation extension
Content Translation is used for creating Wikipedia articles by translating them from an article about the same topic in another language. See a short video about Content Translation.

In the screenshot you can see the translation of the article Beit Hakerem, Jerusalem from English into Japanese. Machine translation using Google Translate is enabled, but the result is not yet edited, and probably contains errors in grammar and vocabulary. The paragraph is marked in yellow to attract the translator’s attention to the need to correct these errors.

Translate extension
Translate is used for three kinds of tasks:

The screenshot shows translating the weekly Tech News on meta.wikimedia.org from English to Hebrew. The text is divided into short sentences. Some technical parts that don’t need to be translated are packed as variables (known as ): ,  , and. The Suggestions in the right-hand sidebar show translation memory—a past translation of similar sentence.
 * 1) Translating the user interface of software: This is used on the translatewiki.net website for translating the user interface of several free software projects: MediaWiki, MediaWiki extensions, Wikimedia mobile apps, Pageviews tool, and also several projects not related to Wikipedia, such as Etherpad and OpenStreetMap. In this capacity, it is integrated with the Gerrit source code management system—all the new and updated translatable user interface messages (strings) in English are semi-automatically copied from Gerrit to translatewiki, and all the translated strings are semi-automatically copied from Gerrit to translatewiki.
 * 2) Translating wiki pages on multilingual wikis and community wikis: Commons, Wikidata, Meta, mediawiki.org, and several others.
 * 3) Translating CentralNotice banners for fundraising, article writing campaigns, and other purposes.

The second task of Translate, “translating wiki pages”, may sound very similar to translating Wikipedia articles. However, there is a significant difference between the two. The pages for which the Translate extension is used tend to be tightly structured and stable. Some examples:


 * Newsletters, which are written for a few days, translated, sent out, and never modified again. For example, the weekly Tech News.
 * Software user manuals, which are written over time, but eventually become stable. For example, the VisualEditor user guide on mediawiki.org.
 * Community policies, for example the Deletion policy on Commons.
 * Legal documents, such as the Terms of use.

Pages that are translatable using the Translate extension have to be prepared for translation: all the parts that have to be translated must be marked with XML-like  tags. The purpose of this is to indicate which pages can be translated and divide the long text into small parts that are easy to translate one by one and to help translators skip all the parts that don’t have to be translated (images, templates, tables, numbers, code examples, etc). This division into small units also helps translators identify which parts of the page were updated, so that their translation can be updated easily and separately from the parts that weren’t modified. These smaller units enable the software to track changes and surface any outdated areas for translators to edit.

Adding such tags to Wikipedia articles, however, would be extremely uncomfortable for Wikipedia editors. Unlike user manuals or legal documents pages, Wikipedia articles change frequently and unexpectedly, both in their text and their structure. For editors who mostly edit in one language, seeing  tags everywhere in the text would inhibit the ease of translation.

To demonstrate this, here is how the wikitext source of the same section looks like:

&lt;languages/>  &lt;translate> Latest technews>m:Special:MyLanguage/Tech/News&lt;/>|tech news from the Wikimedia technical community. Please tell other users about these changes. Not all changes will affect you. more-transl>m:Special:MyLanguage/Tech/News/2020/27&lt;/>|Translations are available.&lt;/translate>

This has a very large amount of markup:,  ,  ,  ,  , and more, which get in the way of editing the text. Therefore, the Content Translation extension was developed as a distinct product for translating Wikipedia articles. No technical preparation is needed to translate an article using this extension, and the workflow of the editors who write the source article is not affected in any way. Content Translation also focuses on creating the first version of a translated article rather than keeping the article’s translation up to date with the source article. This is a trade-off, because it would be quite useful to see how a Wikipedia article changed since it was first translated, but actually doing it in the manner in which it is done in the Translate extension does not scale well.

In addition, Content Translation recognizes that Wikipedias have different writing styles across the different languages. It doesn’t force the translators to stick strictly to the content and the structure of the source article. This is different from the Translate extension, which strongly encourages precise translation and forces identical page structure.

And finally, Content Translation gets translators to use only rich-text WYSIWYG editing, using VisualEditor as a component. Editing in wiki syntax is not allowed. This is done for two reasons: to make it generally easier for new Wikipedia users who are not familiar with wikitext, Wikipedia’s markup language, and to make it easy to adapt content such as images, links, and templates semi-automatically from the source article to the target article. In contrast, the Translate extension uses only wikitext source editing, because it is targeted at more experienced editors and needs very precise and fine-grained formatting.