Content translation/Documentation/Comparison with the Translate extension

From mediawiki.org
Jump to navigation Jump to search

The Language team works on two major MediaWiki extensions, both of which are used for translation: Content Translation and Translate. This document serves to explain the differences between them and why the two extensions are needed.

Summary table[edit]

Content Translation Translate
Translate any Wikipedia article, without any technical preparation Translate wiki pages prepared for translation on multilingual sites, such as Commons and mediawiki.org, and software user interface on translatewiki.net.
Content adaptation encouraged Content structure enforced
No support for updating existing content Status tracking and surfacing of outdated content needing update
No collaboration Collaboration possible and explicit proofreading (review)
Side-by-side editor with automatic adaptation Unit-focused editor with translation memory and message documentation
Visual editing Wiki syntax editing

Content Translation extension[edit]

Content Translation is used for creating Wikipedia articles[1] by translating them from an article about the same topic in another language. See a short video about Content Translation.

Beit Hakerem Content Translation demo screenshot.png

In the screenshot you can see the translation of the article Beit Hakerem, Jerusalem from English into Japanese. Machine translation using Google Translate is enabled, but the result is not yet edited, and probably contains errors in grammar and vocabulary. The paragraph is marked in yellow to attract the translator’s attention to the need to correct these errors.

Translate extension[edit]

Translate is used for three kinds of tasks:

  1. Translating the user interface of software: This is used on the translatewiki.net website for translating the user interface of several free software projects: MediaWiki, MediaWiki extensions, Wikimedia mobile apps, Pageviews tool, and also several projects not related to Wikipedia, such as Etherpad and OpenStreetMap. In this capacity, it is integrated with the Gerrit source code management system—all the new and updated translatable user interface messages (strings) in English are semi-automatically copied from Gerrit to translatewiki, and all the translated strings are semi-automatically copied from Gerrit to translatewiki.
  2. Translating wiki pages on multilingual wikis and community wikis: Commons, Wikidata, Meta, mediawiki.org, and several others.
  3. Translating CentralNotice banners for fundraising, article writing campaigns, and other purposes.
Tech News translation demo screenshot.png

The screenshot shows translating the weekly Tech News on meta.wikimedia.org from English to Hebrew. The text is divided into short sentences. Some technical parts that don’t need to be translated are packed as variables (known as <tvar>): $list, $contribute, and $feedback. The Suggestions in the right-hand sidebar show translation memory—a past translation of similar sentence.

The second task of Translate, “translating wiki pages”, may sound very similar to translating Wikipedia articles. However, there is a significant difference between the two. The pages for which the Translate extension is used tend to be tightly structured and stable. Some examples:

Pages that are translatable using the Translate extension have to be prepared for translation: all the parts that have to be translated must be marked with XML-like <translate></translate> tags. The purpose of this is to indicate which pages can be translated and divide the long text into small parts that are easy to translate one by one and to help translators skip all the parts that don’t have to be translated (images, templates, tables, numbers, code examples, etc). This division into small units also helps translators identify which parts of the page were updated, so that their translation can be updated easily and separately from the parts that weren’t modified. These smaller units enable the software to track changes and surface any outdated areas for translators to edit.

Adding such tags to Wikipedia articles, however, would be extremely uncomfortable for Wikipedia editors. Unlike user manuals or legal documents pages, Wikipedia articles change frequently and unexpectedly, both in their text and their structure. For editors who mostly edit in one language, seeing <translate></translate> tags everywhere in the text would inhibit the ease of translation.

To demonstrate this, here is how the wikitext source of the same section looks like:

{{Tech header|<translate><!--T:1-->
The Tech News weekly summaries help you monitor recent software changes likely to impact you and your fellow Wikimedians. [[<tvar|list>Global message delivery/Targets/Tech ambassadors</>|Subscribe]], [[<tvar|contribute>Special:MyLanguage/Tech/News#contribute</>|contribute]] and [[<tvar|feedback>Talk:Tech/News</>|give feedback]].</translate>}}
{{Deadline|timeanddate=https://www.timeanddate.com/countdown/generic?iso=20200629T09&msg={{URLENCODE:Publication of Wikimedia Tech News}}}}
{{Tech news nav}}
<languages/>
<section begin="tech-newsletter-content"/><div class="plainlinks">
<translate><!--T:2-->
Latest '''[[<tvar|technews>m:Special:MyLanguage/Tech/News</>|tech news]]''' from the Wikimedia technical community. Please tell other users about these changes. Not all changes will affect you. [[<tvar|more-transl>m:Special:MyLanguage/Tech/News/2020/27</>|Translations]] are available.</translate>

This has a very large amount of markup: <translate>, <languages>, <tvar>, </>, <!--T:1-->, and more, which get in the way of editing the text. Therefore, the Content Translation extension was developed as a distinct product for translating Wikipedia articles. No technical preparation is needed to translate an article using this extension, and the workflow of the editors who write the source article is not affected in any way. Content Translation also focuses on creating the first version of a translated article rather than keeping the article’s translation up to date with the source article. This is a trade-off, because it would be quite useful to see how a Wikipedia article changed since it was first translated, but actually doing it in the manner in which it is done in the Translate extension does not scale well.[5]

In addition, Content Translation recognizes that Wikipedias have different writing styles across the different languages. It doesn’t force the translators to stick strictly to the content and the structure of the source article. This is different from the Translate extension, which strongly encourages precise translation and forces identical page structure.

And finally, Content Translation gets translators to use only rich-text WYSIWYG editing, using VisualEditor as a component. Editing in wiki syntax is not allowed.[6] This is done for two reasons: to make it generally easier for new Wikipedia users who are not familiar with wikitext, Wikipedia’s markup language, and to make it easy to adapt content such as images, links, and templates semi-automatically from the source article to the target article. In contrast, the Translate extension uses only wikitext source editing, because it is targeted at more experienced editors and needs very precise and fine-grained formatting.

Footnotes[edit]

  1. In theory, it could also be used for pages in wiki sites other than Wikipedia, for example Wikivoyage, but this is not done at the moment.
  2. Most notably by community relations specialists (liaisons) and product managers.
  3. Sometimes written by product managers, designers, community relations specialists, or technical writers, but often also by volunteer editors.
  4. Written by volunteer community members.
  5. The “Section translation” feature in Content Translation, which is in development as of June 2020, is trying to help update or extend articles that were already translated.
  6. Although it is occasionally requested.