Content translation/Machine Translation

Machine Translation systems in use

 * Apertium - Apertium is a free/open-source platform for developing rule-based machine translation systems. More information about this project can be found on the Apertium wiki.
 * List of languages supported in Content Translation via Apertium


 * Yandex - Yandex.Translate is provided by Yandex – a Russian internet services company. Yandex.Translate can be seamlessly used for translating Wikipedia articles within Content Translation via a publicly available API without compromising Wikipedia's policy of attribution of rights, privacy of our users and brand representation. Read more to know about this service.
 * List of languages supported in Content Translation via Yandex

There is no machine translation for my language. How is Content Translation useful to me and my wiki?
By itself Content Translation is not a machine translation tool. Its primary focus is to help people to create translated wiki pages as efficiently as possible. It includes tools that are tightly integrated with MediaWiki and its usual content creation and editing workflow: display of the source and the translation side-by-side; adaptation of links, categories, images and text formatting; publishing to different namespaces; interlanguage links. These features are already supposed to make typing translated articles by hand easier.

This is not just theory. Content Translation was enabled in the French Wikipedia on March 31 2015 and by June 7 it was used to create 500 articles, even though machine translation was not available.

The fact is that machine translation is not available for the majority of languages in which there are Wikipedias, so most language pairs will only be able to use Content Translation as a tool to translate articles manually with the above adaptation tools. If you want to help create a machine translation engine for your language, see How can I improve machine translation support for my language?

Machine translation to my language is bad, and it's easier to translate manually. How is Content Translation useful to me and my wiki?
As written in the previous answer, Content Translation is not by itself a machine translation tool, but a tool to create translated wiki pages. It is designed to be useful even without machine translation.

Machine translation works quite well in some languages, and then it can make the translators' work even more efficient. Machine translation support for a language pair is enabled after testing and approval from people who know the language well.

If machine translation support for your language is enabled, but you don't want to use it, you can disable it and still enjoy the other tools, such as link, category, and image adaptation, as well as dictionaries (if available for your language).

How are you integrating machine translations?
For language in which machine translation is supported in Content Translation, machine translation will be auto-filled upon clicking a paragraph in the translation area.

Initially we're using the Apertium engine, which is free software and can be installed and maintained on our own servers. At a later point we may use Moses and other engines. We have recently added Yandex for limited use.

What languages are being handled by Yandex? Are there plans to add more?
Yandex is available at present only for English to Russian translations, for users who will be creating pages for the Russian Wikipedia through Content Translation. Although Yandex provides translation capability for nearly 60 languages, we do not have any immediate plans to activate it for other language pairs. However, we are open to requests from the Wikipedia communities if they would like Yandex to be made available for their languages.

How is using Yandex different than using Apertium?
As a user of Content Translation you will not feel any difference on the translation interface as the machine translation system of Yandex will display the translated content in the same way Apertium currently does for the supported 45 language pairs.

How is the machine translation being done if I choose Yandex?
Yandex provides a free for use API key that allows websites and other other services to use their translation system. Content Translation also uses a unique API key to access this service on Yandex’s server. When a user starts translating an article, the HTML content of each section of the source article is sent to the Yandex server and a translated version is obtained and displayed on the respective translation column of Content Translation. Links and references are adapted as usual and users can modify the content as required.

This process continues for all the sections of the article being translated. For better performance, the translations for consecutive sections are pre-fetched. The user can save the unpublished translation (to work on it again at a later time) or publish the article in the usual manner. The article is published on Wikipedia like any other normal article with appropriate attribution and licenses.

translation/Technical_Architecture#Machine_Translation You can view a diagram of the process.

Yandex is not based on open source software. Why are we using it?
Content Translation evolved from a long-standing need to bridge the gap in the amount of content between Wikipedias in different languages. Like all other software used on Wikimedia sites, Content Translation is also open source. In this particular case as well, we are using an open source client to interact with the external service and import freely licensed content in order to help users expand our free knowledge.

To use Yandex’s machine translation system we are not adding any proprietary software in the Content Translation code, or on the Wikimedia websites and servers. The service is free of charge and available for everyone.

Only the freely available Wikipedia article content (in segments) is sent to the Yandex service and the obtained translated content is freely usable on Wikipedia pages. The translated content can be modified by users and this data is also available publicly under a free license through the Content Translation API. This is a valuable resource made available for the community to develop open source translation services for those languages where they don't exist yet.

After studying the implications carefully, we found the fact that the content was stored previously in a closed source service does not limit the freedom of our knowledge or our software in the present or the future. We have taken special care to make sure that the content provided is freely licensed to make sure it complies with Wikipedia policies. This includes a long process for legal and technical evaluation and compliance. The summary of the terms of use is also available.

From user feedback we have seen that machine translation support is really helpful for users and we want to support all languages in the best way. Guided by the principles of Wikimedia Foundation’s resolution to support free and open source software, we will prioritise the integration of open source services whenever they are available for a language. Apertium has been a critical part of Content Translation since its inception, but currently it only provides machine translations for 45 of the numerous possible language combination that Wikipedia can support.

Should I be worried about my personal information when using Yandex?
Irrespective of the service being used, you can be sure that only Wikipedia content from existing articles is sent and only freely licensed content will be added back to the translation. No personal information is sent and communication with those services happen at the server side, so they are isolated from the user device. Please refer to this diagram for more details.

What if Yandex is the only machine translation tool available and I don’t want to use it?
Machine Translation is an optional feature in Content Translation that you can easily disable at will. If more machine translation systems are added for your languages, you can choose to enable MT again and select the MT service of your choice.

Will the content translated by Yandex be free for use in Wikipedia?
Yes. The content received from Yandex is otherwise freely available on the Yandex web translation platform. Content Translation receives it via an API key to make it seamlessly available on the translation interface. This content can be modified by the users (if necessary) and used in Wikipedia articles under free licenses.

Can this content be used for improving machine translation systems in general?
Yes. Translations made in Content Translation are saved in our database. This information will be made publicly available for anyone to use as translation examples to improve their translation services (from University research groups, open source projects to commercial companies, anyone!). The content can be accessed via the Content Translation API. Please note, only information related to translated text is publicly available. This includes - source and translated text, source and target language information and an identifier for the segment of text.

How can I improve machine translation support for my language?
Contribute to an existing Apertium pair, or create a new one!

Get in contact with the Apertium community with IRC,, or many other ways.

Complete FAQ