Since 4th November 2015, Yandex machine translation system has been made available for users of Content Translation. Due to its popularity among Russian Wikipedia users, inclusion of the Yandex.Translate service was a requested feature. Initially it was introduced for translating Wikipedia pages from English to Russian. However, the service is now extended for all languages that are provided through Yandex.
Yandex.Translate is provided by Yandex – a Russian internet services company. Since late 2014, Wikimedia Foundation's Legal team and Yandex collaborated to work out an agreement that will allow the use of Yandex.Translate without compromising Wikipedia's policy of attribution of rights, privacy of our users and brand representation. The terms of the agreement are below and we are happy to hear any questions you may have about this service.
- No personal information is sent to Yandex. The MT system will be accessed via a publicly available API using a key. Article content (freely licensed) is sent to Yandex servers from Wikimedia Foundation servers. No direct communication is happening between the user and external services and no personal information (IP, username) is sent to Yandex servers. The client contacting Yandex servers is open source and you can check it here. No part of Yandex's service or code will be part of Wikimedia infrastructure or Content Translation codebase
- Information is returned from Yandex under a free license. When Yandex service is used, they provide a translated version of Wikipedia content under a free license. Users can modify it and publish it as part of Wikipedia without conflicts with existing policies. The resulting content translated by Yandex and the user modifications will be available under the same license that is used for the rest of the articles in Wikipedia.
- Benefits the wider open source translation community. Translations obtained from Yandex and user modifications will be publicly available. The post-edited translations are of special interest for the translation research community who can use this resource to create new translation services to support languages for which open source machine translation is not available yet. This will help developers create and improve machine translation systems.
- Users can disable it. Automatic translation is an optional tool in Content Translation. Users have an option to disable it if they don't find it useful for some reason. Although many users from the Russian community requested this service, each individual user eventually decides whether they would like to use it or not.
Summary of terms of Yandex agreement
- To license their Yandex.Translate API key for free to the Wikimedia Foundation to allow volunteers on Wikimedia sites to translate articles
- To allow volunteers to translate up to ten million characters per day (much more than their publicly available option)
- To give Wikimedia statistical data on the quantity of characters in the requests sent
- 翻訳ツールの出力結果をボランティアが編集した版を Yandex に提供し、ツールの改善に資すること。
- 現状では要請に基づき、翻訳原文とその言語ならびに訳出先の言語のみ Yandex に提供。
- 翻訳者が公開した訳文は、機械翻訳の補助の有無に関わらずコンテンツ翻訳 API 経由で並列コーポラ parallel corpora として提供すること。これらの API は開発を繰り返し 、結果は Yandex に限定せず広く公開する。
- 内容はすべてCC BY-SA 3.0の元にライセンスされます
この節ではYandexに関する当面の問題を取り上げます。Content Translation FAQ（英語）のページに他の情報を提供しています。
Yandex は現状では 70 超の言語に対応します。今後、対象言語が増えた段階でコンテンツ翻訳機能に組み込むことを検討します。ご注意：英語版記事の作成にあたり、Yandex 機械翻訳はご利用になれません。
コンテンツ翻訳を使う限り、翻訳インターフェース上で何の違いも感じないはずで、サポートする言語に関して Yandex も機械翻訳システムとして Apertium と同じ形式で翻訳済みのコンテンツを表示します。
Yandex provides a free for use API key that allows websites and other services to use their translation system. Content Translation also uses a unique API key to access this service on Yandex's server. When a user starts translating an article, the HTML content of each section of the source article is sent to the Yandex server and a translated version is obtained and displayed on the respective translation column of Content Translation. Links and references are adapted as usual and users can modify the content as required.
This process continues for all the sections of the article being translated. For better performance, the translations for consecutive sections are pre-fetched. The user can save the unpublished translation (to work on it again at a later time) or publish the article in the usual manner. The article is published on Wikipedia like any other normal article with appropriate attribution and licenses.
Yandex is not based on open source software. Why are we using it?
Content Translation evolved from a long-standing need to bridge the gap in the amount of content between Wikipedias in different languages. Like all other software used on Wikimedia sites, Content Translation is also open source. In this particular case as well, we are using an open source client to interact with the external service and import freely licensed content in order to help users expand our free knowledge.
To use Yandex's machine translation system we are not adding any proprietary software in the Content Translation code, or on the Wikimedia websites and servers. The service is free of charge and available for everyone.
Only the freely available Wikipedia article content (in segments) is sent to the Yandex service and the obtained translated content is freely usable on Wikipedia pages. The translated content can be modified by users and this data is also available publicly under a free license through the Content Translation API. This is a valuable resource made available for the community to develop open source translation services for those languages where they don't exist yet.
From user feedback we have seen that machine translation support is really helpful for users and we want to support all languages in the best way. Guided by the principles of Wikimedia Foundation's resolution to support free and open source software, we will prioritise the integration of open source services whenever they are available for a language. Apertium has been a critical part of Content Translation since its inception, but currently it only provides machine translations for 45 of the numerous possible language combination that Wikipedia can support.
Should I be worried about my personal information when using Yandex?
Irrespective of the service being used, you can be sure that only Wikipedia content from existing articles is sent and only freely licensed content will be added back to the translation. No personal information is sent and communication with those services happen at the server side, so they are isolated from the user device. Please refer to this diagram for more details.
What if Yandex is the only machine translation tool available and I don't want to use it?
Machine Translation is an optional feature in Content Translation that you can easily disable at will. If more machine translation systems are added for your languages, you can choose to enable MT again and select the MT service of your choice.
Will the content translated by Yandex be free for use in Wikipedia?
Yes. The content received from Yandex is otherwise freely available on the Yandex web translation platform. Content Translation receives it via an API key to make it seamlessly available on the translation interface. This content can be modified by the users (if necessary) and used in Wikipedia articles under free licenses.
Can this content be used for improving machine translation systems in general?
Yes. Translations made in Content Translation are saved in our database. This information will be made publicly available for anyone to use as translation examples to improve their translation services (from University research groups, open source projects to commercial companies, anyone!). The content can be accessed via the Content Translation API. Please note, only information related to translated text is publicly available. This includes – source and translated text, source and target language information and an identifier for the segment of text.