Content translation/Machine Translation
Machine Translation systems in use
- Apertium - Apertium is a free/open-source platform for developing rule-based machine translation systems. More information about this project can be found on the Apertium wiki.
- Yandex - Yandex.Translate is provided by Yandex – a Russian internet services company. Yandex.Translate can be seamlessly used for translating Wikipedia articles within Content Translation via a publicly available API without compromising Wikipedia's policy of attribution of rights, privacy of our users and brand representation. Read more to know about this service.
- Youdao - Youdao is provided by NetEase – an internet services company from China. Youdao can be seamlessly used for translating Wikipedia articles within Content Translation via a publicly available API without compromising Wikipedia's policy of attribution of rights, privacy of our users and brand representation. Read more to know about this service.
- Language pairs and status for all 3 systems (manually maintained; updated each week or after a deployment)
Extending machine translation support in Content Translation
Machine translation support is an important part of the work-flow of Content Translation tool. Even with limited coverage of languages, usage of machine translation has been widely adopted and on several occasions users have reported an increased efficiency. Initially Apertium was the only machine translation (MT) system that was available with Content Translation, serving more than 30 languages. From 4th November 2015, Yandex machine translation system was also made available, initially for users of Russian Wikipedia and later extended to Armenian, Albanian, Bashkir, Persian, Polish, and Uzbek.
Both Apertium and Yandex have been steadily increasing support for more languages, thus improving the potential of efficiency that the Content Translation tool can provide. In turn, Content Translation includes these changes so that more languages can benefit from the integrated machine translation services. In the following sections we outline the general process that is followed by the WMF Language team to update Content Translation settings to provide machine translation for languages as they are made available. This is an ongoing process and open to changes as per individual needs of each wiki where the tool is used.
Multi-step enablement and feedback process
Machine Translation is an optional feature in Content Translation. Users can easily choose between available machine translation systems or even disable it at will. It can be selected using the dropdown selection box available in Content Translation.
- Enabling machine translation for a language: Available machine translation system for particular language or language pair is enabled as a non-default option. Users can choose it through the drop-down menu on the interface. (see image)
- Notification: Content Translation users will be notified about newly added machine translation support for the language they are translating to. Using the drop-down menu they can choose to try it. As part of the regular updates from the development team, this change will also be communicate to the local village pump of the wiki for the particular language.
- Feedback and monitoring: After machine translation is enabled for use, the development team will be gathering feedback via the usual communication channels (talk page, phabricator etc.) and collecting relevant data to better assess users’ needs and identify any anomalies/bugs.
- Advancement or reassessment: Depending upon the usage trends, benefits, bugs or other issues and concerns, the next step would be to enable the MT system as the default option for a language so that users need not have to go through an extra step to select the MT system every time they start translating. The option to not use MT would still be available for users. However, in case there are deficiencies or bugs that prevent normal usage of Content Translation, the service would be re-examined or suspended as per individual needs of the wiki.