Content translation/Machine Translation/Youdao/ja

2016年10月31日より機械翻訳システムYoudaoを導入、ウィキペディアのコンテンツ翻訳に利用できるようになりました.

Youdaoは中国のインターネットサービス会社であるNetEaseが提供しています. 同社とウィキメディア財団法務部門は協議の結果、ウィキペディアにおける権利帰属の方針や利用者のプライバシー、ブランドイメージを傷つけることなくYoudaoの利用を認める契約に至りました. 契約条件のまとめを以下に示し、このサービスに関する皆さんのご質問をお待ちしています. なお同文は中国語版ウィキペディアのコミュニティからsupportedに掲出されました.

主な機能
誰でもアクセス可能なAPIキーを介してアクセスします（リンク先の内容は中国語表記）. 記事の内容（ライセンスフリー）はウィキメディアサーバからYoudao サーバに送ります. 利用者と外部サービス間に直接のやりとりは発生せず、ウィキペディアの内容に伴って一切の個人情報（IPアドレスやユーザー名）はYoudaoのサーバに送りません. Youdaoサーバに接続するクライアントサーバはオープンソースであり、こちらから確認できます. 一切のYoudaoのサービスもしくはコード派ウィキメディアの構成あるいはコンテンツ翻訳コードベースの一部となりません.
 * Youdaoは一切の個人情報を収集しません. 機械翻訳システムへは
 * Information is returned from Youdaoから返ってくる情報もライセンスフリーの範疇です. Youdaoのサービス利用により提供されるウィキペディアの翻訳版はライセンスフリーの状態を保ちます. 利用者は既存の方針と齟齬を起こすことなく、それを改変しウィキペディア内に公表できます. Youdao翻訳と利用者の改変の成果である内容は、ウィキペディア上の他の記事に適用するものと同一のライセンスの元で入手できます.
 * より広範なオープンソース翻訳コミュニティの役に立ちます. Youdaoから取得し利用者が改変した翻訳は公開されます. 訳出後に編集した翻訳は、これを資源として新しい翻訳サービスを開発し、オープンソースの機械翻訳が未提供の言語をサポートできることから、翻訳研究コミュニティに特に注目されています. これは開発者の機械翻訳システム創出と改善に役立ちます.
 * コンテンツ翻訳において自動翻訳はあくまでもツールの選択肢です. 理由の如何に関わらず、利用者が便利だと認めない場合には無効にする選択肢があります. 利用者は個人設定によりこのサービス及び使用言語に提供された他のサービスの利用あるいは一切の機械翻訳サービスの不使用を決定します.



既知の問題点
Youdao Machine translation service does not translate rich text like HTML. It accepts plain text content and outputs plain text translation. Content translation usually re-applies markup on top of this kind of output, but for Chinese, this is not possible because of non-trivial tokenization of sentence to words. Hence, translators will receive the translations only in plain text, and all links, references etc. will have to be added manually during translation.

Youdao's obligations

 * To license their API key for free to the Wikimedia Foundation to allow volunteers on Wikimedia sites to translate articles
 * To allow volunteers to translate up to four thousand characters per request and ten million characters per day (much more than their publicly available option)
 * To give Wikimedia statistical data on the quantity of characters in the requests sent

Wikimedia Foundation’s obligations

 * To provide the volunteer-edited versions of the text translated by the translation tool so that Youdao can improve their tool
 * No personal data of translators will be shared.
 * Currently, just the original content to translate, its language, and translation target language are sent in the request to Youdao.
 * The translations published by translators, with or without the help of machine translation services, will be provided in the form of parallel corpora by content translation APIs. These APIs will be developed incrementally and results will be freely available for everyone, not just Youdao.

Important notes

 * All content will remain licensed under CC BY-SA 3.0
 * Youdao is not requiring any “branding” on Wikimedia Sites outside of listing Youdao as a translation tool option in the translation interface drop-down menu
 * There is no exchange of personal information of users
 * The agreement is limited to 1 year, at which time we can reevaluate our needs
 * We are free to terminate the agreement for any reason, at any time (with 30 days notice)
 * Agreement is governed by US law

Questions about this service
We have addressed some immediate questions about Youdao in this section. More information is also available on the Content Translation FAQ page.

What languages are being handled by Youdao? Are there plans to add more?
Youdao is available at present only for translations Chinese, and English, French, Japanese, Korean, Portuguese, Russian and Spanish, for users who will be creating pages through Content Translation. As Youdao’s language coverage expands we will consider enabling them for Content Translation. Please note: Youdao machine translation will not be available when creating pages from Chinese to English.

How is using Youdao different than using other machine translation systems?
As a user of Content Translation you will not feel any difference on the translation interface as the machine translation system of Youdao will display the translated content similar to Apertium or Yandex. However, due to Youdao's current limitation of not supporting rich text, links, references etc. will have to be adapted manually.

How is the machine translation being done if I choose Youdao?
Youdao provides a free for use API key that allows websites and other other services to use their translation system. Content Translation also uses a unique API key to access this service on Youdao's server. When a user starts translating an article, the HTML content of each section of the source article is sent to the Youdao server and a translated version is obtained and displayed on the respective translation column of Content Translation. Links and references are adapted as usual and users can modify the content as required.

This process continues for all the sections of the article being translated. For better performance, the translations for consecutive sections are pre-fetched. The user can save the unpublished translation (to work on it again at a later time) or publish the article in the usual manner. The article is published on Wikipedia like any other normal article with appropriate attribution and licenses.

You can view a diagram of the process.

Youdao is not based on open source software. Why are we using it?
Content Translation evolved from a long-standing need to bridge the gap in the amount of content between Wikipedias in different languages. Like all other software used on Wikimedia sites, Content Translation is also open source. In this particular case as well, we are using an open source client to interact with the external service and import freely licensed content in order to help users expand our free knowledge.

Similar to Yandex, with Youdao's machine translation system we are not adding any proprietary software in the Content Translation code, or on the Wikimedia websites and servers. The service is free of charge and available for everyone.

Only the freely available Wikipedia article content (in segments) is sent to the Youdao service and the obtained translated content is also freely usable on Wikipedia pages. The translated content can be modified by users and this data also maintains its free license and is available publicly through the Content Translation API. This is a valuable resource made available for the community to develop open source translation services for those languages where they don't exist yet.

After studying the implications carefully, we found the fact that the content was stored previously in a closed source service does not limit the freedom of our knowledge or our software in the present or the future. We have taken special care to make sure that the content translated maintained its free license to make sure it complies with Wikipedia policies. This includes a long process for legal and technical evaluation and compliance. The summary of the terms of use is also available.

From user feedback we have seen that machine translation support is really helpful for users and we want to support all languages in the best way. Guided by the principles of Wikimedia Foundation's resolution to support free and open source software, we will prioritise the integration of open source services whenever they are available for a language. Apertium has been a critical part of Content Translation since its inception, and currently provides machine translations for nearly 70 of the numerous possible language combination that Wikipedia can support. Adding Yandex in November 2015 has helped a large group of users of nearly 70 more languages, who were unable to use this facility before with Content Translation.

Should I be worried about my personal information when using Youdao?
Irrespective of the service being used, you can be assured that only Wikipedia content from existing articles is sent and only freely licensed content will be added back to the translation. No personal information is collected and communication with those services happen at the server side, so they are isolated from the user device. Please refer to this diagram for more details.

What if Youdao is the only machine translation tool available and I don't want to use it?
Machine Translation is an optional feature in Content Translation that you can easily disable at will. If more machine translation systems are added for your languages, you can choose to enable MT again and select the MT service of your choice.

Will the content translated by Youdao be free for use in Wikipedia?
Yes. The Wikipedia content translated from Youdao is otherwise freely available on the Youdao web translation platform. Content Translation receives it via an API key to make it seamlessly available on the translation interface. This content can be modified by the users (if necessary) and used in other Wikipedia articles under free licenses.

Can this content be used for improving machine translation systems in general?
Yes. Translations made in Content Translation are saved in our database. This information will be made publicly available for anyone to use as translation examples to improve their translation services (from University research groups, open source projects to commercial companies, anyone!). The content can be accessed via the Content Translation API. Please note, only information related to translated text is publicly available. This includes – source and translated text, source and target language information and an identifier for the segment of text.