MinT/ja

MinT（ミンティー：Machine in Translation）は機械翻訳サービスの一種でオープンソースのニューラル機械翻訳モデルに基づきます. 当サービスはウィキメディア財団のインフラ上にホストし、他の組織がリリースしたオープンソース・ライセンスの翻訳モデルと競合しません. 無料の知識エコシステムのインフラにとって公開の機械翻訳サービスは鍵となります. このページでは当該のインフラをもっと多くの人に使えるように、当サービスを判定しようとするイニシアティブを記録します. The service is hosted in the Wikimedia Foundation infrastructure, and it runs translation models that have been released by other organizations with an open-source license. An open machine translation service can be a key piece of the essential infrastructure of the ecosystem of free knowledge. This page captures the initiatives to scale the service and make this infrastructure more widely available.

この MinT を試用するには、コンテンツ翻訳や翻訳ウィキのサイト Translatewiki.net の各プロジェクトに組み込んであるか、直接、テスト例を体験できます.



MinTについて
MinTの設計では訳文を複数の機械翻訳モデルから提供します. 当初は以下のモデルを採用します. Initially, it uses the following models:


 * NLLB-200. メタの研究チームが手がけた最新モデル No Language Left Behind project です. 当モデルは言語 [$2 200] 件の翻訳に対応し、その中には他の業者がサポートしていない言語も含まれます. This model supports translation across 200 languages, including many that are not supported by other vendors.
 * OpusMT（オーパス・エムティー）. ヘルシンキ大学が開発したOPUS (Open Parallel Corpus) projectはフリーライセンスの多言語コンテンツをまとめて翻訳モデルOpusMT 翻訳モデル（オーパスMT）を訓練しています. 誰でもさまざまなプロジェクトに参加してデータをOPUSに提供すると、翻訳の質向上に手軽に貢献できます. 例えば利用者がウィキペディアの記事を訳すときにコンテンツ翻訳拡張機能を使うと、システム側は公開した訳文データを新しいリソースとして回収、同モデルの次のバージョンの翻訳の品質改善に役立てします. あるいはまた[$7]を使って訳文を提供すると、利用者が手軽に寄与するもう一つの方法になります. Anyone can easily help improve the translation quality by participating in the different projects that contribute data to OPUS. For example, when using Content Translation to create translations of Wikipedia articles, the data on published translations will be incorporated as a new resource to improve the translation quality for the next version of the model. Another quick way to contribute is to provide sentence translations with Tatoeba.
 * IndicTrans2. The IndicTrans2 project provides translation models to support over 20 Indic languages. These models were developed by AI4Bharat@IIT Madras, a research group at the Indian Institute of Technology Madras.
 * Softcatalà. Softcatalà is a non-profit organization with the goal to improve the use of Catalan in digital products. As part of the Softcatalà Translation project, translation models used in their translator service to translate 10 languages to and from Catalan have been released.

MinT supports over 200 languages, with more than 50 languages not supported by other services (including 27 languages for which there is no Wikipedia yet). You can read more about the initial release of MinT and check some frequently asked questions in the summary page for the service.



技術的な詳細
The translation models have been optimized for performance using OpenNMT Ctranslate2 library in order to avoid the need for GPU acceleration. This makes it easier for organizations and individuals to build and run their own instances. For more details you can check the source code, the API spec, and a test instance.

MinT provides a platform to run multiple translation models. In order to support different initiatives, aspects such as sentence segmentation, language detection, pre/post-processing of contents, and rich format support has been developed on top of the plain-text based models.



参加する
フィードバックを提供するには協議ページに投稿してください. 改善計画はPhabricator にあがっていき、改善の案を提示したり問題点を指摘したり、タスクが始まっていたらその進捗チェック、それに関する自分なりの視点を共有してください. 完了した工程の確認もでき、以下にある進捗状況のチェック欄をご参照ください.



翻訳者に対してのMinT
Translation is a common way to contribute in the Wikimedia ecosystem for multilingual users. Machine translation can provide a useful initial translation for users to review and improve. The Language team has developed tools to support translations in their workflows that can integrate different machine translation services to speed up their processes. Once MinT was available, integrating it with these tools was a logical next step to amplify their impact. MinT is available in the following projects:

* Content Translation. Content Translation provides guidance to create a translation of a Wikipedia article into another language. Content Translation integrates several translation services to provide an initial translation. * Localization infrastructure. The Translate extension provides the infrastructure used to translate our software and multilingual pages. Communities of translators use it on Translatewiki.net, Wikimedia Meta-wiki, Mediawiki.org and more.



Wikipedia読者に対してのMinT
The number of topics and the amount of information a reader can learn about from Wikipedia depends on the languages they speak. Machine translation can help people to learn more about their topics of interest when the content is not available in their language.

This initiative explores how to surface the machine translation support from MinT in Wikipedia articles in a way that:


 * Allows readers to learn more about the topics of interest from other languages
 * Clearly differentiates automatically generated content from community-created one.
 * Encourages to contribute to community-created content when possible.

At the moment the Language team is working on the design and research aspects of the project to identify the best ways to surface MinT on Wikipedia and the technical explorations for the service to work in this context.

MinT more widely available
Working on the previous initiatives will help to polish and solidify the system. For now, the MinT API is only available for Wikimedia products. As the system gets ready, we'll consider a wider exposure. Providing a service that can be used by communities in innovative ways can be a very powerful tool. New initiatives to make MinT more widely available will be captured here in the future. Meanwhile, feel free to configure your own MinT instance to experiment with it.




 * Completed initial design exploration to illustrate 5 concepts on how to surface machine-translated contents from other languages for Wikipedia articles
 * Completed enablements of MinT in Content Translation for Lingurian, where the community requested further clarifications about MinT, and the last set of 14 languages that could be supported with the NLLB-200 model.
 * Enabled MinT for translatable pages on test wiki
 * Expanded exposure of MinT with the enablement of Content Translation mobile and desktop experiences as default in 7 Wikipedias supported by MinT (Cherokee, Tongan, Hungarian, Kazakh, Kyrgyz, Minangkabau, and Sardinian).
 * Completed the validation for all languages supported by the translation models used by MinT as part of the final QA for enabling the new translation service.
 * Santhosh presented at the 10th Workshop on Asian Translation emphasizing the need for machine translation to be universal, free, and available in more languages. A message well received by the attendees.
 * Research planning started with an initial draft of the research brief for MinT on Wikipedia
 * Continuing technical explorations for applying machine translation beyond plain text (what underlying models provide) to support the Wikipedia context: A new improved approach for sentence segmentation (with a demo page to try) that provides a more accurate way to identify when a sentence ends in different languages, and with a preference to avoid splitting in case of doubt (preferred in the context of machine translation to avoid fragmenting the context of a translation, for example, misinterpreting the dot of an abbreviation as a fullstop).


 * Successful exploration for the use of MinT to translate structured formats such as HTML, SVG and markdown.
 * Completed the deprecation of Youdao, an external translation service that was failing for a long time.
 * Continued design exploration for MinT on Wikipedia with new and updated workflows based feedback.
 * Identified languages which can benefit the most from new OpusMT models
 * コンテンツ翻訳機能のズールー語版で MinT を既定の翻訳サービスに指定


 * （コミュニティから意見を収集しながら）MinT を新たに75言語で機械翻訳に採用：62 言語ではモバイル版翻訳の経験を提供、また 機械翻訳（MT）使用報告書のデータおよび／またはコミュニティからの聞き取りにより、他の翻訳サービスの質が最適ではなかった13言語に展開.
 * 前回の展開を検証： Bhojpuri 語、ラトビア語で MinT を展開できなかった問題点を識別、どちらもウィキペディアが採用する言語コードとMinT ならびに付帯の翻訳モデルのそれとの照合失敗による.
 * 当初の設計の探求と試作版では MinT をウィキペディアに融合させる複数の方法を検討
 * Mint 改善版で翻訳の後処理により、 文末の読み点（フルストップ）直後の余分なアキ（スペース記号）を除去、アラビア語の記法を用いる諸言語のサポートを改善
 * （訳注：インド諸語対応の）IndicTrans2 モデルの統合を完了、先方モデルが対応する全23言語を有効化するかどうか確認.
 * ウィキペディアのコミュニティ群を対象にした活動の初期評価 は MinT を採用した事例の第1号で、将来の調査対象、早期導入先として仮のパイロット運用ウィキの割り出しを目指します.
 * ウィキメディアその他の効果のプロジェクトにおいて、多言語化（ローカライゼーション）translatewiki.netの MinT 導入ではで使うものです.