MinT/zh

MinT (機器輔助翻譯) 是基於開元神經機器翻譯模型開發的機器翻譯服務. 該服務屬於維基媒體基金會的基礎建設，執行其他組織發布的開源協議翻譯模型. 開放的機器翻譯服務可以是打造自由知識生態系的重要基礎建設中的關鍵環節. 此頁面會收集擴大服務適用範圍方面的倡議.

您可以在個別專案如 Translatewiki.net 和安裝內容翻譯功能的專案中試用 MinT 功能，也可以直接使用測試樣品.



關於 MinT
MinT 採用多個機器翻譯模型提供語言翻譯. 初始版本使用的模型如下：

MinT 支援 200 多種語言，包括 50 多種其他同類服務不支援的語言 (其中有 27 種維基百科尚未支援的語言). 您可以進一步了解 MinT 的最初發布版本，並在服務的簡介頁面查看一些常見問題.
 * NLLB-200. 這是 Meta 研究團隊的 No Language Left Behind project 提供的最新模型. 此模型支援 200 種語言翻譯，包括多種其他同類模型支援範圍以外的語言.
 * OpusMT. 赫爾辛基大學的 OPUS (Open Parallel Corpus) project 編譯多門語言的自由協議內容，用以訓練 OpusMT 翻譯模型. 人人都可以透過參與給 OPUS 提供資料的專案，提升翻譯品質. 例如，使用內容翻譯功能建立維基百科條目的翻譯時，已發布的譯文的資料會成為下一版模型提升翻譯品質的新資源. 向 Tatoeba 提供翻譯例句也有助於改善翻譯品質.
 * IndicTrans2. IndicTrans2 專案提供的翻譯模型支援 20 多種印度語言. 這些模型開發於印度理工學院馬德拉斯校區的 AI4Bharat 實驗室.
 * Softcatalà. Softcatalà 是非營利組織，力圖改進加泰蘭文在數位產品中的應用. 該組織提供的翻譯器服務使用的翻譯模型支援 10 種語言與加泰蘭文間的互譯，是 Softcatalà 翻譯專案的一部分，現已公開發布.



技術詳情
翻譯模型使用 OpenNMT Ctranslate2 腳本櫃進行了效能最佳化，以減少 GPU 加速需求. 這讓組織與個人更容易建立並執行自己的樣品. 如需更多詳情，請查看原始程式碼、API 明細以及測試樣品.

MinT 提供執行多個翻譯模型的平台. 為了支援各種情況，語句分節、語言偵測、內容的預先/後期處理以及富文字支援等方面優先於純文字模型開發.



參與其中
歡迎隨時在討論頁面分享意見反應. Phabricator 收集了各種改進計畫，您可以提供改進方案或回報問題、跟蹤工作進度並分享個人觀點. 您也可以在下方查看成品的狀態情報.



譯者用 MinT
在維基媒體生態系中，掌握多門語言的使用者經常透過翻譯的手段為生態系做貢獻. 機器翻譯可以為使用者提供有參考價值的譯文，經審核與改進後即可實際使用. 編輯流程中提供了語言團隊開發的翻譯工具，可整合各種機器翻譯服務的譯文，提升翻譯效率. MinT 功能推出後，自然要將其與這些工具整合，以進一步增強輔助功效. 支援 MinT 功能的有以下幾個專案：


 * 內容翻譯. 內容翻譯功能協助指引將維基百科條目翻譯為另外一種語言. 內容翻譯功能整合多個翻譯服務以提供初始翻譯建議.
 * 當地語系化基礎建設. 翻譯功能提供用於翻譯我們的軟體和多語言頁面的基礎建設. 翻譯者社群會在Translatewiki.net、維基媒體元維基、Mediawiki.org 等網站使用這些基礎建設.

MinT for Wikipedia readers
The number of topics and the amount of information a reader can learn about from Wikipedia depends on the languages they speak. Machine translation can help people to learn more about their topics of interest when the content is not available in their language.

This initiative explores how to surface the machine translation support from MinT in Wikipedia articles in a way that:


 * Allows readers to learn more about the topics of interest from other languages
 * Clearly differentiates automatically generated content from community-created one.
 * Encourages to contribute to community-created content when possible.

At the moment the Language team is working on the design and research aspects of the project to identify the best ways to surface MinT on Wikipedia and the technical explorations for the service to work in this context.

MinT more widely available
Working on the previous initiatives will help to polish and solidify the system. For now, the MinT API is only available for Wikimedia products. As the system gets ready, we'll consider a wider exposure. Providing a service that can be used by communities in innovative ways can be a very powerful tool. New initiatives to make MinT more widely available will be captured here in the future. Meanwhile, feel free to configure your own MinT instance to experiment with it.


 * Completed initial design exploration to illustrate 5 concepts on how to surface machine-translated contents from other languages for Wikipedia articles
 * Completed enablements of MinT in Content Translation for Lingurian, where the community requested further clarifications about MinT, and the last set of 14 languages that could be supported with the NLLB-200 model.
 * Enabled Mint for translatable pages on test wiki
 * Expanded exposure of MinT with the enablement of Content Translation mobile and desktop experiences as default in 7 Wikipedias supported by MinT (Cherokee, Tongan, Hungarian, Kazakh, Kyrgyz, Minangkabau, and Sardinian).
 * Completed the validation for all languages supported by the translation models used by MinT as part of the final QA for enabling the new translation service.
 * Santhosh presented at the 10th Workshop on Asian Translation emphasizing the need for machine translation to be universal, free, and available in more languages. A message well received by the attendees.


 * Research planning started with an initial draft of the research brief for MinT on Wikipedia
 * Continuing technical explorations for applying machine translation beyond plain text (what underlying models provide) to support the Wikipedia context: A new improved approach for sentence segmentation (with a demo page to try) that provides a more accurate way to identify when a sentence ends in different languages, and with a preference to avoid splitting in case of doubt (preferred in the context of machine translation to avoid fragmenting the context of a translation, for example, misinterpreting the dot of an abbreviation as a fullstop).


 * Successful exploration for the use of MinT to translate structured formats such as HTML, SVG and markdown.
 * Completed the deprecation of Youdao, an external translation service that was failing for a long time.
 * Continued design exploration for MinT on Wikipedia with new and updated workflows based feedback.
 * Identified languages which can benefit the most from new OpusMT models


 * Made MinT the default translation service for Zulu in Content Translation


 * Enabled machine translation with MinT (and communicating with communities) for 75 new languages: 62 languages where the mobile translation experience is available, and 13 languages where translation quality from other services may not be ideal based on the MT usage report data and/or community feedback.
 * Validation of previous enablements: identified issues with Bhojpuri and with Latvian where MinT was not available due to mismatches with the language codes used by Wikipedias, MinT and the underlying translation models.


 * Initial design explorations and prototypes on ways we could integrate MinT in Wikipedia
 * Improved Mint translation post-processing to better support languages using the Arabic script by avoiding extra paces after fullstops.
 * Completed the integration of the IndicTrans2 model by verifying the enablement of all their 23 supported languages.
 * Initial analysis of activity for Wikipedia communities that are supported with MinT for the first time to identify potential pilot wikis for future research and as early adopters.
 * Enablement of MinT on translatewiki.net for the use in localization of Wikimedia and other open projects.