Talk:MinT

Welcome to the MinT page
You can use this page to start a discussion with others about how to improve MinT. Thank you! UOzurumba (WMF) (talk) 21:16, 25 September 2023 (UTC)

dead link for 'The IndicTrans2 project'
''IndicTrans2. The IndicTrans2 project'' leads to -> https://ai4bharat.iitm.ac.in/indic-trans2 -> Oops! That page can’t be found.

Can someone realign please ? Thanks. -- Christian 🇫🇷 FR (talk) 16:42, 28 September 2023 (UTC)


 * Hello Christian 🇫🇷 FR,
 * I checked the link and it seems to be working fine now. Thank you! UOzurumba (WMF) (talk) 19:54, 4 October 2023 (UTC)
 * ok now too ✅. We leave unchanged. Thanks for ACK. Christian 🇫🇷 FR (talk) 06:25, 5 October 2023 (UTC)

MinT should match with translation database we have contributed, but does not.
De-deploy MinT please for en-ja translation, or give me a "stop" button to do without it. I need an option to stop MinT, and where can I do that? Are we sure the problem owes that MinT is a combination/ collaboration of two systems? Is there any language pairs that it outputs acceptable translation? When are we going to import the translation database from the previous system? That thesauri is very precious as the circle of translators has spent so many hours building it.

Again, I am talking about the en-ja language pair, and it is not practical to keep using it regardless of tech subjects or not. (details below) CX2 has been nulled for ja users, however, for vocabulary matter, it worked much better.

At the moment, if I am not turning off MinT on en-ja translation for Tech News:
 * I have to open past issues, c&p correct expressions;
 * Tech News has so many set phrases/expressions endemic to it, like iterating updates and so forth:
 * Wki markup is not only neglected but replaced to wrong characters; bold  to plain quotation marks . Why such very  primitive error is present?
 * I feel not confident as so many sentences need to be manually c&p from past issues, which does not sound in line of our attribution policies to me as an Wikipedian.

If we need to invest and train the new MT system and its dictionary, does it paid from the pockets of translators? Do we use MinT with bitterness on our tongues, till we see MinT usable?

I appreciate how the MT system takes care of matching with the translation database, exactly why a low level system should be turned off for certain language pairs AFAIK. In my personal perception, I *need* to neglect MinT's suggestion approx. 85% of the time, and reasoned as:
 * 40% of it because it does not parse grammar correctly, inserts symbols I need to delete manually;
 * 35% of it because its dictionary is not match Wikimedia specific terminologies, which translators had trained the previous system;
 * 20% of it replaces wiki-markup wrongly; as above, for bold letters,  needs to stay as is, but MinT replaces it to plain quotation marks.
 * 5% that I can't trust its dictionary, or for a country name Belarus, MinT outputs Belgium. /: What kind of a bug can induce such primitive error?

MinT is below my expectations as an en-ja translator. Too bad I will not enjoy the MT assistance any more, while the old system has pampered its user, or me, by saving working times almost 40%.

FYI, my usecase:
 * 1) With the design of Tech News, translating from scratch is wasteful: iterated info should keep the sentence format and keep our readers for /ja pages affirmed that translators understand what we are doing.
 * 2) On ESEAP issues, the original text in en is actually an en output translated from the native language of the poster; means that much guess work is involved supplying secondary translation, or looking into wikidata helps me many times to match strange terminology to organization names or wiki teams.

Crossing my fingers that other language pairs are not affected this badly. Cheers, --Omtecho Omotecho (talk) 06:17, 30 September 2023 (UTC)


 * Thanks for the feedback, @Omotecho.
 * MinT is a new initiative still in active development. It is not replacing any previous system: the suggestions from Translation Memory or other services like Apertium are still shown, when they are available. The translation memory (previous translations by editors to similar messages) are given priority, shown above the machine translation ones (in this example MinT suggestions are shown at the bottom of the list).
 * MinT uses different machine learning models to produce the translations. I'll provide more detail on some of the types of issues you are experiencing:
 * Translaiton models models support plain-text translation, and we are building support for more complex formats such as HTML and Wikitext on top of them. For example, improvements to support Wikitext are captured in this ticket. The issues with Wikitext can result in both (a) markup not showing corretly in the result and (b) contents being wrongly translated because markup gets in the way (e.g., resultng in a sentence being cut in half and translated independently, which leads to wrong translations). As Wikitext support is improved, these issues should reduce significantly.
 * For machine learning models the quality of the translation depends on the amount and quality of the training data. By providing more examples of good translations, the models can be improved. Currently, translating Wikipedia articles with Content Translation or contributing to Tatoeba are two easy ways to generate more quality data to improve the models. We also plan to integrate localization data from the Translate extension (more details in this ticket). In addition, contributing more Wikipedia-specific data will result in translations that align better with the community expectations.
 * As I mentioned, MinT is in active development and it has room for improvement, but for polishing a system that supports over 200 languages it is very useful to expose it to the communities in ways that they can help make it better.
 * Thanks! Pginer-WMF (talk) 08:37, 5 October 2023 (UTC)