Topic on Talk:Content translation

Add Santali language to FLORES Supported languages

6
Rocky 734 (talkcontribs)

Is it possible to use Machine translation for Santali Wikipedia in Content Translation using NLLB-200Ā ?? I have read their blog and found Santali as a supported language among 204 languages. (https://github.com/facebookresearch/flores/blob/main/flores200/README.md).

Even if we get atleast one Machine translation tool with minor experimental or Intermediate suggestions it would be a great help. As, till now there is no Machine Translation tool for Santali language. It would help us translate more articles easily using Content Translation tool.

There is no objection in our community regarding adding of machine translation in Content Translation tool.

Pginer-WMF (talkcontribs)

Thanks for the feedback, @Rocky 734.

Santali is not supported right now, but we are exploring how to support more languages (especially those lacking any machine translaiton support).

Not all languages supported in NLLB-200 are available right now in Content Translation. The researchers developing NLLB-200 created and API to make their research available for the translation of Wikipedia articles in a small set of languages for evaluation purposes (more details in this page).

Now that the translaiton models have been released with an open license, we are exploring ways to expand the support to more languages in Wikipedia translation tools. Hearing about the need and interest form the Santali community is really useful for us to plan the next steps.

Thanks!

Rocky 734 (talkcontribs)

Thanks for the reply @Pginer-WMF , I'm clear now that not all languages are available for Content Translation. Playing with this new website to translate text from eng_latin to sat_Beng (sat_Beng is a wrong script code instead sat_Olck (https://github.com/facebookresearch/fairseq/pull/4576)) increased my confidence that the model is quite accurate and fair.


There is another project I remember, being a non-programmer tried to make eng-sat pair in Apertium better (https://beta.apertium.org/index.sat.html#?dir=eng-sat&q=blue%20house.%0A). It is still in beta I hope in future it may be integrated with CT tool.Ā :-)


Website for testing NLLB:https://huggingface.co/spaces/Narrativaai/NLLB-Translator

Screenshot for NLLB: https://snipboard.io/5HLE60.jpg

Rocky 734 (talkcontribs)
Pginer-WMF (talkcontribs)

Hi @Rocky 734,

The Wikimedia Machine Learning team are exploring how to create an instance running NLLB-200 models, which will allow to support more languages from those supported by the model but not available yet thorough the current API such as Santali. You can check this ticket for more details and tracking progress. I mentioned the case of Santali to make sure it is captured in the ticket.


Thanks!

Rocky 734 (talkcontribs)

I apologize for the delay in my response. Thank you so much for the update.šŸ™šŸ™

Reply to "Add Santali language to FLORES Supported languages"