Topic on Talk:Content translation

Enabling Yandex for other wikipedias

6
Halibutt (talkcontribs)

(Re: Content translation/Machine Translation/Yandex)

TLDR: How do we turn it on at Polish Wikipedia?

As there is no Apertium engine for Polish language (and I doubt there ever will), I recently tested the Yandex translation service (it's been one of the workhorses of translationwiki for a while now). It seems it does a much better job at translating between Slavic languages than Google Translate. This is probably because it uses Russian as the middle-man rather than English (read: when translating from Polish to Czech, it GT first translates the text to English, and then to Czech). Since both Polish and Czech are Slavic languages, using Russian as the middleman is not a bad idea.

So, I tested the quality of machine translation and the quality of RU =>PL and PL => RU is excellent. In Wiki terms this means that the translator would only have to do some style edits here and there, add some missing words and such. In other words - stuff we do with articles written by our fellow native speakers every day :) With Yandex it's possible to translate between Russian and Polish without knowing the language you're translating from (though it certainly helps and one has to be extra careful with proper names and words from third languages used in the original text).

Translation between Polish and Slovak, Czech or Serbian is definitely acceptable (I didn't test all the Slavic languages yet, but I guess the results would be similar). Even EN <=> PL seems ok, though I would have to test it more. This means: the person translating a Wikipedia article using Yandex would get a fairly good overview of the content, but would most likely have to either reorder each sentence or write it again, using proper Polish (in my case). Again, extra care should be taken when translating proper names and foreign words, like Latin or English terms used in a Czech or Slovak text (Yandex apparently does a poor job at translating them; this is not a big deal though as most of them would quidem either be wikilinked (thus highlighted) or at least italicised in the original text).

The quality of translation between various language groups (say, DE <=> PL or FR <=> SK) is similar to that offered by Google Translate. Meaning: kind of useful, but the Wikipedian to use it would have to re-type the text from scratch anyway, the machine translation would only serve as a guide, helping him or her to get the gist of the original text.

All in all, I would love to see this engine implemented at Polish wiki as an experimental feature. Is it possible at this stage? What would we have to do in order to make it work? Also, pinging @Tar Lócesilion & @Ency to join the fun :)

Runab WMF (talkcontribs)

@Halibutt Thank you for the detailed testing. Its very helpful information. We don't have immediate plans to activate Yandex for more languages as this is the first experiment with the tool, however we wanted to record all requests so that we could identify which other languages may be served better with it. I am adding your observations to our tracker ticket so that we can appropriately prioritize for the next stage. Thank you.

Endo999 (talkcontribs)

The GoogleTrans gadget new content translation integration supports all the language pairs that Google translate does, which includes the Slavic languages. It doesn't use Yandex, but Google Translate instead. I have translated 16 small to medium size FRWIKI articles into the ENWIKI and the translation seems to be getting better. However, I guess you would know better for the Slavic languages than I.

Since the GoogleTrans gadget is public domain you could make a private version of it that calls Yandex instead of Google. It probably isn't too hard to do. The new Content Translation integration feature will replace the text in the Content Translation system and keep the HTML markup (like what happens with Apertium).

You can look at for help on this feature.

Halibutt (talkcontribs)

@Endo999, thanks for the tip, I didn't know that tool. Testing it right now.

@Runab WMF, thanks for the info, @KartikMistry told me that it's possible to use different translation engines within the Content Translation system, can't wait to test them. As I said, Google Translate is hardly an option for Slavic languages as English is a poor middleman, with all its' emphatic inversions, lack of important parts of grammar (gender!) and whatnot. The way Apertium does it would be perfect, except there is little chance the dictionaries for other languages would ever progress past the beta stage. Yandex indeed looks promising, can't wait to test it live.

Endo999 (talkcontribs)

In response to Halibutt, I have prepared a version of the GoogleTrans gadget to call the Yandex translation system. This version is called YandexTrans and can be called into your code via the line in your common.js file:

mw.loader.load('//en.wikipedia.org/w/index.php?title=User:Endo999/YandexTrans.js&action=raw&ctype=text/javascript');

If you are interested in Cyrillic MT then please give this a go and tell me how it goes. I've only got a free API key from Yandex on this one. I don't know the daily limits but you can get a Yandex API key yourself and replace mine at the top of the code for your own private version.

Help at: General help on GoogleTrans content translation feature is at:

Halibutt (talkcontribs)

@Endo999, thanks a lot. Sadly, I've had my plate full recently (three jobs plus a small kid at home) and didn't have much time to check it, I'll try to take a look this weekend. Thanks!

Reply to "Enabling Yandex for other wikipedias"