Topic on Talk:Content translation

Jump to navigation Jump to search

Limiting the use of the translator

Joutbis (talkcontribs)


For the last few weeks there has been a debate in the Catalan Wikipedia regarding the use of this translator. Right now, there is a bug in the generation of references (or templates) which nullifies the only "working feature" that remained about the tool. All the "cite web" references become the string "error in title or URL". Nothing remains of the original template. You can see an example in w:ca:Usuari:Oriololmo/Crisi_del_PSOE_de_2016.

Something we are seeing lately is the translation of internal links, where the link is correctly sent to an existent page, but the shown text remains in the original language. Sometimes, even the link is to the original wikipedia.

The quality of the language generated is certainly awful. It's hard to convey that in another language, but I will give you some examples:

  • "ARE mRNA" becomes "PLOUGH mRNA"
  • "Your NO counts" becomes "Your NO explains"
  • "This article is a stub [the term for short articles]" becomes "This article is a stub [like in the cigarettes]"
  • "The New York Times" becomes "The New York You swindle"
  • "The Time magazine" becomes "The he swindles magazine"
  • "Van Morrison" becomes "Van [as in the car type] Morrison"
  • "More Dirty Debutantes" becomes "Live in Dirty Debutantes"
  • "Who You Really Are" becomes "Who You Really let him Plough"

The English-Catalan translator is just painfully wrong, and many people don't bother to correct the errors. Catalan is quite similar to Spanish, so many people just stick to translating from this language. But the thing is, most of the examples above are from Spanish-Catalan translations, which people just don't double-check due to the mentioned similarities.

Sure, it is just a tool, and people are supposed to proofread the translations, but when this kind of nonsense is in many translations, you begin to wonder its purpose. Make no mistake, I wish we had an automatic translator which worked 100% of the time, and I get it, this is a tool which is being developed, but this kind of stuff just can't make it into Wikipedia in this current state. And again, some users *do* bother to correct those, mainly experienced ones. But we've also had way too many 'one-hit wonders' just come in, click on the paragraphs, publish the translation and leave Wikipedia. Heck, we've even had non-Catalan speakers translate into Catalan!

So, there has been an effort to move this kind of translations to the user namespace. But if we have to devote time to do this, it's time we're wasting on doing other stuff.

A user has come up with a proposal to temporarily disable the translator while the references/template bug is not fixed, and when it is solved, limit the usage of this translator to certain users. That could either be Autopatrolled users or a list of users who have proved they can properly use this tool in a similar fashion to that of Auto Wiki Browser.

Joutbis (talkcontribs)

OK, I reported two bugs and made a request:

  • references get lost
  • internal links are messed up and not translated
  • please let us restrict the tool somehow.

And not even an answer? Please, do something about it.

Amire80 (talkcontribs)

I actually know Catalan, so you can write in Catalan :)

Machine translation is never perfect. It must always be fixed. Users are not supposed to publish articles with machine translation without correction. If people do it, it's the same as vandalism, and can be deleted if needed. This is especially true for "one-hit wonders".

For problems with citations and templates—can you please give me examples of particular articles where this happened?

Townie (talkcontribs)

Sure, here you go: 624 articles to choose from.

Joutbis (talkcontribs)

As for the link problem, see w:Ca:Volcà de Colima and look for "Pompeya" and "Herculano". The link has the Catalan word and links to the Catalan entry, but the spelling is Spanish. Some administrator may move it to user-space soon, so watch out. You also have the article on top of this thread, with over 200 references, and almost all of them wrong.

If you understand Catalan, please check out the original discussion in the Catalan wikipedia, to see what's worrying us.

Mind you, it's not a matter of one-hit wonders and vandals. Brilliant wikipedians, with several featured articles on their belt, make mediocre articles when using the content translator. The overall quality goes south inevitably.

Endo999 (talkcontribs)

Google Translation is actually getting better for many of the main language pairs now, due to its shift to a deep learning (neural net) paradigm. I can say that since they did this they are getting better grammar in the French to English translations.

However, no machine language translation can rest by itself. It needs a person fluent in the destination language to massage it into the correct destination language grammar and meaning.

I have made a suggestion before, that people take out their own translation API keys and upload them to their preferences in their Wikipedia accounts. Thereupon, Wikipedia uses the translation engine of choice for the translator, if they wish to use machine translation. Since Google is probably the best service for many language pairs, this would allow Wikipedia to have the translator pay for this pay-for-use service. This would get around the strict Open Source policy of Wikipedia. Apertium is a noble attempt at Open Source translation, but most people would say it's not as good as Google translation, and not likely to be in the future either.

I think that attempts to unduly limit the use of machine translation in articles are actually attempts to slow the rate of translation between wikis. Already, translations into the enwiki are 1/10 that of translations into the cawiki. Wikipedia is about the increase of knowledge sharing, not limiting it.

Joutbis (talkcontribs)

I agree that Wikipedia is about knowledge sharing, and I have done myself quite a few translations from English and French into Catalan. But I don't see what the incomprehensible babble of Catalan words that the content translator is generating right now is doing for knowledge sharing. On the contrary, it tends to create large articles that no one can understand. If non-wikipedians see one of these articles, they will reach the conclusion that Wikipedia is no use, and they will be less likely to try it again. So wikipedians have to either chase and delete these articles, or try to fix them (going back to the original source to try to make sense, and spending a long time). I think our efforts would be better invested in creating better articles.

Perhaps translations into English are really getting better. Good for you. Into Catalan, they are still awful. Google, and non-Google.

This is not about limiting knowledge sharing. No way. It's about a computer application that's not fit for production use: the language is incomprehensible, and there are at least two important format errors. Other wikipedias may not have these problems, and that's great for them. But we want to have the choice and decide when and how we deploy this tool.

Pesky Catalans demanding to vote. Damn, it's becoming a pattern! :-)

Joutbis (talkcontribs)

Really, I don't see the point in trying Google Translate. If it's what you get in the interactive version at, it's not really worth it. It may be better than Apertium, if you say so, but it's still very far from acceptable.

I have seen a new development in the behavior of the content translator: please see w:Ca:Eli Lieb , on the last reference. There's a whole blob of HTML code (four nested div's!!!), and I don't know what it really means, but it does look like someone is testing software in a production environment.

There are probably a few examples, like w:Ca:España (diari) , although in this one, the editor had the presence of mind to edit all the junk out. He is in a select minority, mind you.

The thing is: given the irregular status of machine translation across different languages (some may have acceptable quality translators, some definitely don't), can you please give the individual wikipedias the choice whether to incorporate content translation or not?

Halibutt (talkcontribs)

@Townie, judging by the first article that appeared there (Abadia territorial de Santa Maria di Grottaferrata), the error is quite simple to fix: the correct template is there in the code (Ref-Web), but the names of the fields are left in the original language ("titolo=" instead of "títol", and so on.

Joutbis (talkcontribs)

This particular article was translated in 2015, and yes, it has a problem with the template. But this is not what this thread is about. The current problem is that the tool outputs just the string "error in title or URL", not the original template, not a translated template. And it has been going on for months. Please check the examples.

There are other errors reported in this thread, if anybody cares to look into them. But what we ask is for a way to limit the use of the content translation, because it's not ready for production, at least in Catalan.

Amire80 (talkcontribs)

Content Translation indeed has bugs, but it has been used to translate over 160,000 articles, and the number of articles that were subsequently deleted is very low in comparison to the number of articles that were created without Content Translation. Catalan translators were among the most prolific (and helpful with testing and bug fixing).

For the issue of templates, there are several things that the communities can do to improve the situation. I suggest reading the page Content translation/Templates, which has documentation for translators and for template maintainers. In particular, improving TemplateData coverage will be very helpful. I did a sample edit at the "Oficial" template, which was the one that created nested <div>s, and this should be done at more commonly used templates.

Joutbis (talkcontribs)

I haven't seen the stats. Perhaps the articles don't get deleted because they get moved to userspace, or because some other people devote time to fix them, or because they remain in a sorry state, and the quality of the Catalan Wikipedia degrades as a result.

Thanks for giving a hint on how to fix the template thing. But this thread mentions quite a few more bugs: the HTML blobs, the non-translated links, and the generally unintelligible language. It's not ready for production use, at least for Catalan. Get over it.

I'm asking again: please give us some way to limit the use of this tool. Some options: allow the tool only to a group of users defined by the admins, create translated articles in userspace, whatever.

Reply to "Limiting the use of the translator"