Topic on Talk:Content translation

Loophole and possible solution!

6
Nguyentrongphu (talkcontribs)

CT has produced tens of thousands of badly translated articles in Wikipedia Vi. The community has voted on to restrict CT publication to 95%/5% (one has to edit at least 5% contents to publish). However, someone found a loophole and has been exploiting it to mass produce badly translated articles. One can copy contents from CT and paste it to visual editor without editing anything. One possible solution to this is to disable "copy" function in CT, at least in Wikipedia Vi.

Bluetpp (talkcontribs)

Fyi, the Vietnamese Wikipedia community right now is having a vote for removing admin and bureaucrat rights of an admin/bureaucrat for using this loophole to creates thousands of articles badly translated (using machine translation in the CT without reviewing and skip the 95/5 limit by copying from CT to visual editor). My community also doubts the effectiveness of publishing limits; if somebody wants to publish anyway, they still can do it by simply copying and pasting.

Hence the solution Nguyentrongphu proposed.

@Pginer-WMF: Hi Pau, it's I who emailed you some days ago. While our community is considering changing the limit again, can you also look closely into this problem? Thank you.

Pginer-WMF (talkcontribs)

Thanks for sharing your experience. Feedback from editors is very useful for us to improve the tool.

For the specific case described, it seems that the tool communicated clearly that the content was not ready to be published, and the user intentionally ignored such information and decided to break the rules. Unfortunately, content creation tools can always be misused in some ways. Even if we limited copying content (which would affect negatively many legitimate users), it would still be possible to inspect the HTML or go directly to the Google Translate website to copy a machine translation from there without even using Content Translation at all.

The limits are intended to raise awareness for well-intentioned users about the issue of publishing contents without proper review. We plan to improve the way limits work but I don't think any tool can guarantee that content created will be always good. I think that treating this case as vandalism with the usual tools to block users and revert content creation seems the most appropriate to me.

Overall, Content Translation seems to be properly used in Vietnamese Wikipedia in general. During the whole last year (2020) there were about 6500 articles created with the tool, and only 413 were deleted (6%). To put that in perspective, about 53600 articles were created without the tool and 18300 were deleted (34%). Uses that use the tool as intended are contributing contents by reusing the efforts from other languages to create articles that are much less likely to be deleted. There may be exceptions, and we need to investigate and explore ways for those to be reduced, but I think it is good to acknowledge also the positive aspects the tool brings and be careful not to affect those when dealing with the exceptions.


Thanks!

Nguyentrongphu (talkcontribs)

User:Pginer-WMF 6% deleted doesn't reflect the real problem within Wikipedia Vi. For 2-3 years, many tens of thousands of badly translated article went undetected. They looked good at the surface, but one has to read carefully to notice the bad quality. Only recently (middle of 2020), the issue was brought into the light in the Wikipedia Vi community. Plus, some people were exploiting this loophole for many years, so that doesn't factor in the real number of deletion from using CT. Articles created using the loophole are not being counted as CT created articles. 1 user alone was caught translating over 5000 badly translated article using the loophole. Almost all of his articles are not deleted yet due to the fact that he has been an admin, and people trusted him.


Using GG translation is not the same as using CT (or the loophole). It is much easier to catch badly translated articles if they use GG translation. For example, GG translated articles have problems with interlink, ref, templates, appearance (looks un-wikified) and so on... There is a reason why English Wikipedia disabled CT entirely even though people still can use GG translation to write in English Wikipedia.


Is it possible to disable "copy" function within CT if the Wikipedia Vi community reaches consensus on it? Thank you!

Pginer-WMF (talkcontribs)

Copy and pasting is a fundamental editing tool that will negatively impact many legitimate editors if we just disable it. Before considering such a drastic measure, I'd like to have a better understanding of the problem to explore alternative solutions, and keep polishing the idea.

You mentioned that tenths of thousands of articles were created by copying from Content Translation. I'd like to hear a bit more about:

  1. Given that those articles are tagged in any special way, how do you know that content comes from Content Translation?
  2. Given that these users seem to be aware that this is not the intended way to use the tool (and the tool warns them that their content will be reviewed by the community ad is likely to be deleted), what do you think motivates them to do so?

Once we have a better understanding of the process, I think it is worth exploring some solutions that can help in the problematic cases but not affect legitimate editors. For example, instead of preventing copying completely we can consider to support adding a special tag to the content when it is copied (only when it has has not been edited enough according to the limits) so that creating an article just by pasting this could be easily detected by edit filters.

I understand that a few users misusing a tool generates a lot of clean-up work that is time consuming. However, I think it is important to find the right balance if we still want Wikipedia to be the encyclopedia that anyone can edit.

Nguyentrongphu (talkcontribs)

Pginer-WMF

  1. Someone creates dozens of replicate articles in sandbox pages and compares them to his articles. They look identical. Copying directly from GG translation would produce a very different article's appearance.
  2. Some people are motivated by money. Polish embassy in Vietnam has created a contest 2 times (1st in 2019; 2nd in 2020). Winner was determined by the number of articles created in Polish topic. The first prize winner received 2000 US dollars (2nd contest). That is a lot of money in Vietnam (equivalent to about 9 months of salary for an averaged income). So yea, Wikipedia Vi was flooded with thousands of badly translated articles from people who wanted to win the competition. Some people are motivated by personal belief of choosing quantity over quality. The same crat, who is facing a removal vote right now, was responsible for the artificial increment of "depth" of Wikipedia Vi by using bots to welcome millions of IPs. Some people took notice and blanked it out. In the past, he also received funds from Meta to create contests with monetary prizes. He was accused embezzlement after almost a year late of giving out prizes.

I agree with your less radical solution. We just need a way to control the influx of badly translated articles. Any content copied from CT must be marked with a tag saying it was copied from CT (even 1 word copy).

Reply to "Loophole and possible solution!"