Talk:Wikipedia article translation metrics/How to detect translated articles

From mediawiki.org
Latest comment: 9 years ago by Livnetata in topic Simpler hints

Bibliography[edit]

This page would use some bibliography. --Nemo 10:12, 26 January 2015 (UTC)Reply

Simpler hints[edit]

I think it would be best to start with some simpler comparisons on markup and non-linguistical features. For instance, if an article is created with "ref name" markup, links, imagelinks and/or ISBN links all/mostly contained in a previous (interlanguage-linked) article, the latter is probably the source. You can calculate the overlap between two pages for each of those factors, multiply them all and then find a threshold.

Other hints, but less certain, are given by the structure of the article, i.e. number and tree structure of the sections and amount of text in each of them. --Nemo 12:17, 26 January 2015 (UTC)Reply

I agree, I will probably start with non-lingustical features. I'm not sure about multiplying because than it is an all or nothing function (if the editor didn't translate one factor then the final score will be zero). I might just add the percentages of markups found in the target page instead of multiplying.
Another interesting direction is to see if there are markups that are not in the original content. I think it will use as a hint that the article that was not translated from the other langauge. Livnetata (talk) 14:59, 28 January 2015 (UTC)Reply