Talk:Wikipedia article translation metrics/How to detect translated articles

Simpler hints
I think it would be best to start with some simpler comparisons on markup and non-linguistical features. For instance, if an article is created with "ref name" markup, links, imagelinks and/or ISBN links all/mostly contained in a previous (interlanguage-linked) article, the latter is probably the source. You can calculate the overlap between two pages for each of those factors, multiply them all and then find a threshold.

Other hints, but less certain, are given by the structure of the article, i.e. number and tree structure of the sections and amount of text in each of them. --Nemo 12:17, 26 January 2015 (UTC)