Thread:Talk:Requests for comment/Database field for checksum of page text/Similarity measure/reply (7)

I am trying to understand the use case, you yourself state that "Not sure how useful it is to start to describe use cases, it will only be a few I know about…" so I understand what you want to do but I don't understand why you want to do it. And I am suggesting an alternative way using Lucene, Mahout and the XML dump files. The code for the checksums (bug 21860) has been checked in and should be part of MW 1.19. If you are interested in the real contributions of editors then have a look at a project I have been working on called the DiffDB (https://github.com/whym/diffindexer). Best, Diederik Drdee 21:53, 16 November 2011 (UTC)