Thread:Talk:Requests for comment/Database field for checksum of page text/Similarity measure/reply (6)

Sorry, but Lucene and Mahout isn't about this at all. Pointing to XML-dump is just saying "we don't want to consider making any analysis available from any Wikimedia site". At least take the time to read the bug report, just reiterating the same question is waste of time.