Thread:Talk:Requests for comment/Database field for checksum of page text/Similarity measure/reply (2)

Not sure if a Lucene index can be used at all for this, its about measuring similarity between revisions. How much has two versions changed. Right now all projects that needs this kind of data download the revision text to be able to calculate the fingerprints, which is a veeeery dumb approach as it locks up the server for a long time. It is better to calculate this on the server and just transfer the fingerprints. Calculate a fingerprint for a revision are comparable to calculate a complex digest.