Thread:Talk:Requests for comment/Database field for checksum of page text/Similarity measure/reply

Dear Jeblad,

I think a quicker way of getting this is so to create a Lucene index and maybe use Mahout to find similar documents. Given the size of the Wikipedia corpus, I am not sure we can do this in (almost) realtime. What kind of usecases do you have in mind?

Drdee 04:51, 15 November 2011 (UTC)