Topic on Talk:Requests for comment/Database field for checksum of page text

Hash in revision table or text table ?

5
Krinkle (talkcontribs)

Right now the proposal title and committed patch implement this in the revision table. Why is this though ? In my opinion it makes more sense in the text table (which the introduction paragraph of the proposal mentions as target table as well). It's the hash of the text, not of the revision meta-data. There can (and should be) mutiple revisions with the same hash of the revision text. Right now MediaWiki only re-uses a text-table row if a revision is a direct revert of an earlier revision (using the "rollback" feature). If a normal undo takes place or if there were multiple editors between the vandalism and the user had to dig back manualy and save an old revision, then MediaWiki stores a second copy of the text. Anyway, just to bring this up. Do we want it in the text table ?

Dantman (talkcontribs)

No, some of the use cases for this are primarily aimed at tasks which involve looking at a large number of revisions and comparing equality. In essence things which to do right now you'd have to extract the full text for every one of those revisions. Our 'text table' is really only the default location for text an external store can be used instead, so iirc we don't join on the text table. Putting the checksum alongside the text sounds like it would mean that we would end up right back to making hundreds of requests to the text storage.

Drdee (talkcontribs)

I think the revision table makes most sense because that way:

  • it will be easily exposable in the XML dump files
  • it will be easily exposable in the API
  • it will be available on toolserver
Jeblad (talkcontribs)

I'm using something like this in a number of gadgets and if it should be efficient you should not have to query anything that imply processing of text either in the server or at the client, you only want the hash of the text. The hash should although only be calculated from the text, not from any other metadata.

Drdee (talkcontribs)

My understanding of the current code is that it will be put in the revision table and that it will only be based on the actual text. Drdee 04:43, 15 November 2011 (UTC)

Reply to "Hash in revision table or text table ?"