Jump to content

Topic on User talk:Brooke Vibber/Compacting the revision table round 2

On hashes and comments

1
Jynus (talkcontribs)

I have positioned against the use of hashes as identifiers (not in general). Hashes are not arbitrary ids- they are completely coupled to the content, and they have many disadvantages compared to autoincrements (when used as internal identifiers). I have commented extensively about that here: https://phabricator.wikimedia.org/T158724#3063877

That doesn't mean we shouldn't use hashes- we can have a comment table with (auto_increment id PK, hash, blob) and an index on the hash for easy location "fake hash index", then we do `SELECT * FROM comments WHERE hash = hash($text) and blob = $text` and we will have the same efficiency (it still uses the index on hash) while only comparing 1 full blob. Fast, and collision resistant.

BTW, we can create this table ASAP, and it would have a really good compression ratio for InnoDB.

Reply to "On hashes and comments"