Topic on User talk:Brooke Vibber/Compacting the revision table round 2

Proposed comment table

7
MZMcBride (talkcontribs)

Is comment.comment_id intended to match to revision.rev_id? If so, can this be made explicit in the proposal? If not, how are comments intended to be matched to revisions?

It looks like you're currently proposing keeping comment.comment_text as varbinary(767). (I'm not sure Wikimedia wikis are using this larger value on bigger, older wikis.) Several people have requested a blob or a pointer to a blob instead so that they could write Git-like commit messages for edits. I guess we're not planning to support that? Related task: phabricator:T6715.

Would this comment table eventually take over for logging.log_comment?

Brooke Vibber (talkcontribs)

It's a separate id, allowing both for reuse of comment rows and use of the comment table by other things like uploads and logs as well, though I haven't rolled that into the update plans yet.

definitely should update to a blob so can enforce and change the size limit in php side.

Halfak (WMF) (talkcontribs)

When would we re-use comment rows? It seems that there's a 1:1 relationship between revisions and comments.

Anomie (talkcontribs)

Leaving the comment empty is common. On talk pages, variations on "reply" as the comment are pretty common too, so multiple replies in the same section may end up using the same comment. Bots and scripts often use the same comment for many similar edits.

Roan Kattouw (WMF) (talkcontribs)

But how would we detect that a comment was reused? Their IDs are auto-increment IDs, not hashes, so we couldn't efficiently find out if the comment that the user just submitted is already in the comment table somewhere. When multiple revisions / log entries / etc with the same comment are created at the same time, we would know, but when does that ever happen? Maybe for null revisions associated with log entries (like page moves and protections), I suppose?

Anomie (talkcontribs)

Presumably if we decided to reuse comments we'd add an appropriate index (possibly on a hash column) to the comment table so we could do a query for it.

Anomie (talkcontribs)

revision.rev_comment_id points to comment.comment_id.

Reply to "Proposed comment table"