Topic on Talk:Requests for comment/Content model storage

Separate table for content meta-data

2
Duesentrieb (talkcontribs)

Instead of adding columns to various tables (page, revision, archive), I suggest to create a separate table that holds meta-data about revision content (at least the model and format, but we will also want things like the role/slot, blob address, and hash there later, for multi-content-revision support, see Phab:T107595). The table would have at least these fields:

CREATE TABLE /*_*/content (
  cnt_revision INT NOT NULL,
  cnt_model INT NOT NULL,
  cnt_format INT NOT NULL,
  PRIMARY KEY (cnt_revision)
) /*$wgDBTableOptions*/;

This table can then be used to acquire the model and format for a given revision by joining cnt_revision against page_current, rev_id, or ar_rev_id.

If we want to support multiple content "slots" per revision (as per Phab:T107595), cnt_revision would no longer be sufficient to identify the desired content. A cnt_role field would be added to identify the role the content plays in the revision (e.g. main, style, categories, meta, blame, etc). cnt_role would reference a content_role table defined in the same way as content_model and content_format. cnt_revision and cnt_role form a unique key. The table would then look like this:

CREATE TABLE /*_*/content (
  cnt_revision INT NOT NULL,
  cnt_role INT NOT NULL,
  cnt_model INT NOT NULL,
  cnt_format INT NOT NULL,
  -- more fields to add for multi-content-revision support:
  -- cnt_address, cnt_hash, cnt_logical_size, cnt_is_primary, etc
  PRIMARY KEY (cnt_revision, cnt_role)
) /*$wgDBTableOptions*/;

CREATE TABLE /*_*/content_role (
  cr_id smallint NOT NULL PRIMARY KEY AUTO_INCREMENT,
  cr_role VARBINARY(32) NOT NULL
) /*$wgDBTableOptions*/;

CREATE INDEX /*i*/cr_role ON /*_*/content_role (cr_role);

When joining against page_current, rev_id, etc., cnt_role will then have to be fixed (e.g. to the "main" role) to allow a unique match per revision.

Legoktm (talkcontribs)

My only concern with splitting to another table is making sure that the tables stay in sync, and if they get out of sync, handling failure gracefully. While it mostly should work properly with our transaction handling, we occasionally have random freak accidents where page_latest isn't updated properly or something, so it's not 100% perfect.Using the same table (revision/archive) would neatly avoid that problem.

Reply to "Separate table for content meta-data"