Revision refactor

Background[edit]

TL;DR:

major refactor of 'revision' table and some other core tables
taking some wide string fields & indexes out of 'revision' table to make it easier to work with (comments, user/IP actors, content models/formats)
taking some string fields out of 'revision' table to make them reusable in other places like logging (comments, user/IP actors)
prepping schema to allow multiple content objects with distinct roles per revision ("multi-content revisions")

Current rough work plan for the revision refactor; this will be updated with phab links:

Finish up schema redesign work:

update comment table per last week’s consensus
decide on keeping/killing content_format
decide on keeping/killing content & rev sha1 hashes
decide on slot role being in slots vs being in content (feels cleaner to keep it in slots, and relatively low cost)
double-check indexes on actor
use actor, comment tables for logging and images
update, cleanup stray bits

Finish up the proof-of-concept SQL updater:

Start on real updater & transition:

describe updater in doc in more detail
create a schema batch-updater class that can update revision, logging, etc rows in-place during transition mode
split the patch sql into two pieces (one that adds new tables/fields, one that removes old fields)
create a proper installer/updater module that uses the batch-updater class for the middle part between the start and finish sql patches
- this will reduce the separation of code paths for small-site/3rd-party and large-site/wikimedia conversions

Globals:

Updating the Revision class:

in constructor, accept initialized data from actor, comment table columns when available
lazy-load actor_text, comment_text, content as necessary
add new columns/tables to the various internals-exposed APIs like which columns need to be fetched for a manual Revision lookup (depending on transition switch)
join and fetch those columns when available (depending on transition switch)
insert those new columns & tables when available (depending on transition switch)
start looking at how to build a new, MCR-friendly, future-friendly API for fetching, storing, and querying revisions

Updating the Logging class:

todo: investigate in more detail what needs fixing
update to work with actor & comment tables
- with lazy-loading
- depending on transition switch

Updating page deletion:

todo: poke around in the non-Revision bits of page deletion to handle the new schemas

Updating recent changes:

Other things to prep:

did we need reversion info in revision? thinking a separate tracking table is best.

Updating xml import:

Updating xml export:

Updating editing:

Updating other core:

Updating extensions:

audit extension repos for direct revision table usage, see how much fun this will be
set up a todo list and prioritize them
consider an extension.json compatibility check

Revision API cleanliness thoughts:

Revision should be mostly ‘dumb’ object, with services to fetch and store
replace various Revision static getters with service that takes a $db
the new Revision([]) && $rev->insertOn($db) pattern is godawful
- replace it with a store interface that applies modifications on a prior revision or emptiness
replace text-related fetchers & compressors/decompressors with a good interface to Content fetching
the revision deletion/visibility features lead to odd APIs for fetching metadata that accept an audience and user param
- consider changing these to one interface that requests a view object for a given user-or-public-or-raw, then just use that view object

Content API thoughts:

keep em clean. either do something very simple now that can be extended for MCR full world, or think about it before exposing a public api

Comment API thoughts:

if we’re going to reuse comments in multiple places, and give them optional data params, then encapsulating sounds wise.
consider adapting or replacing the machine-readable metadata from logging?
feed a comment object with its context into the comment-rendering functions
- context is page, target page, actor, potentially other things like target section and logging params :D

Actor API thoughts: