Revision refactor
Background[edit]
Todo: copy over relevant core of info from User:Brooke Vibber/Compacting the revision table round 2
TL;DR:
- major refactor of 'revision' table and some other core tables
- taking some wide string fields & indexes out of 'revision' table to make it easier to work with (comments, user/IP actors, content models/formats)
- taking some string fields out of 'revision' table to make them reusable in other places like logging (comments, user/IP actors)
- prepping schema to allow multiple content objects with distinct roles per revision ("multi-content revisions")
Work plan[edit]
Current rough work plan for the revision refactor; this will be updated with phab links:
Finish up schema redesign work:
- update comment table per last weekâs consensus
- decide on keeping/killing content_format
- decide on keeping/killing content & rev sha1 hashes
- decide on slot role being in slots vs being in content (feels cleaner to keep it in slots, and relatively low cost)
- double-check indexes on actor
- use actor, comment tables for logging and images
- update, cleanup stray bits
Finish up the proof-of-concept SQL updater:
- update the updater patch sql with the above updates
Start on real updater & transition:
- describe updater in doc in more detail
- create a schema batch-updater class that can update revision, logging, etc rows in-place during transition mode
- split the patch sql into two pieces (one that adds new tables/fields, one that removes old fields)
- create a proper installer/updater module that uses the batch-updater class for the middle part between the start and finish sql patches
- this will reduce the separation of code paths for small-site/3rd-party and large-site/wikimedia conversions
Globals:
- create a global config var for the transition state (old, transition, new)
Updating the Revision class:
- in constructor, accept initialized data from actor, comment table columns when available
- lazy-load actor_text, comment_text, content as necessary
- add new columns/tables to the various internals-exposed APIs like which columns need to be fetched for a manual Revision lookup (depending on transition switch)
- join and fetch those columns when available (depending on transition switch)
- insert those new columns & tables when available (depending on transition switch)
- start looking at how to build a new, MCR-friendly, future-friendly API for fetching, storing, and querying revisions
Updating the Logging class:
- todo: investigate in more detail what needs fixing
- update to work with actor & comment tables
- with lazy-loading
- depending on transition switch
Updating page deletion:
- todo: poke around in the non-Revision bits of page deletion to handle the new schemas
Updating recent changes:
- put this off for now, use the existing summary table
Other things to prep:
- did we need reversion info in revision? thinking a separate tracking table is best.
Updating xml import:
- either prep this for MCR or just handle the single content items for now
- update Revision API usage if necessary
Updating xml export:
- either prep this for MCR or just start thinking about it for later
Updating editing:
- can continue to use high-level article edit API for now
- start thinking about this for MCR though
Updating other core:
- audit other internal code that touches revision
- check API modules that expose revision queries etc, need to update
- Looking at core modules that directly query the 'revision' table:
- ApiQueryRevisionsBase and its 4 subclasses will need significant work.
- ApiQueryContributors will need minor query adjustments to use rev_actor
- ApiQueryRecentChanges might need to drop or replace rcprop=sha1
- ApiQueryUserContributions will need updates for rev_actor and rev_comment, some thought how to handle ucprop=size|sizediff with the loss of rev_len.
- Looking at core modules that directly query the 'revision' table:
- check Special pages that expose revision queries etc, need to update
- check maintenance scripts that expose revision queries etc, need to update
- set up a todo list and smash em all downâŚ
Updating extensions:
- audit extension repos for direct revision table usage, see how much fun this will be
- set up a todo list and prioritize them
- consider an extension.json compatibility check
Revision API cleanliness thoughts:
- Revision should be mostly âdumbâ object, with services to fetch and store
- replace various Revision static getters with service that takes a $db
- the new Revision([]) && $rev->insertOn($db) pattern is godawful
- replace it with a store interface that applies modifications on a prior revision or emptiness
- replace text-related fetchers & compressors/decompressors with a good interface to Content fetching
- the revision deletion/visibility features lead to odd APIs for fetching metadata that accept an audience and user param
- consider changing these to one interface that requests a view object for a given user-or-public-or-raw, then just use that view object
Content API thoughts:
- keep em clean. either do something very simple now that can be extended for MCR full world, or think about it before exposing a public api
Comment API thoughts:
- if weâre going to reuse comments in multiple places, and give them optional data params, then encapsulating sounds wise.
- consider adapting or replacing the machine-readable metadata from logging?
- feed a comment object with its context into the comment-rendering functions
- context is page, target page, actor, potentially other things like target section and logging params :D
Actor API thoughts:
- need a consistent way to pass around Actor info refs too maybe?