User:Daniel Kinzler (WMDE)/MCR-PageUpdater/Notes-2018-04

This page is for discussing the architecture of PageUpdater and related classes as implemented in https://gerrit.wikimedia.org/r/c/405015/.

Anomie's Proposal
Anomie made the following proposal in response to a comment by Gerö, which I have slightly adapted to allow easier discussion on this page:

There are several pieces of code that need to exist:


 * (CAN) Validate the ability of a user to edit a page.
 * (EXP) Take minimally-processed user input and turn it into some "page update" data structure. IMO this should include expanding section edits and detecting and resolving edit conflicts.
 * (PUP) Directly create a "page update" data structure from code, which might skip some of the features of the previous bullet.
 * (VAL) Validate a "page update" data structure for things like allowed slots, correct content models for each slot, and so on.
 * (RVR) Turn a "page update" data structure into an unsaved RevisionRecord, which at least probably includes PST.
 * (SVE) Save a RevisionRecord. Or not, if it turns out to be a null edit.
 * (POT) Produce a ParserOutput and possibly other data from a RevisionRecord (saved or unsaved) or "page update" (although that could just do (RVR) then proceed), plus a ParserOptions. This includes whatever merging of per-slot ParserOutputs might be required, and optionally checking/updating ParserCache.
 * (DUP) Schedule update jobs from the data produced in (POT).

(POT) might include flags to indicate whether we want HTML, data for (DUP), or both. Current MediaWiki only has the flag for HTML, and ParserCache only stores POs with both.

The use cases include:


 * ApiStashEdit: Does (EXP), (VAL), (RVR), (POT), and then saves data into memcached.
 * EditPage/ApiEditPage: Does (CAN), (EXP), and (VAL). Then gets the data for (RVR) and (POT) either directly or from ApiStashEdit's stash, and does (SVE) and (DUP).
 * Various backend code: Does (PUP), probably (VAL), (RVR), (SVE), (POT), and (DUP). This includes anything making an internal call to ApiEditPage.
 * Dummy edits: Somehow or other saves a RevisionRecord, then does (POT) and (DUP).
 * Purge: Loads the existing RevisionRecord and does (POT) and (DUP).
 * Page view: Loads the existing RevisionRecord and does (POT).
 * Page preview: Does (EXP), probably (VAL), (RVR), and (POT). The ParserOptions it uses for (POT) needs to use setupFakeRevision with the revision from (RVR).
 * ApiParse: Depending on the options, either loads an existing RevisionRecord or does more or less (EXP), probably (VAL), and (RVR). Then does (POT), possibly with setupFakeRevision.

In the current patch, PageUpdater handles (EXP) partially, (PUP), (RVR) partially, and (SVE). PageMetaDataUpdater handles (VAL) more-or-less, (RVR) partially, (POT) for canonical parser options only, and (DUP). RevisionSlotsUpdate from the parent patch handles another bit of (EXP). Ie772a4a adds RevisionRenderer which handles (POT) more generally. Nothing handles (CAN), and (EXP) is incomplete even after combining all the parts.

If I understand Gergő's proposal, he's having a PageUpdate for (PUP) plus either (POT) or just the data produced by (POT); putting (VAL), (RVR), and (SVE) except for null-edit determination in "PageUpdater"; and (POT) maybe and (DUP) in "PageSecondaryDataUpdater". (CAN), (EXP), and null-edit determination would presumably be in another abstraction layer above this.

After writing all this out, here's another proposal:


 * Have PageUpdate for collecting and holding the data for (EXP) and (PUP). Like Gergő suggested but without the (POT)-related bits.
 * Have some "Rendering" object, but unlike Ie772a4a do not have ParserOutput implement that. Probably it needs only a few methods: getRevision, getParserOptions, getParserOutput, and getParserOutputForSlot( $role ). The "ForSlot" method could be lazy.
 * Have a RevisionRenderer service that does (POT), but unlike Ie772a4a it probably needs only 1 or 2 public methods: renderRevision( $revision, $parserOptions, $generateHtml ) and maybe renderRevisionForPreview( $revision, $parserOptions ).
 * Have a PageUpdater service with several methods
 * PageUpdate resolveUpdate( $pageUpdate ) // the section and edit conflict parts of (EXP)
 * Status checkPermissions( $pageUpdate ) // (CAN), has all the permission and block checks from EditPage
 * Status validateUpdate( $pageUpdate ) // (VAL)
 * RevisionRecord makeRevisionForUpdate( $pageUpdate ) // (RVR)
 * Status saveRevisionIfChanged( $revisionRecord ) // (SVE)
 * Status scheduleSecondaryUpdates( $rendering ) // (DUP)
 * (maybe) Rendering renderRevision( $revisionRecord ) // call-through to RevisionRenderer with canonical $parserOptions
 * Status stashRendering( $pageUpdate, $renering ) // ApiStashEdit
 * Rendering|null unstashRendering( $pageUpdate ) // ApiStashEdit
 * Status makeEdit( $pageUpdate ) // does (VAL), (RVR)+(SVE)+(POT) or unstash+(SVE), and (DUP)