User:Daniel Kinzler (WMDE)/MCR-PageUpdater/Notes-2018-04

This page is for discussing the architecture of PageUpdater and related classes as implemented in https://gerrit.wikimedia.org/r/c/405015/.

Anomie's Proposal
Anomie made the following proposal in response to a comment by Gerö, which I have slightly adapted to allow easier discussion on this page:

There are several pieces of code that need to exist:


 * (CAN) Validate the ability of a user to edit a page.
 * (EXP) Take minimally-processed user input and turn it into some "page update" data structure. IMO this should include expanding section edits and detecting and resolving edit conflicts.
 * (PUP) Directly create a "page update" data structure from code, which might skip some of the features of the previous bullet.
 * (VAL) Validate a "page update" data structure for things like allowed slots, correct content models for each slot, and so on.
 * (RVR) Turn a "page update" data structure into an unsaved RevisionRecord, which at least probably includes PST.
 * (SVE) Save a RevisionRecord. Or not, if it turns out to be a null edit.
 * (POT) Produce a ParserOutput and possibly other data from a RevisionRecord (saved or unsaved) or "page update" (although that could just do (RVR) then proceed), plus a ParserOptions. This includes whatever merging of per-slot ParserOutputs might be required, and optionally checking/updating ParserCache.
 * (DUP) Schedule update jobs from the data produced in (POT).

(POT) might include flags to indicate whether we want HTML, data for (DUP), or both. Current MediaWiki only has the flag for HTML, and ParserCache only stores POs with both.

The use cases include:


 * ApiStashEdit: Does (EXP), (VAL), (RVR), (POT), and then saves data into memcached.
 * EditPage/ApiEditPage: Does (CAN), (EXP), and (VAL). Then gets the data for (RVR) and (POT) either directly or from ApiStashEdit's stash, and does (SVE) and (DUP).
 * Various backend code: Does (PUP), probably (VAL), (RVR), (SVE), (POT), and (DUP). This includes anything making an internal call to ApiEditPage.
 * Dummy edits: Somehow or other saves a RevisionRecord, then does (POT) and (DUP).
 * Purge: Loads the existing RevisionRecord and does (POT) and (DUP).
 * Page view: Loads the existing RevisionRecord and does (POT).
 * Page preview: Does (EXP), probably (VAL), (RVR), and (POT). The ParserOptions it uses for (POT) needs to use setupFakeRevision with the revision from (RVR).
 * ApiParse: Depending on the options, either loads an existing RevisionRecord or does more or less (EXP), probably (VAL), and (RVR). Then does (POT), possibly with setupFakeRevision.

In the current patch, PageUpdater handles (EXP) partially, (PUP), (RVR) partially, and (SVE). PageMetaDataUpdater handles (VAL) more-or-less, (RVR) partially, (POT) for canonical parser options only, and (DUP). RevisionSlotsUpdate from the parent patch handles another bit of (EXP). Ie772a4a adds RevisionRenderer which handles (POT) more generally. Nothing handles (CAN), and (EXP) is incomplete even after combining all the parts.

If I understand Gergő's proposal, he's having a PageUpdate for (PUP) plus either (POT) or just the data produced by (POT); putting (VAL), (RVR), and (SVE) except for null-edit determination in "PageUpdater"; and (POT) maybe and (DUP) in "PageSecondaryDataUpdater". (CAN), (EXP), and null-edit determination would presumably be in another abstraction layer above this.

After writing all this out, here's another proposal:


 * Have PageUpdate for collecting and holding the data for (EXP) and (PUP). Like Gergő suggested but without the (POT)-related bits.
 * Have some "Rendering" object, but unlike Ie772a4a do not have ParserOutput implement that. Probably it needs only a few methods: getRevision, getParserOptions, getParserOutput, and getParserOutputForSlot( $role ). The "ForSlot" method could be lazy.
 * Have a RevisionRenderer service that does (POT), but unlike Ie772a4a it probably needs only 1 or 2 public methods: renderRevision( $revision, $parserOptions, $generateHtml ) and maybe renderRevisionForPreview( $revision, $parserOptions ).
 * Have a PageUpdater service with several methods
 * PageUpdate resolveUpdate( $pageUpdate ) // the section and edit conflict parts of (EXP)
 * Status checkPermissions( $pageUpdate ) // (CAN), has all the permission and block checks from EditPage
 * Status validateUpdate( $pageUpdate ) // (VAL)
 * RevisionRecord makeRevisionForUpdate( $pageUpdate ) // (RVR)
 * Status saveRevisionIfChanged( $revisionRecord ) // (SVE)
 * Status scheduleSecondaryUpdates( $rendering ) // (DUP)
 * (maybe) Rendering renderRevision( $revisionRecord ) // call-through to RevisionRenderer with canonical $parserOptions
 * Status stashRendering( $pageUpdate, $renering ) // ApiStashEdit
 * Rendering|null unstashRendering( $pageUpdate ) // ApiStashEdit
 * Status makeEdit( $pageUpdate ) // does (VAL), (RVR)+(SVE)+(POT) or unstash+(SVE), and (DUP)

Daniel's Response
> ''EditPage/ApiEditPage: Does (CAN), (EXP), and (VAL). Then gets the data for (RVR) and (POT) either directly or from ApiStashEdit's stash, and does (SVE) and (DUP).''

(RVR) and (POT) and (SVE) and (DUP) are all done in WikiPage, and are also needed by other use cases.

> In the current patch, PageUpdater handles (EXP) partially (PUP), (RVR) partially, and (SVE).

What bit of (EXP) does this handle? And what part of (PUP) and (RVR) does it not handle?

> PageMetaDataUpdater handles (VAL) more-or-less, (RVR) partially, (POT) for canonical parser options only, and (DUP).

''> RevisionSlotsUpdate from the parent patch handles another bit of (EXP). Ie772a4a adds RevisionRenderer which handles (POT) more generally.''

What bit of (EXP) does this handle?

> Nothing handles (CAN), and (EXP) is incomplete even after combining all the parts.

That is by design - in my opinion, that logic belongs in the controller/interactor layer that performs an "edit", as opposed to creating a revision.

> If I understand Gergő's proposal, he's having a PageUpdate for (PUP) plus either (POT) or just the data produced by (POT);

> putting (VAL), (RVR), and (SVE) except for null-edit determination in "PageUpdater";

I went down that path a couple of times in code experiments, trying to introduce a PageStore. It turned out to require a much more extensive refactoring than I thought feasible. A mutable, builder-like PageUpdate may make it easier, but it still means that pretty much all the code needs to be re-written, instead of being moved around.

> and (POT) maybe and (DUP) in "PageSecondaryDataUpdater".

Or "SecondaryPageDataUpdater"? I kind of like that name.

> (CAN), (EXP), and null-edit determination would presumably be in another abstraction layer above this.

I agree that this makes sense, since it properly separates the idea of "making an edit" from "creating a revision".

> ''Have PageUpdate for collecting and holding the data for (EXP) and (PUP). Like Gergő suggested but without the (POT)-related bits.''

Do I understand correctly that PageUpdate would not have application logic and would not know about services, but would only have setters which are used only by the PageUpdater? That's possible, but I'm not sure I see the point. Though I'm generally in favor of the value/service split.

> ''Have some "Rendering" object, but unlike Ie772a4a do not have ParserOutput implement that. Probably it needs only a few methods: getRevision, getParserOptions, getParserOutput, and getParserOutputForSlot( $role ).''

That's the SlotRenderingProvider in my patch.

> Have a RevisionRenderer service that does (POT), but unlike Ie772a4a it probably needs only 1 or 2 public methods: renderRevision( $revision, $parserOptions, $generateHtml ) and maybe renderRevisionForPreview( $revision, $parserOptions ).

That was my original intention, but I found no way to make that work without making things more obscure.

> Have a PageUpdater service with several methods


 * PageUpdate resolveUpdate( $pageUpdate ) // the section and edit conflict parts of (EXP)
 * It seems to me like the input to this method can't be a PageUpdate. You'd need something to represent a partial PageUpdate for just one section of one slot, etc.
 * Status checkPermissions( $pageUpdate ) // (CAN), has all the permission and block checks from EditPage
 * User permission checks don't belong into the storage layer IMHO. The abstraction for "perform an edit" should be separate from the abstraction for "create a revision".
 * Status validateUpdate( $pageUpdate ) // (VAL)
 * "validate" could mean anything... anyway, why should this be public?
 * RevisionRecord makeRevisionForUpdate( $pageUpdate ) // (RVR)
 * Why should this be public?
 * Status saveRevisionIfChanged( $revisionRecord ) // (SVE)
 * If this does not involve (DUP), it should probably be in RevisionStore. Perhaps PageUpdate should just have a isChange method.
 * Status scheduleSecondaryUpdates( $rendering ) // (DUP)
 * Since this is needed in several places where no revision is created, this should be in a separate class (or at least, in a separate interface).
 * (maybe) Rendering renderRevision( $revisionRecord ) // call-through to RevisionRenderer with canonical $parserOptions
 * Should exist, but in a separate service.
 * Status stashRendering( $pageUpdate, $renering ) // ApiStashEdit
 * Rendering|null unstashRendering( $pageUpdate ) // ApiStashEdit
 * An EditStash (perhaps also a local EditCache?) should be a separate service.
 * Status makeEdit( $pageUpdate ) // does (VAL), (RVR)+(SVE)+(POT) or unstash+(SVE), and (DUP)
 * Ideally, this would be the only public method in the Page