User:Daniel Kinzler (WMDE)/MCR-PageUpdater

This page describes how functionality related to WikiPage::doEditContent is to be refactored.

Situation on MW master, as per 2018-05-30

 * WikiPage::doEditContent is used to create new revisions of a page.
 * WikiPage::doEditUpdates can be used to force derived data to be updated, but it does not by itself re-parse the page.
 * WikiPage::prepareContentForEdit is used by hook handlers to access PST content and ParserOutput of edits-in-progress
 * EditPage is responsible for checking permissions, tokens, rate limits, etc. It also handles automatic conflict resolution (3-way-merge) and section edits.

Situation as per If610c68f49

 * PageUpdater::saveRevision is used to create new revisions of a page. It still relies on WikiPage::insertOn and WikiPage::updateRevisionOn for updating the page table. It uses a DerivedPageDataUpdater to manage state while the edit is in progress, and for performing any updates of secondary or cached data.
 * DerivedPageDataUpdater::prepareContent essentially replaces WikiPage::prepareContentForEdit: it constructs PST content and (conceptually) provides parsed output (in reality, ParserOutput is constructed on demand, not immediately).
 * DerivedPageDataUpdater::prepareUpdate and DerivedPageDataUpdater::doUpdatesreplace WikiPage::doEditUpdates. Resources (especially, ParserOutput) generated by prepareContent is re-used when possible.

Conceptual Break-Down
Conceptually, we want to expose the following interfaces to application logic and extensions:


 * saveRevision(RevisionSlotsUpdate) aka doEditContent to create update a page by creating a new revision (or not, in case of a null-edit).
 * we may also want a version of saveRevision that directly takes a RevisionRecord, leaving PST to the caller.
 * saveDummyRevision aka null revision: like saveRevision, but with no (changed) content.
 * edit for user-initiated edits, covering the functionality of EditPage abstracted from all UI code: checking permissions, tokens, rate limits, etc. It also handles automatic conflict resolution (3-way-merge) and section edits.
 * stashEdit for stashing pre-parsed content to be used by edit.
 * preview takes the same input as edit and returns rendered output for a new, unsaved revision.
 * Could make use of stashed output. Could also write into stash.
 * renderRevision takes a (saved?) RevisionRecord and returns output for it. Similar to preview. Used for plain views, history views, diff views, etc.
 * purge re-generate any derived data. Re-parsing is optional.
 * we may want to expose a separate updateSecondaryData method. This would be used e.g. when importing or undeletion of the newest revision. On the other hand, we could just trigger a purge in that case.
 * we may want to expose a separate updateParserCache method that doesn't trigger the otehr updates.

Internally, we need the following functionality


 * check user permissions, tokens, limits, etc
 * construct Content objects from user input, applying section manipulation and conflict resolution in the process.
 * stash pre-computed PST and rendered output in, so it can be re-used when the page is saved.
 * construct an unsaved RevisionRecord from RevisionSlotsUpdate, while apply PST and slot inheritance (and making use of cached data). This component should also provide access to rendered output for the full revision (as well as individual slots). Such output should be created on-demand, and should be cached aggressively.
 * produce rendered output of a revision, by combining the output of individual slots.
 * save revisions in the database.
 * update the page table.
 * write secondary data to the database.
 * invalidate derived artifacts.
 * access PST content and rendered content from callbacks while an edit is in progress.

Further refactoring
The required refactoring is expected to result in the following components to be defined:


 * An EditController that provides the edit interface for user-initiated edits. It should check permissions, tokens, rate limits, etc, handle automatic conflict resolution (3-way-merge) and section edits, and then hand control to a PageUpdater to for the actual edit.
 * PageUpdater provides CAS semantics that allows the caller to safely create content based on the parent revision. It uses an EditRevisionFactory to construct an unsaved RevisionRecord, along with a RenderedRevision object for later use by DerivedPageDataUpdater. It further contains the logic for saving the new revision using RevisionStore and then updating the page table using a PageStore. Finally, it uses a DerivedPageDataUpdater to update the parser cache, links tables, invalidate affected artifacts, etc.
 * EditRevisionFactory (better name pending) builds an unsaved RevisionRecord from a RevisionSlotsUpdate, along with a RenderedRevision, for used by edit filter extensions and later by DerivedPageDataUpdater. It applies pre-safe-transform (PST) on all content, inherits unchanged content from the parent revision. It tries to take PST content and rendered content from th edit stash.
 * DerivedPageDataUpdater takes a (safed) RevisionRecord and a RenderedRevision, and takes care of updating any secondary data and cached artifacts (optionally, recursively). It may re-use or re-render ParserOutput as appropriate.
 * DerivedPageDataUpdater should use the RenderedRevision instance that was created by the EditRevisionFactory.
 * the job of discarding ParserOutput objects that depend on the revision ID may be handled by RenderedRevision. Perhaps it could have a setRevisionId or updateRevision method for this.
 * RenderedRevision wraps a RevisionRecord and provides (lazy, cached) access to ParserOutput for each slot, as well as a combined ParserOutput across all slots. It uses a RevisionRenderer to produce the combined output. The RenderedRevision is bound to a specific user and ParserOptions object.
 * RenderedRevision should not know RevisionRenderer directly to avoid circular dependencies. We can make RevisionRenderer implement a ParserOutputCombiner interface to isolate them.
 * RevisionRenderer acts as a factory for RenderedRevision, and contains logic for combining the ParserOutput of multiple slots, with the help of the appropriate SlotRoleHandlers which it gets from a SlotRoleRegistry.
 * PageStore is a stateless service that can load PageRecords from the page table, and can update the page table when pages are created, updated, or deleted.
 * RevisionStore is a stateless service for loading and storing RevisionRecord objects.
 * EditStash is a stateless service that caches pre-computed PST content and rendered content for a given RevisionSlotsUpdate. The cache is keyed on the hash of the RevisionSlotsUpdate and is bound to a specific user / session (and anything else that may impact PST).
 * SlotRoleHandler defines the behavior of a slot. For now, it defines what HTML represents the slot output, and where it is placed.
 * SlotRoleRegistry is a stateless service that provides access to SlotRoleHandler instances.

We further want to remove any need to call WikiPage::prepareContentFor edit, by instead passing RevisionRecord and/or RenderedRevision to hook handlers / listeners.