User:Daniel Kinzler (WMDE)/MCR-PageUpdater

From mediawiki.org

This page describes how functionality related to WikiPage::doEditContent is to be refactored.

Pre-MCR situation[edit]

  • WikiPage::doEditContent is used to create new revisions of a page.
  • WikiPage::doEditUpdates can be used to force derived data to be updated, but it does not by itself re-parse the page.
  • WikiPage::prepareContentForEdit is used by hook handlers to access PST content and ParserOutput of edits-in-progress
  • EditPage is responsible for checking permissions, tokens, rate limits, etc. It also handles automatic conflict resolution (3-way-merge) and section edits.

Situation on master, per If610c68f49[edit]

  • PageUpdater::saveRevision() is used to create new revisions of a page. It still relies on WikiPage::insertOn and WikiPage::updateRevisionOn for updating the page table. It uses a DerivedPageDataUpdater to manage state while the edit is in progress, and for performing any updates of secondary or cached data.
  • DerivedPageDataUpdater::prepareContent() essentially replaces WikiPage::prepareContentForEdit(): it constructs PST content and (conceptually) provides parsed output (in reality, ParserOutput is constructed on demand, not immediately).
  • DerivedPageDataUpdater::prepareUpdate() and DerivedPageDataUpdater::doUpdates()replace WikiPage::doEditUpdates(). Resources (especially, ParserOutput) generated by prepareContent() is re-used when possible.

Conceptual Break-Down[edit]

Edit Data Flow

Conceptually, we want to expose the following interfaces to application logic and extensions:

  • saveRevision(RevisionSlotsUpdate) aka doEditContent() to create update a page by creating a new revision (or not, in case of a null-edit).
    • we may also want a version of saveRevision that directly takes a RevisionRecord, leaving PST to the caller.
  • saveDummyRevision() aka null revision: like saveRevision(), but with no (changed) content.
  • edit() for user-initiated edits, covering the functionality of EditPage abstracted from all UI code: checking permissions, tokens, rate limits, etc. It also handles automatic conflict resolution (3-way-merge) and section edits.
  • stashEdit() for stashing pre-parsed content to be used by edit().
  • preview() takes the same input as edit() and returns rendered output for a new, unsaved revision.
    • Could make use of stashed output. Could also write into stash.
  • renderRevision() takes a (saved or unsaved) RevisionRecord and returns output for it. Similar to preview(). Used for plain views, history views, diff views, etc.
  • purge() re-generate any derived data. Re-parsing is optional.
    • we may want to expose a separate updateSecondaryData() method. This would be used e.g. when importing or undeletion of the newest revision. On the other hand, we could just trigger a purge in that case.
    • we may want to expose a separate updateParserCache() method that doesn't trigger the otehr updates.

Internally, we need the following functionality

  • [Permission Checks] check user permissions, tokens, limits, etc
  • [Sections, Edit Conflicts] construct Content objects from user input, applying section manipulation and conflict resolution in the process.
  • [Save to Stash] stash pre-computed PST and rendered output in, so it can be re-used when the page is saved.
  • [PST, make rev] construct an unsaved RevisionRecord from RevisionSlotsUpdate, while applying PST and slot inheritance (and making use of cached data). This component should also provide access to rendered output for the full revision (as well as individual slots). Such output should be created on-demand, and should be cached aggressively.
  • [Render Revision] produce rendered output of a revision, by combining the output of individual slots.
  • [Save Revision] save revisions in the database.
  • [Save Revision] update the page table.
  • [Secondary Updates] write secondary data to the database.
  • [Secondary Updates] invalidate derived artifacts.
  • [Edit Filter] access PST content and rendered content from callbacks while an edit is in progress.

Further refactoring[edit]

The required refactoring is expected to result in the following components to be defined:

  • An EditController that provides the edit() interface for user-initiated edits. It should check permissions, tokens, rate limits, etc, handle automatic conflict resolution (3-way-merge) and section edits, and then hand control to a PageUpdater to for the actual edit ( feeds into ). EditController also offers a preview() method and a stashEdit() method which re-use the code section editing.
  • PageUpdater provides CAS semantics that allows the caller to safely create content based on the parent revision. It uses an EditRevisionFactory to construct an unsaved RevisionRecord, and then creates a RenderedRevision object for later use by DerivedPageDataUpdater by calling RevisionRenderer. It calls any edit filter and similar hooks, and it further contains the logic for saving the new revision using RevisionStore and then updating the page table using a PageStore. Finally, it uses a DerivedPageDataUpdater to update the parser cache, links tables, invalidate affected artifacts, etc (this is equivalent to purging, so is really the middle part of ). dummy revisions (and also null edits) can be handeld as a special case of this.
  • [PST, make rev] EditRevisionFactory (better name pending) builds an unsaved RevisionRecord from a RevisionSlotsUpdate, It applies pre-safe-transform (PST) on all content, inherits unchanged content from the parent revision. It tries to take PST content and rendered content from th edit stash.
  • [Secondary Updates] DerivedPageDataUpdater takes a (saved) RevisionRecord and a RenderedRevision, and takes care of scheduling updates any secondary data and cached artifacts (optionally, recursively). It may re-use or re-render ParserOutput as appropriate, using a RevisionRenderer.
    • DerivedPageDataUpdater should use the RenderedRevision instance that was created by the EditRevisionFactory.
    • the job of discarding ParserOutput objects that depend on the revision ID may be handled by RenderedRevision. Perhaps it could have a setRevisionId() or updateRevision() method for this.
    • DerivedPageDataUpdater should also take care of updating the ParserCache, and of puring similar caches associated with page content like the MessageCache or ModuleCache.
    • TBD: DerivedPageDataUpdater should also take care of site stats; for this purpose, it needs access to an isRedirect() and isCountable() method for the new content - and also for the old content! This information could be provided by RenderedRevision for the new content. For the old content a RenderedRevision could also be used (taking ParserOutput from ParserCache, or re-constructing it), but the relevant information could be taken from the database tables (page, redirect, pagelinks, etc) as well. Perhaps DerivedPageDataUpdater could construct a PageStatusUpdate object representing this.
  • [Render Revision][Edit Filter] RenderedRevision wraps a RevisionRecord and provides (lazy, cached) access to ParserOutput for each slot, as well as a combined ParserOutput across all slots. It uses a RevisionRenderer to produce the combined output. The RenderedRevision is bound to a specific user and ParserOptions object.
    • RenderedRevision should not know RevisionRenderer directly to avoid circular dependencies. One option is to let RevisionRenderer provide a callback to RenderedRevision.
  • [Render Revision] RevisionRenderer acts as a factory for RenderedRevision, and contains logic for combining the ParserOutput of multiple slots, with the help of the appropriate SlotRoleHandlers which it gets from a SlotRoleRegistry.
  • [Save Revision] PageStore is a stateless service that can load PageRecords from the page table, and can update the page table when pages are created, updated, or deleted. PageRecords are immutable value objects representing entries in the page table.
  • [Save Revision] RevisionStore is a stateless service for loading and storing RevisionRecord objects.
  • [Save to Stash] EditStash is a stateless service that caches pre-computed PST content and rendered content for a given RevisionSlotsUpdate. The cache is keyed on the hash of the RevisionSlotsUpdate and is bound to a specific user / session (and anything else that may impact PST).
    • TBD: The information stashed by EditStash could be an (possibly incomplete) RenderedRevision. An in-process cache implementing the same interface could then be used to ensure re-use of ParserOutput during edits, instead of the in-place caching approach that DerivedPageDataUpdater is currently using. This would remove the need to pass around RenderedRevision objects everywhere they could possibly be needed.
  • [Render Revision] SlotRoleHandler defines the behavior of a slot. For now, it defines what HTML represents the slot output, and where it is placed.
  • [Render Revision] SlotRoleRegistry is a stateless service that provides access to SlotRoleHandler instances.
  • ViewAction loads a RevisionRecord from a RevisionStore and uses a RevisionRenderer to generate HTML (subject to caching in ParserCached), and serves it to the client.
  • PreviewAction uses an EditRevisionFactory to construct an unsaved RevisionRecord and passes it to RevisionRenderer to generate HTML (subject to using the EditStash), and serves it to the client. This uses EditController::preview() internally.
  • PurgeAction loads a RevisionRecord from a RevisionStore and uses an DerivedPageDataUpdater to trigger regeneration of any derived data or cached artifacts.
  • ApiStashEdit uses an EditRevisionFactory to construct an unsaved RevisionRecord and passes it to RevisionRenderer to generate ParserOutput, then stashes the rendered output and PST content for later use for preview or edit. This uses EditController::stashEdit() internally.

We further want to remove any need to call WikiPage::prepareContentFor edit, by instead passing RevisionRecord and/or RenderedRevision to hook handlers / listeners.