User:Daniel Kinzler (WMDE)/MCR-PO

While refactoring the MediaWiki storage layer for MCR, questions have arisen regarding the handling of multiple slots in ParserOutput.

This document is intended for discussing the requirements and constraints, and the possible solutions.

Please comment and discuss inline.

Baseline
Goal: provide an MCR-capable interface for updating pages, with a minimum of refactoring of adjecant code. The implementation doesn't need to fully support MCR yet.


 * Use ContentHandler::getParserOutput to generate output for each slot
 * Use Content::getSecondaryDataUpdates to get data updates for each slot
 * Update links tables with information from all slots. In the future, links tables may also record which slot references what resource.
 * Store a single ParserOutput for the revision in the ParserCache. Can be just the ParserOutput for the main slot for now, but will have to somehow contain all information from all ParserOutputs of all slots eventually. The code composing that combined output may be different for different "kinds" of pages, and may be defined by an extension. The mechanism for that doesn't need to be decided at this stage, though.
 * The use of ApiStashEdit can be restricted to the main-slot-only case for now.

This implies that we need accessto POs for individual slots and a combined PO during saving. This does not imply anything about the structure of the PO that gets written to the ParserCache or shown in a preview.

Multi-Slot View
Model use case: Structured Data on Commons (SDoC). Single-slot viewing should also be possible.


 * Allow the rendering of one slots to depend on content of all slots. (Possible restriction: only allow the main slot to depend on other slots. This would allow the main slot to depend on the rendering of other slots, which could otherwise lead to circular dependencies). This needs ContentHandler::getParserOutput to have access to all slots.
 * Allow Article::view (or a replacement of that method) to show the content of each slot, rendered individually, in separate sections of the page. In particular, allow a rendering of the structured data to be shown that is independent of the wikitext, rendered in the user's interface language. That rendering also serves as the editing interface for structured data, just like on a Wikidata page.
 * The code composing that combined output, when constructing and/or when using the PO, will be different for different "kinds" of pages, and may be defined by an extension. In the SDoC case, it will be specific to the File namespace, and will be supplied by the WikibaseMediaInfo extension.
 * Note that for output in the user language, the ParserCache needs to be split (which is already the case for File pages on Commons)

Multi-Slot Editing
Model use case: Atomic editing of a Lua module and its documentation. Single-slot editing should also be possible.


 * Allow EditPage to show one editing area (for text based content: one text area) for each slot. Also provide a single-slot editing mode.
 * Allow previews to be rendered for all slots or individual slots.
 * When parsing/rendering after save, re-use the cached rendering of unchanged slots, instead of regenerating it needlessly. However, if the rendering of the unchanged slot depends on a slot that was touched, it needs to be re-rendered anyway.

This implies that ApiStashEdit, ApiParse, and ApiEditPage all accept input for multiple slots at once.

Proposed Changes to ParserOutput

 * Each ParserOutput can record which slots it depends on
 * The combined canonical PO has to return aggregated information of all slots from all getters
 * The combined canonical PO should not contain duplicate information, since it will be serialized and stored in the parser cache, so size matters.

Option 1, compose output just in time:
 * The combined ParserOutput for all slots would have to
 * know the ParserOutput objects for each slot
 * getters aggregate info from all the POs on the fly.
 * know how to combine the HTML from all slot's ParserOutputs into a single output (or return null from getText), by substituting placeholders in an HTML "template". Note that this template may be defined or modified by an extension, and may depend on page type or namespace or some other property of the page.

Options 2, compose output right away:
 * The combined parser output gets fed with the aggregated content of the POs.
 * The code that constructs the combined PO also constructs the combined HTML right away.
 * This means that during saving, the POs of unchanged slots cannot be re-used.

Note that option 1 and 2 offer largely the same interface to calling code. The only code that needs to be aware of this distinction is the PO itself, and the code that constructs it.

This means that we can start with option 2, and change to option 1 later when need arises, for optimizing the rendering during save, or for other purposes. E.g. AbuseFilter may want to apply different rules to different slots, so it would have to process the POs of the individual slots separately.