Multi-Content Revisions/Page Update Controller

From MediaWiki.org
Jump to navigation Jump to search
This page is part of the MCR proposal. Status: draft, comments welcome. The interface is already pretty stable, but the details of implementation are still unclear.

Updating a revision is a complex process, with complicated requirements with regards to the usage of transactional logic and deferred updates. To honor these requirements, stateful "interactor" objects are defined in addition to the stateless storage service:

/**
 * Controller for updating a wiki page. This is a high level controller,
 * which performs permission checks and secondary data updates.
 *
 * @license GPL 2+
 * @author Daniel Kinzler
 */
interface PageUpdateController {

	/**
	 * Sets the Content of a primary slot for revision creation.
	 *
	 * @param string $slotName
	 * @param Content $content
	 */
	public function setSlotContent( $slotName, Content $content );

	/**
	 * Marks the given slot for removal in the new revision.
	 * 
	 * @param string $slotName
	 */
	public function removeSlot( $slotName );

	/**
	 * Saves a new revision of the page.
	 *
	 * @param string $summary
	 * @param User $user
	 * @param int $flags
	 * @param array $tags
	 *
	 * @throws StorageException
	 * @return Revision
	 */
	public function save( $summary, User $user, $flags = 0, $tags = [] );

	/**
	 * Aborts the page update.
	 */
	public function abort();

	/**
	 * Aborts the page update if save() was not yet called.
	 *
	 * Implementations should call this from the destructor.
	 */
	public function cleanup();
}

(Code experiment: https://gerrit.wikimedia.org/r/#/c/217710/)

  • PageUpdateController shall be used to create new revisions when a page is edited (or created).
  • Application code can acquire an PageUpdateController from a factory. As a first step, WikiPage can act as that factory.
  • The logic for updating the page and revision tables will move from WikiPage (and Revision) to PageUpdateController. The save() method will take the place of WikiPage::doEditConent.
  • PageUpdateController will rely on the RevisionSlotStore to store content meta-data, and on BlobStore (or ContentStore) to store the actual content.

In order to support updating derived content, a RevisionUpdateController could be defined on the same level as PageUpdateController. There is no need to implement RevisionUpdateController initially, but the concept should be kept in mind.

Page Update Process[edit]

  • When a page is edited, the content of at least one slot is updated. It does not matter whether a slot with the same role existed in the previous revision.
  • One edit (user interaction) creates one revision, regardless of how many slots were updated (see also Content Meta-Data).
  • Unchanged content of slots is re-used from previous revisions (see Content Meta-Data Data Model). E.g.:
    • Revision 1 has two slots, A and B, with content ( A1, B1 )
    • Now, slot A is edited, but slot B is untouched. Then revision 2 is ( A2, B1 ). That is, slot B in revision two is the same content as slot B of revision 1.
  • SecondaryDataUpdates are created and executed for all content objects of a revision. If the content of a slot is not modified by a new revisions, it may be possible to omit or optimize the SecondaryDataUpdates for that content object.

If support for derived content is desired, derived content can be re-calcuated when a revision is saved. Such "materialized" derived content could be saved along with the primary content, and it could provide it's own secondary data updates like LinksUpdate. Note that derived content of a revision can be updated later without creating a new revision.

Challenges[edit]

  • Maintaining backwards compatibility for hooks may be a major challange. For the initial implementation, the PageUpdateController will have to know the WikiPage instance, so it can provide it to hooks as a parameter.
  • SecondaryDataUpdates (and other secondary updates, deferred updates, queued jobs, etc) need to be handled for all Content objects.
  • a transaction context should be maintained across the saving of all slots, see Transaction Management. The UpdateController can either control a transaction context, or can inherit a transaction context from the factory that creates it.
  • Refactoring the code that deals with "preparing" the content for saving and the amorphous "edit" object returned from prepareContentForEdit() seems tricky. Allowing the same optimizations to function when preparing multiple Content objects for saving at the same time needs thought.
  • It may be useful to have PageUpdateControllers on two levels: a high level PageUpdateController which does permission checks, generates derived content, and takes care of secondary data updates; and a low level PageUpdateController which does nothing but store Content objects and update meta-data.

Support for derived content poses additional challenges:

  • Which code knows what derived content shall be generated upon saving a new revision?
  • the idea of derived content is tightly bound to dependency tracking: when storing derived content, we should also store what it was derived from. And when the base content changes, we need to update the derived content. Thus, dependency tracking should be implemented on the level of slots, not pages: the wikitext of a page doesn't depend on the templates it uses, but the rendered HTML of the page does.