Multi-Content Revisions

From mediawiki.org
Jump to navigation Jump to search

Multi-Content Revisions (or MCR) refers to the ability of MediaWiki to store multiple content objects in a single revision of a page.

What does MCR do?[edit]

MCR provides a way to store content in multiple slots on a page. The content may all be of the same kind (use the same content model), or be of different kinds. This can be thought of like attachments on an email.

Slots may be changed separately or together (atomically). Every change to a slot is recorded as an edit to the page, and will show as such in the page’s history. If a slot is not touched by a given edit, it stays unchanged (that is, the slot’s content is inherited from the parent revision).

MCR allows additional data to be integrated with page content in a way that makes it just work with page moves, protection, watching, deletion, diffing, re-rendering, caching, etc.

Is MCR complete?[edit]

The storage mechanism for MCR is complete and has been in production since 2019. The migration of the database schema on Wikimedia systems has been completed in 2020, support for the old schema has been removed in the 1.35 release.

The original vision for MCR included an easy way for extensions to define where the additional content would be shown on the page, and how it would be edited. As of 2020, this part of the vision has not been implemented since it was not needed for the initial use case (Structured Data on Commons). A generalized editing mechanism also seemed conceptually  questionable, especially for content models that are not text based and require an interactive user interface for editing.

How does MCR scale?[edit]

Since MCR allows more kinds of content to be stored on a page, one might expect it to lead to a need to record additional edits in the database. Typically however, this isn't even needed: the information that is recorded in the extra slots would otherwise have been either embedded in the primary content (the wikitext), or placed on an associated page (typically a subpage). Changing this information causes an edit to be recorded in the revision table in any case.

In some cases, MCR can even reduce the number of edits: since multiple slots can be updates in a single edit, information that is currently managed on associated pages (subpages) requires two edits without MCR, but only a single with with MCR.

One additional space requirement is for tracking the association between content objects and revisions in the slots table: if a page has three slots, there will be three times as many rows in the slots table for that page as in the revision table. For this reason, the information recorded in the slots table is kept to a minimum (about 25 byte per row).

It is worth noting that the initial use case for MCR (Structured Data on Commons) led to a significant increase in the number of edits. This is however due to the data model used for the additional content (Wikibase), which favors high granularity of edits. The effect would have been the same if the data had been stored on separate pages, it is unrelated to MCR.

Does MCR support structured data?[edit]

MCR just provides a way to manage multiple content objects per page, it does not know or care about how this content is structured (what model it uses). MCR would manage e.g. binary audio data as happily as wikitext or perhaps JSON.

However, MCR enables data that has previously embedded in the primary content (typically wikitext) to be managed separately. This allows such data to be stored in a more suitable form, such as JSON. MCR provides a place to store this data and integrates it with the update and rendering mechanism, but it does not provide a user interface for interacting with the data.

What could MCR be used for in the future?[edit]

MCR is designed to remove the need to embed structured data in wikitext. One example for this kind of thing is the way TemplateData places meta-data about template parameters on the template page using a special syntax. Instead, this information could be stored in a separate slot, in a machine readable form such as JSON. This would enable the creations of a specialized API and a dedicated user interface for displaying and manipulating this information. As of 2020, MCR doesn't directly help with creating that API or UI, it just removes the complexities of extracting and replacing structured data embedded in wikitext.

Another example are categories: Wikitext uses a special syntax to place pages in categories. The complex nature of the wikitext syntax makes it hard to reliably extract or change these categories. If the community decides that this should change, MCR could be used to store categories apart from the wikitext (but still as part of the same page), as a data structure that can easily be manipulated. Provided the user interface allows it, changing the text and the categories can still be done in a single edit.

However, changing the way categories are managed faces some challenges in practice, due to the need to transition from the traditional system to the new system, and because of the way that templates can dynamically construct categories. So while MCR makes it simple to manage category data apart from wikitext, it doesn’t help with transitioning towards that new system, nor does it help with creating an editing interface for this new kind of data.

How can I use MCR in my Extension?[edit]

Extensions that want to use MCR to attach additional content to pages need to defining a slot role for their purpose in the SlotRoleRegistry. This is presently done via a service manipulator from a handler for the MediaWiki Services Hook:

function onMediaWikiServices( $services ) {
	$services->addServiceManipulator(
		'SlotRoleRegistry',
		function (
			\MediaWiki\Revision\SlotRoleRegistry $registry
		) {
			$registry->defineRoleWithModel( SLOT_ROLE_MYEXTENSION, CONTENT_MODEL_MYEXTENSION );
		});
};

In the example, SLOT_ROLE_MYEXTENSION and CONTENT_MODEL_MYEXTENSION are constants defined by the extension. SLOT_ROLE_MYEXTENSION is the name of the slot (the slot role), CONTENT_MODEL_MYEXTENSION is the name of the ContentHandler to use with this role. The ContentHandler will define how the content of the slot is rendered and how diffs are generated. The slot role could also use an existing content model, such as CONTENT_MODEL_WIKITEXT.

Once the slot role is registered, it can be used internally in the MediaWiki storage layer, using PageUpdater::setContent() to write, and RevisionRecord::getContent() to read. Per default, the slot will be shown when visiting pages that have content in this slot. Display can be controlled using the $layout parameter of defineRoleWithModel().

Note that per 2020, no API or UI will automatically become available to interact with content in the slot. Extension authors will have to implement their own API modules and UI components based on the storage layer functionality described above. Integration with the EditPage that would allow the extra slot to be updated along with the primary page in the same edit is possible in theory, but not directly supported by the hooks offered be EditPage.

Further Information[edit]