Multi-Content Revisions/Glossary

From mediawiki.org
This page is part of the MCR proposal. Status: stable.

Some central terms used in the context of the Multi-Content Revisions proposal:

  • page: a named location on the wiki, with versioned content. A page has a sequence of revisions (the page's "history"), and may have multiple streams.
  • revision: one version of a page as created by an edit. A revision belongs to exactly one page, and has one (or none) parent revision. A revision has at least one main slot, and optionally one additional slot per each defined content role.
  • edit: an edit is a user action that creates a revision. Note that the number of slots touched by an edit is generally smaller than the number of slots that belong to the revision, since slots that remain unchanged still count as belonging to the new revision.
  • slot: the association of a content object (documents) with a specific role in a given revision
    • main slot: the slot with the "main" role. This is always defined for any revision, and is always a primary slot. It will be used per default in places where legacy code expects only one content object to be associated with a revision.
    • derived or virtual slots: these refer to the idea of respectively storing or generating content derived from the primary user-supplied content.
  • stream: a sequence of slots that have the same role and are all associated with revisions of the same page. If revisions of some page have a "main" and a "foo" slot, that page is said to have a "main" and a "foo" stream. Streams can also be thought of as the revision history of a specific document or content object.
  • role: the purpose a given content object has for a revision. Besides the "main" role which must alway be present, there may be roles for content that represents the categories or a page, or quality information, or a blame map, etc.
  • content: a self-contained document, representing the information associated with a slot.
    • primary content: content that is user-created and cannot be re-created from other slots (e.g. the article text). The primary slots of a revision can be enumerated.
    • derived content: content that was derived from content in other slots and is not user-editable (e.g. a blame map). It may be useful to provide a generic mechanism for associating derived content with a revision. It is not required for the derived slots of a revision to be enumerable. Some may be completely virtual, that is, derived from other slots on demand, and not stored in the database at all. Non-virtual derived content that is stored along with the primary content may be referred to as materialized derived content.
    • content object: a document of some sort; the content as a logical unit, with the model known but the format undefined; in PHP, an object compatible with the Content interface.
    • content blob: the serialized form of a content object, without any meta-information. Cannot be read without knowing format and model. Cannot be used without knowing the role.
    • content model: the schema or interpretation used to construct a content object from a raw decoding of a content blob. E.g. geo-json or wikidata-item, both of which use text/json as their serialization format.
    • content format: the serialization format (as a MIME type). This is sufficient for decoding a content blob to some raw data structure, but is not sufficient to interpret that structure ("text/json" says nothing about how the data shall be interpreted).
    • content meta-data: information about the content of a slot, such as content format, model, and hash.
  • meta-data: information identifying a revision and its associated content. This includes all information necessary to find, load, deserialize, and interpret the content objects associated with a revision. Meta-data does not include any information extracted or derived from content; this would be considered derived content, see above. Types of meta-data relevant in the context of MCR are:
    • revision meta-data: page, user, timestamp, comment, etc. Stored in the revision table.
    • content meta-data: address, model, format, length, hash, etc. Stored in the content table.
    • slot meta-data: role, revision, and content id. Stored in the slots table.
  • blob: A potentially large binary string stored in an arbitrary location, accessible via an address (read: URL). Blobs are immutable.