Page metadata

From mediawiki.org
Request for comment (RFC)
Page metadata
Component General
Creation date
Author(s) Nathan Larson
Document status in draft

Over the growth of the MediaWiki software, more and more metadata have been added to pages:

  • Categories
  • Interwiki links
  • Templates
  • Copyright violation, under construction, protection, and so on
  • External links
  • ...

These data are not part of the page itself, but rather information about the page. Thus moving at least some of these data out of the page might be beneficial.

Some decisions to be made are, do we want to implement this as core functionality, and if so, which of the schema options and user interface options shall we use?

Advantages[edit]

The wiki-language of the article remains clearer:

  • Inexperienced users can edit more freely (really? Or will separate metadata add more complications to what they have to learn?)
  • Parsing the text is simpler (how? doesn't this add more complications, since the parser has to combine another set of data with the page text?)
  • Recent changes can be identified by the type of change, and users not interested in certain types of meta data change need not bother looking. Also, certain types of changes, e.g. categories, can be followed much easier.
  • Changes to meta data can be consistently checked before committing (e.g. Categories must exist). Of course, this can be done using the current system too.
  • Page description metadata can be used for a multitude of purposes, such as generating HTML heads, did you know or featured article summaries, etc. similar to what Extension:Blurb provides by parsing the page anew each time (see also User:Leucosticte/AutoBlurb for a more general discussion of the issues involved.) Special:AllPages could have an option to list not only page name but the brief description. Extension:CategoryGallery could display not only all the images in a category but also the image descriptions.
  • All metadata can be kept in a separate cache, again reducing overall load. (is this true?)

Disadvantages and open questions[edit]

  • The separation of the current data into 'text data' and 'meta data' is somewhat arbitrary, thus potentially confusing.
  • Maintaining an overall revision history becomes more complicated, since the article history is spread over 'text-data' and 'meta-data' histories. (See Page_metadata#metadata_table_option_with_page_metadata_field for an explanation of how this would be done.)

Schema options for metadata storage[edit]

The most recent meta data are combined with the most recent article text to generate the complete article version. Options for storing metadata:

rev_metadata field option[edit]

The metadata can be stored in revision.rev_metadata.

  • Downside: this could require storing the same voluminous metadata with every revision, which is wasteful if the metadata didn't change.

metadata table option with page_metadata field[edit]

The metadata can be stored in a separate table, metadata, with metadata_id primary key. Each time the metadata is revised, a new metadata row is created, and the page.page_metadata is updated with the new metadata_id.

  • What about older versions of meta data and article data? How should those be combined? Probably when a revision is saved, revision.rev_metadata should be populated with the current page_metadata (i.e. the most recent metadata_id for that page). Then when you view old revisions, it will know what metadata to combine with the archived text.

Separate pages for metadata[edit]

When Extension:ExplicitDescription is installed, clicking on the red "edit" link on the edit screen for Foo takes the user to the MediaWiki:Desc-0-Foo edit screen.

Have a namespace or subpages with page metadata (Metadata:Foo or Foo/metadata for the Foo article). See Extension:ExplicitDescription. Thus, the metadata would be stored in the revision table. The revision.rev_metadata field could hold the Manual:Revision table#rev_id of the metadata revision pertaining to that revision.

Wikidata option[edit]

Store metadata in Wikidata. Users are going to have to use Wikidata anyway for interlanguage links and such, so this is no big deal. There may need to be an InstantData set up (analagous to InstantCommons) as non-WMF wikis' local repositories for data pulled from Wikidata.

User interface options[edit]

Additional edit screen inputboxes[edit]

A screenshot showing Extension:Advanced Meta's Keywords and Description edit screen inputboxes.

Have separate edit screen inputboxes for various types of metadata; see Extension:Advanced Meta for an example of this.

Metadata tags or parserfunctions[edit]

Use tags or parserfunctions in the page text to explicitly set forth metadata to be stored as such when the page is saved. E.g.

<metadesc> PageDescription </metadesc>

See Extension:MetaDescriptionTag for more examples. Extension:BedellPenDragon is another extension that relies on parserfunctions to add metadata to page_props.

Automated description extraction[edit]

Similar to what Extension:Description2 does: strip out sitenotices and such and put the article lead in the page description database field, unless there is metadata overriding this description. There could be short versions and long versions of the article lead. The short version would be one sentence. The long version would be the first paragraph.

Bug reports[edit]

See also[edit]