|Document status||early draft|
Over the growth of the MediaWiki software, more and more metadata have been added to pages:
- Interwiki links
- Copyright violation, under construction, protection, and so on
- External links
These data are not part of the page itself, but rather information about the page. Thus moving at least some of these data out of the page might be beneficial.
The wiki-language of the article remains clearer:
- Inexperienced users can edit more freely (really? Or will separate metadata add more complications to what they have to learn?)
- Parsing the text is simpler (how? doesn't this add more complications, since the parser has to combine another set of data with the page text?)
- Recent changes can be identified by the type of change, and users not interested in certain types of meta data change need not bother looking. Also, certain types of changes, e.g. categories, can be followed much easier.
- Changes to meta data can be consistently checked before committing (e.g. Categories must exist). Of course, this can be done using the current system too.
- Page description metadata can be used for a multitude of purposes, such as generating HTML heads, did you know or featured article summaries, etc. similar to what Extension:Blurb provides by parsing the page anew each time (see also User:Leucosticte/AutoBlurb for a more general discussion of the issues involved.) Special:AllPages could have an option to list not only page name but the brief description. Extension:CategoryGallery could display not only all the images in a category but also the image descriptions.
- All metadata can be kept in a separate cache, again reducing overall load. (is this true?)
Disadvantages and open questions
- The separation of the current data into 'text data' and 'meta data' is somewhat arbitrary, thus potentially confusing.
- Maintaining an overall revision history becomes more complicated, since the article history is spread over 'text-data' and 'meta-data' histories. (See Page_metadata#metadata_table_option_with_page_metadata_field for an explanation of how this would be done.)
Schema options for metadata storage
The most recent meta data are combined with the most recent article text to generate the complete article version. Options for storing metadata:
rev_metadata field option
The metadata can be stored in revision.rev_metadata.
- Downside: this could require storing the same voluminous metadata with every revision, which is wasteful if the metadata didn't change.
metadata table option with page_metadata field
The metadata can be stored in a separate table, metadata, with metadata_id primary key. Each time the metadata is revised, a new metadata row is created, and the page.page_metadata is updated with the new metadata_id.
- What about older versions of meta data and article data? How should those be combined? Probably when a revision is saved, revision.rev_metadata should be populated with the current page_metadata (i.e. the most recent metadata_id for that page). Then when you view old revisions, it will know what metadata to combine with the archived text.
Separate pages for metadata
Have a namespace or subpages with page metadata (Metadata:Foo or Foo/metadata for the Foo article). See Extension:ExplicitDescription. Thus, the metadata would be stored in the revision table. The revision.rev_metadata field could hold the Manual:Revision table#rev_id of the metadata revision pertaining to that revision.
Store metadata in Wikidata. Users are going to have to use Wikidata anyway for interlanguage links and such, so this is no big deal. There may need to be an InstantData set up (analagous to InstantCommons) as non-WMF wikis' local repositories for data pulled from Wikidata.
User interface options
Additional edit screen inputboxes
Have separate edit screen inputboxes for various types of metadata; see Extension:Advanced Meta for an example of this.
<metadesc> PageDescription </metadesc>
Automated description extraction
Similar to what Extension:Description2 does: strip out sitenotices and such and put the article lead in the page description database field, unless there is metadata overriding this description. There could be short versions and long versions of the article lead. The short version would be one sentence. The long version would be the first paragraph.
- bugzilla:23016: Create new extension implementing article importance and quality (and/or other configurable attributes) as database fields (village pump (technical) discussion))
- bugzilla:53508: Move invisible page properties from the DOM (Document Object Model) to dedicated metadata