Proposed Database Schema Changes

From mediawiki.org

This page contains very old proposals for database schema changes. New schema change proposals should use the #schema-change project on Wikimedia's Phabricator to discuss and propose schema changes for more visibility

Archive: /October 2004, /August 2006, /November 2006

table for auto-suggest page title search[edit]

An OpenSearch standard allows browsers to get a list of suggestions when they type in the search box (upper right corner). A query SELECT title FROM page WHERE title='...%' ORDER BY title LIMIT 10 returns a list of titles that begin with what the user typed, but the list is alphabetical, and not very relevant when only a few characters have been typed.

To improve relevancy, the list can be sorted by how many other pages link to the title, for example: SELECT title FROM page WHERE title='...%' ORDER BY backlinkscount LIMIT 10.

  • Add a backlinkscount field to the page table, and create an index title, backlinkscount DESC.
    Pros: efficient space - just one more integer field. Cons: Counter update will happen in the same table.
  • Or, add a new table with title and backlinkscount fields, and also create an index title, backlinkscount DESC.
    Pros: When generating a backlinkcount, it might be faster to create a new table, and once complete, swap it in place. The replication is (IIUC) much more efficient. Cons: Duplication of titles. Pages might be out of date.

recentchanges table: add "Change size" field[edit]

  • Bug 6277 - add change in length (delta) to the recentchanges table
Tables: add rc_change signed integer field to recentchanges table
Indexes: need?

This field would contain the size of the change (delta) between two subsequent revisions (either positive or negative).

The field would allow for quick vandalism detection, as considerable number of vandals either erase large amount of content, or inject large quantities of bogus text.

Interwiki Links table[edit]

Bug: not filed

Currently, there is a table langlinks that contains interwiki links from all articles to the titles in all other languages of the same family. langlinks contains these fields:

+----------+-----------------+------+-----+---------+-------+
| Field    | Type            | Null | Key | Default | Extra |
+----------+-----------------+------+-----+---------+-------+
| ll_from  | int(8) unsigned | NO   | PRI | 0       |       |
| ll_lang  | varchar(10)     | NO   | PRI | NULL    |       |
| ll_title | varchar(255)    | NO   |     | NULL    |       |
+----------+-----------------+------+-----+---------+-------+
Drawbacks
The table does not contain two important types of interwiki links:
  • the inline langlink [[:he:Einstein]] is a link to a page in another language embedded in the text
  • the cross-family link [[meta:Langlinks table]] is a link to the page in another site (i.e. Meta)
Proposed solutions
  • Introduce a new table with identical structure as langlinks, containing both the inline language links and interwikies, but without the language links shown on the left hand side. It can later be determined if it's a language link by cross-referencing the interwiki table's iw_local field.
  • Introduce a "type" field into the langlinks table, determining the type of the link - 0=language, 1=inline language, 2=interwiki, etc

Index optimization[edit]

  • Bug 6440 - Change indexes on pagelinks, templatelinks, and imagelinks tables to considerably optimize several queries.
Changes
pagelinks: pl_namespace index: add pl_from
templatelinks: tl_namespace index: add tl_from
imagelinks: il_to index: add il_from

The additional field will allow multi-page queries to limit results and have an easy way to continue the query without using filesort.

New field in table page for languages with language variants[edit]

Bug: not filed

Add new field page_no_title_convert to table page to complete the implementation of the __NOTITLECONVERT__ magic word. This magic prevents conversion of page titles into different variants (e.g. cyrillic, latin, ...). The additional field enables a Parser::replaceLinkHolders() function to figure out if the link text can be converted to different variants, without fetching the article text (and checking for the magic word).

Full patch (of directory maintenance/) is available at svn diff -r16523:16525.

Changes
 ALTER TABLE /*$wgDBprefix*/page
 ADD page_no_title_convert bool NOT NULL default 0;


user_options to own table[edit]

The user_options field in the user table is not properly normalized. It is proper to have a user_options table with the fields useroption_id (PK), useroption_user (foreign key to user_id), useroption_key (option key), useroption_value (option value). This format would allow extension writers to do things like associate specific user options with a user and then search for all users with a specific option value. Space requirements might even be less because it would be possible to prune rows that correspond to default values, as opposed to storing a serialized array in the row as is currently done.

Archive table add primary key[edit]

  • Bug 15441 — Some tables lack unique or primary keys, may allow confusing duplicate data
  • Bug 39675 — Add primary key and log fields to archive table; perhaps also use original page_id when page is restored?

The archive table doesn't have a primary key currently; this is generally annoying, and means that we refer to deleted revisions with a (namespace,title,timestamp) tuple which isn't guaranteed to be unique. (Dupe timestamps can occur in a page's history due to funny bugs or history merging from two formerly separate pages; very quick consecutive operations may also produce dupes on second-resolution timestamps.)

Adding an auto-increment primary key should be fairly easy to do offline [via slaves + master switch].

Additional indexes may also be wise, for instance to pull items by name to maintain user names or other such operations.

Other fields that might be useful on archive could be:

  • Yes Done original page_id, to maintain across restores — ar_page_id
  • timestamp of deletion action
    • maybe the user and reason fields also?

See also Requests for comment/Page deletion.

See Also[edit]