Multi-Content Revisions/Schema Migration

From mediawiki.org

This page provides an overview of mappings from the old to the new schema.

Note that during the initial phase of the migration, the new tables are populated, but the old tables stay in use, and are not modified in any way. See also the migration plan.

slots[edit]

Rows in the slots table associate a revision row with any number of content rows. For migration from the old schema, there will always be only one content row, which must be inserted before inserting the corresponding slot row.

Field from revision from archive
slot_revision rev_id ar_rev_id. If ar_rev_id is not set, a new ID has to be allocated from revision.rev_id. This could be done in a separate step beforehand.
slot_role role_id corresponding to role_name = "main". role_id corresponding to role_name = "main".
slot_content content_id content_id
slot_inherited 0. Can be 1 for null-revisions, but that's just nice-to-have. 0. Can be 1 for null-revisions, but that's just nice-to-have.

content[edit]

Rows in the content table represent meta-data about individual content objects. A single content row can be re-used by multiple slots rows, but for the initial import, we can assume that there is one slot per revision, and one content row per slot.

Field from revision from archive
content_id auto-increment auto-increment
content_size rev_len. Should eventually be calculated via Content::getSize() if NULL. ar_len. Should eventually be calculated via Content::getSize() if NULL.
content_sha1 rev_sha1. Should eventually be calculated from the blob if NULL. ar_sha1. Should eventually be calculated from the blob if NULL.
content_model model_id corresponding to rev_content_model, or ContentHandler::getDefaultModelFor() if NULL. model_id corresponding to ar_content_model, or ContentHandler::getDefaultModelFor() if NULL.
content_address CONCAT( "tt:", rev_text_id ). Later, this can become CONCAT( "ex:", old_text ) where old_flags contains "external". CONCAT( "tt:", ar_text_id ). If ar_text_id is not set, a row in the text table must be created . This could be done in a separate step beforehand.

Name Tables[edit]

All name tables (roles, models, formats, namespaces, etc) are populated on demand: when looking up the id for a name during a write operation, and that ID is not found, a row for that name should be inserted. However, since the tables are small, their content can be cached in memory. Only when a name is not found in memory will it be necessary too check the database.

Field Type Ref Status Comment
xxx_id smalint unsigned keep
xxx_name varbinary(255) keep The canonical (human readable) name. Normalization rules and character set restrictions may vary.

The table must allow lookups in both directions. Both columns are unique. The id is auto-incrementing.

The mapping defined by the table may be cached aggressively. The mapping should update automatically when attempting to look up the id for an unknown name.