Manual:Schema changes

This is a help page describing how to build schema change patches for MediaWiki core and its extensions for people who need to change the database layout as part of their development work.

Glossary

 *  Schema  - Current database layout of MediaWiki.
 * Schema change - An atomic part of schema migration that is being added through a commit. For example "Adding table foo", "Dropping column bar from table baz" and so on.
 * Database management system (DBMS) - The underlying technology handling the MediaWiki database. The supported ones in MediaWiki core are: MySQL, SQLite and PostgreSQL.  It can be more using extensions.
 * Data definition language (DDL) - Syntax that defines schema and schema changes (It can differ in different DBMSes). For example " ALTER TABLE ", " DROP COLUMN ".  They are saved as " .sql " files.
 *  Database Abstraction Layer (DBAL)  - The bridge between DBMS-independent database schema and schema change definitions and the actual DDLs.

Overview
Each schema change needs to handle two parts. First, new installations need to have the new schema instead of the old one and second, old installation must be able to upgrade to the new one. For the first part, we fix the schema DDL files (saved as tables.sql ) and for the second part, we provide " ALTER TABLE " patches and wire them into updater logic.

We are at middle of a migration from having one dedicated DDL per DBMS to only one abstracted schema. Depending on the table, you might change several raw SQL files or only change one json file and generate the SQL files using a maintenance script.

Manual (deprecated)
In this method which is used until 2020, when making a schema change:


 * 1) Change tables.sql in two different places ( maintenance/tables.sql for MySQL and maintenance/postgres/tables.sql for PostgreSQL)
 * 2) Make a schema change DDL file as the upgrade path of current installations for MySQL and put the file in maintenance/archives/
 * 3) * If other DBMS types don't work with that patch, you need to make a dedicated patch for them. For example, SQLite does not have ALTER TABLE, meaning you need to make a temporary table, copy the data, drop the old table and rename the new table to the old name. Example
 * 4) Wire these DDL files (from step 2) into MysqlUpdater, SqliteUpdater, PostgresUpdater

Examples

 * Dropping a column - Note that we used to support five DBMSes instead of three
 * Changing indexes
 * Adding a new table

Automatically generated
You can find the abstract schema for all of MediaWiki core's tables in " maintenance/tables.json ". This abstraction is using [ https://github.com/doctrine/dbal Doctrine DBAL library] to generate DDL files.


 * 1) Change the tables.json structure.
 * 2) Run maintenance script to generate the three DDL files:
 * 3) Create an abstract schema change .json file (see below) and put it in maintenance/abstractSchemaChanges/ directory
 * 4) Build the schema patches using the maintenance script, for example:
 * 5) Add them to MysqlUpdater, SqliteUpdater, PostgresUpdater
 * 6) Do not forget to checkout your changes and automatically generated DDL files in git when making the patch.

Example patches

 * Renaming four indexes in logging table

Abstract schema change
For making a schema change, you will make a json file with snapshot of before and after abstract schemas for the table (one schema change per table please). Then you will run a maintenance script in a similar manner and it will diff between two tables and then automatically generates the schema change DDL files.

Example abstract schema change
The two tables are the same but type of " actor_user " has changed from " integer " to " bigint ". The reason for diffing instead of abstracting the change itself is that SQLite does not have ALTER TABLE for most cases, meaning DBAL needs to know the schema to build a schema change DDL file using temporary tables.

Best practices in choosing the data type

 * For timestamps, use mwtimestamp datatype (T42626).
 * Instead of VARCHAR or CHAR, use VARBINARY or BINARY (otherwise you have to deal with encodings in databases)
 * Unless you actually need the negative value, use UNSIGNED for any type of integer value to double its capacity.
 * Using ENUM is highly discouraged (T119173).