Manual:Schema changes

From mediawiki.org
Jump to navigation Jump to search

This is a help page describing how to build schema change patches for MediaWiki core and its extensions for people who need to change the database layout as part of their development work.

Glossary[edit]

  • Schema: Current database layout of mediawiki.
  • Schema change: An atomic part of schema migration that is being added through a commit. For example "Adding table foo", "Dropping column bar from table baz" and so on.
  • Database management system (DBMS): The underlying technology handling the mediawiki database. The supported ones in mediawiki core are: MySQL, Sqlite and Postgres. It can be more using extensions.
  • Data definition language (DDL): Syntax that defines schema and schema changes (It can differ in different DBMSes). For example "ALTER TABLE", "DROP COLUMN". They are saved as ".sql" files.
  • Database Abstraction Layer (DBAL): The bridge between DBMS-independent database schema and schema change definitions and the actual DDLs.

Overview[edit]

Each schema change needs to handle two parts. First, new installations need to have the new schema instead of the old one and second, old installation must be able to upgrade to the new one. For the first part, we fix the schema DDL files (saved as tables.sql) and for the second part, we provide "ALTER TABLE" patches and wire them into updater logic.

We are at middle of a migration from having one dedicated DDL per DBMS to only one abstracted schema. Depending on the table, you might change several raw SQL files or only change one json file and generate the SQL files using a maintenance script.

Manual[edit]

In this method which is used until 2020, when making a schema change:

  1. Change tables.sql in two different places (maintenance/tables.sql for MySQL and maintenance/postgres/tables.sql for PostgresQL)
  2. Make a schema change DDL file as the upgrade path of current installations for MySQL and put the file in maintenance/archives/)
    • If other DBMS types don't work with that patch, you need to make a dedicated patch for them. For example, Sqlite does not have ALTER TABLE, meaning you need to make a temporary table, copy the data, drop the old table and rename the new table to the old name. here's an example
  3. Wire these DDL files (from step 2) into MysqlUpdater, SqliteUpdater and PostgresUpdater.

Examples[edit]

Automatically generated[edit]

We are working to improve this. First step is to overhaul schemas. You can find the abstract schema in "maintenance/tables.json". This file does not contain all tables yet and for the tables that are not abstracted you need to follow the old way. This abstraction is using Doctrine DBAL library to generate DDL files.

If the table exists in "tables.json":

  1. Change the tables.json structure.
  2. Run maintenance script to generate the three DDL files:
    php maintenance/generateSchemaSql.php --json maintenance/tables.json --sql maintenance/tables-generated.sql --type=mysql
    php maintenance/generateSchemaSql.php --json maintenance/tables.json --sql maintenance/sqlite/tables-generated.sql --type=sqlite
    php maintenance/generateSchemaSql.php --json maintenance/tables.json --sql maintenance/postgres/tables-generated.sql --type=postgres
    
  3. Create an abstract schema change .json file (see below) and put it in maintenance/abstractSchemaChanges/ directory
  4. Build the schema patches using the maintenance script, for example:
    php maintenance/generateSchemaChangeSql.php --json maintenance/abstractSchemaChanges/patch-logging-rename-indexes.json --sql maintenance/archives/patch-logging-rename-indexes.sql --type=mysql
    php maintenance/generateSchemaChangeSql.php --json maintenance/abstractSchemaChanges/patch-logging-rename-indexes.json --sql maintenance/sqlite/archives/patch-logging-rename-indexes.sql --type=sqlite
    php maintenance/generateSchemaChangeSql.php --json maintenance/abstractSchemaChanges/patch-logging-rename-indexes.json --sql maintenance/postgres/archives/patch-logging-rename-indexes.sql --type=postgres
    
  5. Add them to MysqlUpdater, SqliteUpdater, PostgresUpdater
  6. Do not forget to checkout your changes and automatically generated DDL files in git when making the patch.

Example patches[edit]

Example abstract schema[edit]

[
	{
		"name": "actor",
		"comment": "The \"actor\" table associates user names or IP addresses with integers for the benefit of other tables that need to refer to either logged-in or logged-out users. If something can only ever be done by logged-in users, it can refer to the user table directly.",
		"columns": [
			{
				"name": "actor_id",
				"comment": "Unique ID to identify each actor",
				"type": "bigint",
				"options": { "unsigned": true, "notnull": true, "autoincrement": true }
			},
			{
				"name": "actor_user",
				"comment": "Key to user.user_id, or NULL for anonymous edits",
				"type": "integer",
				"options": { "unsigned": true, "notnull": false }
			},
			{
				"name": "actor_name",
				"comment": "Text username or IP address",
				"type": "binary",
				"options": { "length": 255, "notnull": true }
			}
		],
		"indexes": [
			{ "name": "actor_user", "columns": [ "actor_user" ], "unique": true },
			{ "name": "actor_name", "columns": [ "actor_name" ], "unique": true }
		],
		"pk": [ "actor_id" ]
	}
]

Notes[edit]

Common Doctrine DBAL types and their equivalent
Doctrine DBAL/Abstract schema MySQL Sqlite Postgres
bigint BIGINT BIGINT BIGSERIAL (if autoincrement)/BIGINT
binary BINARY BLOB BYTEA
blob TINYBLOB/BLOB/MEDIUMBLOB/LONGBLOB (based on size) BLOB BYTEA
mwtinyint TINYINT(1) SMALLINT SMALLINT
integer INT INTEGER INT
smallint SMALLINT SMALLINT SMALLINT
string CHAR/VARCHAR CHAR/VARCHAR CHAR/VARCHAR
text TINYTEXT/TEXT/MEDIUMTEXT/LONGTEXT (based on size) CLOB TEXT

Abstract schema change[edit]

For making a schema change, you will make a json file with snapshot of before and after abstract schemas for the table (one schema change per table please). Then you will run a maintenance script in a similar manner and it will diff between two tables and then automatically generates the schema change DDL files.

Example abstract schema change[edit]

{
	"before": {
		"name": "actor",
		"columns": [
			{
				"name": "actor_id",
				"type": "bigint",
				"options": { "unsigned": true, "notnull": true, "autoincrement": true }
			},
			{
				"name": "actor_user",
				"type": "integer",
				"options": { "unsigned": true, "notnull": false }
			},
			{
				"name": "actor_name",
				"type": "binary",
				"options": { "length": 255, "notnull": true }
			}
		],
		"indexes": [
			{ "name": "actor_user", "columns": [ "actor_user" ], "unique": true },
			{ "name": "actor_name", "columns": [ "actor_name" ], "unique": true }
		],
		"pk": [ "actor_id" ]
	},
	"after": {
		"name": "actor",
		"columns": [
			{
				"name": "actor_id",
				"type": "bigint",
				"options": { "unsigned": true, "notnull": true, "autoincrement": true }
			},
			{
				"name": "actor_user",
				"type": "bigint",
				"options": { "unsigned": true, "notnull": false }
			},
			{
				"name": "actor_name",
				"type": "binary",
				"options": { "length": 255, "notnull": true }
			}
		],
		"indexes": [
			{ "name": "actor_user", "columns": [ "actor_user" ], "unique": true },
			{ "name": "actor_name", "columns": [ "actor_name" ], "unique": true }
		],
		"pk": [ "actor_id" ]
	}
}

The two tables are the same but type of "actor_user" has changed from "integer" to "bigint". The reason for diffing instead of abstracting the change itself is that SQlite does not have ALTER TABLE for most cases, meaning DBAL needs to know the schema to build a schema change DDL file using temporary tables.