Requests for comment/Abstract table definitions

MediaWiki supports multiple database engines. Our query system is basically entirely abstract and works on any database engine.

Our table creation and updating code on the other hand is a mess. We write incompatible raw sql forcing us to write the same thing multiple times in different ways for different database engines. As a result we have a bad track of it being hard to run MediaWiki on other database engines and because extensions that define their own tables also have to write custom extra code to support other databases very few extensions with custom data tables support anything besides MySQL.

To fix that this is a proposal for an abstract language for defining tables, indexes, alters, etc... that; Can be sanely written and read. Is not raw sql (we'll probably PEG parse it) and can be used to build the database for any database engine we support. And can be used by extensions too to define their database pieces abstractly so the extension will work in other databases. It also uses abstractly defined types instead of relying on mysql types so that in the future we can start using data types that actually suit the database engine best (eg: Being able to use BOOL in some database engines instead of an INT type just because we use a TINYINT in MySQL).

Examples
table hitcounter { @ENGINE(HEAP) @MAX_ROWS(25000)

hc_id uint }

table logging(log_) { log_id rowid, log_type string(32) default(''), log_action string(32) default(''), log_timestamp timestamptz default('19700101000000'), log_user reference(user.user_id) default(0), log_user_text string(255) default(''), log_namespace int default(0), log_title string(255) default(''), log_page uint nullable, log_comment unicode(255) default(''), log_params string, log_deleted uint(tiny) default(0), }

table page(page_) { id rowid, namespace int, title string(255), restrictions string(tiny), counter uint(big) default(0), is_redirect bool default(false), is_new bool default(false), random udouble, touched timestamptz, latest reference(revision.rev_id), len int, }

table revision(rev_) { @MAX_ROWS(10000000) @AVG_ROW_LENGTH(1024)

id rowid, page reference(page.page_id), text_id uint, comment string(tiny), user reference(user.user_id), user_text string(255), timestamp timestamptz, minor_edit bool(old), deleted bool(old), len int nullable, parent_id uint nullable, sha1 string(32), }

table text(old_) { id rowid, text string(medium), flags string(tiny), }

table pagelinks(pl_) { from reference(page.page_id) default(0), namespace int, title string(255) default(''), }

Syntax
table tablename { ... } table "rawtablename" { ... } table table(t_) { # @AT rules define table options that end up at the end of the CREATE TABLE definition # normally without these the table options in LocalSettings.php are just used # Most databases besides MySQL just ignore these @ENGINE(MyISAM)
 * 1) A table is defined with one of these blocks
 * 2) tablename will be prefixed and converted with tableName
 * 1) A table name can also be specified in quotes to stop it from being prefixed and converted
 * 1) A column name prefix can be defined by doing this you can use things like `id` instead of `rev_id`

# A column definition always starts with a column name followed by a type # rowid is a special type referring to an auto-incremented unsigned integer set as the primary key # rowid does not need to be followed by "primary" id rowid, # Defaults are specified by a default('') data string(255) default(''), # All columns are NOT NULL by default, you must specify nullable to remove the NOT NULL constraint extra string(32) nullable, # Simple single column indexes can be defined with "primary", "unique" and "indexed" key string(20) unique, } type typename ('...', '...');
 * 1) A named enumerated type. This is here so pgsql can use CREATE TYPE in some cases. Other databases like MySQL
 * 2) just remember the values and use an ENUM.

ToDo

 * For REFERENCES we need to figure out how to declare all the different things our PostgreSQL does after it:
 * The pgsql code uses a lot of strange code and procedures. We need to pick out what we want to expose abstractly and what we leave as custom pgsql code.
 * indexes
 * Alter table
 * The pgsql code uses a lot of strange code and procedures. We need to pick out what we want to expose abstractly and what we leave as custom pgsql code.
 * indexes
 * Alter table
 * Alter table