Requests for comment/Abstract table definitions

MediaWiki supports multiple database engines. Our query system is basically entirely abstract and works on any database engine.

Our table creation and updating code on the other hand is a mess. We write incompatible raw sql forcing us to write the same thing multiple times in different ways for different database engines. As a result we have a bad track of it being hard to run MediaWiki on other database engines and because extensions that define their own tables also have to write custom extra code to support other databases very few extensions with custom data tables support anything besides MySQL.

To fix that this is a proposal for an abstract language for defining tables, indexes, alters, etc... that; Can be sanely written and read. Is not raw sql (we'll probably PEG parse it) and can be used to build the database for any database engine we support. And can be used by extensions too to define their database pieces abstractly so the extension will work in other databases. It also uses abstractly defined types instead of relying on mysql types so that in the future we can start using data types that actually suit the database engine best (eg: Being able to use BOOL in some database engines instead of an INT type just because we use a TINYINT in MySQL).

Examples
table externallinks(el) { from reference(page.page_id) default(0), to string, index string,

@index from (from, to(40)); @index to (to(60), from); @index index (index(60)); }

table hitcounter { @ENGINE(HEAP) @MAX_ROWS(25000)

hc_id uint }

table logging(log) { log_id rowid, log_type string(32) default(''), log_action string(32) default(''), log_timestamp timestamptz default('19700101000000'), log_user reference(user.user_id) default(0), log_user_text string(255) default(''), log_namespace int default(0), log_title string(255) default(''), log_page uint nullable, log_comment unicode(255) default(''), log_params string, log_deleted uint(tiny) default(0),

@index "type_time" (log_type, log_timestamp); @index "user_time" (log_user, log_timestamp); @index "page_time" (log_namespace, log_title, log_timestamp); @index "times" (log_timestamp); @index "log_user_type_time" (log_user, log_type, log_timestamp); @index "log_page_id_time" (log_page,log_timestamp); @index "type_action" (log_type, log_action, log_timestamp); }

table page(page) { id rowid, namespace int, title string(255), restrictions string(tiny), counter uint(big) default(0), is_redirect bool default(false), is_new bool default(false), random udouble, touched timestamptz, latest reference(revision.rev_id), len int,

@unique "name_title" (page_namespace,page_title); @index "page_random" (page_random); @index "page_len" (page_len); @index "page_redirect_namespace_len" (page_is_redirect, page_namespace, page_len); }

table revision(rev) { @MAX_ROWS(10000000) @AVG_ROW_LENGTH(1024)

id rowid, page reference(page.page_id), text_id uint, comment string(tiny), user reference(user.user_id), user_text string(255), timestamp timestamptz, minor_edit bool(old), deleted bool(old), len int nullable, parent_id uint nullable, sha1 string(32),

@unique "rev_page_id" (page, id); @index "rev_timestamp" (timestamp); @index "page_timestamp" (page, timestamp); @index "user_timestamp" (user, timestamp); @index "usertext_timestamp" (user_text, timestamp); @index "page_user_timestamp" (page, user, timestamp); }

table text(old) { id rowid, text string(medium), flags string(tiny), }

table pagelinks(pl) { from reference(page.page_id) default(0), namespace int, title string(255) default(''),

@unique from (from, namespace, title); @unique namespace (namespace,title,from); }

Syntax
table tablename { ... } table "rawtablename" { ... } table table(t_) { # @AT rules define table options that end up at the end of the CREATE TABLE definition # normally without these the table options in LocalSettings.php are just used # Most databases besides MySQL just ignore these @ENGINE(MyISAM)
 * 1) A table is defined with one of these blocks
 * 2) tablename will be prefixed and converted with tableName
 * 1) A table name can also be specified in quotes to stop it from being prefixed and converted
 * 1) A column name prefix can be defined by doing this you can use things like `id` instead of `rev_id`

# A column definition always starts with a column name followed by a type # rowid is a special type referring to an auto-incremented unsigned integer set as the primary key # rowid does not need to be followed by "primary" id rowid, # Defaults are specified by a default('') data string(255) default(''), # All columns are NOT NULL by default, you must specify nullable to remove the NOT NULL constraint extra string(32) nullable, # Simple single column indexes can be defined with "primary", "unique" and "indexed" (- maybe we'll drop unique and indexed) key string(20) unique,

# Unique indexes can be defined with @unique. @unique key (key); # Normal indexes can be defined with @index. @index id_data (id, data(32)); # Normally the index name is prefixed with the prefix but you can use quotes just like with table names to stop that @index "table_idextra" (id, extra); } type typename ('...', '...'); @index[tablename] "t_id" ("t_id");
 * 1) A named enumerated type. This is here so pgsql can use CREATE TYPE in some cases. Other databases like MySQL
 * 2) just remember the values and use an ENUM.
 * 1) An index defined outside of a table definition.
 * 2) Becase there is no prefix here quotes must be used to avoid ambiguity.
 * 3) We could drop the @, do something like `index name on tablename (...);` but on second thought using the same syntax for
 * 4) indexes in and outside of table definitions with a very simple [tablename] for outside makes this nice and simple to understand.

alter table foo(f) { # Drop the id column -id, # Modify data's type ~data bytes, # Add a new extra column after data +extra bytes after(data), # Rename ts to timestamp and also make it nullable ~ts -> time timestamp nullable,
 * 1) Alter a table in many ways

# Drop the key index -@index key; # Make key the primary key +@primary (key); # Add an index on extra +@index extra (extra(80)); }

update foo { f_timestamp = NULL } where { f_timestamp = '0' };
 * 1) Occasionally migrations need to update or insert a bit of data for that we have some extra syntax
 * 2) These just become simple $db->update and $db->insert calls.

insert[ignore] foo { key = 'foo', data = '...' };
 * 1) A single key based insert uses the same ='s syntax as an update

insert foo (key, data) values (	('bar', '...'),	('baz', '...') );
 * 1) If you have a lot of data you can use this syntax

rename table foo -> bar;
 * 1) Rename a table

drop table bar;
 * 1) Drop a table

ToDo

 * For REFERENCES we need to figure out how to declare all the different things our PostgreSQL does after it:
 * The pgsql code uses a lot of strange code and procedures. We need to pick out what we want to expose abstractly and what we leave as custom pgsql code.
 * indexes
 * Alter table
 * Find a better way to handle the schema oddities we have than creating special types just for them. ie: Stick with a standard set of types but add some overrides for oddities.
 * Rethink the string/unicode types. Rather than the idea of varbinary vs. varchar these types were based on, there are actually strong differences between varbinary, varchar binary, and varchar.
 * Instead of debating between "@index ..." and something like "index name on tablename ..." perhaps new indexes can be done inside of an alter table block even though adding an index is done with a different sql command.
 * Alter table
 * Find a better way to handle the schema oddities we have than creating special types just for them. ie: Stick with a standard set of types but add some overrides for oddities.
 * Rethink the string/unicode types. Rather than the idea of varbinary vs. varchar these types were based on, there are actually strong differences between varbinary, varchar binary, and varchar.
 * Instead of debating between "@index ..." and something like "index name on tablename ..." perhaps new indexes can be done inside of an alter table block even though adding an index is done with a different sql command.