Manual:Database access

This article provides an overview of database access and general database issues in MediaWiki.

Database layout
For information about the MediaWiki database layout, such as a description of the tables and their contents, please see Manual:Database layout and maintenance/tables.sql.

Database Abstraction Layer
MediaWiki provides a database abstraction layer. Unless you are working on the abstraction layer, you should never directly call PHP's database functions (such as  or  .)

The abstraction layer is accessed by using the  function. For more detailed documentation on, see the entry on   in the GlobalFunctions.php file reference.

Typically,  is called with a single parameter, which can the   (for read queries) or   (for write queries and read queries that need to have absolutely newest information) constant. The distinction between master and slave is important in a multi-database environment, such as Wikimedia. This function will return you a  object that you can use to access the database. See the section below for what you can do with this Database object.

To make a read query, something like this usually suffices:

""

For a write query, use something like:

""

We use the convention $dbr for read and $dbw for write to help you keep track of whether the database object is a slave (read-only) or a master (read/write). If you write to a slave, the world will explode. Or to be precise, a subsequent write query which succeeded on the master may fail when replicated to the slave due to a unique key collision. Replication on the slave will stop and it may take hours to repair the database and get it back online. Setting read_only in my.cnf on the slave will avoid this scenario, but given the dire consequences, we prefer to have as many checks as possible.

Wrapper functions
We provide a query function for raw SQL, but the wrapper functions like select and insert are usually more convenient. They can take care of things like table prefixes and escaping for you under some circumstances. If you really need to make your own SQL, please read the documentation for tableName and addQuotes. You will need both of them.

Another important reason to use the high level methods rather than constructing your own queries is to ensure that your code will run properly regardless of the database type. Currently there is MySQL and reasonable support for SQLite and PostgreSQL, also somewhat limited for Oracle and DB2, but there could be other databases in the future such as MSSQL or Firebird.

In the following, the available wrapper functions are listed. For a detailed description of the parameters of the wrapper functions, please refer to class DatabaseBase's docs. Particularly see  for an explanation of the ,  ,  ,  ,  , and   parameters that are used by many of the other wrapper functions.

Wrapper function: select
The select function provides the MediaWiki interface for a SELECT statement. The components of the SELECT statement are coded as parameters of the select function. An example is This example corresponds to the query Arguments are either single values (such as 'category' and 'cat_pages > 0') or arrays, if more than one value is passed for an argument position (such as array('cat_pages > 0', $myNextCond)). If you pass in strings, you must manually use DatabaseBase::addQuotes on your values as you construct the string, as the wrapper will not do this for you. The array construction for $conds is somewhat limited; it can only do equality relationships (i.e. WHERE key = 'value').

Basic query optimization
MediaWiki developers who need to write DB queries should have some understanding of databases and the performance issues associated with them. Patches containing unacceptably slow features will not be accepted. Unindexed queries are generally not welcome in MediaWiki, except in special pages derived from QueryPage. It's a common pitfall for new developers to submit code containing SQL queries which examine huge numbers of rows. Remember that COUNT(*) is O(N), counting rows in a table is like counting beans in a bucket.

Replication
The largest installation of MediaWiki, Wikimedia, uses a large set of slave MySQL servers replicating writes made to a master MySQL server. It is important to understand the issues associated with this setup if you want to write code destined for Wikipedia.

It's often the case that the best algorithm to use for a given task depends on whether or not replication is in use. Due to our unabashed Wikipedia-centrism, we often just use the replication-friendly version, but if you like, you can use $wgLoadBalancer->getServerCount > 1 to check to see if replication is in use.

Lag
Lag primarily occurs when large write queries are sent to the master. Writes on the master are executed in parallel, but they are executed in serial when they are replicated to the slaves. The master writes the query to the binlog when the transaction is committed. The slaves poll the binlog and start executing the query as soon as it appears. They can service reads while they are performing a write query, but will not read anything more from the binlog and thus will perform no more writes. This means that if the write query runs for a long time, the slaves will lag behind the master for the time it takes for the write query to complete.

Lag can be exacerbated by high read load. MediaWiki's load balancer will stop sending reads to a slave when it is lagged by more than 30 seconds. If the load ratios are set incorrectly, or if there is too much load generally, this may lead to a slave permanently hovering around 30 seconds lag.

If all slaves are lagged by more than 30 seconds, MediaWiki will stop writing to the database. All edits and other write operations will be refused, with an error returned to the user. This gives the slaves a chance to catch up. Before we had this mechanism, the slaves would regularly lag by several minutes, making review of recent edits difficult.

In addition to this, MediaWiki attempts to ensure that the user sees events occuring on the wiki in chronological order. A few seconds of lag can be tolerated, as long as the user sees a consistent picture from subsequent requests. This is done by saving the master binlog position in the session, and then at the start of each request, waiting for the slave to catch up to that position before doing any reads from it. If this wait times out, reads are allowed anyway, but the request is considered to be in "lagged slave mode". Lagged slave mode can be checked by calling $wgLoadBalancer->getLaggedSlaveMode. The only practical consequence at present is a warning displayed in the page footer.

Lag avoidance
To avoid excessive lag, queries which write large numbers of rows should be split up, generally to write one row at a time. Multi-row INSERT ... SELECT queries are the worst offenders and should be avoided altogether. Instead do the select first and then the insert.

Working with lag
Despite our best efforts, it's not practical to guarantee a low-lag environment. Replication lag will usually be less than one second, but may occasionally be up to 30 seconds. For scalability, it's very important to keep load on the master low, so simply sending all your queries to the master is not the answer. So when you have a genuine need for up-to-date data, the following approach is advised:


 * 1) Do a quick query to the master for a sequence number or timestamp
 * 2) Run the full query on the slave and check if it matches the data you got from the master
 * 3) If it doesn't, run the full query on the master

To avoid swamping the master every time the slaves lag, use of this approach should be kept to a minimum. In most cases you should just read from the slave and let the user deal with the delay.

Lock contention
Due to the high write rate on Wikipedia (and some other wikis), MediaWiki developers need to be very careful to structure their writes to avoid long-lasting locks. By default, MediaWiki opens a transaction at the first query, and commits it before the output is sent. Locks will be held from the time when the query is done until the commit. So you can reduce lock time by doing as much processing as possible before you do your write queries. Update operations which do not require database access can be delayed until after the commit by adding an object to $wgPostCommitUpdateList.

Often this approach is not good enough, and it becomes necessary to enclose small groups of queries in their own transaction. Use the following syntax:

Use of locking reads (e.g. the FOR UPDATE clause) is not advised. They are poorly implemented in InnoDB and will cause regular deadlock errors.

It's also surprisingly easy to cripple the wiki with lock contention. If you must use them, define a new flag for $wgAntiLockFlags which allows them to be turned off, because we'll almost certainly need to do so on the Wikimedia cluster.

Instead of locking reads, combine your existence checks into your write queries, by using an appropriate condition in the WHERE clause of an UPDATE, or by using unique indexes in combination with INSERT IGNORE. Then use the affected row count to see if the query succeeded.

Database schema
Don't forget about indexes when designing databases, things may work smoothly on your test wiki with a dozen of pages, but will bring a real wiki to a halt. See above for details.

For naming conventions, see Manual:Coding conventions.

SQLite compatibility
When writing MySQL table definitions or upgrade patches, it is important to remember that SQLite shares MySQL's schema, but that works only if definitions are written in a specific way:
 * Primary keys must be declared within main table declaration, but normal keys should be added separately with CREATE INDEX:

However, primary keys spanning over more than one field should be included in the main table definition:


 * Don't add more than one column per statement:

You can run basic compatibility checks with php sqlite.php --check-syntax filename.sql, or, if you need to test an update patch, php sqlite.php --check-syntax tables.sql filename.sql, assuming that you're in $IP/maintenance/.