Database transactions

MediaWiki uses database transactions to assure consistency in the database, but also to improve performance.

Some general information about database transactions can be found here:


 * On Wikipedia, see Database transaction and ACID
 * For MySQL, see the transaction statement and the InnoDB transaction model

First, we should distinguish two types of methods: MediaWiki is using transactions in a few ways:
 * Those with outer transaction scope: methods that structurally are clearly guaranteed to have no callers up the chain that perform transaction operations. Places that have this scope are the execute method of Maintenance scripts, the run method of Job classes, and the doUpdate method of DeferrableUpdate classes. When these methods run, no callers further up the call stack will have any transaction activity. This means that methods with outer transaction scope are free to start and end transactions (given some caveats described below). Callers down the stack do not have outer scope and are expected to respect this fact.
 * Those without unclear/inner transaction scope: these are methods that are not clearly guaranteed to have outer transaction scope. This is most of the methods in MediaWiki core and extensions. Various methods fall under this category, such as those of model/abstraction classes, utility classes, DAO objects, hook handlers, business/control logic classes, and so on. These methods are not free to start/end transactions and must only use transaction semantics that support nesting. If they need to do some updates after commit, then they must register a post-commit callback method.
 * 1) Using "traditional" begin/commit pairs to protect critical sections and be certain they are committed. Nested transactions are not supported. This should only ever be used from callers that have outer transaction scope and are only affecting one database (accounting for any possible hook handlers too). Valid methods include callbacks to onTransactionIdle or AutoCommitUpdate where only one DB is updated and no hooks are fired. Always match each begin with a commit.
 * 2) Using startAtomic/endAtomic pairs to protect critical sections without knowing when they will commit. Nested sections are fully supported. These can be used anywhere, but must be properly nested (e.g. do not open a section and then not close it before a "return" statement). In maintenance scripts, when no atomic sections are open, a commit happens. If DBO_TRX is set, however, the atomic sections join the main DBO_TRX transaction set.
 * 3) Using a pivoted transaction round if DBO_TRX is enabled (this is the case per default on web requests, but not for maintenance mode or unit tests). The first write on each database connection without a transaction triggers BEGIN. A COMMIT happens at the end of the request for all databases connections with writes pending. If multiple databases that have DBO_TRX set where written to, then they all will do their commit step in rapid succession, at the end of the request. This maximizes cross-DB transaction atomicity. Note that optimistic concurrency control (REPEATABLE-READ or SERIALIZABLE in PostgreSQL) might undermine this somewhat, since SERIALIZATION FAILURE can occur on a proper subset of the commits, even if all the writes appeared to succeed. In any case, DBO_TRX reduces the number of commits which can help site performance (by reducing fsync calls) and means that all writes in the request are typically either committed or rollback together.
 * 4) If at any point, an exception is thrown and not caught by anything else, MWExceptionHandler will catch it and rollback all database connections with transactions. This is very useful when combined with DBO_TRX.

Various misuses of transactions will cause exceptions or warnings, for example: In some cases, code may want to know that data is committed before continuing to the next steps. One way to do this is to put the next steps in callback to onTransactionIdle, AtomicSectionUpdate, or AutoCommitUpdate. The later two are DeferredUpdates, which differ sometwhat in Maintenance vs web/job request mode: Any method with outer transaction scope has the option of calling commitMasterChanges on the LBFactory singleton to flush all active transactions on all database. This also assures that all pending updates are committed before the next lines of code are executed.
 * Nesting begin with another begin or commit will throw an exception.
 * Calling begin or commit when an atomic section is active will throw an exception.
 * Calling commit when no transaction is open will raise a warning.
 * startAtomic/endAtomic expect __METHOD__ as an argument and this must match on each level of atomic section nesting. If it does not match, then an exception is thrown.
 * In web requests and jobs (including jobs in CLI mode), these updates run after the main transaction round commits. Each update is wrapped in it's own transaction round, though AutoCommitUpdate disables DBO_TRX on the specified database handle, committing each query on the fly. If deferred updates trigger deferred updates, the extra transaction rounds are simply added.
 * In Maintenance scripts, these updates run after any transaction on the local (e.g. "current wiki") database commits (or immediately if there isn't one). Deferred updates cannot simply be automatically deferred until no transactions are active as that might lead to OOMs for long running scripts where some (possibly "foreign wiki") database always has an active transaction (this would otherwise be ideal). This is why it's oddly tied only to the local database master. Regardless, since Maintenance::execute has outer transaction scope and DBO_TRX is off for them, it doesn't usually make sense to directly call DeferredUpdates::addUpdate from the execute method, since the code could just run immediately.

The use of rollback should strongly be avoided, since it affects what all the previous executed code did before the rollback. It's particularly bad since other databases might have related changes and it's easy to forget to roll them back too. Instead, simply throwing an uncaught exception is enough to trigger rollback of all databases. This is how rollback is normally used, as a fail-safe that aborts everything, returns to the initial state, and errors out. However, if directly calling rollback is truly needed, always use rollbackMasterChanges on the LBFactory singleton to make sure all databases are reverted to the initial state of any transaction round.

For some databases transaction lifecycle can be traced in the debug log when $wgDebugDBTransactions is set to true.

This is the result of some conversation on the wikitech-l mailing list and subsequent discussion on the bugzilla. Some relevant discussions are:


 * Nested database transactions
 * Can we kill DBO_TRX? It seems evil!
 * Transaction warning: WikiPage::doDeleteArticleReal
 * Transaction warning: WikiPage::doEdit (User::loadFromDatabase) (TranslateMetadata::get)

In one mail, Tim Starling explained the reasoning behind the DBO_TRX system. Here is a redacted version of his explanation:

DBO_TRX provides the following benefits: * It provides improved consistency of write operations for code which is not transaction-aware, for example rollback-on-error. * It provides a snapshot for consistent reads, which improves application correctness when concurrent writes are occurring. DBO_TRX was introduced when we switched over to InnoDB, along with the introduction of Database::begin and Database::commit. [...]   Initially, I set up a scheme where transactions were "nested", in the sense that begin incremented the transaction level and commit decremented it. When it was decremented to zero, an actual COMMIT was issued. So you would have a call sequence like: * begin -- sends BEGIN * begin -- does nothing * commit -- does nothing * commit -- sends COMMIT This scheme soon proved to be inappropriate, since it turned out that the most important thing for performance and correctness is for an   application to be able to commit the current transaction after some particular query has completed. Database::immediateCommit was introduced to support this use case -- its function was to immediately reduce the transaction level to zero and commit the underlying transaction. When it became obvious that that every Database::commit call should really be Database::immediateCommit, I changed the semantics, effectively renaming Database::immediateCommit to   Database::commit. I removed the idea of nested transactions in   favour of a model of cooperative transaction length management: * Database::begin became effectively a no-op for web requests and was sometimes omitted for brevity. * Database::commit should be called after completion of a sequence of write operations where atomicity is desired, or at the earliest opportunity when contended locks are held. [...]   When transactions too long, you hit performance problems due to lock contention. When transactions are too short, you hit consistency problems when requests fail. The scheme I introduced favours performance over consistency. It resolves conflicts between callers and callees by using the shortest transaction time. I think was an   appropriate choice for Wikipedia, both then and now, and I think it is    probably appropriate for many other medium to high traffic wikis. Savepoints were not available at the time the scheme was introduced. But they are a refinement of the abandoned transaction nesting scheme, not a refinement of the current scheme which is optimised for reducing lock contention. In terms of performance, perhaps it would be feasible to use short transactions with an explicit begin with savepoints for nesting. But then you would lose the consistency benefits of DBO_TRX that I   mentioned at the start of this post. -- Tim Starling