Toolserver:Database access

One of the key features of a Toolserver account is access to a replicated copy of the Wikimedia databases. The databases are organised into six clusters: s1, s2, s3, s4, s5 and s6.

Overview
There are six clusters of databases, defined by the Wikimedia Foundation. The Toolserver replicates the clusters as they are. In addition, all servers also have a copy of the Commons database, though Commons is properly a part of s4.

Each cluster is stored on a database host called sql-sX, e.g. s1 is stored on sql-s1. See Toolserver database for how to check on which cluster a wiki is located.

By default, your database credentials are stored in. The code examples below show you how to read that file in, rather than putting passwords in your script directly, which is a security risk.


 * DNS names
 * You can also connect to a host using a DNS name: . This has the advantage of not requiring a query to the   database or knowing in advance which cluster to connect to.


 * Fast servers
 * The servers, for the most part, are real-time copies of the live databases, but at times there may be replag. To alleviate this problem, each cluster also has a fast server, which disallows queries longer than 60 seconds.  This server is much less likely to be lagged.  To connect to the fast server, append -fast to the server name, e.g. sql-s1-fast. To use the fast server with a DNS hostname, replace   with  : arwiki-p.fastdb.toolserver.org.


 * Database names
 * Each Wikimedia project has a specific database name. The general pattern is langcodeproject name_p. The full list is available here (though without the _p suffix).

Command-line / script access
To connect to a specific database (for example, enwiki_p) via a command-line interface, use: This will automatically look up the correct server and connect to it.

If you are connecting via a script or program, there are two methods of connecting to the appropriate database:
 * 1) Connect to the toolserver database (on "sql-toolserver", but also available on any one of the database servers) and run a query like:   This will output an integer that corresponds to the appropriate server. For example, the query just mentioned would output 1 indicating that enwiki_p is located on sql-s1.
 * 2) Use the DNS aliases, for example: enwiki-p.db.toolserver.org or enwiki-p.fastdb.toolserver.org.  These are configured to automatically resolve to the correct server for the specified database. Note that "_p" must be converted to "-p" in the DNS aliases.  Using db.toolserver.org connects to a standard database server, while fastdb.toolserver.org connects to a fast database server.

Batch access
If you plan to run long queries on the database, e.g. for scheduled report jobs, we recommend using batch job scheduling to access the database. This will ensure that the database doesn't become overloaded.

Program access
In most programming languages, it will be sufficient to tell MySQL to use the database credentials found in $HOME/.my.cnf

Below are various examples in a few common programming languages.

Python

 * See also: Django

phpMyAdmin
To access phpMyAdmin on the Toolserver, visit https://phpmyadmin.toolserver.org and use your Toolserver username and your LDAP password for the first dialog. When you see the phpMyAdmin login page use your username and your MySQL password (which can be found in the .my.cnf file in your home directory).

Database schema
The current MediaWiki database schema is available in the MediaWiki svn tree at /trunk/phase3/maintenance/tables.sql. A (possibly out of date) description of the schema is available at mw:Manual:Database layout.

Page text access
FIXME

General caveats
FIXME

Toolserver database
The toolserver (meta-)database is a database called "toolserver", containing information about other databases on the toolserver. It can be read by anyone and is available on all database servers. The host name of the primary copy is sql-toolserver</tt>.

The toolserver database is updated automatically on a daily basis. It is maintained by User:Duesentrieb, but please report any problems to JIRA.

Tables in the toolserver database:

wiki
Information about the wikis for which the Toolserver has data.

So, for example, to get SQL server, database name and the API URLs of the 10 largest Wikipedias, you could do this:

namespace (deprecated)
This table contains the primary namespace names for the wikis in the wiki table. Use of this table is deprecated, use the namespacename table instead, which covers all valid names for each namespace in each wiki, not just the primary names.

namespacename
This table contains the all valid namespace names for the wikis in the wiki table. Use of this table is preferred over the namespace table, since it covers all valid names for each namespace in each wiki, not just the primary names.

This can be used to build wiki-syntax links from information from the database, e.g.:

language
This table contains language names, by code, in English and the respective language.

For example, to get the languages of the top 10 Wikipedias:

User databases
By default, once a user is created an empty  database is created as well. Users can create additional databases by following the  pattern. For example, if user dab wanted to create a table named, he would name it.

You can allow other users access to your databases by adding the  suffix to the end of the database names. For example, if you wanted to share the previously created image database, you could rename it:

Program access
Simply use  as the hostname rather than   as in the examples above.

Database status files
There are status-files about each cluster (and the user databases) at /var/www/status_sclusternumber or  http ://toolserver.org/status_sclusternumber</tt> (e.g., http://toolserver.org/status_s1). In both cases, clusternumber can be an integer from 1 to 6 inclusive.

These files are simple ASCII text files. Lines beginning with " " are comment lines and can safely be ignored. A non-comment-line " " indicates that everything is alright and no known maintenance or service interruption is scheduled.

LOAD DATA LOCAL
By default, MySQL does not allow the LOAD DATA LOCAL command to be used for security reasons. If you want to allow this, you can edit your $HOME/.my.cnf</tt> file to add the local-infile</tt> option:

[client] local-infile user = jsmith ...

For more information, see the MySQL manual.

(This is not required on nightshade</tt>, because Debian's MySQL package enables local-infile by default.)