Manual:Database access/zh

本文概述了MediaWiki中的数据库访问和常规数据库问题.

在MediaWiki中进行编码时，通常只能为此目的通过MediaWiki的功能访问数据库.

数据库设计
有关MediaWiki数据库设计的信息，例如表及其内容的描述，请参见和.

使用sql.php
MediaWiki提供了一个维护脚本来访问数据库. 从maintenance目录运行：

然后，您可以写出数据库查询. 或者，您可以提供一个文件名，然后MediaWiki将执行该文件，并酌情替换任何MW特殊变量. 有关更多信息，请参见.

这将适用于所有数据库后端. 但是，提示功能没有数据库附带的命令行客户端的那样丰富.

使用mysql命令行客户端
在LocalSettings.php中，您将找到维基的MySQL密码和用户名，例如：

使用SSH，请输入以下内容进行登录:

用 信息替换 和. 然后将提示您输入密码 ，此后您将看到 提示符.

数据库抽象层
MediaWiki提供了一个数据库抽象层. 除非您在抽象层上工作，否则永远不要直接调用PHP的数据库函数（例如 或者 ）.

可通过 类访问抽象层. 可以通过在注入的 上调用 （首选）或 来获取此类的实例. 函数 正在逐步淘汰，不应在新代码中使用. 通常使用以下两个参数之一调用，可以是 （用于读取查询）或 （用于需要具有绝对最新信息的写入查询和读取查询）常量. 在多数据库环境，例如Wikimedia中，母版和副本之间的区别很重要. 请参阅下面的包装函数部分，以了解如何使用返回的 对象.

选择查询结果的包装是数组，其键是从1开始的整数. 要进行读取查询，通常需要满足以下条件：

""

对于write查询，请使用类似以下内容的内容：

""

我们约定使用进行读取，使用进行写入，以帮助您注意数据库对象是副本对象（只读）还是主对象（读/写）. 如果您向副本写入，就会遇到问题. 确切地说，在主服务器上成功执行的后续写请求，由于唯一的键冲突，复制到副本时可能会失败. 副本上的复制将停止，可能需要数小时才能修复数据库并使它恢复联机状态. 在副本服务器上的my.cnf中设置read_only可以避免这种情况，但是考虑到可怕的后果，我们希望有尽可能多的代码检查.

包装函数
我们为纯SQL提供了query函数，但是select和insert之类的包装函数通常更方便. 在某些情况下，它们可以处理诸如表前缀和转义之类的事情. 如果您确实需要编写自己的SQL，请阅读tableName和addQuotes的文档. 您将需要他们两个. 请记住，不正确使用addQuotes可能会给您的Wiki带来严重的安全漏洞.

Another important reason to use the high level methods rather than constructing your own queries is to ensure that your code will run properly regardless of the database type. Currently the best support is for MySQL/MariaDB. There is also good support for SQlite, however it is much slower than MySQL or MariaDB. There is support for PostgreSQL, but it is not as stable as MySQL. MediaWiki has experimental support for Oracle and MSSQL.

In the following, the available wrapper functions are listed. For a detailed description of the parameters of the wrapper functions, please refer to class 's docs. Particularly see  for an explanation of the ,  ,  ,  ,  , and   parameters that are used by many of the other wrapper functions.

包装函数: select
The select function provides the MediaWiki interface for a SELECT statement. The components of the SELECT statement are coded as parameters of the select function. An example is

这个例子对应于查询

也可以使用JOIN；例如：

这个例子对应于查询

provides an example of how to use table aliases in queries.

Arguments are either single values (such as 'category' and 'cat_pages > 0') or arrays, if more than one value is passed for an argument position (such as array('cat_pages > 0', $myNextCond)). If you pass in strings to the third or fifth argument, you must manually use Database::addQuotes on your values as you construct the string, as the wrapper will not do this for you. The values for table names (1st argument) or field names (2nd argument) must not be user controlled. The array construction for $conds is somewhat limited; it can only do equality and  relationships (i.e. WHERE key = 'value').

You can access individual rows of the result using a foreach loop. Once you have a row object, you can use the  operator to access a specific field. 完整的示例可能是：

Which will put an alphabetical list of categories with how many entries each category has in the variable $output. If you are outputting as HTML, ensure to escape values from the database with

Convenience functions
For compatibility with PostgreSQL, insert ids are obtained using nextSequenceValue and insertId. The parameter for nextSequenceValue can be obtained from the  statement in   and always follows the format of x_y_seq, with x being the table name (e.g. page) and y being the primary key (e.g. page_id), e.g. page_page_id_seq. For example:

For some other useful functions, e.g. affectedRows, numRows, etc., see Manual:Database.php.

Basic query optimization
MediaWiki developers who need to write DB queries should have some understanding of databases and the performance issues associated with them. Patches containing unacceptably slow features will not be accepted. Unindexed queries are generally not welcome in MediaWiki, except in special pages derived from QueryPage. It's a common pitfall for new developers to submit code containing SQL queries which examine huge numbers of rows. Remember that COUNT(*) is O(N), counting rows in a table is like counting beans in a bucket.

Replication
Large installations of MediaWiki such as Wikipedia, use a large set of replica MySQL servers replicating writes made to a master MySQL server. It is important to understand the issues associated with this setup if you want to write code destined for Wikipedia.

It's often the case that the best algorithm to use for a given task depends on whether or not replication is in use. Due to our unabashed Wikipedia-centrism, we often just use the replication-friendly version, but if you like, you can use  to check to see if replication is in use.

Lag
Lag primarily occurs when large write queries are sent to the master. Writes on the master are executed in parallel, but they are executed in serial when they are replicated to the replicas. The master writes the query to the binlog when the transaction is committed. The replicas poll the binlog and start executing the query as soon as it appears. They can service reads while they are performing a write query, but will not read anything more from the binlog and thus will perform no more writes. This means that if the write query runs for a long time, the replicas will lag behind the master for the time it takes for the write query to complete.

Lag can be exacerbated by high read load. MediaWiki's load balancer will stop sending reads to a replica when it is lagged by more than 30 seconds. If the load ratios are set incorrectly, or if there is too much load generally, this may lead to a replica permanently hovering around 30 seconds lag.

If all replicas are lagged by more than 30 seconds (according to ), MediaWiki will stop writing to the database. All edits and other write operations will be refused, with an error returned to the user. This gives the replicas a chance to catch up. Before we had this mechanism, the replicas would regularly lag by several minutes, making review of recent edits difficult.

In addition to this, MediaWiki attempts to ensure that the user sees events occurring on the wiki in chronological order. A few seconds of lag can be tolerated, as long as the user sees a consistent picture from subsequent requests. This is done by saving the master binlog position in the session, and then at the start of each request, waiting for the replica to catch up to that position before doing any reads from it. If this wait times out, reads are allowed anyway, but the request is considered to be in "lagged replica mode". Lagged replica mode can be checked by calling. The only practical consequence at present is a warning displayed in the page footer.

Shell users can check replication lag with ; the other users with the API.

Databases often have their own monitoring systems in place as well, see for instance MariaDB (Wikimedia) and wikitech:Help:Toolforge/Database (Wikimedia Cloud VPS).

Lag avoidance
To avoid excessive lag, queries that write large numbers of rows should be split up, generally to write one row at a time. Multi-row INSERT ... SELECT queries are the worst offenders and should be avoided altogether. Instead do the select first and then the insert.

Even small writes can cause lag if they are done at a very high speed and replication is unable to keep up. This most commonly happens in maintenance scripts. To prevent it, you should call  after every few hundred writes. Most scripts make the exact number configurable:

Working with lag
Despite our best efforts, it's not practical to guarantee a low-lag environment. Replication lag will usually be less than one second, but may occasionally be up to 30 seconds. For scalability, it's very important to keep load on the master low, so simply sending all your queries to the master is not the answer. So when you have a genuine need for up-to-date data, the following approach is advised:


 * 1) Do a quick query to the master for a sequence number or timestamp
 * 2) Run the full query on the replica and check if it matches the data you got from the master
 * 3) If it doesn't, run the full query on the master

To avoid swamping the master every time the replicas lag, use of this approach should be kept to a minimum. In most cases you should just read from the replica and let the user deal with the delay.

Lock contention
Due to the high write rate on Wikipedia (and some other wikis), MediaWiki developers need to be very careful to structure their writes to avoid long-lasting locks. By default, MediaWiki opens a transaction at the first query, and commits it before the output is sent. Locks will be held from the time when the query is done until the commit. So you can reduce lock time by doing as much processing as possible before you do your write queries. Update operations which do not require database access can be delayed until after the commit by adding an object to.

Often this approach is not good enough, and it becomes necessary to enclose small groups of queries in their own transaction. 使用以下语法：

Use of locking reads (e.g. the FOR UPDATE clause) is not advised. They are poorly implemented in InnoDB and will cause regular deadlock errors. It's also surprisingly easy to cripple the wiki with lock contention.

Instead of locking reads, combine your existence checks into your write queries, by using an appropriate condition in the WHERE clause of an UPDATE, or by using unique indexes in combination with INSERT IGNORE. Then use the affected row count to see if the query succeeded.

数据库架构
Don't forget about indexes when designing databases, things may work smoothly on your test wiki with a dozen of pages, but will bring a real wiki to a halt. 有关详细信息，请参见上方.

For naming conventions, see Manual:Coding conventions/Database.

SQLite兼容性
When writing MySQL table definitions or upgrade patches, it is important to remember that SQLite shares MySQL's schema, but that works only if definitions are written in a specific way:


 * Primary keys must be declared within main table declaration, but normal keys should be added separately with CREATE INDEX:

However, primary keys spanning over more than one field should be included in the main table definition:


 * Don't add more than one column per statement:


 * Set explicit defaults when adding NOT NULL columns:

您可以使用以下方法运行基本兼容性检查：



或者，如果您需要测试补丁更新，请执行以下两项操作：


 * （使用新的“ tables.sql”）
 * Since DB patches update the tables.sql file as well, for this one you should pass in the pre-commit version of tables.sql (the file with the full DB definition). Otherwise, you can get an error if you e.g. drop an index (since it already doesn't exist in tables.sql because you just removed it).
 * Since DB patches update the tables.sql file as well, for this one you should pass in the pre-commit version of tables.sql (the file with the full DB definition). Otherwise, you can get an error if you e.g. drop an index (since it already doesn't exist in tables.sql because you just removed it).

The above assumes you're in $IP/maintenance/, otherwise, pass the full path of the file. For extension patches, use the extension's equivalent of these files.

参见

 * &mdash; If an extension requires changes to the database when MediaWiki is updated, that can be done with this hook. Users can then update their wiki by running.