API:Query

The  module allows you to get information about a wiki and the data stored in it, such as the wikitext of a particular page, the links and categories of a set of pages, or the token you need to change wiki content.

Introduction and guidelines
The query module has many submodules (called query modules), each with a different function. There are three types of query modules:
 * Meta information about the wiki and the logged-in user
 * Properties of pages, including page revisions and content
 * Lists of pages that match certain criteria

You should use multiple query modules together to get what you need in one request, e.g.  is a call to six modules in one request.

Unlike meta and list query modules, all property query modules work on a set of pages that you specify using either,  ,  , or   parameters. Use one of the first three if you know the pages' titles, page ids, or revision ids. Do not ask for one page at a time – this is very inefficient, and consumes lots of extra resources and bandwidth. Instead you should request information about multiple pages by combining their titles or ids with the ' | ' pipe symbol:.

Use  if you want to get data about a set of pages that would be the result of another API call. For example, if you want to get data about pages in a certain category, instead of querying  and then querying again with   set to all the returned pages, you should combine the two API calls into one by specifying   in place of the list parameter. More details are in below.

If you're querying Wikimedia wikis and requesting results as  (or php), then specify. The original result format was designed around XML;  the new structure is easier to process (and defaults to utf8). However, it is still subject to change in MediaWiki 1.26.

Lastly, you should always request the new "continue" syntax to iterate over results. To use it, always pass an empty  parameter, and check if the result contains a   section. If it does, merge its returned values with your original request and call the api again. Repeat until there is no more  section. More details are in below.

Sample query
Before we get into the nitty-gritty, here's a useful sample query that simply gets the wiki markup (content) of a page:

api.php?action=query&prop=revisions&rvprop=content&format=jsonfm&titles=Main%20Page

This means fetch (action=query) the content (rvprop=content) of the most recent revision of Main Page (titles=Main%20Page) in JSON with whitespace to make it easier to read (format=jsonfm).

Alternatively, you can use  as a parameter to index.php to get the content of a page: index.php?title=Main%20Page&action=raw

Specifying pages
You can specify pages in the following ways:
 * By name using the  parameter, e.g.
 * By page ID using the  parameter, e.g.
 * By revision ID using the  parameter, e.g.
 * Most query modules will convert revision ID to the corresponding page ID. Only prop=revisions actually uses the revision ID itself.
 * Using a generator

Specifying titles through the query string (either through  or  ) is limited to 50 titles per query (or 500 for those with the   right, usually bots and sysops).

Title normalization
Title normalization converts page titles to their canonical form. This means capitalizing the first character, replacing underscores with spaces, and changing namespace to the localized form defined for that wiki. Title normalization is done automatically, regardless of which query modules are used. However, any trailing line breaks in page titles (\n) will cause odd behavior and they should be stripped out first.

Missing and invalid titles
Titles that don't exist or are invalid still appear in the  section, but they have the   or   attribute set. In output formats that support numeric array keys (such as JSON and PHP serialized), missing and invalid titles will have unique, negative page IDs. Query modules will just ignore missing or invalid titles, as they can't do anything useful with them. The titles in the Special: and Media: namespaces cannot be queried. If any such titles are found in the  parameter or passed to a module by a generator, a warning will be issued.

Resolving redirects
Redirects can be resolved automatically, so that the target of a redirect is returned instead of the given title. When present, they will always contain  and   attributes and may contain a   attribute for those redirects that point to specific sections.

Both normalization and redirection may take place. In the case of multiple redirects, all redirects will be resolved, and in case of a circular redirect, there might not be a page in the 'pages' section (see also below). Redirect resolution cannot be used in combination with the  parameter or with a generator generating revids; doing that will produce a warning and will not resolve redirects for the specified revids.

The examples below show how the  parameter works.

Limits
See here for more information on limits.

Continuing queries
Very often you will not get all the data you want in one API query. When that happens the API result indicates there is more data.

Because there are more data matching the query, the API result includes a  element. If you want further data, you would add its values (in the example,  and  ) to the original request to get the next set of results. You continue to do this until an API result does not have a  element, indicating there are no more data matching the query.

Here is Python code showing how to iterate over query results (using the python requests lib). Note you should not manipulate or depend on any specifics of the values returned inside the  element, as they may change.

batchcomplete
When you make an API request using a generator together with properties, the API result may signal to continue because there are more properties to retrieve for the pages so far, or because there are more pages from the generator, or both. From version 1.25 onwards, the API returns a  element to indicate that all data for the current "batch" of pages has been returned. This can be useful to avoid building a combined result set for thousands of pages when using a generator together with prop modules that may themselves need continuation.

Backwards compatibility of continue
From MediaWiki 1.21 to 1.25, it was required to specify  (i.e. with an empty string as the value) in the initial request to get continuation data in the format described above. Without doing that, API results would indicate there is additional data by returning a  element, explained in Raw query continue. Prior to 1.21, that raw continuation was the only option.

If your application needs to use the raw continuation in MediaWiki 1.26 or later, you must specify  to request it.

Getting a list of page IDs
When not using the new JSON, the result page set in JSON is returned as an object keyed by page ID which can be difficult to properly iterate over in JavaScript. The  parameter returns these page IDs as an array for easier iteration. Note that the ordering of these page IDs still does not necessarily correspond to the ordering of the input (whether directly or via a generator).

Exporting pages
You can export pages through the API with the  parameter. If the  parameter is set, an XML dump of all pages in the   element will be added to the result. The  parameter only gives a result when used with specified titles (Generator, ,   or  ). Note that the XML dump will be wrapped in the requested format; if that format is XML, characters like &lt; and &gt; will be encoded as entities (&amp;lt; and &amp;gt;) If the  parameter is also set, only the XML dump (not wrapped in an API result) will be returned.

See also: Importing pages

Generators
With generators, you can use the output of a list instead of the  parameter. The output of the list must be a list of pages, whose titles are automatically used instead of the,   or   parameter. Other query modules will treat generated pages as if they were given in a parameter. Only one generator is allowed.

Some property modules can also be used as a generator. Unlike list modules, however, you are required to specify the,   or   for the generator to work on. For example, if you wanted to load all pages that are linked to from the main page, you would use. Other query modules will then ignore the given titles and instead use the titles from the generator.

Parameters passed to a generator must be prefixed with a. For instance, when using, use   instead of.

It should also be noted that generators only pass page titles to the 'real' query, and do not output any information themselves. Setting parameters like  will therefore have no effect.

Generators and redirects
Here, we use prop=links as a generator. This query will get all the links from all the pages that are linked from Title. For this example, assume that Title has links to TitleA and TitleB. TitleB is a redirect to TitleC. TitleA links to TitleA1, TitleA2, TitleA3; and TitleC links to TitleC1 and TitleC2. Redirect are solved because the  parameter is set.

The query will execute the following steps:
 * 1) Resolve redirects for titles in the   parameter
 * 2) For all the titles in the   parameter, get the list of pages they link to
 * 3) Resolve redirects in that list
 * 4) Run the prop=links query on that list of titles

Generators and continuation
You can continue queries using a generator the same way as other queries. In the first call to the API, the generator will create a batch of titles to work on. Each subsequent continuation will give you only data from that batch until you have all of it, at which point the  property will be set. This enables you to process that batch before continuing with the rest of the query, if you wish. The next continuation will then create a new batch from the generator and so on. If you use, please read API:Raw Query Continue to understand which parameters you have to include in the continuation queries. If instead you use, you simply pass all parameters back, as you do for queries without a generator. Please note that for generators used together with a non-query module, the  format will always be used.

More generator examples

 * Show info about 4 pages starting at the letter "T"
 * https://en.wikipedia.org/w/api.php?action=query&generator=allpages&gaplimit=4&gapfrom=T&prop=info


 * Show content of first 2 non-redirect pages beginning at "Re"
 * https://en.wikipedia.org/w/api.php?action=query&generator=allpages&gaplimit=2&gapfilterredir=nonredirects&gapfrom=Re&prop=revisions&rvprop=content

Possible warnings

 * No support for special pages has been implemented
 * Thrown if a title in the Special: or Media: namespace is given
 * Redirect resolution cannot be used together with the revids= parameter. Any redirects the revids= point to have not been resolved.
 * Note that this can also be caused by a generator that generates revids