API:Query

From MediaWiki.org
(Redirected from API:Generator)
Jump to: navigation, search
Language:Project:Language policy English  • Deutsch • español • فارسی • 日本語 • русский • 中文
This page is part of the MediaWiki action API documentation.

MediaWiki action API

v · d · e

The action=query module allows you to get information about a wiki and the data stored in it, such as the wikitext of a particular page, the links and categories of a set of pages, or the token you need to change wiki content.

Introduction and guidelines[edit]

The query module has many submodules (called query modules), each with a different function. There are three types of query modules:

  • Meta information about the wiki and the logged-in user
  • Properties of pages, including page revisions and content
  • Lists of pages that match certain criteria

You should use multiple query modules together to get what you need in one request, e.g. prop=info|revisions&list=backlinks|embeddedin|allimages&meta=userinfo is a call to six modules in one request.

Unlike meta and list query modules, all property query modules work on a set of pages that you specify using either titles, pageids, revids, or generator parameters. Use one of the first three if you know the pages' titles, page ids, or revision ids. Do not ask for one page at a time – this is very inefficient, and consumes lots of extra resources and bandwidth. Instead you should request information about multiple pages by combining their titles or ids with the '|' pipe symbol: titles=PageA|PageB|PageC.

Use generator if you want to get data about a set of pages that would be the result of another API call. For example, if you want to get data about pages in a certain category, instead of querying list=categorymembers and then querying again with pageids set to all the returned pages, you should combine the two API calls into one by specifying generator=categorymembers in place of the list parameter. More details are in #Generators below.

If you're querying Wikimedia wikis and requesting results as format=json (or php), then specify formatversion=2. The original result format was designed around XML; the new structure is easier to process (and defaults to utf8). However, it is still subject to change in MediaWiki 1.26.

Lastly, you should always request the new "continue" syntax to iterate over results. To use it, always pass an empty continue= parameter, and check if the result contains a continue section. If it does, merge its returned values with your original request and call the api again. Repeat until there is no more continue section. More details are in #Continuing queries below.

Sample query[edit]

Before we get into the nitty-gritty, here's a useful sample query that simply gets the wiki markup (content) of a page:

api.php?action=query&prop=revisions&rvprop=content&format=jsonfm&titles=Main%20Page

This means fetch (action=query) the content (rvprop=content) of the most recent revision of Main Page (titles=Main%20Page) in JSON with whitespace to make it easier to read (format=jsonfm).

Alternatively, you can use action=raw as a parameter to index.php to get the content of a page: index.php?title=Main%20Page&action=raw

Specifying pages[edit]

You can specify pages in the following ways:

  • By name using the titles parameter, e.g. titles=Foo|Bar|Main_Page
  • By page ID using the pageids parameter, e.g. pageids=123|456|75915
  • By revision ID using the revids parameter, e.g. revids=478198|54872|54894545
    • Most query modules will convert revision ID to the corresponding page ID. Only prop=revisions actually uses the revision ID itself.
  • Using a generator

Specifying titles through the query string (either through titles or pageids) is limited to 50 titles per query (or 500 for those with the apihighlimits right, usually bots and sysops).

Title normalization[edit]

Title normalization converts page titles to their canonical form. This means capitalizing the first character, replacing underscores with spaces, and changing namespace to the localized form defined for that wiki. Title normalization is done automatically, regardless of which query modules are used. However, any trailing line breaks in page titles (\n) will cause odd behavior and they should be stripped out first.

Capitalization, localization, "_" → " " (space), "Project" → "Wikipedia", ...

Missing and invalid titles[edit]

Titles that don't exist or are invalid still appear in the <pages> section, but they have the missing="" or invalid="" attribute set. In output formats that support numeric array keys (such as JSON and PHP serialized), missing and invalid titles will have unique, negative page IDs. Query modules will just ignore missing or invalid titles, as they can't do anything useful with them. The titles in the Special: and Media: namespaces cannot be queried. If any such titles are found in the titles= parameter or passed to a module by a generator, a warning will be issued.

A missing title, an invalid one and an existing one in JSON

Resolving redirects[edit]

Redirects can be resolved automatically, so that the target of a redirect is returned instead of the given title. When present, they will always contain from and to attributes and may contain a tofragment attribute for those redirects that point to specific sections.

Both normalization and redirection may take place. In the case of multiple redirects, all redirects will be resolved, and in case of a circular redirect, there might not be a page in the 'pages' section (see also below). Redirect resolution cannot be used in combination with the revids= parameter or with a generator generating revids; doing that will produce a warning and will not resolve redirects for the specified revids.

The examples below show how the redirects parameter works.

Using "redirects" parameter. "Main page" is a redirect to "Main Page"

Same request but without the "redirects" parameter.

Without "redirects" you may want to use prop=info to obtain redirect status.

Request with a section link. "Wikipedia:!--" is a redirect to "Wikipedia:Manual of Style#Invisible comments"

Here is a case of a circular redirect: Page1 → Page2 → Page3 → Page1. Also, in this example a non-normalized name 'page1' is used.

Limits[edit]

Most list queries return 10 items by default. See here for more information on limits.

Continuing queries[edit]

MediaWiki version: 1.26

Very often you will not get all the data you want in one API query. When that happens the API result indicates there is more data.

Example of a query needing continuation

Because there are more data matching the query, the API result includes a continue element. If you want further data, you would add its values (in the example, continue=-|| and accontinue=List_of_19th_century_baseball_players) to the original request to get the next set of results. You continue to do this until an API result does not have a continue element, indicating there are no more data matching the query.

Here is Python code showing how to iterate over query results (using the python requests lib). Note you should not manipulate or depend on any specifics of the values returned inside the continue element, as they may change.

for result in query({'generator': 'allpages', 'prop': 'links'}):
    # process result data
...
def query(request):
    request['action'] = 'query'
    request['format'] = 'json'
    lastContinue = {}
    while True:
        # Clone original request
        req = request.copy()
        # Modify it with the values returned in the 'continue' section of the last result.
        req.update(lastContinue)
        # Call API
        result = requests.get('https://en.wikipedia.org/w/api.php', params=req).json()
        if 'error' in result:
            raise Error(result['error'])
        if 'warnings' in result:
            print(result['warnings'])
        if 'query' in result:
            yield result['query']
        if 'continue' not in result:
            break
        lastContinue = result['continue']

batchcomplete[edit]

When you make an API request using a generator together with properties, the API result may signal to continue because there are more properties to retrieve for the pages so far, or because there are more pages from the generator, or both. From version 1.25 onwards, the API returns a batchcomplete element to indicate that all data for the current "batch" of pages has been returned. This can be useful to avoid building a combined result set for thousands of pages when using a generator together with prop modules that may themselves need continuation.

Backwards compatibility of continue[edit]

MediaWiki version: 1.9

From MediaWiki 1.21 to 1.25, it was required to specify continue= (i.e. with an empty string as the value) in the initial request to get continuation data in the format described above. Without doing that, API results would indicate there is additional data by returning a query-continue element, explained in Raw query continue. Prior to 1.21, that raw continuation was the only option.

If your application needs to use the raw continuation in MediaWiki 1.26 or later, you must specify rawcontinue= to request it.

Getting a list of page IDs[edit]

When not using the new JSON formatversion=2, the result page set in JSON is returned as an object keyed by page ID which can be difficult to properly iterate over in JavaScript. The indexpageids parameter returns these page IDs as an array for easier iteration. Note that the ordering of these page IDs still does not necessarily correspond to the ordering of the input (whether directly or via a generator). Getting a list of all page IDs

Exporting pages[edit]

You can export pages through the API with the export parameter. If the export parameter is set, an XML dump of all pages in the <pages> element will be added to the result. The export parameter only gives a result when used with specified titles (Generator, titles, pageids or revid). Note that the XML dump will be wrapped in the requested format; if that format is XML, characters like < and > will be encoded as entities (&lt; and &gt;) If the exportnowrap parameter is also set, only the XML dump (not wrapped in an API result) will be returned.


Exporting the contents of API

Exporting all templates used in API

See also: Importing pages

Generators[edit]

With generators, you can use the output of a list instead of the titles parameter. The output of the list must be a list of pages, whose titles are automatically used instead of the titles, pageids or revids parameter. Other query modules will treat generated pages as if they were given in a parameter. Only one generator is allowed.

Some property modules can also be used as a generator. Unlike list modules, however, you are required to specify the titles, pageids or revids for the generator to work on. For example, if you wanted to load all pages that are linked to from the main page, you would use generator=links&titles=Main%20Page. Other query modules will then ignore the given titles and instead use the titles from the generator.

Parameters passed to a generator must be prefixed with a g. For instance, when using generator=backlinks, use gbltitle instead of bltitle.

It should also be noted that generators only pass page titles to the 'real' query, and do not output any information themselves. Setting parameters like gcmprop will therefore have no effect.

Using list=allpages as generator[edit]

Get links and categories for the first three pages in the main namespace starting with "Ba"

Generators and redirects[edit]

Here, we use prop=links as a generator. This query will get all the links from all the pages that are linked from Title. For this example, assume that Title has links to TitleA and TitleB. TitleB is a redirect to TitleC. TitleA links to TitleA1, TitleA2, TitleA3; and TitleC links to TitleC1 and TitleC2. Redirect are solved because the redirects parameter is set.

The query will execute the following steps:

  1. Resolve redirects for titles in the titles parameter
  2. For all the titles in the titles parameter, get the list of pages they link to
  3. Resolve redirects in that list
  4. Run the prop=links query on that list of titles

Using redirect resolution with generators

Generators and continuation[edit]

You can continue queries using a generator the same way as other queries. In the first call to the API, the generator will create a batch of titles to work on. Each subsequent continuation will give you only data from that batch until you have all of it, at which point the batchcomplete property will be set. This enables you to process that batch before continuing with the rest of the query, if you wish. The next continuation will then create a new batch from the generator and so on. If you use rawcontinue, please read API:Raw Query Continue to understand which parameters you have to include in the continuation queries. If instead you use continue, you simply pass all parameters back, as you do for queries without a generator. Please note that for generators used together with a non-query module, the continue format will always be used.

More generator examples[edit]

Show info about 4 pages starting at the letter "T"
https://en.wikipedia.org/w/api.php?action=query&generator=allpages&gaplimit=4&gapfrom=T&prop=info
Show content of first 2 non-redirect pages beginning at "Re"
https://en.wikipedia.org/w/api.php?action=query&generator=allpages&gaplimit=2&gapfilterredir=nonredirects&gapfrom=Re&prop=revisions&rvprop=content

Page types[edit]

Page type Example Used in the given page(s) Which pages have it List all in the wiki
Page link [[Page]] prop=links list=backlinks list=alllinks
Template transclusion {{Template}} prop=templates list=embeddedin list=alltransclusions
Categories [[category:Cat]] prop=categories list=categorymembers list=allcategories
Images [[file:image.png]] prop=images list=imageusage list=allimages
Language links [[ru:Page]] prop=langlinks list=langbacklinks
Interwiki links [[meta:Page]] prop=iwlinks list=iwbacklinks
URLs https://mediawiki.org prop=extlinks list=exturlusage

Possible warnings[edit]

  • No support for special pages has been implemented
    • Thrown if a title in the Special: or Media: namespace is given
  • Redirect resolution cannot be used together with the revids= parameter. Any redirects the revids= point to have not been resolved.
    • Note that this can also be caused by a generator that generates revids




action=query

(main | query)
  • This module requires read rights.
  • Source: MediaWiki
  • License: GPL-2.0+

Fetch data from and about MediaWiki.

All data modifications will first have to use query to acquire a token to prevent abuse from malicious sites.

Parameters:
prop

Which properties to get for the queried pages.

categories
List all categories the pages belong to.
categoryinfo
Returns information about the given categories.
contributors
Get the list of logged-in contributors and the count of anonymous contributors to a page.
deletedrevisions
Get deleted revision information.
duplicatefiles
List all files that are duplicates of the given files based on hash values.
extlinks
Returns all external URLs (not interwikis) from the given pages.
extracts
Returns plain-text or limited HTML extracts of the given pages.
fileusage
Find all pages that use the given files.
globalusage
Returns global image usage for a certain image.
imageinfo
Returns file information and upload history.
images
Returns all files contained on the given pages.
info
Get basic page information.
iwlinks
Returns all interwiki links from the given pages.
langlinks
Returns all interlanguage links from the given pages.
links
Returns all links from the given pages.
linkshere
Find all pages that link to the given pages.
mapdata
Request all map data from the page Metallica
pageimages
Returns information about images on the page, such as thumbnail and presence of photos.
pageprops
Get various page properties defined in the page content.
pageterms
Get the Wikidata terms (typically labels, descriptions and aliases) associated with a page via a sitelink. On the entity page itself, the terms are used directly. Caveat: On a repo wiki, this module only works directly on entity pages, not on pages connected to an entity via a sitelink. This may change in the future.
pageviews
Shows per-page pageview data (the number of daily pageviews for each of the last pvipdays days).
redirects
Returns all redirects to the given pages.
references
Return a data representation of references associated with the given pages.
revisions
Get revision information.
stashimageinfo
Returns file information for stashed files.
templates
Returns all pages transcluded on the given pages.
transcludedin
Find all pages that transclude the given pages.
transcodestatus
Get transcode status for a given file page.
videoinfo
Extends imageinfo to include video source (derivatives) information
wbentityusage
Returns all entity IDs used in the given pages.
flowinfo
Deprecated. Get basic Flow information about a page.
Values (separate with | or alternative): categories, categoryinfo, contributors, deletedrevisions, duplicatefiles, extlinks, extracts, fileusage, globalusage, imageinfo, images, info, iwlinks, langlinks, links, linkshere, mapdata, pageimages, pageprops, pageterms, pageviews, redirects, references, revisions, stashimageinfo, templates, transcludedin, transcodestatus, videoinfo, wbentityusage, flowinfo
list

Which lists to get.

abusefilters
Show details of the abuse filters.
abuselog
Show events that were caught by one of the abuse filters.
allcategories
Enumerate all categories.
alldeletedrevisions
List all deleted revisions by a user or in a namespace.
allfileusages
List all file usages, including non-existing.
allimages
Enumerate all images sequentially.
alllinks
Enumerate all links that point to a given namespace.
allpages
Enumerate all pages sequentially in a given namespace.
allredirects
List all redirects to a namespace.
allrevisions
List all revisions.
alltransclusions
List all transclusions (pages embedded using {{x}}), including non-existing.
allusers
Enumerate all registered users.
backlinks
Find all pages that link to the given page.
betafeatures
List all BetaFeatures
blocks
List all blocked users and IP addresses.
categorymembers
List all pages in a given category.
centralnoticelogs
Get a log of campaign configuration changes.
checkuser
Check which IP addresses are used by a given username or which usernames are used by a given IP address.
checkuserlog
Get entries from the CheckUser log.
codecomments
List comments on revisions in CodeReview.
codepaths
Get a list of 10 paths in a given repository, based on the input path prefix.
coderevisions
List revisions in CodeReview.
codetags
Get a list of tags applied to revisions in a given repository.
embeddedin
Find all pages that embed (transclude) the given title.
extdistbranches
Returns the list of branches for a repository supported by ExtensionDistributor
extdistrepos
Returns the list of repositories supported by ExtensionDistributor
exturlusage
Enumerate pages that contain a given URL.
filearchive
Enumerate all deleted files sequentially.
gadgetcategories
Returns a list of gadget categories.
gadgets
Returns a list of gadgets used on this wiki.
globalallusers
Enumerate all global users.
globalblocks
List all globally blocked IP addresses.
globalgroups
Enumerate all global groups.
imageusage
Find all pages that use the given image title.
iwbacklinks
Find all pages that link to the given interwiki link.
langbacklinks
Find all pages that link to the given language link.
linterrors
Get a list of lint errors
logevents
Get events from logs.
messagecollection
Query MessageCollection about translations.
mmsites
Serve autocomplete requests for the site field in MassMessage.
mostviewed
Lists the most viewed pages (based on last day's pageview count).
mystashedfiles
Get a list of files in the current user's upload stash.
pagepropnames
List all page property names in use on the wiki.
pageswithprop
List all pages using a given page property.
prefixsearch
Perform a prefix search for page titles.
protectedtitles
List all titles protected from creation.
querypage
Get a list provided by a QueryPage-based special page.
random
Get a set of random pages.
recentchanges
Enumerate recent changes.
search
Perform a full text search.
tags
List change tags.
threads
Show details of LiquidThreads threads.
usercontribs
Get all edits by a user.
users
Get information about a list of users.
watchlist
Get recent changes to pages in the current user's watchlist.
watchlistraw
Get all pages on the current user's watchlist.
wblistentityusage
Returns all pages that use the given entity IDs.
wikisets
Enumerate all wiki sets.
deletedrevs
Deprecated. List deleted revisions.
Values (separate with | or alternative): abusefilters, abuselog, allcategories, alldeletedrevisions, allfileusages, allimages, alllinks, allpages, allredirects, allrevisions, alltransclusions, allusers, backlinks, betafeatures, blocks, categorymembers, centralnoticelogs, checkuser, checkuserlog, codecomments, codepaths, coderevisions, codetags, embeddedin, extdistbranches, extdistrepos, exturlusage, filearchive, gadgetcategories, gadgets, globalallusers, globalblocks, globalgroups, imageusage, iwbacklinks, langbacklinks, linterrors, logevents, messagecollection, mmsites, mostviewed, mystashedfiles, pagepropnames, pageswithprop, prefixsearch, protectedtitles, querypage, random, recentchanges, search, tags, threads, usercontribs, users, watchlist, watchlistraw, wblistentityusage, wikisets, deletedrevs
Maximum number of values is 50 (500 for bots).
meta

Which metadata to get.

allmessages
Return messages from this site.
authmanagerinfo
Retrieve information about the current authentication status.
babel
Get information about what languages the user knows
featureusage
Get a summary of logged API feature usages for a user agent.
filerepoinfo
Return meta information about image repositories configured on the wiki.
globalrenamestatus
Show information about global renames that are in progress.
globaluserinfo
Show information about a global user.
languagestats
Query language stats.
linterstats
Get number of lint errors in each category
messagegroups
Return information about message groups.
messagegroupstats
Query message group stats.
messagetranslations
Query all translations for a single message.
notifications
Get notifications waiting for the current user.
oath
Check to see if two-factor authentication (OATH) is enabled for a user.
siteinfo
Return general information about the site.
siteviews
Shows sitewide pageview data (daily pageview totals for each of the last pvisdays days).
tokens
Gets tokens for data-modifying actions.
unreadnotificationpages
Get pages for which there are unread notifications for the current user.
userinfo
Get information about the current user.
wikibase
Get information about the Wikibase client and the associated Wikibase repository.
Values (separate with | or alternative): allmessages, authmanagerinfo, babel, featureusage, filerepoinfo, globalrenamestatus, globaluserinfo, languagestats, linterstats, messagegroups, messagegroupstats, messagetranslations, notifications, oath, siteinfo, siteviews, tokens, unreadnotificationpages, userinfo, wikibase
indexpageids

Include an additional pageids section listing all returned page IDs.

Type: boolean (details)
export

Export the current revisions of all given or generated pages.

Type: boolean (details)
exportnowrap

Return the export XML without wrapping it in an XML result (same format as Special:Export). Can only be used with query+export.

Type: boolean (details)
iwurl

Whether to get the full URL if the title is an interwiki link.

Type: boolean (details)
continue

When more results are available, use this to continue.

rawcontinue

Return raw query-continue data for continuation.

Type: boolean (details)
titles

A list of titles to work on.

Separate values with | or alternative. Maximum number of values is 50 (500 for bots).
pageids

A list of page IDs to work on.

Type: list of integers
Separate values with | or alternative. Maximum number of values is 50 (500 for bots).
revids

A list of revision IDs to work on.

Type: list of integers
Separate values with | or alternative. Maximum number of values is 50 (500 for bots).
generator

Get the list of pages to work on by executing the specified query module.

Note: Generator parameter names must be prefixed with a "g", see examples.

allcategories
Enumerate all categories.
alldeletedrevisions
List all deleted revisions by a user or in a namespace.
allfileusages
List all file usages, including non-existing.
allimages
Enumerate all images sequentially.
alllinks
Enumerate all links that point to a given namespace.
allpages
Enumerate all pages sequentially in a given namespace.
allredirects
List all redirects to a namespace.
allrevisions
List all revisions.
alltransclusions
List all transclusions (pages embedded using {{x}}), including non-existing.
backlinks
Find all pages that link to the given page.
categories
List all categories the pages belong to.
categorymembers
List all pages in a given category.
deletedrevisions
Get deleted revision information.
duplicatefiles
List all files that are duplicates of the given files based on hash values.
embeddedin
Find all pages that embed (transclude) the given title.
exturlusage
Enumerate pages that contain a given URL.
fileusage
Find all pages that use the given files.
images
Returns all files contained on the given pages.
imageusage
Find all pages that use the given image title.
iwbacklinks
Find all pages that link to the given interwiki link.
langbacklinks
Find all pages that link to the given language link.
links
Returns all links from the given pages.
linkshere
Find all pages that link to the given pages.
messagecollection
Query MessageCollection about translations.
mostviewed
Lists the most viewed pages (based on last day's pageview count).
pageswithprop
List all pages using a given page property.
prefixsearch
Perform a prefix search for page titles.
protectedtitles
List all titles protected from creation.
querypage
Get a list provided by a QueryPage-based special page.
random
Get a set of random pages.
recentchanges
Enumerate recent changes.
redirects
Returns all redirects to the given pages.
revisions
Get revision information.
search
Perform a full text search.
templates
Returns all pages transcluded on the given pages.
transcludedin
Find all pages that transclude the given pages.
watchlist
Get recent changes to pages in the current user's watchlist.
watchlistraw
Get all pages on the current user's watchlist.
wblistentityusage
Returns all pages that use the given entity IDs.
One of the following values: allcategories, alldeletedrevisions, allfileusages, allimages, alllinks, allpages, allredirects, allrevisions, alltransclusions, backlinks, categories, categorymembers, deletedrevisions, duplicatefiles, embeddedin, exturlusage, fileusage, images, imageusage, iwbacklinks, langbacklinks, links, linkshere, messagecollection, mostviewed, pageswithprop, prefixsearch, protectedtitles, querypage, random, recentchanges, redirects, revisions, search, templates, transcludedin, watchlist, watchlistraw, wblistentityusage
redirects

Automatically resolve redirects in query+titles, query+pageids, and query+revids, and in pages returned by query+generator.

Type: boolean (details)
converttitles

Convert titles to other variants if necessary. Only works if the wiki's content language supports variant conversion. Languages that support variant conversion include en, gan, iu, kk, ku, shi, sr, tg, uz and zh.

Type: boolean (details)