API talk:Query

From MediaWiki.org
Jump to: navigation, search

Is it possible to query for pages that contain a template?[edit]

I am trying (and failing) to formulate a query that will return pages that contain Template:Persondata. Does anyone have an idea how this can be achieved?

first pages of WP have a swear word[edit]

I visited http://en.wikipedia.org/w/api.php?action=query&generator=allpages&gaplimit=4 and was greeted with a swear word. Jidanni 20:26, 19 October 2007 (UTC)

wikipedia:!!!Fuck You!!! is about a music album. This is just another Wikipedia article, nothing the API can do about that. --Catrope 20:39, 19 October 2007 (UTC)

prop=info[edit]

I think prop=info should be documentated here. Anyone can fix the problem? --87.6.112.238 16:39, 1 December 2007 (UTC)

It's documented here, and also linked from this page ("Page information"). --Catrope 14:43, 3 December 2007 (UTC)

Suggestion: start and limit versus start and until[edit]

Having to specify a starting name or title plus a limit makes it necessary to start yet another query after the last name has been determined from the previous one in such cases, when the number of items returned does not suffice the callers needs.

In order to make programming easier, we could as well allow to tell the name or title of the last item wanted. This can sometimes, but not always, reached using "prefixindex". --Purodha Blissenbach 14:14, 11 March 2008 (UTC)

There are a non-obsolete alternative?[edit]

Suggestion: to comment at the article that "api.php is a non-obsolete alternative to query.php".

I don't think that's necessary. You're supposed to get here through API, where that's said already. --Catrope 20:01, 28 March 2008 (UTC)

limit = max ?[edit]

Hi, would it be possible to have a special "max" value for limit parameters ? That would just tell the API to use the maximum authorized value for the user. This would be useful for example for tools that can be run both by regular users, bots or sysops. --NicoV 18:26, 7 July 2008 (UTC)

That feature has existed for ages. It could be that it's not documented here, I'll check. --Catrope 20:37, 7 July 2008 (UTC)
Oups, thanks for pointing that out and putting a link to the other documentation :) --NicoV 18:56, 8 July 2008 (UTC)

Limits confusion[edit]

There seem to be at least two kinds of limits which this documentation is not clear about.

It seems the former does not have the features such as "max", the "limits" element, "query-contine".

Can this be clarified by somebody who knows? In particular I'd like my code to deterministically know how many titles it can query for each site no matter whether it is run as bot or normal etc. — Hippietrail 16:35, 2 August 2008 (UTC)

You're right that there's no query-continue for the limit on titles= (a limit that applies to all multivalue parameters, i.e. parameters that allow multiple values separated by the | character), excess titles will just be ignored. To determine which limit applies to you, run a userinfo query to see whether you've got the apihighlimits right. --Catrope 22:10, 3 August 2008 (UTC)

Tokens cannot be obtained through JSON Callback mode[edit]

I understand why this limitation exists, but, please, it would be nice if it was written in the documentation! I lost 1 day of experimentation form my degree thesis with various communication methods, only to find out that tokens cannot be obtained when there is a JSON callback (from examining the source). MediaWiki always returned me the

"warnings":{"info":{"*":"Unrecognized value for parameter 'intoken': edit"}}

error. I hope that this discussion will help others googling this to find info on the issue. --Bobo italy 11:04, 24 February 2009 (UTC)

Added here. --Catrope 19:43, 24 February 2009 (UTC)

continue and Generators[edit]

I am missing information, about how I can continue through query when I use a generator? It is not right to add both continue params to the queryurl. I must first add the continue param from the non-generator modul and when I get one continue, I can use that. It that right? Why a continue can stand at the top, or at the end of the query? Thanks for information 80.143.85.173 14:00, 23 May 2009 (UTC)

You're right, you should first continue the 'regular' module, then the generator; I'll add this information. The query-continue element sometimes being on top and sometimes on the bottom may be weird, but not very relevant. --Catrope 21:54, 23 May 2009 (UTC)

Can a generator generate titles for a list query?[edit]

I want to identify articles in a category that aren't in a list. This would be trivial, except that the list might link to an article in the category via a redirect. What it boils down to is this: I need a list of redirects to articles in a given category.

The obvious solution is to use generator=categorymembers with list=backlinks. But this doesn't work for me. The documentation doesn't say this can't be done, but I note that there is not a single example in the documentation, of a generator that passes generated titles to a list query: every single generator example passes the generated titles to a prop query.

Can this be done? If so, please add an example to the documentation; if not, please update the documentation to explicitly say this is not possible.

Hesperian 02:13, 1 July 2009 (UTC)

To me this is pretty obvious from the docs on generators, which say that the generated pages are substituted for the titles parameter, whereas list=backlinks uses the bltitle parameter. Also, the latter accepts only one title while a generator may generate more than one title. In short: no, this cannot be done in a single request using generators, you'd have to run a separate list=backlinks query for each title. --Catrope 11:31, 1 July 2009 (UTC)

Chacter Issues with meta=siteinfo[edit]

After receiving errors in the Collection Extension indicating the SiteInfo could not be retrieved, I narrowed the problem down to the Mediawiki API calls from the collections extension. The errors were created during the use of renderer mwserve which creates a PDF of a collection of pages.

The Api call that caused the problem is in the file, ApiQuerySiteinfo.php under the function appendGeneralInfoOriginal. The offending code is below:

$mainPage = Title :: newFromText(wfMsgForContent('mainpage'));
$data['mainpage'] = $mainPage->getPrefixedText();
$data['base'] = $mainPage->getFullUrl();

The line that causes problems is the access to the $mainpage object to get the full url. Commenting only that line out of the file will cause the Api command to run sucessfully and provide xml output to the screen.

The result of running the following Api query:

api.php?action=query&meta=siteinfo&format=xmlfm

presents the following output:



According to the Joomla Forum, this is the result of a Byte Order Mark that is not interpreted correctly.

Yes, "" is the Byte Order Mark (BOM) of the Unicode Standard. Specifically it is the hex bytes EF BB BF, which form the UTF-8 representation of the BOM, misinterpreted as ISO 8859/1 text instead of UTF-8.

Probably what it means is that you are using a text editor that is saving files in UTF-8 with the BOM, when it should be saving without the BOM. It could be PHP files that have the BOM, in which case they'd appear as literal text on your page. Or it could be translated text you pasted into Joomla! edit windows.

I do not know why others are not having the same problem. I have verified that MySQL is using UTF-8. I experimented with the $wgShowHostnames, having no effect. I set $wgDBmysql5 to 'false' from 'true' to test with no effect (NOTE that it is not suggested to change this setting and actually the setting is supposed to force names to utf8). I also verified no PHP settings for apache were changed in mbstrings or other areas.

I have spent considerable time researching various threads on the web and none could provide help or a solution. My (pathetic) resolution was to finally hardcode the url into the Api file. Then the calls to the collection extension work fine and PDF output is correctly produced.

--David, 9/30/2009 3:06pm (-5)

Apparently, you've edited some file (LocalSettings.php?) with Windows Notepad or some other screwy text editor that added a BOM to it. You'll have to find and remove it. I recommend using something like Notepad++ in the future. 62.140.253.6 19:17, 30 September 2009 (UTC)
Hi, I have the same issue of a BOM character at the beginning of the api.php page. I asked the staff maintaining the wiki, they scan all the files of the wiki and removed the BOM characters, but it always shows on the api.php page. Is there other files it could come from? 89.3.152.107 15:11, 3 June 2011 (UTC)

Query 101[edit]

I just spend way too long trying to figure out how to do something that should be very simple, namely fetch the current content of a page. The syntax for accomplishing this is still pretty perverse, but now it's added as an example at the top. Jpatokal 23:51, 18 November 2009 (UTC)

How do I access SpecialPages:WantedPages[edit]

I need to access the list of pages not yet created in my application. Why are the special pages not documented or unsupported? — Preceding unsigned comment added by 24.75.44.162 (talkcontribs) 19:13, 20 January 2011

There's a new module in MediaWiki 1.17 (not released yet, but an in-development version is available through SVN) called list=querypage. This includes wantedpages.
You can see more information about this at /w/api.php of any wiki running the development version MediaWiki 1.17. An example of a public wiki running this is TranslateWiki.net
http://translatewiki.net/w/api.php?action=query&list=querypage&qppage=Wantedpages
Krinkle 22:37, 27 January 2011 (UTC)

Page IDs[edit]

Can someone confirm that page IDs stay the same all the time (even if page has been moved)? Which ID is associated to redirect page? --Smihael 11:34, 21 August 2011 (UTC)

If a redirect page was created by a move, it is a new page with a new ID. Rich Farmbrough 21:28, 14 April 2016 (UTC).

Continuing queries[edit]

The "Continuing queries" section states that "when using a generator" the API can return two query-continue values. In fact, this can happen even without using a generator; for example, http://en.wikipedia.org/w/api.php?action=query&titles=Albert%20Einstein&prop=categories%7Clinks returns query-continues for both categories and links. In this case, what is the algorithm for correctly continuing the query? --R'n'B 15:40, 15 November 2011 (UTC)

Suggestion : add a new category for generators[edit]

Hi,

I suggest to create a new category for API documentation pages about queries that can used be as a generator :

Category:MediaWiki API generators

--DavidL (talk) 22:19, 13 April 2012 (UTC)

Number of results[edit]

Hey Wikipedia enthusiasts,

For another way browsing Wikipedia content I thought it would be a good idea to retrieve the number of a list that is being requested (e.g. prop=links or list=backlinks etc). Atm as far as I know only "prop=categoryinfo" returns the attributes "pages", "files" and "subcats". But it's not possible to get the number of parent categories, external links etc. And info of articles and other pages with other namespaces (prop=info) doesn't deliver any of such data. Is it somehow possible to get those numbers? I can't see any and for me it's somehow strange, that there is only one type of request that delivers some numbers (categoryinfo as described) but the others don't. Atm the only way to get the number of elements retrieved by a request is just to iterate through all replies (continue) but this is a stupid and resource wasting method when not all elements are needed. Or are there plans to enhance the wiki api in further development process? —The preceding unsigned comment was added by 89.182.35.255 (talkcontribs) 04:16, 7 July 2013‎ (UTC)

categoryinfo returns those attributes because those counts are actually part of the category info. There has been some talk about having a way to tell "list=backlinks&bllimit=200" that you're only interested in the count so it would then return a number 0–200 or "201+". But there will never be a way to get it to tell you there are 103927 backlinks, as the query to do so is simply too expensive, and it's unlikely for this to be included in prop=info for the same reason. BJorsch (WMF) (talk) 13:38, 8 July 2013 (UTC)

Better management of redirects ?[edit]

Hi,

I was wondering if it was possible to enhance the way the redirects parameter currently works. Currently, when you specify this parameter, the list of pages only contains the final pages (not the redirects themselves). Would it be possible to also have the redirect pages in them (with the current redirects parameter, or with adding an optional value to it) ?

--NicoV (talk) 07:28, 15 July 2013 (UTC)

The mapping of the redirects to targets is already included in the response. Anomie (talk) 14:27, 15 July 2013 (UTC)
Yes, but what I'm missing is all the information related to the redirect pages themselves : you only know its title and to which page it redirects, but you don't have all the other information about this redirect page (attributes of the page element in the response, page properties, ...)
For example, I need to retrieve all internal links in a page, and for links to redirect pages I also want the target in addition.
  • If I make a request for links without the redirects parameter, I then have to make a second request for each redirect page (information obtained through the redirect attribute of the answer) to know to what page it redirects.
  • If I make a request for links with the redirects parameter, I'm missing a few information about the redirect pages themselves, and I don't know if the page contains only links to the redirect page (let's call it A (redirect)), or also links to the target page (A (target)).
--NicoV (talk) 14:38, 15 July 2013 (UTC)

How does the search API for MediaWiki work?[edit]

I am using media wiki search API i.e. http://en.wikipedia.org/w/api.php?action=query&list=search&format=json&srsearch=Taj+Mahal+Agra in one of my project.

When I am trying to search for Taj Mahal Agra, my search results does not show any result for Taj Mahal although its one of the very popular place of India.

Infact there are no actual search results are returned, instead some suggested results are returned and it is not even possible to identify that the returned results are not the actual results and they are the suggested one instead.

Expected was, that the API to return actual search result with atleast for Taj Mahal as the first element in the resulting response.

And my concern is that I need to search with the string "Taj Mahal Agra" and not with only "Taj Mahal".

Please advice.

Thank you in advance. — Preceding unsigned comment added by Krishdamani (talkcontribs) 11:16, 8 January 2015‎ (UTC)

When I try that link, it gives the same results as https://en.wikipedia.org/wiki/Special:Search?search=Taj+Mahal+Agra&ns0=1&limit=10, which is the expected behavior. Anomie (talk) 14:12, 8 January 2015 (UTC)
Anomie, Thanks for replying. Yes, you are right the results are the same as that. But, what I meant is that since Taj Mahal is one of the most popular place of interest of India, then it should show the article for Taj Mahal. Please Advice.--Krishdamani (talk) 06:14, 9 January 2015 (UTC)
This is a duplicate topic of the one at API talk:Properties. Should we merge them, and perhaps move the discussion to API:Search, where it's probably the most appropriate? Robin Hood  (talk) 06:48, 9 January 2015 (UTC)
RobinHood70, yes it is the same one so we can merge it. Even I was willing it to be posted to the discussion to API:Search, but I didnot found any way how to post there, and hence I posted it here. So is it fine with you if you can move it to the dicussion to API:Search or else please let me know how can I do the same and I will do it. Sorry for the inconvinience. --Krishdamani (talk) 08:05, 9 January 2015 (UTC)
Since the API is working correctly, API:Search isn't the place for it either. You'd probably want to go to Talk:Search. Anomie (talk) 14:04, 9 January 2015 (UTC)
Adding a redirect from Taj Mahal, Agra solved this particular case. Rich Farmbrough 21:36, 14 April 2016 (UTC).

Suggestion to migrate to a search standard[edit]

I've been exploring your "query" action for a little while now and I'm finding it's quite limited in power. It doesn't support negations, for example. You might want to find all articles for the category "Physics" that don't have the category "Chemistry" - that sort of thing.

I think the generators are a good compositional feature and you get a lot of mileage out of them, but they're quite limited in that you can only apply them once and they're not a general solution for all types of query composition. For example, you might want to find a random article in a given category. This could be some kind of composition between the categorymembers list and the random list, but that's not how generators work.

I'm kind of wondering whether it would make sense to look at whether it would be possible to migrate this "query" action over to use a standard search technology like Elasticsearch. Articles are effectively documents that have a number of attributes on them. Using a search standard would give you a lot of power for free without having to roll your own implementations of these features. — Preceding unsigned comment added by 82.23.61.36 (talkcontribs) 00:33, 12 August 2015 (UTC)

You can already use list=search as a generator if that's what you want, which (combined with the CirrusSearch extension uses ElasticSearch. The general query action also allows for what are basically SQL queries against the underlying database, rather than the more fuzzy searches available from a search engine. "Composition" of these SQL queries isn't generally provided since it can easily lead to poor performance and other issues. Anomie (talk) 12:27, 12 August 2015 (UTC)

Number of sections and headlines[edit]

I've got two questions concerning sections:

  1. Is there a query which returns the number of sections of an article?
  2. Is there a query which returns the headlines of the sections of an article?

--jobu0101 (talk) 21:59, 14 February 2016 (UTC)

There are no direct methods of doing that other than to load the page content and parse it. If you're really desperate to do that, you can get there indirection by using a revisions query and iterating rvsection from 1 upwards until it gives you an error, but that would be extremely inefficient and resource intensive. Robin Hood  (talk) 04:05, 15 February 2016 (UTC)
Parsing it is the way to go, in particular using action=parse with the sections prop. Anomie (talk) 14:37, 16 February 2016 (UTC)
I had a feeling I was forgetting something. Thanks, Anomie! Robin Hood  (talk) 20:57, 16 February 2016 (UTC)

@Anomie: Thank you, that's great: [1]. --jobu0101 (talk) 21:46, 19 February 2016 (UTC)

Query if title is blocked[edit]

Sometimes certain non-existing articles are blocked for creation. I'm looking for a way to query if a title is blocked. --jobu0101 (talk) 21:48, 19 February 2016 (UTC)

If you're looking for one specific title, you can use action=query&prop=info&inprop=protection&titles=PageTitle. For a listing of all protected titles, you can use action=query&list=protectedtitles. See API:Protectedtitles for the full list of parameters. Robin Hood  (talk) 23:08, 19 February 2016 (UTC)
Thank you very much! The first solution is the one I was looking for. But I also like the approach of the second one. Unfortunately [2] doesn't look like there is a possibility to only show pages with "restrictiontypes": ["create"]. --jobu0101 (talk) 01:36, 20 February 2016 (UTC)
The entire list is for create-protected pages. Pages that exist and are protected are found using API:Allpages. Robin Hood  (talk) 05:40, 20 February 2016 (UTC)
Oh, you're right. Thanks for mentioning. --jobu0101 (talk) 09:31, 20 February 2016 (UTC)

Request for Revisions API[edit]

See here. --jobu0101 (talk) 09:07, 4 March 2016 (UTC)

AllPages not giving 'all pages'[edit]

On my Main Page, when describing how many articles I have up, the Statistics|NUMBEROFARTICLES markup doesn't actually give me what I want. On All Pages it says I have 5 "content pages"... I think from some Googling that it ONLY counts a 'content' page if it links to another wiki or has a category, is that right? I'd like it to count ALL pages added by people. There are definitely more than 5 articles on AllPages, so why isn't it showing?

--Dog1994 (talk) 06:54, 14 April 2016 (UTC)

This is because we wanted to exclude redirects, and possibly tiny stubs. See the manual for detail and various fixes. Rich Farmbrough 21:40, 14 April 2016 (UTC).

What links here?[edit]

How can I get the pages that link to a certain page using the API? This there also a query to just get the number of pages that link to it? --jobu0101 (talk) 19:25, 1 June 2016 (UTC)

See API:Backlinks. I don't think you can just get a count other than by the obvious method of getting the actual results and then counting them. Robin Hood  (talk) 22:28, 1 June 2016 (UTC)

Documentation of page protections[edit]

Where can I find a documentation of all possible page protections like autoconfirmed, editeditorprotected and so on? --jobu0101 (talk) 21:50, 22 September 2016 (UTC)

@RobinHood70: You helped me several times. Maybe you know how to help me in this concern, too. --jobu0101 (talk) 15:34, 1 October 2016 (UTC)
As far as I know, it's just the group name (which you can find through action=siteinfo&siprop=usergroups). In some parts of the API, "all" means everyone can take whatever action, but there isn't actually a group named "all". See API:Protect, for example. Robin Hood  (talk) 19:26, 5 October 2016 (UTC)
@RobinHood70: Thanks. I was asking because I read at de:Hilfe:Seitenschutz#Individueller_Seitenschutz that there are only three possibilities: autoconfirmed, editeditorprotected and sysop. But then I encountered editprotected (see [3]). So I was wondering what that might be. --jobu0101 (talk) 14:50, 22 November 2016 (UTC)
The possible page protections are defined by $wgRestrictionLevels. The levels are supposed to correspond to user rights, but for backwards compatibility "sysop" and "autoconfirmed" are accepted as aliases for the "editprotected" and "editsemiprotected" rights, respectively. Anomie (talk) 14:39, 23 November 2016 (UTC)
Thanks for pointing that out. --jobu0101 (talk) 20:47, 26 November 2016 (UTC)

On the order of titles taken out of generator[edit]

I used `list=search` to generate a list of search results, like this, and the page named Test is listed as the first result, which indicates that it is a close match. However, when I used `generator=search` to get this list of titles, like this, the order is different from the former one, and is somewhat random.

My question is, whether it is always the case for `generator` to retain the same order as `list`? From my observation, it might not hold.

--CXuesong (talk) 13:04, 6 March 2017 (UTC)

A generator processes titles in a particular order, but the titles in the pages property of a query response are not ordered. So, for example, the first call to generator=search will always return the first 10 pages from the search but those 10 probably won't be in the search result order like they are from list=search. For generator=search, however, you may use the index field to sort the pages on the client side if necessary. Anomie (talk) 14:37, 6 March 2017 (UTC)
Wow. Haven't noticed it before. Thank you, Anomie! Frankly I've never thought of the ordering issue when taking the results out of the generator, until it comes to search, not to mention index, which looks like a special property that only exists in the search result.

--CXuesong (talk) 09:32, 7 March 2017 (UTC)

automatically generated documentation for query -> references[edit]

The given example with "Albert Einstein" returns error, even in en.wikipedia (https://en.wikipedia.org/wiki/Special:ApiSandbox#action=query&format=json&prop=references&titles=Albert+Einstein).

"code": "citestoragedisabled",
"info": "Cite extension reference storage is not enabled.". --Xoristzatziki (talk) 06:12, 15 July 2017 (UTC)
  • Yes, this is because of settings, but when enabled it should work. --wargo (talk) 11:41, 15 July 2017 (UTC)