API talk:Main page

Suggested API 27 April 2006
I was not able to find a list of suggested API functions for retrieving Information via SOAP/WHATEVER. If you find one that already exists, please delete this page. If you consider meta inappropriete for this kind of page, please delete it. Or move it to my user name subpage. If you feel offended by the pseudo-syntax or naming of these functions, alter them unless they are beyond repair. Thanks.

Articles

 * article_mw ( string lemma )

article_mw returns a single article from a wiki in the original Mediawiki syntax


 * article_xml ( string lemma )

article_xml returns a single article from a wiki in xml output


 * article_section_mw ( string lemma, int section)

returns a single section from a single article from a wiki in the original mediawiki syntax. section is the same as section in the edit option

Search

 * list of articles that contain string query
 * size of an article (chars, bytes, words)
 * list of authors of an article (ugly to compute when it comes to articles with 1000+ revisions)

About save
Hi everybody. I noticed two things about the "save" action:


 * 1) shouldn't the request include the time of the last revision of the article? otherwise, how can the server detect an edit conflict?
 * 2) I think the new content of the page should be passed as a POST request, rather than being embedded in the URL.

Thanks for your attention Paolo Liberatore 10:55, 15 September 2006 (UTC)
 * Hi Paolo. POST & GET are treated the same, thus it really doesn't matter (except in login, where I migth force data to be sent in the post to avoid leaving logging traces, etc. Here i list them in the GET format just for simplicity sake. As for the saving, you are correct, the edit token will contain the revid of the article being submitted. I just haven't gotten around to spescifying the saving part. Thanks for checking! --Yurik 14:37, 15 September 2006 (UTC)

Auto merging/edit conflicts
I do have some concerns about the merge case. I think the merge process should be made explicit, which means the client should be able to retrieve the would-be-merged result before the save. Which means there should be a conflict detection query. Maybe a session model would be helpful. Use case A (very rough idea): This would allow to build more editor-like (or browser-like) client apps. --Ligulem 09:12, 17 September 2006 (UTC)
 * 1) The client could say to the server: "Give me the content of page P" (Get(P) returning content)
 * 2) then "Show me the text of page P that would result if I would save it with content C1 right now (and please tell me if this would need a merge and if yes whether the merge would be successful or not )" (Mergecheck(C1) returning mergeresult)
 * 3) then "I want to save page P with content C2" (Save(P,C2)).


 * Please take a look at the parameters for the submit request. Is that what you were looking for? --Yurik 13:36, 18 September 2006 (UTC)


 * Looks good. I think that's it. (sorry for the late reply) --Ligulem 12:06, 22 September 2006 (UTC)

possible category problem
Hi, using the API on large categories gives me less results than it should, see en:Category:Uncategorized from September 2006 for example. My program using queri API reports 4712 articles, when there are actually over 5000 articles in that category. I have tested my software, and I don't see that it is doing anything wrong, and indeed works perfectly on smaller categories, though I still don't rule out that I am at fault. I would be greatful if someone could use their queri API software on that category and see how many results they are given. thanks Martin 09:57, 17 September 2006 (UTC)


 * Using the C# example from the en user manual gives even less results, so I think something somewhere is going wrong with large cats. Martin 09:51, 18 September 2006 (UTC)


 * Hi Martin, I will take a look in the next few days. Thanks for reporting. --Yurik 13:35, 18 September 2006 (UTC)
 * Yurik fixed a related bug in queryapi.php a few days ago. Phe 15:33, 18 October 2006 (UTC)

Protected pages
Hi. I have another request... would it be possible to add  or something similar to retrieve the list of protected articles? I could not find any other way (other than downloading ) for doing so. Thanks! Paolo Liberatore 12:46, 3 October 2006 (UTC)
 * I am not sure the list of protected articles is available from the database. If it is, I can certainly expose it. --Yurik 16:24, 3 October 2006 (UTC)
 * That would be great! Thank you. Paolo Liberatore 16:16, 10 October 2006 (UTC)
 * As an alternative that could be possibly be simpler to implement, the entire record of an article from the page table could be retrived via . Paolo Liberatore 17:52, 6 November 2006 (UTC)
 * I just noticed there is a page_restrictions field in the page table. I will have to find out what it might contain (the Page table is not being very descriptive, and afterwards add that to prop=info. Filtering by that field is problematic because its a tiny blob (i don't think mysql can handle blob indexing, but i might be wrong). --Yurik 22:31, 6 November 2006 (UTC)
 * Here is my reading of the source code. The point where page_restrictions is interpreted is . There,   reads this field, parses it and stores the result in the   array. This array is then used by   to return an arrary containing the groups that are allowed to perform a given action. As far as I can see, this array is such that   is an array that contains the groups that are allowed to perform the edit action, etc. Except that, if this array is empty, no restriction (other than default ones) apply.
 * In the database dump, the page_restrictions field appears to be either empty or something like  or just  . The explanation of the table says that this is a comma-separated list, but the comments in Title.php mention that is an old format (I think that the value 'sysop' is actually in this format; no page in wikien current has a comma in the page_restrictions field). Paolo Liberatore 15:52, 11 November 2006 (UTC)

User preferences options
I tried to download User.php in order to view the uioptions, but I cannot locate the file starting from http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/BotQuery/. Is it within svn.mediawiki.org ?

Secondly, I compared the some uioptions identifiers to the WP user preferences identifiers (). They do not match (e.g., timecorrection vs wpHourDiff). It would be nice to use the same identifiers.

Regards, Sherbrooke 12:48, 8 October 2006 (UTC)


 * User.php is located in /phase3/includes. The useroption name is the same as used internally in the database, and is constant across all wikies. --Yurik 19:19, 8 October 2006 (UTC)

Category belonging to an article
Actually query.php can't get category in a redirect but there is sensible use of such categorized redirect e.g. original book's titles is a redirect to the local book's title and can be categorized. It'll nice if api.php solve this problem. Phe 15:37, 18 October 2006 (UTC)
 * Afaics fixing it in query.php seems not difficult, in genPageCategoryLinks use $this->existingPageIds instead of $this->nonRedirPageIds but I'm unsure if it'll break existing bot... Phe 15:42, 18 October 2006 (UTC)

API?
What about telling the unfamiliar what API actually is? The page doesn't make much sense without knowing that beforehand… Jon Harald Søby 17:42, 30 October 2006 (UTC)
 * Added links. Feel free to add/change the page to make the information better. Thx. --Yurik 19:38, 31 October 2006 (UTC)

How to retrieve the html format?
I am interested in retrieving a formatted html version of the content of an article, with the links parsed and pointing to wikimedia pages, but without tabs, menu options, etc. The article as you can see it inside wikipedia.--Opinador 11:26, 5 November 2006 (UTC)
 * The easiest way would probably be to add action=render to the regular wiki request: http://en.wikipedia.org/w/index.php?title=Main_Page&action=render. I plan to add it to the API, but it will not happen soon - currently the internal wiki code is not very well suited for this kind of requests. --Yurik 16:17, 6 November 2006 (UTC)
 * Thank you, I think is enough for me at this moment. But probably it will be useful to have in API this aproach, or at least to have a parameter to force expand templates before returning the page results.--Opinador 10:26, 7 November 2006 (UTC)

Diffs
Are there any plans to provide raw diffs between revisions? This would be particularly helpful for vandalism fighting bots who check text changes for specific keywords and other patterns. The advantage would be that diffs require much less bandwidth and therefore increase bot response time. What do you think? Sebmol 15:24, 21 November 2006 (UTC)

Count
Would it be possible to add an option which woud simply return the number of articles which satisfied the supplied conditions? This would be especially helpful counting large categories, like en:Category:Living people. HTH HAND —Phil | Talk 15:59, 29 November 2006 (UTC)
 * I second that emotion. Zocky 23:26, 9 December 2006 (UTC)

version/image info
I think it would be useful to get information about the sofware version (i.e. core version, extensions, hooks/functions installed) and images (i.e. checksum would useful to see if an image has changed). -Sanbeg 21:08, 7 December 2006 (UTC)

Login / lg
How is the password verified by the server? How is the password sent? Plaintext / encrypted? What encryption is used?

Perhaps someone more knowledgable can fill in this table: Thanks a lot in advance, Shinobu 04:49, 20 December 2006 (UTC)


 * From what I can tell, HTTPS is not supported. Also, since the authentication is form-based, the only option for sending the password is plain text. Using status code 401, HTTP itself supports Basic and Digest authentication as standard schemes, although Digest is not used all that often. In the case of Basic, it is effectively plain text; for Digest, it is an MD5 digest of the password, so it is effectively encrypted. All of this is moot because MediaWiki doesn't do this.


 * The problem is that if anything but plaintext is received by the server, you are limited in how the passwords can be stored in the database. For instance, if you use digest authentication based on MD5, the version of the password has to be based on the MD5 hash of the password, since it can't check against another hashing/crypting scheme without reprocessing the original password. In the case of MediaWiki, this would actually work, since it's database value is based on an MD5 hash of the plain text, but I'm not sure that there is consistent browser support for Digest authentication. The other issue is that using form-based authentication allows you to have a pretty HTML form for posting and let's you implement stuff like "Remember me", since HTTP authentication doesn't support this (regardless of whether you use Basic or Digest). Hope that helps. Mike Dillon 16:38, 22 December 2006 (UTC)


 * What I said about Digest authentication is a little inaccurate; since it isn't relevant to what MediaWiki does, you can just ignore my correction if you aren't interested.


 * The real problem is that without access to a plaintext versions of the password on the server, the server can't reproduce the expected response in the challenge/response protocol. This is because the response is based on the MD5 hash of not just the password, but other information as well. Since MediaWiki only stores the MD5 hash of the password, it can't generate the expected hash when the other components of the HA1 hash are factored in (see Digest authentication for more details).


 * There are further complications if MediaWiki has been configured with the  setting, since it actually double-hashes to get the value that's stored in the database in that case. Since MD5 is a one-way hash, there is no way for the server to get back to the original password. All it can do is verify that what the user sent as the password can be re-hashed to the same value. Mike Dillon 04:42, 24 December 2006 (UTC)

So in summary, the password is always sent in plaintext?

@From what I can tell, HTTPS is not supported: I think https is supported: Main Page (https), so I should think that only URLs are passed unencrypted, right? POST data and responses should be encrypted. Unfortunately, it doesn't seem to know that I already signed in on http, so apparently when you use the https login you have to do everything in https. Question is, is API supported using https?

@form-based, the only option for sending the password is plain text: One could calculate the MD5 hash using JavaScript, or one could use https. Considering I've already sent my password in plaintext, it should at least be possible to verify and/or change it without doing so again. Sending the password in plaintext again and again seems even less wise than doing it once.

I'm pretty new at this, but sending passwords in plaintext seems pretty dumb to a layman. Not warning about it even more so. Shinobu 08:03, 24 December 2006 (UTC)


 * I didn't know about secure.wikimedia.org. I'm not sure that you can log in to that server and get a cookie that will work on a non HTTP version, though. I just tried to use: https://secure.wikimedia.org/wikipedia/en/w/api.php?action=login&format=xml&lgname=Username&lgpassword=Password&lgdomain=en.wikipedia.org and I got a cookie in "secure.wikimedia.org", which means that all traffic must go through that domain instead of en.wikipedia.org. Also, since the server sets the "secure" flag on the cookie, normal HTTP libraries won't send it to the plain HTTP URLs on secure.wikimedia.org, so you have to take the SSL overhead on every request to stay logged in. If you aren't trying to use the cookies returned, you should be able to use the lgtoken value returned in the API response against en.wikipedia.org, but I haven't tested that. I didn't look into what is done on the server side with the login token.


 * As for what is encrypted when using HTTPS, everything is encrypted, including the URL. All that can be seen by sniffing is that you're sending data to the server's IP address using SSL. You can't see any headers or the HTTP request body or URL, since they're part of the SSL payload.


 * Regarding the use of Javascript to send an MD5 hash, it could be done, but the MediaWiki software would have to support it and you'd have to find Javascript code to do the hashing.


 * I'm not sure what you mean about sending the password again and again. It should only be sent to the login action. After that, you either use the cookies returned or the login token in the API response to send what is effectively a session id, not your password. Mike Dillon 20:31, 24 December 2006 (UTC)

Regarding encryption on secure.wikimedia.org, I just saw this post on English Wikipedia's technical village pump. The poster claims that the "null" cipher is possibly being used for the HTTPS server and if so even that is not encrypted... I just checked it and saw "AES 256", so I'm not sure whether it is encrypted or not. The browser said it was AES 256 and the stream looked encrypted in a packet capture, so this is mainly just an FYI. I'm not sure if there is any documentation on the cipher used by the HTTPS server. Mike Dillon 00:23, 27 December 2006 (UTC)


 * There have been some additions to the aforementioned discussion from Brion VIBBER. The null encryption thing is not true, but he mentions some other quirks of the secure.wikimedia.org server. See en:Wikipedia:Village pump (technical). Mike Dillon 15:33, 27 December 2006 (UTC)

@you'd have to find Javascript code to do the hashing: There is a module to do this. w:User:Lupin/md5-2.2alpha.js Although one should be careful about what to hash, because otherwise the hash could be sniffed and used instead, defeating the purpose of sending a hash.

As much as I understand this to be a complicated issue, I still think it needs to be fixed. Shinobu 19:06, 28 December 2006 (UTC)


 * I just tried the following:
 * Log in using API to secure.wikimedia.org
 * Note the lgusername, lgtoken, and lguserid fields in the response
 * Request a watchlist against en.wikipedia.org using those values
 * Unfortunately, I received . As I said before, I haven't looked into how the tokens work to see what the problem might be, but I would think this should work. Of course, at that point your session can be hijacked since you're sending the token over the wire unencrypted, but there's really no way to avoid this except for using SSL for every request. At least nobody can steal your password...


 * I also noticed that I get the same token back for subsequent login requests. Mike Dillon 22:13, 30 December 2006 (UTC)


 * I am also trying to the exact same thing as Mike Dillon said above. I've gone through the same steps with the same lack of result. I'm passing lgusername, lgtoken and lguserid in POST, and verifying this in my app before it goes out. I'm extremely new the API, but trying getting it rapidly except for this hiccup. Any progress here? Eddieroger 05:58, 13 February 2007 (UTC)

Select category members by timestamp
Hello, is it possible to retrieve cm with a parameter that lists only starting from a certain timestamp? This should be possible, since a timestamp is stored in the categorylinks database table. Is this the correct place to ask, or should I file a bug in bugzilla? Bryan 12:18, 24 December 2006 (UTC)


 * Similarly get all templated embedding pages by timestamp would be useful for monitoring when a template gets used because it contains structured data that can be reflected on other websites.82.6.99.21 15:14, 9 February 2007 (UTC)

Formatting bug
When I call the API with "action=opensearch", it will only return results in the JSON format. It doesn't matter what the search term is or what I pass to the format argument. Forgive me if this isn't the proper place to report a bug like this. 68.253.13.166 20:14, 3 February 2007 (UTC)

feedwatchlist bug
Even after successfully logging in with "action=login", if we don't use cookies but rather pass the lgusername, lguserid and lgtoken returned as parameters along with "action=feedwatchlist" then we get a HTTP 500 error ( tried this with http://en.wikipedia.org/w/api.php just now). The documentation says that both parameters and cookies should work. Jyotirmoyb 05:52, 20 February 2007 (UTC)
 * If you are trying to reproduce this bug with a browser, make sure to clear the cookies set by a successful login. Jyotirmoyb 05:55, 20 February 2007 (UTC)
 * Is it normal behaviour that watchlist topics appear approx. 1 day late? Thanks, 84.160.6.41 21:05, 9 March 2007 (UTC)

YAML Formatted Errors
I'm working on writing some methods to access the API, requesting my results in YAML format. It works fine when I send a proper request, which is most of the time, but occasionally I do something wrong and get the helpful error message. However, on having my code process the YAML result, I'm getting an error. Specifically, it reports "ArgumentError: syntax error on line 12, col 2: ` For more details see API'" Is this because the result is not properly formatted YAML, or a Ruby error? I don't know very much about YAML, learning as I go along, but I like the format so far. Is anyone else having this issue, and is it an issue at all? Thanks. Eddie Roger 04:21, 21 February 2007 (UTC)

JSON Callback
It would be really nice if the api had a JSON callback, à la Yahoo Web Services. This way, any website could make an AJAX request to Wikipedia! --Bkkbrad 17:16, 26 February 2007 (UTC)

mw-plusminus data
MediaWiki recently add mw-plusminus tags to Special:Recentchanges and Special:Watchlist, which display the number of characters added/removed by the edit. I was wondering if it may be possible for api.php to both retrieve this info and somehow append the data to query requests that involve page histories, recentchanges, etc. I would most like to see this available in usercontribs queries, as that data is currently not retrievable through any other medium (save for pulling up each edit one at a time and compaing the added/removed chars). Thanks. AmiDaniel 23:55, 26 February 2007 (UTC)

Query Limits
Hi all, another suggestion from me. On the API it is stated that the query limit of 500 items is waived for bot-flagged accounts to the MediaWiki max query of 5000. Would it be possible to extend this same waiver to logged-in admin accounts? I have multiple instances in some of my more recent scripts that query, quite literally, thousands of items (backlinks, contribs, etc.) that are primarily only used by myself and other administrators on enwiki. While having to break these queries up into units of 500 causes a rather dramatic performance decrease on the user's end, I can't imagine it's particularly friendly on the servers either (or rather, submitting two or three requests of 5000 each would likely be less harmful to the servers than twenty or thirty requests of 500 each). Generally, admins are considered similarly well-trusted not to abuse such abilities, and they are also small in numbers on all projects. Also, the MediaWiki software itself allows similar queries of upto 5000, and I most presume that loading this data from MediaWiki is far more burdenous on the servers than submitting the same query through API. This clearly more a political matter than a technical one, but I'd like to know if there are any objections to implementing such a change. Thanks. AmiDaniel 05:05, 1 March 2007 (UTC)