Requests for comment/API roadmap

From MediaWiki.org
Jump to: navigation, search
General2013-01-14Anomie, Yurikawaiting feedbackongoing
Request for comment
API roadmap
Component General
Creation date 2013-01-14
Author(s) Anomie, Yurik
Document status awaiting feedback
Implementation status ongoing

Background[edit | edit source]

MediaWiki API has been steadily growing and adding features, and even though it provides most of the desired functionality, it has some areas in which it could be improved.

This RFC serves as an announcement of proposed breaking changes and a request for feedback on major new features. It's also Brad's todo list for after Wikimania.

Proposals for implementation[edit | edit source]

Deprecation process[edit | edit source]

Discussion at the Architecture Summit in January 2014 was generally favorable to deprecations of major features, as long as we give people enough time to update. Minor changes will continue to be announced to the mediawiki-api-announce mailing list.

When it is possible for the new version of the feature to coexist with the old (e.g. prop=imageinfo and prop=fileinfo):

  1. The new feature will be implemented.
  2. The deprecation will be announced:
    • A message will be sent to the mediawiki-api-announce mailing list.
    • The old feature will report deprecation warnings.
    • Uses of the deprecated feature will be logged on the server (currently in WMF's case this is on fluorine where deployers have access).
  3. After a suitable timeframe (e.g. if the deprecation was in MediaWiki 1.24, during the 1.25 development cycle), usage of the deprecated feature on WMF wikis will be evaluated and the deprecated feature may be removed.

When it is not possible for the new version to coexist with the old (e.g. changing format=json):

  1. The new feature will be implemented, but must be explicitly requested by clients via a query parameter.
  2. The deprecation will be announced:
    • A message will be sent to the mediawiki-api-announce mailing list.
    • Deprecation warnings will be output when the parameter to request the new version is not given.
    • Uses of the deprecated feature will be logged privately.
  3. After a suitable timeframe, the new version will become the default and the old removed. The "request the new version" parameter will be silently ignored.
  4. The "request the new version" parameter will at some point be removed, leading to "unrecognized parameter" warnings.

When the default for a behavior is to be changed but the old behavior is not being removed (e.g. changing the default continuation to be the new easy-to-use style rather than the current query-continue):

  1. If not already present, a request parameter will be added to specifically request the old behavior.
  2. The change will be announced:
    • A message will be sent to the mediawiki-api-announce mailing list.
    • Deprecation warnings will be output when neither the select-new-version nor the select-old-version flags are used. Logs will also be made.
  3. After a suitable timeframe, the new version will become the default.
  4. Any flag to select the new version explicitly may at some point be removed, leading to "unrecognized parameter" warnings.

Gerrit changes:

Comments[edit | edit source]

  • Suggestion from Wikimania: Add a way for developers to register for deprecation warnings generated by a particular user agent. Should run by Legal for privacy issues. Anomie (talk) 16:18, 6 August 2014 (UTC)

Remove obsolete output formats[edit | edit source]

The following output formats will be deprecated and removed:

  • wddx / wddxfm
  • yaml / yamlfm - it's identical to json anyway
  • txt / txtfm
  • dbg / dbgfm
  • dump / dumpfm

The following output formats will remain: json / jsonfm, xml / xmlfm, php / phpfm, rawfm, none.

JSON will be the preferred output format.

Gerrit change: gerrit:154098

Comments[edit | edit source]

  • +1 Legoktm (talk) 22:29, 16 July 2014 (UTC)
  • +1 Addshore (talk) 10:59, 20 July 2014 (UTC)
  • -1 because txt/txtfm is very useful for me to the readable PHP format. If txt/txtfm is kept, I will +1.Cyberpower678 (talk) 18:59, 29 July 2014 (UTC)
  • -1 I remember submitting a patch to add txt format, as debugging the output from the serialized PHP format was extremely irritating and time consuming. I can see getting rid of dbg and dump, as they're mostly extensions of that, though. As Cyberpower said, consider this a +1 if txt is kept. Soxred93 (talk) 01:09, 30 July 2014 (UTC)
    • @Cyberpower678, X!: What advantage does txt/txtfm have over jsonfm, besides that you personally might be more familiar with PHP's print_r output than JSON? Anomie (talk) 11:37, 30 July 2014 (UTC)
    • Indeed, Jsonfm is extremely easy to debug with :) Addshore (talk) 07:46, 1 August 2014 (UTC)
      • I end up misreading JSON, because of the quotes and semicolons bunched together like that. It's an issue I have in general when looking for something in a piece of text. TXT far easier for me to read, and search through than JSON, because the array keys are surrounded by brackets which stand out more as well as the value of an index gets pointed to with "=>" which also stands out more. It's more of a readability issue, rather than not understanding how to read it. Also I don't see why we need to remove it. It's not like you have a sloppy custom TXT generator. It simply PHP's print_r. But if you guys are still going to remove it, I suppose I could get used to it, but I would really prefer to use PHP's print_r because it's easier for me to read.Cyberpower678 (talk) 11:44, 2 August 2014 (UTC)
  • +1 -FASTILY 09:53, 31 July 2014 (UTC)
  • +0 An advantage of txt/txtfm (and xml/xmlfm) is its human readability. Also, most browsers will attempt to display xml. Sidenote: in php 5.3, at least, json has a slight performance advantage over php's own serialization. - Amgine (talk) 01:27, 2 August 2014 (UTC)
    JSON with appropriate whitespace is as human-readable, IMO. And all the "fm" formats are served as an HTML page, so browser display isn't much of an issue. Anomie (talk) 16:07, 6 August 2014 (UTC)

JSON output as default[edit | edit source]

The API currently defaults to xmlfm when no format parameter is given. This will be changed to jsonfm.

Note this will not affect modules that use their own custom output formatters. Also, action=help will be getting its own custom output formatter (see below).

As no client should be trying to parse the *fm formats, this probably won't follow a deprecation process. It'll just be done once action=help is rewritten.

Comments[edit | edit source]

Changes to JSON output format[edit | edit source]

The existing JSON format suffers from a number of shortcomings that make it more difficult to use than necessary. Many of these are inherited from the underlying data structure being designed for the XML format. Thus, format=json2 will be created with the following differences and the existing format=json will be deprecated and eventually removed:

  • The existing 'utf8' option will be the default.
    • A new 'ascii' request parameter will be introduced for clients who need all non-ASCII codepoints escaped.
  • Anything using '*' as a key will be renamed to something more natural. In some cases this may result in something like "query.page[1].foo['*']" becoming simply "query.page[1].foo", and in others something like "query.page[1].foo.content".
  • Boolean result properties will use boolean true as the value, rather than the empty string. Whether a property will be present with a boolean false value or will continue to be entirely absent from the result when false will be determined on a case-by-case basis.
    • Result parameters that are already being returned as booleans may accidentally change to the empty-string style in the format=json output.
  • Page lists will be returned as arrays rather than objects with page_ids as keys. This will make it easier for clients to iterate over the results.
    • The 'indexpageids' parameter will be removed.
  • The JSON formatter currently has a tendency to return values that are normally objects as arrays when empty (bug 10887). This will be easily fixable.

On the MediaWiki code side, developers will see the following changes:

  • If anything is currently returning boolean values as actual booleans rather than the API standard empty-string, code will need to be changed to preserve this behavior in the non-sane output. The exact code change is yet to be decided, but will be something along the lines of having to pass such boolean values through some method on ApiResult.
  • There will be a way to explicitly tag a PHP array in the result as "array" or "object", much like how ApiResult::setIndexedTagName is used for the XML format.
    • Ambiguous cases, such as empty arrays or some kinds of arrays with integer keys, might throw an error if not explicitly tagged. If this would bother you, comment!

Comments[edit | edit source]

  • Note this proposal has been modified slightly: due to the fact that the changes here would subtly break clients that weren't updated during the whole transition period, it seems better to make them break cleanly by having "format=json" simply fail after the transition period. I'm not terribly fond of the name "json2", but I can't think of anything better. Anomie (talk) 11:27, 7 August 2014 (UTC)
  • As a suggestion - how about "newjson", "betterjson", or maybe even "sanejson"? :P -FASTILY 20:25, 9 August 2014 (UTC)
The problem with those is: what happens when the next version of JSON rolls around? The naming starts to get silly when you've got "newerjson", "newestjson", "evennewerjson", "latestandgreatestjson", etc. :) – RobinHood70 talk 23:58, 9 August 2014 (UTC)

Changes to PHP output format[edit | edit source]

The changes described above for the JSON output format will also be applied to the PHP output format, where applicable.

Comments[edit | edit source]

  • From a code perspective, are we just going to have one class that handles preparing an array for formatting, and then the subclasses just do something like return FormatJson::encode( $stuff ) or return serialize( $stuff )? Legoktm (talk) 22:36, 16 July 2014 (UTC)
    • That's already basically how it works, I'm not planning on changing it. Anomie (talk) 20:00, 31 July 2014 (UTC)
  • For the same reason the JSON changes changed to format=json2, this'll probably not happen. Or else be format=php2. Anomie (talk) 20:14, 9 August 2014 (UTC)

Changes to XML output format[edit | edit source]

Changes here will mostly be on the back-end; the actual data output to clients is intended to remain the same wherever possible. However, clients should be prepared for the following:

  • Result structure may no longer match the JSON format.
  • Tag and attribute names may be encoded when not conforming to XML requirements.
  • Result structure may change depending on the specific query. For example, passing both rvprop=content and rvdiffto=prev to prop=revisions will currently omit the diff from the result (bug 55371) (it should be throwing an error, but that's another bug). In the future, it's likely that this will return the content as the value of the <rev> node when rvdiffto is not supplied and as the value of a <content> subnode of the <rev> node when it is.

For example, bug 43221 was fixed by changing the names of attributes such as "4::foo" to fit XML's restrictions. In the future, this would be fixed by either encoding the name (e.g. "_4.3A..3A.foo") or by changing the structure of output in only the XML format (e.g. <attribute name="4::foo">).

On the MediaWiki code side, developers will see the following changes:

  • The XML formatter will no longer die if ApiResult::setIndexedTagName() is forgotten. Instead, it will act as if that were called with something generic (e.g. ApiResult::setIndexedTagName( $array, 'item' )).
  • The XML formatter will no longer (be supposed to) raise an error when a node has both node content (ApiResult::setContent) and non-scalar attributes. Instead, it will simply shove the intended node content into a subnode.
  • Anything that's hard-coding '*' instead of using ApiResult::setContent is going to break.
  • There may be additional ApiResult calls required in some cases.

Comments[edit | edit source]

Changes to pretty-printed HTML formats[edit | edit source]

The pretty-printed HTML formats (jsonfm, xmlfm, phpfm, rawfm) will likely lose the automatic linking of links and various other bits of fanciness. They will gain a hook to allow for syntax highlighting via extensions such as Extension:SyntaxHighlight_GeSHi.

Comments[edit | edit source]

The geshi extension uses CSS provided by ResourceLoader to style highlighted syntax. Are you thinking of adding ResourceLoader support to api.php? TBH, I don't really see the point of adding syntax highlighting... Legoktm (talk) 23:04, 16 July 2014 (UTC)

Just as an alternative idea.... Add a index.php Special page, where you can post the api.php output to, then just add a link in the api.php introduction paragraph to this "syntax highlighted" output. Avoids mixing api.php and index.php more than that is desired. TheDJ (talk) 13:36, 17 July 2014 (UTC)

Why to remove the auto-linking feature? --Ricordisamoa 23:37, 25 July 2014 (UTC)

Because it would probably get in the way of proper syntax highlighting, seems like it's only useful for action=help which is going to be redone, and has been the source of bugs like bug 61362. Anomie (talk) 13:34, 28 July 2014 (UTC)

HTMLizing action=help[edit | edit source]

The output from action=help is currently a plain-text document wrapped in the usual API output formatting, with a few links and bolding added in post-processing when viewed via xmlfm. This will be changed to output an HTML document intended for viewing in a browser.

The default view of api.php will provide only general information and documentation of the main module (i.e. the bits of the current page above the "*** Modules ***" line and the credits at the bottom), with the various module names in the documentation for the 'format' and 'action' parameters linking to documentation for those modules. The documentation for action=query will similarly document only the query module itself, with links to documentation for the various 'prop', 'list', and 'meta' modules. There will be an option to output documentation for all modules on one page, likely 'all=1'.

At the same time, the various bits of text in the API help should be made localizable.

The possibility of including a version of Special:ApiSandbox on the help pages is also under consideration, although that may be left for a later iteration.

If anyone is actually using action=help from a client, please comment about your use cases if they wouldn't be satisfied by this proposal.

Comments[edit | edit source]

  • Why don't we just turn the help into a special page? Legoktm (talk) 22:34, 16 July 2014 (UTC)
    Forgive me if I'm wrong (I haven't look at MW code in a while), but it seems like it'd be good practice to keep the API specific code out of the main software, i.e. in the main api.php page. I can't remember how much mixing and mashing there is, though. Soxred93 (talk) 01:09, 30 July 2014 (UTC)
    Well, we already have Special:ApiSandbox...I was just thinking of a static version of that page. Legoktm (talk) 21:21, 30 July 2014 (UTC)

Internationalizing API warnings and errors[edit | edit source]

API warnings and errors are currently returned in English (bug 35074), and further multiple warnings are concatenated into a single text string.

The error codes will generally not change, this will only control the human-readable messages.

The plan is for an error-language option with the following possibilities:

  • 'none' returns the message key and parameters, no human-readable message
  • 'user' uses the language in $wgLang to generate a human-readable message
  • A language code uses the specified language

The non-'none' options would have an additional option to specify whether the message should be returned as HTML ($msg->parse()), wikitext ($msg->text()), or wikitext ignoring site-local customizations ($msg->useDatabase( false )->text()).

Errors and warnings will both be returned as arrays of objects, each object having a code, the source module (maybe not for errors?), and message data as above.

During the transition period, omitting the error-language option will produce backwards-compatible output. After, 'user' will likely be the default.

On the code side, this will entail a major reworking of the various error and warning methods in ApiBase.

Comments[edit | edit source]

  • Is there a reason we can't just provide both html and wikitext error messages, and let the user pick whichever one they want? Legoktm (talk) 23:12, 16 July 2014 (UTC)
    I'd rather not clutter the response with useless repetition of every message in three different formats, when the client already knows which one it wants when making the request. Anomie (talk) 10:59, 19 July 2014 (UTC)
    Makes sense. +1 Legoktm (talk) 19:38, 30 July 2014 (UTC)
  • Wikibase has already started adding i18n support for some of its api error messages, I am sure the team would appreciate some sort of response in core :) Addshore (talk) 07:55, 1 August 2014 (UTC)

Support uselang[edit | edit source]

Some bits of the API support 'uselang' since the underlying MediaWiki methods support it. Once errors and help can be localized, it would make sense for it to offically be listed.

Comments[edit | edit source]

  • I'm not sure I agree with this. What's an example use case aside from localized error messages which was covered above? Legoktm (talk) 23:13, 16 July 2014 (UTC)
    • The parsing-related actions, mostly. And silly things like action=watch returning a UI message instead of letting the UI handle it (which reminds me, I should put that on the "to be deprecated" list). Anomie (talk) 11:02, 19 July 2014 (UTC)

Removal of long-deprecated parameters[edit | edit source]

Analysis will be done to determine whether anyone is still using the following:

  • "watch" and "unwatch" parameters that have been replaced with "watchlist".
  • "sessionkey" parameter to action=upload and prop=stashimageinfo that was replaced with "filekey".
  • "toponly" parameter to list=usercontributions.
  • "querymodules" parameter to action=help, replaced with extended syntax for the "modules" parameter.
    • action=paraminfo will get the same treatment.
  • "title" parameter to action=watch.
  • "url" parameter to prop=langlinks and prop=iwlinks

Additional deprecated parameters may also be considered.

Gerrit changes:

Comments[edit | edit source]

  • +1 seems reasonable -FASTILY 09:53, 31 July 2014 (UTC)
  • -1 for llurl of prop=langlinks. This should be always implemented consistent with prop=iwlinks which currently only uses iwurl=1 and still has no iwprop=url replacement. The input/output structure is currently the same which should be kept until also iwlinks implements to new prop=url feature. Merlissimo (talk) 11:44, 20 August 2014 (UTC)
    I note that llurl is already deprecated, and has been since February 2014. But I have no problem with deprecating prop=iwlinks&iwurl too. In both cases the result format is not changing. Anomie (talk) 13:31, 20 August 2014 (UTC)

Simplified continuation as default for action=query[edit | edit source]

Currently, this must be requested by passing an empty 'continue' parameter in the initial request. This will be changed to be the default, and the current query-continue may be requested with a 'rawcontinue' parameter.

Gerrit changes:

Comments[edit | edit source]

  • Wasn't query-continue supposed to be phased out anyhow? -FASTILY 09:53, 31 July 2014 (UTC)
    • Yuri wanted to, but I as an API user find the new method lacking in flexibility. So I'd prefer to keep it as an advanced option. Anomie (talk) 14:17, 31 July 2014 (UTC)

Simplified continuation should indicate "pause points"[edit | edit source]

With the hard-to-understand query-continue continuation, it's easy for the client to know when it has a full batch of results for the current batch from the generator, so it can pause and process that batch before continuing the continuation.

The simplified continuation should support this sort of batching without having to parse the 'continue' parameter; a 'batchcomplete' boolean property in the result should suffice.

Gerrit change: gerrit:152359

Comments[edit | edit source]

  • +1 Protonk (talk) 22:06, 22 July 2014 (UTC)
  • Not sure about this. The continue parsing code is needed anyway. Adding this boolean seems redundant. Rich Farmbrough 00:18, 5 August 2014 (UTC).
    How is continue parsing code needed with the simplified continuation? The point of that method is the client should just merge the key-value pairs returned in the "continue" result property with the original query. Anomie (talk) 09:18, 6 August 2014 (UTC)

Query item count[edit | edit source]

People sometimes request a count(*) functionality for various modules, and even though there is plenty of justification to get it, the fundamental database limitation has always stopped us - counting all items is an O(N) table traversal. As a result, the clients could only do a full client-side iteration of all the data and count it locally. This wastes both the server resources and bandwidth.

It would be relatively simple to allow modules to return an integer from 0 to the relevant limit. For example, if foolimit=100 then the result in "count" mode would be a number 0 to 100 or "101+".

Comments[edit | edit source]

Rewrite prop=imageinfo from scratch as prop=fileinfo[edit | edit source]

The code is a mess, the limit semantics make no sense, and we have several other options that don't really fit non-images.

The best thing to do here is probably to just write a prop=fileinfo module from scratch so we don't have to worry about backwards compatibility, and then deprecate prop=imageinfo.

Current plans:

  • Going to ask Flow to re-prefix their prop=flowinfo module, so fileinfo can have "fi".
  • Right now, iilimit specifies the max number of revisions to return per file, which is inconsistent with the rest of the API and isn't particularly sane. For fileinfo, filimits will limit the number of file-info-objects returned per result, and a separate "fioldversions" property (default 0, values integers or 'all') will specify the max number of revisions to be returned per file.
  • fistart/fiend may result in the info for the current revision not being returned.
  • iiprops has three different metadata properties. There really should be only one, and if possible it should be key-value pairs rather than a list of objects with key and value properties.
  • There will be no equivalent to iiurlwidth or iiurlheight. Instead there will only be fiparams which will be roughly equivalent to iiurlparam (but multi-valued).
  • prop=stashimageinfo is very odd, it's a prop module but doesn't use any titles. It would make sense to me for prop=fileinfo to have a fifilekeys parameter instead of having a whole separate module for this.
  • prop=videoinfo really isn't needed either. Instead we should make it possible for extensions to add additional info to the fileinfo response.

Comments[edit | edit source]

  • Having just implemented the client side of this for my bot, you have my absolute support! If there's anything you can do to convert the iiprop=metadata|commonmetata|extmetadata all into something a bit more consistent, that would be ideal. – RobinHood70 talk 01:49, 30 July 2014 (UTC)
  • +1 Yes please. -FASTILY 09:53, 31 July 2014 (UTC)

Clean up log event parameter handling in action=logevents[edit | edit source]

The current method is a big switch in ApiQueryLogEvents that specially formats certain log event types. This is ugly, and won't work at all for extensions. IMO, this logic should go in the LogFormatter subclasses, as after all it's a matter of formatting.

On the client side, we should probably regularize all the parameters under a "params" node, instead of having some dumped into the main object (possibly conflicting with other properties!) and some in a subarray named for the type. This'll allow us to clean up some of the legacy param naming at the same time.

We should also probably explicitly note that BC breaks may occur for log events currently using the "legacy" format, and maybe a parameter to explicitly request the legacy format for all events from action=logevents.

Comments[edit | edit source]

Proposals that may be dropped[edit | edit source]

Change defaults for "prop" parameters[edit | edit source]

Many query modules take a 'prop' parameter to specify which bits of information the client actually wants. Defaults for these parameters may be cut back or eliminated entirely. Or the prop parameter may be made required with no default.

Comments[edit | edit source]

  • I'm not sure if this one is really worth the trouble. Anomie (talk) 19:13, 16 July 2014 (UTC)
  • This strikes me as a case of breaking things for the sake of breaking them. I can't see any problem with leaving it the way it is. --Carnildo (talk) 01:44, 30 July 2014 (UTC)

Allow paging the "titles" parameter[edit | edit source]

If too many titles/pageids/revids are given to the query module (or generator), it should page through them rather than erroring out or issuing a warning and ignoring some. This way client does not need to worry about passing too many titles; the query will simply treat it just like a generator, returning an appropriate continuation value.

Comments[edit | edit source]

  • Do we really want the client to be passing us 10000 titles just for us to tell them to retry 9950 of them? The client can as easily handle that on the client side, and save bandwidth in the process. Anomie (talk) 19:13, 16 July 2014 (UTC)
  • Anomie has a point here. I don't see a good way of submitting large numbers of titles and having them continue without having to resubmit those titles with each request, which is a waste of bandwidth. That said, it would be really nice at the client side to not have to worry about splitting page collections into smaller groupings or submitting requests every nth page or whatever approach you want to take. If a way can be found to submit a title list once and then page through it, that'd be great. Otherwise, yeah, I agree with dropping this idea. – RobinHood70 talk 16:45, 17 July 2014 (UTC)
    Despite the API not being anywhere near "level 3 REST", I'd like to preserve the REST principle of avoiding server-stored request state (i.e. a remembered list of titles to be processed). Anomie (talk) 11:07, 19 July 2014 (UTC)
    Agreed. Something that preserves state could leave the doors wide open to a DoS attack that would bring servers to their knees, so not a good idea. – RobinHood70 talk 17:34, 19 July 2014 (UTC)

Extension:SiteMatrix should create a query submodule[edit | edit source]

The action added by Extension:SiteMatrix, action=sitematrix, should really be a query submodule meta=sitematrix. In addition, it's output structure could be improved.

Further, this action seems to serve much the same purpose as meta=siteinfo&siprop=interwikimap. They could be merged somehow.

Comments[edit | edit source]

  • Actually replacing meta=siteinfo&siprop=interwikimap isn't really feasible unless we can make the output entirely compatible. And doing so would be facilitated by the following proposal. Anomie (talk) 04:31, 12 September 2013 (UTC)

meta=siteinfo should be split up[edit | edit source]

Many of the options available to meta=siteinfo's siprop should be split into their own meta submodules. This would be an interface cleanliness issue.

Comments[edit | edit source]

  • Support Support -- as long as there is some sort of versioning or backwards compatibility. -- MarkAHershberger(talk) 18:15, 21 October 2013 (UTC)
  • +1 -FASTILY 09:53, 31 July 2014 (UTC)
  • +1 Addshore (talk) 07:58, 1 August 2014 (UTC)

Module prefix limiting[edit | edit source]

Core modules should use two-letter prefixes and extension modules should use three-letter prefixes (with 'g' prohibited as the first character). The intent here is to avoid collisions between extensions and new core modules.

Comments[edit | edit source]

  • Seems unduly limiting; some core modules already use longer prefixes, and it does nothing to prevent collisions between extensions. Anomie (talk) 04:31, 12 September 2013 (UTC)

Embed the action in the URL[edit | edit source]

To facilitate directing particular actions to different API processing clusters, it would be advantageous to include the action in the URL even for POST requests. Embedding it in the PATH_INFO may make it easier to do this,[citation needed] but may not be possible on all hosts. As an alternative, the API could simply require that action be present in $_GET rather than $_POST.

Comments[edit | edit source]

Completed items[edit | edit source]

Removal of certain data from action=paraminfo[edit | edit source]

The data returned by action=paraminfo includes two items that appear to be at best incomplete and seem to have almost no possible uses:

  • 'props' is supposed to contain some sort of data structure indicating which result properties correspond to which request parameters. But the format of this data isn't even specified and the existing examples seem to be ad-hoc without any real consistency.
    • The intended use of this data appears to be for automatically generating objects with property accessors to wrap access to the MediaWiki API. But given the lack of any specification as to the data structure, I expect this has at most one actual user.
  • 'errors' is supposed to contain a list of possible errors that the API module can return. But the lists are incomplete, and in some cases cannot ever be complete since additional errors can be raised by extension hooks in code far removed from anything related to the API.
    • I imagine the intended use of this data is again for automatically generating strongly-typed errors in some library trying to wrap the MediaWiki API. But since the data is complete, any such library is already going to have to have a generic fallback. It would probably be best for it to use that fallback in all cases.

Gerrit change: gerrit:152760

Comments[edit | edit source]

  • As an API user, seeing a list of possible errors that might occur is nice, so I can think about what might go wrong in a program flow. As a developer, if something as a hook, it's completely impossible to document what might happen. I think something like "This is a list of more common errors that might occur, but not a complete list" would be nice. Result properties are useless. Legoktm (talk) 23:07, 16 July 2014 (UTC)
    OTOH, as an API user myself I don't much see the point to an incomplete list of errors unless there's something the program can do about the error automatically besides logging it for human attention and/or moving on to the next thing. At which point it's probably better done in the human-curated documentation (improvement of which is also planned for next quarter). But I agree that explicitly marking it as an incomplete list would be better than what we have now, which probably encourages some people to try to handle every one individually. Anomie (talk) 10:53, 19 July 2014 (UTC)
  • As a developer of mw apis for extensions the manually maintained list of possible errors is ugly and annoying to maintain. I know for a fact that over the past year many of the error lists returned by some of the modules would likely have been wrong. Addshore (talk) 07:53, 1 August 2014 (UTC)


Token handling[edit | edit source]

API modules that perform changes must use tokens for CSRF protection. Currently there are multiple ways to retrieve a token: action=tokens, action=query&prop=info&intoken=..., action=query&prop=revisions&rvtoken=..., action=query&list=users&ustoken=..., action=query&list=recentchanges&rctoken=.... Formerly some modules would implement their own "gettoken" parameter, although now only action=login does anything like this. Further, some modules have their own "type" of token and others use the generic "edit" token type, and which is required for a particular module is not always clear. And it's not possible to fetch both the token and the data of the page to be acted on at the same time.

The following changes will be made to token handling:

  • All existing methods of retrieving tokens will be deprecated.
  • A new meta=tokens will be added to action=query. It will work just like action=tokens does, but by virtue of being a submodule of action=query you can combine it with e.g. prop=revisions to fetch both the edit token and the page content being edited.
  • The help for every 'token' parameter will clearly indicate which token type is needed. The type will also be included in action=paraminfo.
  • Many of the existing token types will be merged into a single 'csrf' type, as they're already all the same token.
  • All tokens will be static, not varying based on the target of the action.
    • All tokens in core and WMF-deployed extensions are already static except for action=rollback, which depends on the title and user being rolled back, and action=userrights, which depends on the user. These actions will accept both a new static token and the non-static token used in the web UI. The web UI will continue to accept only the existing tokens.
  • The old token-fetching methods are still present but will return deprecation warnings.

On the code side, the token-related methods in ApiBase will be changing. For most extensions, it's just a matter of changing needsToken() from returning true to returning 'csrf'; extensions using custom salts will either need to add those salts (using a new hook) or convert to 'csrf'. Provision is made for extensions maintaining BC with earlier versions of MediaWiki.

Gerrit change: gerrit:153110 (plus 153085–153109 to update various extensions)

Comments[edit | edit source]

  • Yes please. Also just noting that action=createaccount has it's own token handling logic similar to action=login. Legoktm (talk) 22:28, 16 July 2014 (UTC)
  • Both login and account creation will continue to need a special token to avoid login csrf. Otherwise I think this is all good. — Preceding unsigned comment added by CSteipp (talkcontribs) 23:49, 16 July 2014‎
  • Love it! Addshore (talk) 11:03, 20 July 2014 (UTC)
  • What will happen with the starttimestamp currently emitted by prop=info&intoken=...? Since this needs to be updated per edit to check for page deletion since the edit started, I'd recommend just moving it to the general section of the info. – RobinHood70 talk 02:10, 27 July 2014 (UTC)
    • Either that or a "meta=timestamp", most likely. While the needed timestamp is available from meta=siteinfo, that's a lot of extra junk to query just to get the timestamp. Anomie (talk) 13:41, 28 July 2014 (UTC)
  • YES +OVER 9000!!! Cyberpower678 (talk) 19:29, 29 July 2014 (UTC)
  • Yep. Tokens was easily one of the worst parts when designing Peachy. Soxred93 (talk) 01:09, 30 July 2014 (UTC)
    I can write a new token function into Peachy 2 @X!:.Cyberpower678 (talk) 11:48, 2 August 2014 (UTC)
  • +1 -FASTILY 09:53, 31 July 2014 (UTC)

General discussion[edit | edit source]

  • I Support Support basically everything in this. Please drop the 'may be dropped' ones. --Krenair (talkcontribs) 20:56, 16 July 2014 (UTC)
  • I'm excited to see this move forward. I think it would be helpful if this list was prioritized, or at least ordered. Some of these are extremely easy to implement (like dropping deprecated parameters), and could be done independently of this RfC IMO. Legoktm (talk) 23:17, 16 July 2014 (UTC)
    • I feel good that I actually got things written down and will get to work on them in a focused way! ;) This RFC serves as "notify people of upcoming changes", "chance for people to object to upcoming changes", and "todo list" all at once, which is my excuse for the inclusion of easy stuff on the list. But to me it's not "seeking approval before beginning work on any of it". If you look back at the history, you'll see some of the stuff on earlier versions of this page actually has already been done (e.g. adding generator support to various actions); other easy stuff is welcome to be done similarly.Anomie (talk) 11:23, 19 July 2014 (UTC)
    • As for the "dropping deprecated parameters", the main blocker there is checking how many people still use them to determine how far in the future $DATE should be in the "These long-deprecated parameters are finally going away on $DATE" announcement. Anomie (talk) 11:23, 19 July 2014 (UTC)
  • Another idea that just occurred to me is that pretty much anything that outputs a page set gives you both the namespace ID and the full title, including the namespace. Unless there's a good reason for this that I'm not thinking of, I think you could get rid of that redundancy. – RobinHood70 talk 22:33, 19 July 2014 (UTC)
  • I can't say whether any of the proposals are good or bad. I was flagged to this conversation after a dire warning of cat/dog cohabitation and 90% of this I do not understand. Any change of a digested version/how to test for support in a API consumer and if patching will be needed? Hasteur (talk) 20:57, 28 July 2014 (UTC)
    • Not much chance of a digested version, unless someone else volunteers to write it. But each section here will likely be individually implemented (it won't be a massive change-everything-in-one-huge-change), and there will be announcements to mediawiki-api-announce when each change is about to be deployed. Anomie (talk) 11:45, 30 July 2014 (UTC)
    Point of clarification here, then: when you say "transition period" throughout this page, are you talking strictly about the period between when you start modifications and when API2 is complete? If so, I'm a heck of a lot less worried (read: couldn't care less) about being able to detect whether I should be using "sane" or not, because I'm obviously not going to program against an evolving API. I assumed it would be a big rollout, and that by "transition period", you meant a major version or two for people to adjust. If that's not the case, then all I ask is that siteinfo contain some kind of easy API version identifier once the transition is final and the new API/changes to the JSON output become the default behaviour. – RobinHood70 talk 03:02, 5 August 2014 (UTC)
    I mean the period of time when #Deprecation process is being gone through. Anomie (talk) 09:15, 6 August 2014 (UTC)
  • Good and bad. But generally good. I'm waiting to make the changes to Peachy. Cyberpower678 (talk) 19:37, 29 July 2014 (UTC)