Talk:Requests for comment/API roadmap

About this board

For older discussions, see Archive.

Previous page history was archived for backup purposes at Talk:Requests for comment/API roadmap/LQT Archive 1 on 2015-07-10.

Start a new topic

Twitter

3 comments • 21:13, 12 October 2015 8 years ago

3

Nemo bis (talkcontribs)

Tuju> i would recommend like twitter does it, they version their protocols into urls and keep each url working, regardless that their db layout changes. hence they wont break applications.

Reply 12:02, 19 August 2014 9 years ago

Addshore (talkcontribs)

+1, sounds great!

Reply 21:13, 12 October 2015 8 years ago

This post was hidden by Addshore (history)

Reply to "Twitter"

Deprecation codes

One comment • 13:41, 24 October 2014 9 years ago

1

John Vandenberg (talkcontribs)

It would be really handy if the API deprecation messages used an identifier, of some sort, that clients can use to 'understand' what these messages are. This doesnt even use the word 'deprecate'/'deprecation'.

Formatting of continuation data will be changing soon. To continue using the current formatting, use the 'rawcontinue' parameter. To begin using the new format, pass an empty string for 'continue' in the initial query.

Using i18n codes for API warnings should be a high priority, as not everyone can understand English, and clients do not want to show English uncoded warnings to non-English users.

Reply 13:41, 24 October 2014 9 years ago

Reply to "Deprecation codes"

Errors should use reasonable HTTP response codes

5 comments • 13:14, 24 October 2014 9 years ago

5

Sharihareswara (WMF) (talkcontribs)

It would be great if API errors used the HTTP error response codes rather than returning 200.

Reply 03:50, 18 March 2014 10 years ago

Sharihareswara (WMF) (talkcontribs)

I see that we previously WONTFIXed this request but now that we're overhauling the API I think we should give it another look.

Reply 03:54, 18 March 2014 10 years ago

Anomie (talkcontribs)

My reasoning in that bug still applies: an HTTP error indicates that something went wrong with the HTTP request, for example that the target resource wasn't found or couldn't be executed. As far as the API is concerned, that's the transport layer. If the API request is able to be processed but the result is an API error, that's reported at the application layer instead.

Say the API did return an HTTP 400 or 500 response code for an API error. How does the client determine that this is an API error rather than a varnish timeout or the like? I don't much like "blindly try to parse the body, if it succeeds it's an API error".

Also, say the API did return an HTTP 4xx response code for an API error. People would probably expect that action=delete would return a 404 if the target page isn't found to be deleted. But then what happens with action=query, when there may be multiple titles and some might be not found and others not? Or look at action=watch, before gerrit:53964 you could have made the case for it to return 404, but now it's like action=query.

Reply 13:16, 18 March 2014 10 years ago

RobinHood70 (talkcontribs)

I agree with Anomie's reasoning. An API error is not an HTTP error, and should not be reported as one.

Reply 20:13, 8 April 2014 10 years ago

John Vandenberg (talkcontribs)

If the error is related to application layer data, HTTP error codes are wrong, of course.

However IIRC the MWAPI emits server errors with HTTP 200 and a response that includes an error code like internal_api_error_ExceptionFooBar. Those are a server error, and should/could have a HTTP 50x code because the application failed while attempting to complete processing of the request, and all bets are off on what parts of the request were performed and committed to the database.

The current approach isnt _wrong_, as 50x are optional, but it worth reconsidering using them for the cases they actually apply to.

Reply 13:14, 24 October 2014 9 years ago

Reply to "Errors should use reasonable HTTP response codes"

Architecture Summit notes

One comment • 04:36, 14 March 2014 10 years ago

1

Sharihareswara (WMF) (talkcontribs)

Please see Talk:Architecture Summit 2014/Storage services#API versioning and additional notes on that page.

Reply 04:36, 14 March 2014 10 years ago

Reply to "Architecture Summit notes"

Follow-up action items meeting september 11

3 comments • 22:18, 6 March 2014 10 years ago

3

Drdee (talkcontribs)

The action items from the September 11 meeting are:

Wikia makes their RFC public, ASAP :) - Federico
- Separate RfC re RESTful API?
- Prototype Parsoid REST API - Gabriel YDone
Find motivating use case re flags versus versions - Yuri
Restructure current RFC - Brad/Yuri ?
Sumana to post this etherpad onwiki, email mediawiki-api & wikitech-l

I have added Done based on my understanding of the current status, please feel free to edit.

Reply 21:44, 17 December 2013 10 years ago

GWicke (talkcontribs)

The REST storage service and public content API are now discussed in these two closely related RFCs: Storage service and Content API.

Wikia has released a REST API that covers their immediate needs: . They also have an API team that might work on a more general REST API. I hope that we can collaborate with them on the REST API.

Reply 23:57, 17 December 2013 10 years ago

GWicke (talkcontribs)

Here is a full copy of the etherpad before it disappears:

API roadmap conversation, Sept 11 2013 at WMF office
* Attendees: Yuri, Max, Yuvi, Erik B, Brad, Sumana, Subbu, Gabriel, RobLa, Roan, Federico, Tim

== REASONS / JUSTIFICATIONS ==

Current proposal: https://www.mediawiki.org/wiki/Requests_for_comment/API_roadmap

* Change output format - structured warnings / errors, localization    
** Kill XML specifically :( (it's 25% of non-OpenSearch traffic but it's a mess and needs to die)
* Split traffic between server pools depending on action
** Change URL to e.g. api.php/query?...
*** Why make the URL longer?

== Discussion ==

* Module refactoring - https://www.mediawiki.org/wiki/Requests_for_comment/API_roadmap#Modules_refactoring

Drawbacks to versioning modules, versus individual flags:
* Making promises we can't keep: we say action=foo~3 isn't going to change, but then some security issue or core change comes along and we have to break it anyway.
* Code rot: "foo~3" implies an entirely separate module, the code for which will easily rot.
** Yes, the version *could* be treated as a feature flag within the module. Then you have this vaguely-named flag that doesn't indicate what it does besides "version".
* Say we make "foo~3", then "foo~4". If a client wants something introduced in ~4, they have to accept ~3 as well.
** Encouraging people to upgrade to the latest version is often a benefit
*** But forcing them to upgrade many features for one feature?

URL change won't help with caching yet -> REST content API
* query param order random, cannot be purged
* don't want to wrap HTML in JSON

https://www.mediawiki.org/wiki/Talk:Requests_for_comment/API_roadmap#Clean_up_formats_23045


Wikia's requirements: work with SDKs their ecology likes
* REST
** What kind of REST? it means something different to everyone!
** Cacheable as much as possible -> no query params, deterministic URL so purgeable
** Representations?  State Transformations? 
*** Content types, not everything should be wrapped in JSON
**** +1
** Discoverability - API results include URL's to possible state transofmrations, related resources, etc.

Yuri: How will we proceed in changing the API?
* Sumana advises: consult existing API usability research, just as we consult users & MW developers

How do we change defaults?

Star versus underscore is so JS can do "foo._" instead of "foo['*']"
> Avoid underscores in js identifiers per conventions, maybe use "content" instead of "*" / "_" (also more descriptive)

Idealist vs Pragmatism - Do you want something beautiful?  Or something that continues to work?
Why can't it do both?
* The argument is to find specific use cases for each individual change,  an overall beautiful API is not definable as individual little pieces but as an overarching design methodology

==NEXT==
* Wikia makes their RFC public, ASAP :) - Federico
** Separate RfC re RESTful API?
** Prototype Parsoid REST API - Gabriel
* Find motivating use case re flags versus versions - Yuri
* Restructure current RFC - Brad/Yuri ?
* Sumana to post this etherpad onwiki, email mediawiki-api & wikitech-l

Reply 22:18, 6 March 2014 10 years ago

Reply to "Follow-up action items meeting september 11"

Meeting notes on etherpad

One comment • 18:19, 17 September 2013 10 years ago

1

GWicke (talkcontribs)

Last week we met in the office and discussed this RFC. The discussion notes are on the etherpad.

Reply 18:19, 17 September 2013 10 years ago

Reply to "Meeting notes on etherpad"

PATH_INFO

7 comments • 13:22, 29 April 2013 11 years ago

7

Anomie (talkcontribs)

It seems to me that using PATH_INFO is going to make things more complicated for clients, as instead of at a low level taking an assoc/dict/hash/etc of query parameters, they have to also take a PATH_INFO value. And while that's not much of a complication (if nothing else, a magic key could be extracted from the assoc/dict/hash/etc), what is the benefit?

Reply 13:20, 25 April 2013 11 years ago

Yurik (talkcontribs)

I agree that it will have to make "action" a special parameter (or could be extracted from the dict), but there are several benefits:

Ability to easier partition server farm to create a cluster dedicated to certain actions - like parsing (requested by parsoid team)
webserver access log files will contain the action even for post requests
No need to introduce api2.php just yet - we can determine new version by request style
Future core version changes can be done in the style api.php/action/2?...
Shorter URL

Reply Edited 18:49, 25 April 2013 11 years ago

Anomie (talkcontribs)

Partitioning, ok. Versioning could as well be done with action=foo/2. api.log already contains the action for post requests; I guess you're talking about the webserver access.log? Shorter URL and "api2.php", meh.

Reply 14:50, 25 April 2013 11 years ago

Yurik (talkcontribs)

Anomie, the core value is the #1 - everything else are side benefits :) As for logs, unless this is a very very recent change, I don't see action in the post req in the logs.

Reply Edited 15:30, 25 April 2013 11 years ago

Anomie (talkcontribs)

Are you looking in the api.log (on fluorine), or in webserver access logs?

Reply 13:23, 26 April 2013 11 years ago

Yurik (talkcontribs)

I'm looking at the api log files that are rsynced to stats1.

Reply 20:05, 26 April 2013 11 years ago

Anomie (talkcontribs)

I don't know what's in that one.

Reply 13:22, 29 April 2013 11 years ago

Reply to "PATH_INFO"

CORS and third-party web apps

6 comments • 07:57, 15 March 2013 11 years ago

6

Brooke Vibber (talkcontribs)

Currently, web applications using client-side JavaScript can only access our API via JSONP (wrapped in a function call and run through a <script> tag). This is a bit nasty for several reasons:

different URLs -> breaks potential shared caching with other apps that use the same queries over JSON
harder to get progress feedback or detect errors
can't do POST requests at all
authentication is disabled to prevent CSRF stealing a web user's credentials

This has practical limitations for some mobile platforms as well -- for instance our Wikipedia app for Firefox OS is a web app hosted on bits.wikimedia.org. Since an XHR can't access *.wikipedia.org/w/api.php from there, it has to use either JSONP or a server-side proxy (icky, hides IPs, no load balancing, etc). Since a proxy is icky and hard to scale, we're using JSONP for now... but this won't work once we try to add login and editing features, since auth isn't available.

If we had CORS headers set up to allow non-authenticated (no cookies) access via XHR from all third-party domains, and we could auth without cookies (can we use a token for this? I .... think so) that would be helpful.

Not sure if that's doable on the current API or not. :D

Reply 00:38, 26 January 2013 11 years ago

Yurik (talkcontribs)

There is some sort of CORS handling in the API, but I will need to look further into it to get a better understanding of how it is setup.

Reply 04:25, 30 January 2013 11 years ago

Anomie (talkcontribs)

Basically, it's three parts:

The client adds an "origin" parameter to the request to indicate the origin and explicitly request CORS.
The browser adds an "Origin" HTTP header, to also indicate the origin.
The MediaWiki configuration has $wgCrossSiteAJAXdomains and $wgCrossSiteAJAXdomainExceptions to determine whether to allow the cross-domain request.

First, the "origin" parameter must match one of the values in the "Origin" header, or the request fails.

Second, the "origin" parameter must match one of the patterns in $wgCrossSiteAJAXdomains and not match any pattern in $wgCrossSiteAJAXdomainExceptions. These are currently set to allow various WMF wikis (but bits.wikimedia.org is not in the list).

If both checks pass, then the appropriate CORS headers are returned to instruct the browser to allow the request, including cookies.

I guess the basic idea behind this proposed non-cookie authentication method would be that it works just like cookies except that it's handled by the client code rather than the browser?

Reply 16:04, 30 January 2013 11 years ago

Dantman (talkcontribs)

Yeah, it would be nice to drop JSONP for CORS. We'll have to disable anonymous editing over CORS though so that the API can't be turned into a kind of mass spam attack that could come from absolutely any innocent IP in the world without people's knowledge.

For auth via tokens. This would basically be where OAuth (or ;) something like OAuth) would fit in.

Reply 21:31, 11 March 2013 11 years ago

Brooke Vibber (talkcontribs)

Technically speaking, I think anonymous editing via action=edit already allows that kind of attack. *cough*

Reply 23:43, 14 March 2013 11 years ago

Dantman (talkcontribs)

*facepalm* right it can, and I had a private bug about fixing that.

Reply 07:57, 15 March 2013 11 years ago

Reply to "CORS and third-party web apps"

Clean up formats

19 comments • 17:10, 4 March 2013 11 years ago

19

Anomie (talkcontribs)

yaml format can be removed, since it's now identical to json. format=txt and format=dump seem entirely pointless, and format=dbg seems redundant to format=php for real use and format=rawfm for debugging.

Now for the controversial part: format=xml seems to be a major source of problems, since it needs special handling all over the place. If we keep it at all, it would be very nice to change it to something that doesn't need magic "_element" and "*" members and won't cause bugs like bug 43221 (for the last, if nothing else define some sort of reversible encoding for invalid names). This would also allow us to get rid of format=rawfm, since we won't have any more magic elements.

Reply 19:08, 23 January 2013 11 years ago

Brooke Vibber (talkcontribs)

YES YES YES. Kill XML, let us just export an associative array and have it go straight to a JSON object.

Reply 00:11, 26 January 2013 11 years ago

Brooke Vibber (talkcontribs)

Multiple formats support is awkward and, especially with XML, is just plain weird.

I'd strongly like to kill all formats except for JSON. JSON is widely supported, simple, doesn't have weird-ass attributes and text contents, and generally should be a good default.

Kill XML with fire, please please please!

The other formats are basically equivalent to JSON (YAML was actually replaced with JSON because valid JSON is valid YAML!) and there's not much benefit to their existence.

Reply 00:10, 26 January 2013 11 years ago

Amgine (talkcontribs)

Serialized php is faster in php, and easier for those of us coding in php, and only slightly less efficient in bandwidth. There is a rather large code base of tools using php serialize.

Reply 17:41, 27 January 2013 11 years ago

Brooke Vibber (talkcontribs)

Note that software using PHP serialization now should be able to update to JSON by simply changing the format parameter and switching from 'unserialize' to 'json_decode'. There _shouldn't_ be differences in the decoded data format, that I know of.

I threw together a quick benchmark:

On 2000-items of RecentChanges data, file size:

138K rc.json
134K rc.xml
187K rc.phpser

and speed per iteration:

$ php test.php 
Benchmarking xml... 4.436 ms
Benchmarking json-objects... 4.846 ms
Benchmarking json-assoc... 4.312 ms
Benchmarking php... 2.776 ms

So yes, on ~140-190KB of tightly-packed RC data you might save 2 milliseconds of low-level parse time. I'm not convinced this is a significant savings.

Reply 19:46, 27 January 2013 11 years ago

Amgine (talkcontribs)

@brion: generation comparison?

Reply 19:58, 27 January 2013 11 years ago

Amgine (talkcontribs)

That's actually a much stronger argument for JSON, alas...

$ php bench.php
Benchmarking json-objects... 7.774 ms
Benchmarking json-assoc... 7.720 ms
Benchmarking php... 12.301 ms

So, I'm wrong on the speed, and I apologize for that one.

Reply Edited 20:27, 27 January 2013 11 years ago

Yurik (talkcontribs)

For everyone's enjoyment, I present to you the formatting usage stats. XML gets about 500 reqs/min (drop from ~1000 3 months ago), JSON ~2100, PHP has been growing to about 200 now, YAML dropped from 1.3/min to sporadic, DBG (?!?) is consistently used at about 1.3/min, RAW frequently spikes to up to 30!!!, TXT averages 3, but the real kicker - 50 reqs per minute is the xmlfm... FML!!! Need to track and kill it with vengeance.

Reply Edited 05:05, 29 January 2013 11 years ago

Amgine (talkcontribs)

While the numbers are interesting, they may not tell the complete tale. The xml is a good example: there are three very sharp downward steps, suggesting three very high volume but specific tools have stopped using that format. Contrariwise, there's an informal but general increasing trend in PHP, suggesting a diversity of tools are using that format. Translated, this suggest a wider range of projects might be broken by removal of php as a format, while a smaller number of projects might be broken by removal of json as a format.

Yes, I know you're not suggesting eliminating php as a format.

But you are suggesting the content API should shut out the more diverse community of projects which are already using the API.

Reply 19:14, 27 January 2013 11 years ago

Yurik (talkcontribs)

Amgine, I think we definetly should keep the current multi-format API model for query/action modules, with possible drop of WDDX and YAML, but on the content side we should make it uniform to take advantage of caching. If the difference between using PHP and JSON is simply replacing one built-in method with another, it shouldn't be that big of a deal. And yes, we can make content data model be HTTP-error-coded and possibly even non-structured blob based, removing the need for JSON vs PHP vs XML debate alltogether :)

In other words - keep using the current API, figure out what content (e.g. html) you need for some task, and than download the blob with a different content-api call. There shouldn't even be a need for an API library. At most there will be a simple json structure to separate TOC entries/sections - depending on the call.

Reply Edited 20:44, 27 January 2013 11 years ago

Amgine (talkcontribs)

Reply 21:24, 28 January 2013 11 years ago

Duplicatebug (talkcontribs)

xmlfm is default, so testing or building a new queries in the browser will use this format or the help page is used here.

In my opinion you should not drop xml, because not all program languages have native json or php format, for example java (at least in 1.6). Adding a new jar can be a blocker for this.

Reply 11:22, 3 February 2013 11 years ago

Anomie (talkcontribs)

How would you feel about changing the XML format to something that would be less likey to cause issues? Something closer to an XML-format property list or WDDX. Or maybe just keeping WDDX as the XML format?

Reply 14:57, 4 February 2013 11 years ago

Duplicatebug (talkcontribs)

At which places the xml format makes problems? All xml related things should be in ApiFormatXml and nobody see it. bug 43221: property names with :: are also bad in json. When having a attribute name for content in json (like text or _continue) the xml wrapper can produce a text node out of that, than nobody needs ApiResult::setContent, when that is the problem.

Reply 11:43, 10 February 2013 11 years ago

Svick (talkcontribs)

I think the “special handling” Anomie was talking about is that you need to call setIndexedTagName every time you want to return (numerical) array from the API. (There could be other situations that require special handling for XML, that I didn't encounter yet.)

Reply 13:39, 10 February 2013 11 years ago

Anomie (talkcontribs)

There's also how other formats have to deal with a key named "*" so XML can do its "text content with properties" thing.

Reply 14:13, 11 February 2013 11 years ago

Anomie (talkcontribs)

Property names with "::" are fine in json, any string is allowed in a key (see RFC 4627). In JavaScript they can't be accessed with the foo.bar notation, but foo['bar'] still works.

Reply 14:23, 11 February 2013 11 years ago

Duplicatebug (talkcontribs)

Yes, * is also fine in json, but you must write foo['*']. In another thread some people do not want write the string notation and want the object notation. So it makes no sense to have other params in string notation and break this. Than you can keep * also.

Reply 17:32, 3 March 2013 11 years ago

Anomie (talkcontribs)

The use of "*" is gratuitous, we can easily pick something more sensible. The use of "::" as keys in the API for things that use "::" as keys in MediaWiki core is not gratuitous.

About this board

Twitter

Deprecation codes

Errors should use reasonable HTTP response codes

Architecture Summit notes

Follow-up action items meeting september 11

Meeting notes on etherpad

PATH_INFO

CORS and third-party web apps

Clean up formats

Title is better