Topic on Talk:Requests for comment/API roadmap

Anomie (talkcontribs)

yaml format can be removed, since it's now identical to json. format=txt and format=dump seem entirely pointless, and format=dbg seems redundant to format=php for real use and format=rawfm for debugging.

Now for the controversial part: format=xml seems to be a major source of problems, since it needs special handling all over the place. If we keep it at all, it would be very nice to change it to something that doesn't need magic "_element" and "*" members and won't cause bugs like bug 43221 (for the last, if nothing else define some sort of reversible encoding for invalid names). This would also allow us to get rid of format=rawfm, since we won't have any more magic elements.

Brooke Vibber (talkcontribs)

YES YES YES. Kill XML, let us just export an associative array and have it go straight to a JSON object.

Brooke Vibber (talkcontribs)

Multiple formats support is awkward and, especially with XML, is just plain weird.

I'd strongly like to kill all formats except for JSON. JSON is widely supported, simple, doesn't have weird-ass attributes and text contents, and generally should be a good default.

Kill XML with fire, please please please!

The other formats are basically equivalent to JSON (YAML was actually replaced with JSON because valid JSON is valid YAML!) and there's not much benefit to their existence.

Amgine (talkcontribs)

Serialized php is faster in php, and easier for those of us coding in php, and only slightly less efficient in bandwidth. There is a rather large code base of tools using php serialize.

Brooke Vibber (talkcontribs)

Note that software using PHP serialization now should be able to update to JSON by simply changing the format parameter and switching from 'unserialize' to 'json_decode'. There _shouldn't_ be differences in the decoded data format, that I know of.

I threw together a quick benchmark:

On 2000-items of RecentChanges data, file size:

138K rc.json
134K rc.xml
187K rc.phpser

and speed per iteration:

$ php test.php 
Benchmarking xml... 4.436 ms
Benchmarking json-objects... 4.846 ms
Benchmarking json-assoc... 4.312 ms
Benchmarking php... 2.776 ms

So yes, on ~140-190KB of tightly-packed RC data you might save 2 milliseconds of low-level parse time. I'm not convinced this is a significant savings.

Amgine (talkcontribs)

@brion: generation comparison?

Amgine (talkcontribs)

That's actually a much stronger argument for JSON, alas...

$ php bench.php
Benchmarking json-objects... 7.774 ms
Benchmarking json-assoc... 7.720 ms
Benchmarking php... 12.301 ms

So, I'm wrong on the speed, and I apologize for that one.

Yurik (talkcontribs)

For everyone's enjoyment, I present to you the formatting usage stats. XML gets about 500 reqs/min (drop from ~1000 3 months ago), JSON ~2100, PHP has been growing to about 200 now, YAML dropped from 1.3/min to sporadic, DBG (?!?) is consistently used at about 1.3/min, RAW frequently spikes to up to 30!!!, TXT averages 3, but the real kicker - 50 reqs per minute is the xmlfm... FML!!! Need to track and kill it with vengeance.

Amgine (talkcontribs)

While the numbers are interesting, they may not tell the complete tale. The xml is a good example: there are three very sharp downward steps, suggesting three very high volume but specific tools have stopped using that format. Contrariwise, there's an informal but general increasing trend in PHP, suggesting a diversity of tools are using that format. Translated, this suggest a wider range of projects might be broken by removal of php as a format, while a smaller number of projects might be broken by removal of json as a format.

Yes, I know you're not suggesting eliminating php as a format.

But you are suggesting the content API should shut out the more diverse community of projects which are already using the API.

Yurik (talkcontribs)

Amgine, I think we definetly should keep the current multi-format API model for query/action modules, with possible drop of WDDX and YAML, but on the content side we should make it uniform to take advantage of caching. If the difference between using PHP and JSON is simply replacing one built-in method with another, it shouldn't be that big of a deal. And yes, we can make content data model be HTTP-error-coded and possibly even non-structured blob based, removing the need for JSON vs PHP vs XML debate alltogether :)

In other words - keep using the current API, figure out what content (e.g. html) you need for some task, and than download the blob with a different content-api call. There shouldn't even be a need for an API library. At most there will be a simple json structure to separate TOC entries/sections - depending on the call.

Amgine (talkcontribs)

<mind essplodes>

Duplicatebug (talkcontribs)

xmlfm is default, so testing or building a new queries in the browser will use this format or the help page is used here.

In my opinion you should not drop xml, because not all program languages have native json or php format, for example java (at least in 1.6). Adding a new jar can be a blocker for this.

Anomie (talkcontribs)

How would you feel about changing the XML format to something that would be less likey to cause issues? Something closer to an XML-format property list or WDDX. Or maybe just keeping WDDX as the XML format?

Duplicatebug (talkcontribs)

At which places the xml format makes problems? All xml related things should be in ApiFormatXml and nobody see it. bug 43221: property names with :: are also bad in json. When having a attribute name for content in json (like text or _continue) the xml wrapper can produce a text node out of that, than nobody needs ApiResult::setContent, when that is the problem.

Svick (talkcontribs)

I think the “special handling” Anomie was talking about is that you need to call setIndexedTagName every time you want to return (numerical) array from the API. (There could be other situations that require special handling for XML, that I didn't encounter yet.)

Anomie (talkcontribs)

There's also how other formats have to deal with a key named "*" so XML can do its "text content with properties" thing.

Anomie (talkcontribs)

Property names with "::" are fine in json, any string is allowed in a key (see RFC 4627). In JavaScript they can't be accessed with the foo.bar notation, but foo['bar'] still works.

Duplicatebug (talkcontribs)

Yes, * is also fine in json, but you must write foo['*']. In another thread some people do not want write the string notation and want the object notation. So it makes no sense to have other params in string notation and break this. Than you can keep * also.

Anomie (talkcontribs)

The use of "*" is gratuitous, we can easily pick something more sensible. The use of "::" as keys in the API for things that use "::" as keys in MediaWiki core is not gratuitous.

Reply to "Clean up formats"