API:JSON version 2

From mediawiki.org
MediaWiki version:
1.25

format=json suffers from a number of shortcomings that make it more difficult to use than necessary. Many of these arise because XML was the original output format and the underlying data structure of API responses was designed around this.

To address this, after discussion MediaWiki 1.25 introduces a new JSON response format. It is not the default, you only get results in the new format if you specify formatversion=2, and it's only for the json and php formats (and their human-readable jsonfm and phpfm variants).

Maintaining backwards compatibility[edit]

XXX from mediawiki-api-announce mailing

Without specifying formatversion=2, the results in API responses that clients receive should be backwards-compatible. But there are some caveats:

  • Modules that were previously outputting raw booleans in JSON may now output those properties using the convention that has always been the standard (for version 1): empty-string for true and absent key for false. Client code that acts on these booleans will likely break or warn if it doesn't test for an absent key. Instances of this should be reported in Phabricator so the API module can be fixed, please tag with #MediaWiki-API and the tag for the relevant extension.
  • format=xml will now reversibly mangle tag and attribute names that are not valid XML, instead of just outputting invalid XML.
  • Previously-announced breaking changes to log entry parameter formatting, that are not actually part of this general result formatting change but were made at about the same time.

API module implementers: ensuring backwards compatibility[edit]

XXX

The general theme is that the ApiResult arrays now have more metadata, which the API core code uses to apply a backwards-compatible transformation for clients that don't request formatversion=2, and optional transformation so JSON output needn't be limited by restrictions of XML. At the same time, ApiResult and ApiFormatXml are easier for developers to use.

To ensure backwards compatibility – i.e. clients that don't request formatversion=2 get the same results as in previous releases – developers of API modules may need to update code.

  • Several ApiResult methods have been deprecated. If your extension is maintained in Gerrit , these should have already been taken care of for you (except for T95168 where work is ongoing), but new code will need to avoid the deprecated methods.
    • You should not use the deprecated methods getIsRawMode() and setRawMode(). Raw mode used to indicate that a result printer wanted metadata keys such as _element; now all printers need to handle "raw mode" data.
  • All ApiResult methods that operate on a passed-in array (rather than internal data) are now static, and static versions of all relevant data- and metadata-manipulation methods are available. This should reduce the need for passing ApiResult instances around just to be able to set metadata.
  • Properties that begin with an underscore are reserved for API metadata (following the lead of existing _element and _subelements), and are stripped from output. You can indicate that a property beginning with an underscore is not metadata using ApiResult::setPreserveKeysList().
  • You can tag PHP arrays with "array types" to indicate whether they should be output as arrays or hashes. This is particularly useful to fix T12887.
  • The "*" property is deprecated in favor of a properly-named property and special metadata to identify it for XML format and for back-transformation. Use ApiResult::setContentValue() instead of ApiResult::setContent() and all the details are handled for you.
  • ApiFormatXml will no longer throw an exception if you forget to call ApiResult::setIndexedTagName()!
  • ApiFormatXml will now reversibly mangle tag and attribute names that are not valid XML, instead of irreversibly mangling spaces and outputting invalid XML for other stuff.
  • ApiResult will now validate data added (e.g. adding resources or non-finite floats will throw an exception) and auto-convert objects. The ApiSerializable interface can be used to control object conversion, if __toString() or cast-to-array is inappropriate.
  • You can now add actual booleans to ApiResult, and the API will automatically convert them in responses to "version 1" clients to the old convention for boolean result parameters (empty-string for true and absent for false) for backwards compatibility. However, this means that if you were violating this convention by returning a boolean "someKey": true or false, then existing clients will probably break! If your API module does this then you need to use the new ApiResult::META_BC_BOOLS metadata property to prevent this conversion for "version 1" clients. You should check your API module code for setting boolean values in ApiResult; also if you insert external data structures such as JSON into ApiResult, you may be returning true or false values without realizing it.
  • Modules outputting as {"key":{"*":"value"}} to avoid large strings in XML attributes can now output as {"key":"value"} while still maintaining <container><key>value</key></container> in XML format, using ApiResult::META_BC_SUBELEMENTS. New code should use ApiResult::setSubelementsList() instead.
  • Modules outputting hashes as [{"name":"key1","*":"value1"},{"name":"key2","*":"value2"}] (due to the keys being invalid for XML) can now output as {"key1":"value1","key2":"value2"} in JSON while maintaining <container><item name="key1">value1</item><item name="key2">value2</item></container> in XML format, using ApiResult:setArrayType() with array META_TYPE 'kvp' or 'BCkvp'.

Most of the changes to extensions that this change necessitated are in gerrit change set I7b372... and topic:api-cleanup-PS25.

Changes to XML format[edit]

format=xml does not have a new results format. There are some changes to XML results:

From API/Architecture work/Planning#Changes to XML output format

Changes here will mostly be on the back-end; the actual data output to clients is intended to remain the same wherever possible. However, clients should be prepared for the following:

  • Result structure may no longer match the JSON format.
  • Tag and attribute names may be encoded when not conforming to XML requirements.
  • Result structure may change depending on the specific query. For example, passing both rvprop=content and rvdiffto=prev to prop=revisions would previously omit the diff from the result (bug 55371) (it should be throwing an error, but that's another bug). Now this will return the content as the value of the <rev> node when rvdiffto is not supplied and as the value of a <content> subnode of the <rev> node when it is.

For example, bug 43221 was fixed by changing the names of attributes such as "4::foo" to fit XML's restrictions. In the future, this would be fixed by either encoding the name (e.g. "_4.3A..3A.foo") or by changing the structure of output in only the XML format (e.g. <attribute name="4::foo">).

On the MediaWiki code side, developers will see the following changes:

  • The XML formatter will no longer die if ApiResult::setIndexedTagName() is forgotten. Instead, it will act as if that were called with something generic (e.g. ApiResult::setIndexedTagName( $array, '_v' )).
  • The XML formatter will no longer (be supposed to) raise an error when a node has both node content (ApiResult::setContent()) and non-scalar attributes. Instead, it will simply shove the intended node content into a subnode.
  • Anything that's hard-coding '*' should be updated to use ApiResult::setContentValue().
  • Additional metadata is available to hint at improved XML output.

The future of "version 1" JSON response format[edit]

In some future release the old format=json will be deprecated, and may eventually be removed.

Using the new JSON results format[edit]

You can use formatversion=2 in your requests in MediaWiki 1.25 , but do note that the output formatting isn't entirely stable yet and might change in MediaWiki 1.26. [needs update]

Changes to JSON output format[edit]

XXX from mediawiki-api-announce mailing

With formatversion=2, we can make some useful changes:

  • Return booleans as boolean true instead of empty-string. Where appropriate,[note 1] return boolean false instead of omitting the property.
  • Return empty objects in JSON as {}, rather than [].
  • Have action=query's "pages" be an array, instead of an object with page ids as keys that can be difficult to iterate.
  • Provide useful property names instead of '*'.
  • Eliminate useless indirection, e.g. {"text":"..."} instead of {"text":{"*":"..."}} and {"key1":"value1","key2":"value2"} instead of [{"name":"key1","*":"value1"},{"name":"key2","*":"value2"}].
  • The existing utf8 option is the default. A new ascii request parameter has been introduced for clients who need all non-ASCII codepoints escaped.

If you see missed opportunities to make the above changes in existing formatversion=2 output, or if there are other changes that would make API output easier to use in JSON, please let MediaWiki API developers know! Phabricator would be ideal (tag with #MediaWiki-Action-API, and the appropriate extension's tag if applicable), or reply on the mediawiki-api mailing list.

  1. Where the property is usually false, it's sometimes just bloat to include it.