Requests for comment/API roadmap

Background
MediaWiki API has been steadily growing and adding features, and even though it provides most of the desired functionality, I feel it is necessary to discuss our future plans for growth, versioning, and overall development strategy.

Proposals
api2.php ? agent=MyProgramVer42 & action=query~2 & ...

api2.php
Setting a new versioned entry point allows us
 * change overall output structure
 * reduce tolerance for incorrect requests
 * require the new 'agent' parameter in case the HTTP's useragent string is missing or begins with 'mozilla' or 'opera' to help us contact the broken client's author. This parameter must be part of the URL's query string even for POST requests.
 * allow new structure for warnings, e.g.

PHP implementation [expand]

action~2 versioning
In addition to the versioned entry point, each action module could add its own versioning, which would allow:
 * remove previously added feature / parameter / behavior
 * change parameter naming
 * change parameter defaults
 * change default output format

PHP implementation [expand]

query~2 submodule versioning
Modular nature of the action=query allows us to version the individual props, lists, and meta submodules. The proposed rules are:
 * query~2 supports all nonversioned submodules that do not override default output (like watchlistraw)
 * query~2 will not allow any extension (non-core) submodules with less than 3-letter prefixes or beginning with letter 'g' (reserved for generator use).
 * a single query may not combine multiple versions of the same submodule: list=allpages|allpages~2 is invalid
 * each submodule may declare minimum query version required: list~2 may only work under query~2 or higher, but not under query.
 * the output of each submodule~n is placed under the 'query' tag without the version number: { 'query': {'submodule':...} }

Easy continue
Easy continue allows significant client simplification. Easy continue is an API guarantee to the client that by simply adding all items in the 'continue' section to the next query the client will receive all available data, without accidentally skipping some values due to a 'limit' parameters or the generator paging. This change can be made available in all query versions, and made default in action=query~2. Client logic (in python) [expand] A client library could have this code (uses python requests lib):

Query incomplete pages
Notify client if not all properties have finished populating the 'page' element, and the client should merge it with the result of the subsequent api call. E.g. action=query~2&titles=Page1|Page2&prop=links could get this result, in which 'Page2' does not have all containing links, and should be merged with the result of the next call. This change can be made for all query versions. Client logic (in python) [expand]

API Cleanup

 * request rewriting (aliasing)
 * facility for renaming modules and parameters, or any other parameter manipulations.


 * all query extensions should use 3+ letter prefixes to avoid conflicts with the core
 * rename prop=templates into prop=transclusions - more accurate name. Prefix can stay the same.
 * list=watchlistraw should place its results under 'query' element just like all other modules.
 * list=watchlistraw</tt> and possibly list=watchlist</tt> should be renamed to clarify their meaning. watchlistraw is a list of monitored pages. watchlist is the list of changes done to the monitored pages. Looking for better naming.
 * query~2 would always use the easy continue unless the client sets 'legacycontinue' parameter
 * deprecate converttitles - any language-specific normalization should be added to 'normalized' element
 * format=json</tt> will output pages as a list, not as a dictionary: 'pages': [ {}, {}, {} ]</tt> instead of 'pages': { '1':{}, '2':{}, '3':{} }</tt>
 * Remove indexpageids=''</tt> - won't be needed after the JSON formatter change
 * Any naming changes for the Page types table to make them more consistent?
 * Any query module that has prop</tt> parameter will always require it, when used in production (not format=xmlfm/jsonfm/...)

Continuation

 * Fix all modules to use continue</tt> instead of overwriting one of the original parameters
 * Both start</tt> and continue</tt> are used by AllImages</tt>, CategoryMembers</tt>, Deletedrevs</tt>, ImageInfo</tt>, UserContributions</tt>
 * from</tt> is used by <tt>AllMessages</tt>, <tt>AllUsers</tt>
 * <tt>offset</tt>	is used by <tt>ExternalLinks</tt>, <tt>ExtLinksUsage</tt>, <tt>QueryPage</tt>, <tt>Search</tt>
 * <tt>prop</tt> is used by <tt>Siteinfo</tt>
 * <tt>start</tt> is used by <tt>Blocks</tt>, <tt>LogEvents</tt>, <tt>ProtectedTitles</tt>, <tt>RecentChanges</tt>, <tt>Watchlist</tt>
 * <tt>users</tt> is used by <tt>Users</tt>

Changes under discussion

 * per module <tt>flags</tt> parameter should replace all the boolean flags of that module:
 * query - <tt>redirects=&export=&indexpageids=''</tt> should be replaced with <tt>flags=redirects|export|indexpageids</tt>
 * imageinfo - <tt>iilocalonly</tt> should be replaced with <tt>iiflags=localonly|...</tt>, etc

Justification

 * Allows clients to avoid updating every time API changes
 * Reduces the cost of making a breaking change
 * Organize feature changes - if the client asks for ver X, API guarantees the capabilities of X and result in format X.
 * Recommended API usage is shown as the latest version. If API default behavior (v1) changes to be optional (v2), new developers will work with the new default (recommended) way from the start.
 * No clutter with ever expanding list of additional parameters - with versioning, new parameters could replace old ones, or change their meaning, or be removed completely without breaking any clients.
 * Ability to obsolete capabilities in a structured way: MW supports API requests with version X+, but will give standard warning for anything below the latest version Y. No need to parse warning messages to see if specific feature change applies.

Requirements
API versioning must solve these real life scenarios:
 * Client must identify itself to the host in order for us to notify developer of incorrect/suboptimal usage.
 * Client relies on the specific output format, and needs to always get the same.
 * Client wants to use feature X. How does it check if it is available.
 * Client has to be notified that feature Y is obsolete (Unsure of this)
 * Updates to the core must not change API output and behavior, except the obsolete notification
 * An extension may add functionality to the API, and might be updated independently from the core.

Per-module versioning
api.php ? action=query~2 & ... Each API module will have its own versioning number. This approach provides independence between different components, and solves the version conflict problem described in the other section.

Global Version Number
api2.php ? action=query & ...

One global version is good to deal with overall compatibility issues, such as changing the result data format: structured warnings, json only format, etc.

Despite this, even though having one main version parameter approach seems more reasonable at a first glance, it seems to be a dead end with respect to the modular expandable nature of our API. Any extension can add an API module, but no one can ever guarantee that an extension will be upgraded at the same time as the core.

Let's say there is a server with the core ver 1102 and an extension ver 1206. Later MW 1309 is released, with the new API behavior. Independently an extension 1309 is also released to support the new features.

On the server, admins update the core right away, but decide to delay extension deployment because they might want to do additional testing, or for any other reason. So now the core 1309 is working with extension 1206. This is totally fine so far - the core frequently runs with the older extensions.

After MW update, but before extension is put in production (which could be a long time), API client is created against the API version 1309. The developers use one global version number to access API - 1309 - to test and develop the new program, and they assume that the extension results will stay the same - they might not know that the new extension is pending. But later, at a whim of an admin, the extension is finally added to production, at which point the client fails - what was suppose to be a firm contract - "you ask for ver X you get ver X despite upgrades" has been broken.

To avoid any kind of surprises like this, there should never be any "default" fallback behavior in the API. The developer may query which versions are available, and choose to work with the one available, but will do so explicitly. See Defaults section below.