Requests for comment/API roadmap

From MediaWiki.org
Jump to: navigation, search

Contents

Completed [edit]

All implemented features from this RFC are moved to the archive page.

  • Simplified query continuation

Background [edit]

MediaWiki API has been steadily growing and adding features, and even though it provides most of the desired functionality, I (Yurik) feel it is necessary to discuss our future plans for growth, versioning, and overall development strategy.

Justification [edit]

  • Allows clients to avoid updating every time API changes
  • Reduces the cost of making a breaking change
  • Organize feature changes - if the client asks for ver X, API guarantees the capabilities of X and result in format X.
  • Recommended API usage is shown as the latest version. If API default behavior (v1) changes to be optional (v2), new developers will work with the new default (recommended) way from the start.
  • No clutter with ever expanding list of additional parameters - with versioning, new parameters could replace old ones, or change their meaning, or be removed completely without breaking any clients.
  • Ability to obsolete capabilities in a structured way: MW supports API requests with version X+, but will give standard warning for anything below the latest version Y. No need to parse warning messages to see if specific feature change applies.

Requirements [edit]

API versioning must solve these real life scenarios:

  • Client must identify itself to the host in order for us to notify developer of incorrect/suboptimal usage.
  • Client relies on the specific output format, and needs to always get the same.
  • Client wants to use feature X. How does it check if it is available.
  • Client has to be notified that feature Y is obsolete (Unsure of this)
  • Updates to the core must not change API output and behavior, except the obsolete notification
  • An extension may add functionality to the API, and might be updated independently from the core.
  • Minimalism: All API capabilities should return only the data requested to minimize bandwidth and improve speed.

General API Proposals [edit]

 // The action is the first parameter after the slash, allowing for varnish to forward based on action
 request.open("GET", "api.php/query~2 ? ...");
 // Use header to avoid cache fragmentation.
 // X-User-Agent is used by javascript clients because of browser restrictions
 request.setRequestHeader("X-User-Agent", "MyProgram/4.2 UsingLib/2.8");

New entry point [edit]

At first a new entry point (api2.php) seemed to be the best way to proceed, but Gabriel proposed to use the built-in PHP's $_SERVER["PATH_INFO"] value, which would allow us to convert api.php?action=query&... into api.php/query?.... Doing this eliminates the api2.php need.

Setting a new entry point allows us

  • change overall output structure
  • reduce tolerance for incorrect requests
  • require the new 'agent' parameter in case the HTTP's useragent string is missing or begins with 'mozilla' or 'opera' to help us contact the broken client's author. This parameter must be part of the URL's query string even for POST requests.
  • allow new structure for warnings (See warnings and errors internationalization below)

action~2 versioning [edit]

In addition to the versioned entry point, each action module could add its own versioning, which would allow:

  • remove previously added feature / parameter / behavior
  • change parameter naming
  • change parameter defaults
  • change default output format

Cleanup [edit]

  • Request rewriting (aliasing) facility for renaming modules and parameters, or any other parameter manipulations.
  • Individual module name and parameter changes are here.
  • action=watch should perhaps not return ui messages that vary with the user language.
  • JSON formatter - replace {'*': 'text'} with {'_': 'text'}

Modules refactoring [edit]

Here are modules that might duplicate functionality, appear closely related to be merged into one, or whose features should be moved out into a different/new module.

  • action=sitematrix extension
  • meta=siteinfo should be broken up into many meta=A|B|C actions. They don't seem to share much in common, and this approach would be cleaner from the usage, as well as more modular and extendable. Example: meta=namespaces|usergroups. The deciding factor between the module being meta or list could be ability for common users to influence it. For example, a new user can be added to the wiki, so its a list, where as usergroup is set up by the administrators, hence its a meta.

Tokens [edit]

This section needs improvement. It will describe the API token infrastructure, both client usage and internal practices.

  • remove base::getToken() - possibly replace by getTokenSalt()
  • main:setupModule - can $gettoken be false?

generator support for other actions [edit]

Partially complete

It seems that in some cases, actions like watch, delete, undelete, purge, rollback, patrol, etc. require some of the action=query functionality for creating a list of pages to work on. I think it would make sense to provide ApiPageSet functionality for titles, pageids, revids, redirects, normalization, and most importantly all generators to all other relevant actions besides query. This will reduce the load on the master (one DB commit instead of multiple), reduce the number of individual module command parameters because they will reuse generator or pageset features, and allow much greater flexibility with regards to generating a list of pages - since most modules currently do not support any of the generator options. This feature has already been merged.

 // add all pages in the category 'MediaWikiAPI' to my watchlist
 api.php/watch ? generator=categorymembers & cmtitle=MediaWikiAPI

Help screen cleanup [edit]

The main help screen is very long and hard to read, and in a dire need to be cleaned up.

  • action=help (no params default) should output just the list of modules with their descriptions
  • Clicking on the module name should bring module's full page - action=help & modules=name
  • Main page should have a link to show unified screen the way it is now (good for some text searches)

Errors and Warnings Localization [edit]

Mediawiki employs a very good translatewiki.net tool for all translation needs, and I think we should use that, instead of each module providing a list of warnings and error messages. We could introduce global lang=code parameter that would specify what language the user needs the message in. In case of an error or a warning, API will translate the message into the required language, or wiki's default language if lang=(nothing). Optionally we might decide to use the HTTP Accept-Language header.

'warnings': {
    'unknownparams': "Parameters 'param1' and 'param2' were not recognized",
    'deprecated': "Parameters 'api', 'query', and 'param' are deprecated",
}

In case the lang parameter is not provided (or a magic keyword 'none'? TBD), the results are returned as arrays:

'warnings': {
    'unknownparams': ['param1', 'param2'],
    'deprecated': ['api','query','param'],
}

According to Manual:Messages API, message parsing could happen on both PHP and JavaScript level. Need an expert opinion if it's possible to structure this so that final string generation (more work) is done in the browser if available, rather than the server.

Query Proposals [edit]

query~2 submodule versioning [edit]

Modular nature of the action=query allows us to version the individual props, lists, and meta submodules. The proposed rules are:

  • query~2 supports all nonversioned submodules that do not override default output (like watchlistraw)
  • query~2 will not allow any extension (non-core) submodules with less than 3-letter prefixes or beginning with letter 'g' (reserved for generator use).
  • There needs to be a LocalSettings-configurable extension prefix override in case two extensions have the same prefix.
    • Or conflicting extensions could just be fixed.
  • a single query may not combine multiple versions of the same submodule: list=allpages|allpages~2 is invalid
  • each submodule may declare minimum query version required: list~2 may only work under query~2 or higher, but not under query.
  • the output of each submodule~n is placed under the root without the version number: {'submodule':..., 'pages':..., 'normalized':..., 'continue':...}

Query incomplete pages [edit]

Notify client if not all properties have finished populating the 'page' element, and the client should merge it with the result of the subsequent api call. E.g. action=query~2&titles=Page1|Page2&prop=links could get this result, in which 'Page2' does not have all containing links, and should be merged with the result of the next call. This change can be made for all query versions.

'pages': [
    {
        'id': 42,
        'title': 'Page1',
        'links': [...]
    },
    {
        'id': 84,
        'title': 'Page2',
        'links': [...],
        'incomplete': '',
    },
]

Cleanup [edit]

  • Individual module cleanup is here.
  • Move all items from 'query' to root in the result. The 'pages', 'normalized', 'continue', and all list/meta elements will be under the root element.
  • All query extensions should use 3+ letter prefixes to avoid conflicts with the core
  • query~2 would always use the simplied continue unless the client sets 'legacycontinue' parameter
  • Remove indexpageids= - won't be needed after the JSON formatter change
  • Any query module that has prop parameter will always require it, when used in production (not format=xmlfm/jsonfm/...)
  • Add MediaWiki version with possible GIT URL(s) to meta=siteinfo (same as Special:Version page)
  • When using aplimit=max, the limits section should use parameter name, not module name: {'limits': {'aplimit': 500}} instead of {'allpages': 500}
  • Allow query to page if too many titles/pageids/revids are given. This way client does not need to worry about passing too many titles (currently api only sets a textual warning) - the query will simply treat it just like a generator, processing first N pages, and specifying that in the next query the first N values should be ignored.
  • format=json will output pages as a list, not as a dictionary: 'pages': [ {}, {}, {} ] instead of 'pages': { '1':{}, '2':{}, '3':{} }
  • Replace 'image' with 'file' in module names, parameters, and output. See this list. Modules with image as output: action=upload|parse, meta=siteinfo. Modules with params: action=delete and maybe more. Error message clean up is harder to do, because than there will differ from core/gui.
  • Attributes always in lower case, e.g. action=block gives userID back. See this list.
  • Consistens for property names: some modules have reason, other a comment or description. Some modules have a pagesize, other a pagelen. See this list.
  • Removing deprecated params like watch/unwatch for action=edit, aliases like dimensions/size. See this list.

Multi-writing support (Seeking comments) [edit]

  • Many wikis use multiple writing systems, and can auto-convert from one to another. Current boolean flag converttitles uses the following logic, and might need to be changed to allow for variant requests (normalize title to variant=X) or possibly other methods.
if ($numberOfVariant > 1 && !$titleObj->exists()) $wgContLang->findVariantLink( $title, $titleObj );

Continuation [edit]

  • Fix all modules to use continue instead of overwriting one of the original parameters
Both start and continue are used by AllImages, CategoryMembers, Deletedrevs, ImageInfo, UserContributions
from is used by AllMessages, AllUsers
offset is used by ExternalLinks, ExtLinksUsage, QueryPage, Search
prop is used by Siteinfo
start is used by Blocks, LogEvents, ProtectedTitles, RecentChanges, Watchlist
users is used by Users

Query item count [edit]

We get lots of requests to implement count(*) functionality for various modules, and even though there is plenty of justification to get it, the fundamental database limitation has always stopped us - counting all items is an O(N) table traversal. As a result, the clients could only do a full client-side iteration of all the data and count it locally. This wastes both the server resources and bandwidth.

Now, correct me if I am wrong, but it seems that frequently the client just needs to know if the count is above a certain threshold, e.g. has a user made more than 10 edits in the last year, or does this page have more than 1 page linking to it. We could easily implement this with a count parameter:

? action=query~2 & ... & count=backlinks|links & bllimit=100 & pllimit=100

If the module name is listed in the count parameter, the resulting element is replaced with the count.

{
  'pages': [
    { 'title': 'PageTitle',
      'links': '42',  // <-- instead of a list, total links count is 42
    } ],
  'backlinks': '100+' // <-- instead of a list, the bllimit was set to 100, but more than 100 exists
}

I believe the api users would be happy with this compromise, and in case they really do have to know the exact count and iterate over all items, would save a lot of bandwidth. The implementation is fairly straightforward - ApiQuery.php would replace any list or prop with the module's name, and would also allow the module to optimize internal SQL.

Changes under discussion [edit]

  • per module flags parameter should replace all the boolean flags of that module:
query - redirects=&export=&indexpageids= should be replaced with flags=redirects|export|indexpageids
imageinfo - iilocalonly should be replaced with iiflags=localonly|..., etc

Client Libraries [edit]

MediaWiki should maintain simple default libraries in several popular languages for basic API usage. The libraries should implement those features that are either absolute minimum or unlikely to be implemented by the library writers, but important to the servers.

  • Request throttling
  • Token management / login
  • Agent strings
  • Error & warning handling
  • Rudimentary request/response: {'action':'query', 'list':'allpages'}{'allpages':[...]}
  • Query continue and changed revisions detection

Less is more - the less functionality we define as "must have", the less we will have to maintain. There should not be too much action-specific functionality, possibly with the exception of query and edit, and even there - bare minimum.

Initial language support
  • Python (ver 3)
  • JavaScript
  • .NET
  • Java

Content API [edit]

This section is for the proposals related to purely content-requesting API. Due to heavy caching requirements unlike the other parts of the API, I think it would be highly beneficial to diverge from the overall API model here in order to take the most advantage of the squid or other types of caching.

  • REST-style
  • JSON-only output
  • Minimum number of parameters
  • URL rewriting

Use Cases (PLEASE EXPAND) [edit]

Please help us plan it by adding your usage scenario.

  • (mobile) Get HTML of one page/header/section/TOC/paragraph (possibility of formatting layout consistently)
  • (mobile) Get Tables/Pictures of one page separately
  • (mobile) Get Pictures with possibility to scale item
  • Embed page header/section into another site (iframe or extension)

Other use cases / wish list [edit]

  • (Parsoid) Expand a batch of templates and extension hooks with complete isolation between each member of the batch. The actions would be very similar to a combination of action=expandtemplates and action=parse, but without the parser involvement. For this, we need 1) ideally generic batching support, 2) a dedicated end point for template expansion and 3) an end point to call a tag extension hook directly. Batching is needed to amortize the per-request overheads (HTTP connection setup, PHP startup costs). Parsoid expands all templates in a page in parallel, which is not very efficient when done with one API request per template.
  • An output format that can spit out a page that is in wikitext would be useful. Sometimes I want to build a maintenance list to be ported into Wikipedia:AutoWikiBrowser or simply to upload and to work with it. Next to impossible for those who are not hackers, but can at least work out API queries, to get a list with active links. 12:52, 23 February 2013 (UTC) (Billinghurst)

See also [edit]