API:Implementation Strategy

From mediawiki.org

This explains the implementation of the MediaWiki API machinery in core. If you want to provide an API in your code for clients to consume, read API:Extensions .

File/Module Structure[edit]

  • api.php is the entry point, located in the wiki root. See API:Main page#The endpoint.
  • includes/api will contain all files related to the API, but none of them will be allowed as entry points.
  • All API classes are derived from a common abstract class ApiBase. The base class provides common functionality such as parameter parsing, profiling, and error handling.
  • ApiMain is the main class instantiated by api.php. It determines which module to execute based on the action=XXX parameter. ApiMain also creates an instance of the ApiResult class, which contains the output data array and related helper functions. Lastly, ApiMain instantiates the formatting class that will output the data from ApiResult in XML/JSON/PHP or other format to the client.
  • Any module derived from ApiBase will receive a reference to an instance of the ApiMain during instantiation, so that during execution the module may get shared resources such as the result object.

Query modules[edit]

  • ApiQuery behaves similar to ApiMain in that it executes submodules. Each submodule derives from ApiQueryBase (except ApiQuery itself, which is a top-level module). During instantiation, submodules receive a reference to the ApiQuery instance.
  • All extension query modules should use a 3 or more letter prefixes. The core modules use 2 letter prefixes.
  • ApiQuery execution plan:
    1. Get shared query parameters list/prop/meta to determine needed submodules.
    2. Create an ApiPageSet object and populate it from the titles/pageids/revids parameters. The pageset object contains the list of pages or revisions that query modules will work with.
    3. If requested, a generator module is executed to create another PageSet. Similar to the piping streams in UNIX. Given pages are the input to generator that produces another set of pages for all other modules to work on.
  • Requirements for query continuation:
    • The SQL query must be totally ordered. In other words, the query must be using all columns of some unique key either as constants in the WHERE clause or in the ORDER BY clauses.
      • In MySQL, this is an exclusive or, to the point where querying Foo and Bar must order by title but not namespace (namespace is constant 0), Foo and Talk:Foo must order by namespace but not title (title is constant "Foo"), and Foo and Talk:Bar must order by both namespace and title.
    • The SQL query must not filesort.
    • The value given to setContinueEnumParameter() must include all the columns in the ORDER BY clause.
    • When continuing, a single compound condition should be added to the WHERE clause. If the query has ORDER BY column_0, column_1, column_2, this condition should look something like this:
(column_0 > value_0 OR (column_0 = value_0 AND
 (column_1 > value_1 OR (column_1 = value_1 AND
  (column_2 >= value_2)
 ))
))

Of course, swap ">" for "<" if your ORDER BY columns are using DESC. Be sure to avoid SQL injection in the values.

Internal data structures[edit]

  • Query API has had very successful structure of one global nested array() structure passed around. Various modules would add pieces of data to many different points of that array, until, finally, it would get rendered for the client by one of the printers (output modules). For the API, we suggest wrapping this array as a class with helper functions to append individual leaf nodes.

Error/status reporting[edit]

For now we decided to include error information inside the same structured output as normal result (option #2).

For the result, we may either use the standard HTTP error codes, or always return a properly formatted data:

Using HTTP code
void header( string reason_phrase [, bool replace [, int http_response_code]] )

The header() can be used to set the return status of the operation. We can define all possible values of the reason_phrase, so for the failed login we may return code=403 and phrase="BadPassword", whereas for any success we would simply return the response without altering the header.

Pros: It's a standard. The client always has to deal with HTTP errors, so using HTTP code for result would remove any separate error handling the client would have to perform. Since the client may request data in multiple formats, an invalid format parameter would still be properly handled, as it will simply be another http error code.

Cons: ...

Include error information inside a proper response

This method would always return a properly formatted response object, but the error status/description will be the only values inside that object. This is similar to the way current Query API returns status codes.

Pros: HTTP error codes are used only for the networking issues, not for the data (logical errors). We do not tied to the existing HTTP error codes.

Cons: If the data format parameter is not properly specified, what is the format of the output data? Application has to parse the object to know of an error (perf?). Error checking code will have to be on both the connection and data parsing levels.

Boilerplate code[edit]

Simple API module
<?php

class Api<module name> extends ApiBase {
	public function __construct( $main, $action ) {
		parent::__construct( $main, $action );
	}

	public function execute() {
		
	}

	public function getAllowedParams() {
		return array(
			'<parameter name>' => array(
				ApiBase::PARAM_TYPE => array( 'foo', 'bar', 'baz' ),
			),
		);
	}

	public function getParamDescription() {
		return array(
			'<parameter name>' => '<parameter description>',
		);
	}

	public function getDescription() {
		return '<Module description here>';
	}

	public function getExamples() {
		return array(
			'api.php?action=<module name>&<parameter name>=foo'
		);
	}

	public function getHelpUrls() {
		return '';
	}
}