User:DKinzler (WMF)/Client Software Guidelines

This document provides guidance for client software that is accessing Wikimedia APIs.

Introduction
Norms for maintainers See also:
 * How to find APIs
 * Where to find documentation and specs
 * common data types
 * error formats
 * paging
 * Scope and applicability
 * Who is the target audience
 * What happens if I'm not following the guidelines.
 * breakage
 * blocks
 * MUST subscribt to mailing list (api-announce?)
 * MAY make code available for code search


 * REST and HATEOAS.

MUST specify a meaningful User-Agent header
Clients software interacting with our APIs must follow the Wikimedia User-Agent policy, which requires that clients must send a  header containing information that allows us to identify the software and provides a way to contact the maintainer in case of issues. The header must have the following form:

.

In case a  header cannot be set (e.g. because the client code is executing in a web browser which sets its own   header), client code should set the   instead.

NOTE: Clients that do not follow this requirement may be blocked or severely rate-limited.

MUST inform users about errors
Errors received by the client MUST be surfaced to the user if they could not be resolved, e.g. by retrying (see ). In particular, the user MUST be alerted about unresolved server side issues (status code 5xxx) and about unexpected failures to perform the request (status 4xx). Clients should use the information contained in the response body to provide further information to the user [TBD reference relevant error payload spec] (see also ). [TBD: talk about localization].

Clients SHOULD take particular care to provide detailed information to the user when a request has failed because (e.g. with a status 403) the user lacks the necessary privileges (perhaps because of a user or IP block). [TBD: reference spec for block info data structures]

MUST surface warnings to maintainers
[NEW!] Warnings that do not prevent the request from completing MUST be surfaced to maintainers and system administrators in an appropriate way, such as:


 * Making an entry in a log file
 * Failing automated tests
 * Outputting a message on the command line when in verbose mode
 * Displaying a warning when in development mode

Warnings that must be brought to the attention of software maintainers include:

This requirement is intended to ensure that software developers are informed about issues with their client in a timely and non-disruptive way, so they can take action before things break.
 * Deprecation and Sunset response headers.
 * [TBD: X-WMF-Warning or X-API-Warning headers]
 * [TBD: warnign section in response body, e.g. in the action API]

NOTE: HTTP client librariesmay hide errors and warnings from the client code by transparently following redirects. For instance, a deprecated API endpoint may use a 308 redirect to direct the client to the new endpoint, while providing a deprecation header. To allow the client software to process this header and make the developer aware of the issue, automatic resolution of redirects has to be disabled in the underlying library.

SHOULD gracefully handle HTML content when receiving errors
[NEW!] Client code SHOULD be prepared to process HTML responses when receiving 4xx or 5xx status codes, even if the address they are accessing is documented to return a machine readable format such as JSON. Clients should use the Content-Type header to detect HTML responses.

The reason is that, while API endpoints should be designed to return machiene readable errors descriptions, intermediate layers, such as proxies and caches, will often generate HTML responses when something goes wrong. The client should make an effort to process the HTML response in a meaningful way.

MUST NOT use APIs marked as non-public
Client software MUST NOT access APIs that are indicated to be non-public (restricted or private or internal). Such APIs are reserved for use by software that is controlled by Wikimedia and its affiliates. This also applies for undocumented APIs.

APIs may be indicated to be non-public by their documentation, or by markers in the address used to access them [TBD: reference URL design guides]. The same API endpoint may be public when accessed through one address, and non-public when accessed through another.

EXAMPLE: If Wikimedia builds a feature specifically for Wikipedia, this may involve creating an API that serves data to populate the user interface components used by that feature. While there is nothing special or secret about that data, the way it is bundled and structured is specific to the user interface component (backend-for-frontend pattern), and there is no plan to keep it stable. To avoid surprises, third party clients may not access this API.

EXAMPLE: Wikimedia may set up a server cluster optimized for serving the Wikipedia mobile app. The APIs served by this server cluster would be the same as the ones offered to the general public, but access to this cluster would be reserved for use by the Wikipedia app, to guarantee operational stability.

NOTE: Accessing APIs that are marked as experimental is acceptable, but they should not be relied upon. They may change or vanish without warning.

NOTE: [NEW!] Clients that do not follow this requirement may be blocked or severely rate-limited.

MUST NOT start to use deprecated APIs
Client software MUST NOT be written to access APIs that are already documented to be deprecated. Existing software may continue to access deprecated APIs, but MUST surface the associated warnings to the developer.

See also: and.

SHOULD use the latest version of the API
Client software should be kept up to date with the current version of the API. For this purpose, the maintainers of the software should subscribe to the relevant communication channels [TBD], and implement tests that will warn them about their software using deprecated APIs (see #MUST surface warnings to maintainers).

MUST NOT obscure the identity of the user
Client software that makes authenticated requests MUST authenticate as the user who is actually controlling the activity. It is not acceptable to use the software maintainer's credentials when making API calls on behalf of others. Unauthenticated requests on behalf of others are permitted but not recommended for write operations or expensive or high volume queries.

In the case of a web application, one suitable mechanism for performing authenticated requests in behalf of others is OAuth2.

EXAMPLE: Suppose Joanne creates a web page on Toolforge that allows people to post messages to multiple users. This is implemented by calling APIs that edit the respective user talk pages. Then these API calls must not be made using Joanne's credentials. Instead, the users who which to post the messages must first authorize Joannes tool to act on their behalf using OAuth. The API calls must then be made using the relevant OAuth tokens. This way, the edits to the talk pages are attibuted top the users who actually controlled them, rather than to Joanne who wrote the tool.

NOTE: Clients that do not follow this requirement may be blocked.

MUST delay retries
Software that is sending requests to our APIs MAY retry requests upon receiving errors, if the nature of the error indicates that it is likely to be transient, such as:


 * HTTP status 503 ("service unavailable")
 * HTTP status 504 ("gateway timeout")
 * HTTP status 429 ("too many requests")
 * HTTP status 404 ("not found") when received right after creating the respective resoucre. In that case, the error is assumed to be due to stale data on the server side.

Clients MUST NOT automatically retry request after the following errors, even if they may be transient:


 * HTTP status 403 ("access denied"). See.

Software that implements retry logic MUST ensure that retries do not happen too quickly or to often. Specifically:


 * Clients SHOULD follow any instructuions about rate limits and retries provided in the response. In particular, clients SHOULD implement support for the Retry-After header, and delay any retry at least as long as specified by that header.
 * If the response does not provide information about the time to wait before sending the next retry, or if the client does not implement support for processing this information, then the client MUST NOT retry more than six times per minute.
 * When not using delay information provided in the response, the PREFERRED way to determine the wait time this is to apply exponential backoff, starting with a one second delay, and doubling the delay time with every attempt.
 * Clients that are not using any of the strategies described above MUST apply a fixed delay of ten seconds between all retry attempts.


 * The delay MUST be measured from the time the last response was received until before the next response is sent.

NOTE: [NEW!] Clients that do not follow this requirement may be blocked or severely rate-limited.
 * Clients MUST NOT send more then ten retries.

Implement correct semantics of data types
Client software must take care to interpret data types and structures in the way documented in their specification [TBD: reference inventory of standard data types].

This is particularly important for data types prone to subtle misinterpretation, such as:


 * date/time (time zones, year zero, etc)
 * intervals (open vs closed, end value vs size)
 * lists and sets (significant vs insignifcant order)
 * maps (case sensitivity of keys)
 * language codes (IANA vs WMF)
 * very small or very large numbers (float precision)
 * text (character sets, unicode normal form)

proper parameter encoding
Clients must use the UTF-8 encoding when making HTTP requests. Similarly, the response body must be interpreted as UTF-8, unless the Content-Type header specifies a different characters set.

HTTP headers in the request and responses are ASCII only, see RFC 7230.

All parts of the request URL use UTF-8, with percent-encoding used to escape special characters when they occurr in parameter values [TBD: ref path parameters, query parameters].

The following characters MUST be escaped when they occurr in path parameters, since they hold special meaning per the URL spec:


 * the question mark (?) as %3F
 * the percent sign (%) as %25
 * the hash sign (#) as %23

In addition, the following characters must also be escaped, because they may carry meaning the the context of REST endpoint routing:


 * the slash sign (/) as %2F
 * the pipe characters (|) as %7C (MAYBE, for custom verbs?) [TBD]
 * the ampersand characters (&) as %26 (MAYBE, for consisteny?) [TBD]

The following characters SHOULD NOT be escaped:


 * the colon [TBD]

EXAMPLE: [TBD: correct way to do this in MW, php, python, JS]

NOTE: the set of characters that must be escaped in query parameter values and secment ID is slightly different! [TBD]
 * query parameters
 * path parameters...

minimize the number of requests
Client code should be designed to minimize the number of requests it sends to the server. The simplest way to achieve this is to avoid requesting information that is not actually needed. Beyond that, some APIs may support features that allow the number of requests to be reduced, such as:


 * Batch requests [TBD: reference the corresponding API design guide].
 * Property expansion [TBD: reference the corresponding API design guide].
 * Use streaming[TBD] instead of polling when possible.

Another way to reduce the number of requests is to avoid unneccessary redirects by ensuring that the request URL is normalized as much as possible. In particular:

When available, clients should also make use of HTTP features that may reduce the number of requests, such as keep-alive and pipelining.
 * Do not include trailing slashes or double slashes in the URL
 * Use the canonical form of resource identifiers in the URL

NOTE: Strategies to reduce the number of requests can be add odds with the goal of minimizing the amount of data transferred. The goal should be to keep both at a minimum, rather than to optimize one at the expense of the other. See.

NOTE: Clients that cause an excessive number of requests may be rate limited or even blocked completely.

minimize the amount of data transferred
Client code should be designed to minimize the amount of data requested from the server. The simplest way to achieve this is to avoid requesting information that is not actually needed. Beyond that, some APIs may support features that allow the amount of data to be reduced, such as: Furthermore, some features should be avoided in order to reduce the amount of traffic: Clients should also make use of HTTP features that may reduce the amount of traffic, such as transparent compression.
 * filtering of collections [TBD: reference the respective API design guide]
 * filtering of fields [TBD: reference the respective API design guide]
 * Property expansion, when not needed [TBD: reference the respective API design guide]

NOTE: Strategies to reduce the amount of data can be add odds with the goal of minimizing the number of requests. The goal should be to keep both at a minimum, rather than to optimize one at the expense of the other. See.

NOTE: Clients that cause an excessive ammount of traffic may be rate limited or even blocked completely.

use on officially supported client library
TBD...

SHOULD follow redirects

 * ...unless...
 * use correct semantics for 301, 302, 303, and 308, etc

MUST comply with robots.txt when scraping [??]
[do we need this here? robots.txt isn't really about APIs...]

SHOULD support compression
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Encoding

https://developer.mozilla.org/en-US/docs/Web/HTTP/Compression#end-to-end_compression

MUST follow HTTP standards
Clients that interact with you APIs must follow the relevant HTTP standards, most importantly RFC 9110. This can for the most part be achieved by using a good HTTP client library.

More resources:


 * https://developer.mozilla.org/en-US/docs/Web/HTTP/Resources_and_specifications

SHOULD be designed to be robust against changes and failures
Clients could should follow the Robustness Principle: "be conservative in what you do, be liberal in what you accept from others". In practice, this means that failures of the network and of the server should be handled gracefully, and assumptions about the behavior of the server should be kept to a minimum.

See also:

[TBD: reference doc about what changes to the response body structure are considerd non-breaking (adding fields, etc - check wikidata stable interface policy)]

MAY authenticate

 * authentication methods
 * csrf tokens
 * SHOULD use OAuth when acting on behalf of others
 * SHOULD auth when editing?

MAY request localization

 * content, errors

MUST NOT rely on undocumented or deprecated behavior
????