User:DKinzler (WMF)/Client Software Guidelines

This document provides guidance for client software that is accessing Wikimedia APIs.

Introduction
Norms for maintainers See also:
 * How to find APIs
 * Where to find documentation and specs
 * common data types
 * error formats
 * paging
 * Scope and applicability
 * Who is the target audience
 * What happens if I'm not following the guidelines.
 * breakage
 * blocks
 * MUST subscribt to mailing list (api-announce?)
 * MAY make code available for code search


 * REST and HATEOAS.

SHOULD minimize the number of requests
Client code should be designed to minimize the number of requests it sends to the server. The simplest way to achieve this is to avoid requesting data that is not actually needed. Beyond that, some APIs may support features that allow the number of requests to be reduced, such as:


 * Batch requests [TBD: reference the corresponding API design guide].
 * Property expansion [TBD: reference the corresponding API design guide].

Another way to reduce the number of requests is to avoid unneccessary redirects by ensuring that the request URL is normalized as much as possible. In particular:


 * Do not include trailing slashes or double slashes in the URL
 * Use the canonical form of resource identifiers in the URL

NOTE: Please take care that you don't end up requesting a lot of unnecessary data in order to avoid requests, for instance by requesting all properties to be always erxpanded, even if their value is not actually needed. See.

NOTE: Clients that cause an excessive number of requests may be rate limited or even blocked completely.

SHOULD minimize the amount of data transferred
NOTE: Clients that cause an excessive ammount of otraffic may be rate limited or even blocked completely.
 * filter unneeded entities
 * exclude fields if not needed
 * avoid expansion if not needed
 * support compression

MUST specify a meaningful User-Agent header
Clients software interacting with our APIs must follow the Wikimedia User-Agent policy, which requires that clients must send a  header containing information that allows us to identify the software and provides a way to contact the maintainer in case of issues. The header must have the following form:

.

Parts that are not applicable can be omitted.

In case a  header cannot be set (e.g. because the client code is executing in a web browser which sets its own   header), client code should set the   instead.

NOTE: Clients that do not follow this requirement may be blocked or severely rate-limited. This will likely happen without warning, because if you are not providing contact information in the User-Agent string, we don't have a way to warn you.

SHOULD follow redirects

 * ...unless...
 * use correct semantics for 301, 302, 303, and 308, etc

MUST surface user blocks
When the API refuses to perform an action because the user is blocked or doesn't have the necessary privileges (e.g. with a status 403), the client software MUST notify the user of this fact. It SHOULD make use of any relevant information supplied in the response body to provide the user with details about the issue. [TBD: reference spec for block info data structures] [TBD: talk about localization]

MUST surface errors and warnings
Inform the user about:


 * server errors (5xx)
 * client errors (4xx) as appropriate

See spec for error document structure!

Inform the developer (and possibly also the user) about:

...automated tests...
 * Deprecation headers
 * Sunset headers
 * X-WMF-Warning headers

MUST gracefully handle HTML content when receiving errors
Client code must be prepared to process HTML responses when receiving 4xx or 5xx status codes, even if the address they are requesting data from is specified to return a machine readable format such as JSON.

The reason is that, while API endpoints should be designed to return machiene readable errors descriptions, intermediate layers that process the HTTP request, such as proxies and caches, will often generate HTML bodies when something goes wrong. The client should make an effort to process the HTML response in a meaningful way.

MUST delay retries
Software that is sending requests to our APIs MAY requests upon receiving errors, if the nature of the error indicates that it is likely to be transient. This is typically the case for status code 503, but also plain 500, and particularly 429. Other error codes may be interpreted as transient depending on the context: for instance, attempts to access a newly created resources may temporarily result in a 404 response due to replication lag.

Software that implements retry logic MUST ensure that retries do not happen too quickly and to often. Specifically:

NOTE: Clients that do not follow this requirement may be blocked or severely rate-limited.
 * If the response contained a Retry-After header, the client MUST wait for the specified time until retrying.
 * If the response body contains more details about the applicable rate limit, the client SHOULD use that information to determine how long it should wait until it makes another request.
 * Otherwise, the client SHOULD apply an exponential backoff strategy, starting with a delay of one second for the first retry, and doubling the wait time for each subsequent retry.
 * If exponential backoff is not implemented and no Retry-After header is received, the client should wait at least five seconds before retrying.
 * Clients MUST NOT retry a request more than ten times.
 * Clients SHOULD make an effort to avoid sending retries from multiple parallel threads or processes independently.

MUST NOT use APIs marked as restricted
Client software MUST NOT access APIs that are documented to be restricted (or private or internal). Such APIs are reserved for use by software that is controlled by Wikimedia and its affilliates. This also applies for undocumented APIs.

NOTE: Clients that do not follow this requirement may be blocked or severely rate-limited.

NOTE: Accessing APIs that are marked as experimental is acceptable, but they should not be relied upon. They may change or vanish without warning.

MUST NOT start to use deprecated APIs
Client software MUST NOT be written to access APIs that are already documented to be deprecated. Existing software may continue to access deprecated APIs, but MUST surface the associated warnings to the developer.

See also: and.

SHOULD use the latest version of the API
Client software should be kept up to date with the current version of the API. For this purpose, the maintainers of the software should subscribe to the relevant communication channels [TBD], and implement tests that will warn them about their software using deprecated APIs (see ).

MUST not make authenticated requests on behalf of others
Client software that makes authenticated requests MUST authenticate as the user who is actually controlling the activity. It is not acceptable to use the software maintainer's credentials when making API calls on behalf of others.

In the case of a web application, one suitable mechanism for achieving this is OAuth2.

EXAMPLE: Suppose Joanne creates a web page on Toolforge that allows people to post messages to multiple users. This is implemented by calling APIs that edit the respective user talk pages. Then these API calls must not be made using Joanne's credentials. Instead, the users who which to post the messages must first authorize Joannes tool to act on their behalf using OAuth. The API calls must then be made using the relevant OAuth tokens. This way, the edits to the talk pages are attibuted top the users who actually controlled them, rather than to Joanne who wrote the tool.

NOTE: Clients that do not follow this requirement may be blocked.

Implement correct semantics of data types
Client software must take care to interpret data types and structures in the way documented in their specification [TBD: reference inventory of standard data types].

This is particularly important for data types prone to subtle misinterpretation, such as:


 * date/time (time zones, year zero, etc)
 * intervals (open vs closed, end value vs size)
 * lists and sets (significant vs insignifcant order)
 * maps (case sensitivity of keys)
 * language codes (IANA vs WMF)
 * very small or very large numbers (float precision)
 * text (character sets, unicode normal form)

MUST comply with robots.txt when scraping [??]
[do we need this here? robots.txt isn't really about APIs...]

SHOULD support compression
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Encoding

https://developer.mozilla.org/en-US/docs/Web/HTTP/Compression#end-to-end_compression

MUST follow HTTP standards
Clients that interact with you APIs must follow the relevant HTTP standards, most importantly RFC 9110. This can for the most part be achieved by using a good HTTP client library.

More resources:


 * https://developer.mozilla.org/en-US/docs/Web/HTTP/Resources_and_specifications

SHOULD be designed to be robust against changes and failures
Clients could should follow the Robustness Principle: "be conservative in what you do, be liberal in what you accept from others". In practice, this means that failures of the network and of the server should be handled gracefully, and assumptions about the behavior of the server should be kept to a minimum.

See also:

MAY authenticate

 * authentication methods
 * csrf tokens
 * SHOULD use OAuth when acting on behalf of others
 * SHOULD auth when editing?

MAY request localization

 * content, errors

MUST NOT rely on undocumented or deprecated behavior
????