User:DKinzler (WMF)/Client Software Guidelines

This document provides guidance for client software that is accessing Wikimedia REST APIs.

General Guidance
Norms for maintainers See also:
 * How to find APIs
 * Where to find documentation and specs
 * common data types
 * error formats
 * paging
 * Who is the target audience
 * What happens if I'm not following the guidelines.
 * MUST subscribt to mailing list (api-announce?)
 * MAY make code available for code search


 * REST and HATEOAS.

MUST follow HTTP standards
Clients that interact with you APIs must follow the relevant HTTP standards, most importantly RFC 9110. This can for the most part be achieved by using a good HTTP client library.

More resources:


 * https://developer.mozilla.org/en-US/docs/Web/HTTP/Resources_and_specifications

SHOULD be designed to be robust against changes and failures
Clients could should follow the Robustness Principle: "be conservative in what you do, be liberal in what you accept from others". In practice, this means that failures of the network and of the server should be handled gracefully, and assumptions about the behavior of the server should be kept to a minimum.

See also:

SHOULD minimize the number of requests
Client code should be designed to minimize the number of requests it sends to the server. The simplest way to achieve this is to avoid requesting data that is not actually needed. Beyond that, some APIs may support features that allow the number of requests to be reduced, such as:


 * Batch requests [TBD: reference the corresponding API design guide].
 * Property expansion [TBD: reference the corresponding API design guide].

Another way to reduce the number of requests is to avoid unneccessary redirects by ensuring that the request URL is normalized as much as possible. In particular:


 * Do not include trailing slashes or double slashes in the URL
 * Use the canonical form of resource identifiers in the URL

NOTE: Please take care that you don't end up requesting a lot of unnecessary data in order to avoid requests, for instance by requesting all properties to be always erxpanded, even if their value is not actually needed. See.

SHOULD minimize the amount of data transferred

 * filter unneeded entities
 * exclude fields if not needed
 * avoid expansion if not needed

SHOULD support compression
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Encoding

https://developer.mozilla.org/en-US/docs/Web/HTTP/Compression#end-to-end_compression

MUST specify a meaningful User-Agent header
Clients software interacting with our APIs must follow the Wikimedia User-Agent policy, which requires that clients must send a  header containing information that allows us to identify the software and provides a way to contact the maintainer in case of issues. The header must have the following form:

.

Parts that are not applicable can be omitted.

In case a  header cannot be set (e.g. because the client code is executing in a web browser which sets its own   header), client code should set the   instead.

NOTE: Clients that do not follow this policy are likely to be blocked or severely rate limited without warning, because if you are not providing contact information in the User-Agent string, we don't have a way to warn you.

MUST comply with robots.txt when scraping [??]
[do we need this here? robots.txt isn't really about APIs...]

SHOULD follow redirects

 * ...unless...
 * use correct semantics for 301, 302, 303, and 308, etc

MUST surface user blocks
When the API refuses to perform an action because the user is blocked or doesn't have the necessary privileges (e.g. with a status 403), the client software MUST notify the user of this fact. It SHOULD make use of any relevant information supplied in the response body to provide the user with details about the issue. [TBD: reference spec for block info data structures] [TBD: talk about localization]

MUST surface errors and warnings
Inform the user about:


 * server errors (5xx)
 * client errors (4xx) as appropriate

See spec for error document structure!

Inform the developer (and possibly also the user) about:


 * Deprecation headers
 * Sunset headers
 * X-WMF-Warning headers

MUST gracefully handle HTML content when receiving errors
Client code must be prepared to process HTML responses when receiving 4xx or 5xx status codes, even if the address they are requesting data from is specified to return a machine readable format such as JSON.

The reason is that, while API endpoints should be designed to return machiene readable errors descriptions, intermediate layers that process the HTTP request, such as proxies and caches, will often generate HTML bodies when something goes wrong. The client should make an effort to process the HTML response in a meaningful way.

MUST delay retries
Software that is sending requests to our APIs MAY requests upon receiving errors, if the nature of the error indicates that it is likely to be transient. This is typically the case for status code 503, but also plain 500, and particularly 429. Other error codes may be interpreted as transient depending on the context: for instance, attempts to access a newly created resources may temporarily result in a 404 response due to replication lag.

Software that implements retry logic MUST ensure that retries do not happen too quickly and to often. Specifically:


 * If the response contained a Retry-After header, the client MUST wait for the specified time until retrying.
 * If the response body contains more details about the applicable rate limit, the client SHOULD use that information to determine how long it should wait until it makes another request.
 * Otherwise, the client SHOULD apply an exponential backoff strategy, starting with a delay of one second for the first retry, and doubling the wait time for each subsequent retry.
 * If exponential backoff is not implemented and no Retry-After header is received, the client should wait at least five seconds before retrying.
 * Clients MUST NOT retry a request more than ten times.
 * Clients SHOULD make an effort to avoid sending retries from multiple parallel threads or processes independently.

MUST NOT use APIs marked as restricted
Client software MUST NOT access APIs that are undocumented, or are documented to be restricted (or private or internal). Such APIs are reserved for use by client software that is controlled by Wikimedia.

NOTE: Accessing APIs that are marked as experimental is acceptable, but they should not be relied upon. They may change or vanish without warning.

MUST NOT start to use deprecated APIs
Client software MUST NOT be written to access APIs that are documented to be deprecated. Existing software may continue to access deprecated APIs, but MUST surface the associated warnings to the developer.

See also: and.

MUST NOT rely on undocumented or deprecated behavior
????

SHOULD use the latest version of the API
Client software should be kept up to date with the current version of the API. For this purpose, the maintainers of the software should subscribe to the relevant communication channels [TBD], and implement tests that will warn them about their software using deprecated APIs (see ).

MUST implement the documented semantics of standard data types
Client software MUST take care to interpret data types and structures in the way documented in their specification [TBD: reference inventory of standard data types].

This is particularly important for data types prone to subtle misinterpretation, such as:


 * date/time (time zones, year zero, etc)
 * intervals (open vs closed, end value vs size)
 * lists and sets (significant vs insignifcant order)
 * maps (case sensitivity of keys)
 * language codes (IANA vs WMF)
 * very small or very large numbers (float precision)
 * text (character sets, unicode normal form)

MUST not make authenticated requests on behalf of others
Client software that makes authenticated requests MUST authenticate as the user who is actually controlling the activity. It is not acceptable to use the software maintainer's credentials when making API calls on behalf of others.

In the case of a web application, one suitable mechanism for achieving this is OAuth2.

MAY authenticate

 * authentication methods
 * csrf tokens
 * SHOULD use OAuth when acting on behalf of others
 * SHOULD auth when editing?

MAY request localization

 * content, errors