User:DKinzler (WMF)/Client Software Guidelines

This document provides guidance for client software that is accessing Wikimedia APIs, such as bots and gadgets.

Terminology
Throughout this guideline, we distinguish between maintainers, operators, and users of client side software. Depending on the kind of software, the these may be the same person, or three very different groups of people:


 * A maintainer of the software is someone who has control over the program code. They can make changes to the software itself.
 * An operator of the software does not modify the software, but determins when, where and how the software will be used. Of course, the operator may also at the same time by the maintainer, but that doesn't have to be the case: A develoepr who uses a component developed by someone else to create a web site would be operating that component, but they would not be maintaining it.
 * The users control a specific execution of the software, but are not in control of the source code or configuration, and may not even be aware that they are using the software. For example, when a user visits a web site, their browser may load the client software that makes a call to the mediawiki API on their behalf, without them knowing or caring about it. On the other hand, a user who runs a wiki bot from the command line on toolforge would also be the operator of the software at the same time (and, if they wrote the bot, also the maintainer).

Best Practices
The best practices layed out here are designed to reduce the workload of the maintainers of client side software and service side software alike. Following these practices my require some additional effort initially, but will avoid nasty surprises and unplanned work down the road. It also makes life easier for the people wor work on the server side. Client software that follows best pracices is less likely to break unexpectedly, and less likely to be blocked by comunity admins or Wikimedia staff.

Stay Informed
Maintainers and operators of client side code should follow announcements on the mediawiki-api-announce mailing list. Additional lists that may be useful to stay informed about ongoing developments and upcoming changes include mediawiki-api and wikitech-l.

Be Accountable
When problems arise, it's always best to talk about them before taking action. While the Wikimedia Foundation reserves the right [TBD: link to terms of service] to block any incoming requests in order to protect is operational integrity, Wikimedia will try to reach out to the operator of problematic clients in order to resolve problems, ideally before requests need to be blocked. Of course, this requires a way to contact the operator. To ensure this is possible, clients should adher to the following practices:

Avoid Incompatibility
One common reason for unexpected breakage in client software is the use of APIs that were not intended to be stable interfaces in the first place. To avoid this, client software should follow the following practices:

Stable APIs may change or be derpecated and removed. In this case, the server will try to warn any callers of deprecated functionality about the upcoming change. Such warnings do not prevent the request from succeeding, and they may not be relevant to the ultimate user of the software. They are directed at maintainers and operators, to inform them about a problem that may cause similar requests to fail in the future.

Maintainers of client side software should make sure that they are aware of such warnings by applying the following practices:

Reduce Resource Usage
One common reason why the Wikimedia Foundation may block client software that is operated in good faith is that it is consuming an unreasonable amount of resources, typically by sending too many requests. In that case, the server will often try to instruct the client to slow down. To avoid putting undue stress on the servers, clients should apply the following practice:

One typical cause of clients unintentionally sending a large number of requests at a high rate is badly implemented retry logic. If you want to implement automatic retries, please apply the following practices:

Keep the User Informed
To avoid unexpected problems, it is important to properly process any errors and warnings received in responses from the server. In general, errors and warnings should be surfaced to the people affected by it, and to the people who can remedy it. To this end, client code should apply the following practices:

Further Guidance

 * TBD: REST and HATEOAS.

Introduction
Norms for maintainers See also:
 * How to find APIs
 * Where to find documentation and specs
 * common data types
 * error formats
 * paging
 * Scope and applicability
 * Who is the target audience
 * What happens if I'm not following the guidelines.
 * breakage
 * blocks
 * MUST subscribt to mailing list (api-announce?)
 * MAY make code available for code search


 * REST and HATEOAS.

Cross-Site Requests
TBD, see API:Cross-site_requests

Implement correct semantics of data types
Client software must take care to interpret data types and structures in the way documented in their specification [TBD: reference inventory of standard data types].

This is particularly important for data types prone to subtle misinterpretation, such as:


 * date/time (time zones, year zero, etc)
 * intervals (open vs closed, end value vs size)
 * lists and sets (significant vs insignifcant order)
 * maps (case sensitivity of keys)
 * language codes (IANA vs WMF)
 * very small or very large numbers (float precision)
 * text (character sets, unicode normal form)

proper parameter encoding
Clients must use the UTF-8 encoding when making HTTP requests. Similarly, the response body must be interpreted as UTF-8, unless the Content-Type header specifies a different characters set.

HTTP headers in the request and responses are ASCII only, see RFC 7230.

All parts of the request URL use UTF-8, with percent-encoding used to escape special characters when they occurr in parameter values [TBD: ref path parameters, query parameters].

The following characters MUST be escaped when they occurr in path parameters, since they hold special meaning per the URL spec:


 * the question mark (?) as %3F
 * the percent sign (%) as %25
 * the hash sign (#) as %23

In addition, the following characters must also be escaped, because they may carry meaning the the context of REST endpoint routing:


 * the slash sign (/) as %2F
 * the pipe characters (|) as %7C (MAYBE, for custom verbs?) [TBD]
 * the ampersand characters (&) as %26 (MAYBE, for consisteny?) [TBD]

The following characters SHOULD NOT be escaped:


 * the colon [TBD]

EXAMPLE: [TBD: correct way to do this in MW, php, python, JS]

NOTE: the set of characters that must be escaped in query parameter values and secment ID is slightly different! [TBD]
 * query parameters
 * path parameters...

minimize the number of requests
Client code should be designed to minimize the number of requests it sends to the server. The simplest way to achieve this is to avoid requesting information that is not actually needed. Beyond that, some APIs may support features that allow the number of requests to be reduced, such as:


 * Batch requests [TBD: reference the corresponding API design guide].
 * Property expansion [TBD: reference the corresponding API design guide].
 * Use streaming[TBD] instead of polling when possible.

Another way to reduce the number of requests is to avoid unneccessary redirects by ensuring that the request URL is normalized as much as possible. In particular:

When available, clients should also make use of HTTP features that may reduce the number of requests, such as keep-alive and pipelining.
 * Do not include trailing slashes or double slashes in the URL
 * Use the canonical form of resource identifiers in the URL

NOTE: Strategies to reduce the number of requests can be add odds with the goal of minimizing the amount of data transferred. The goal should be to keep both at a minimum, rather than to optimize one at the expense of the other. See.

NOTE: Clients that cause an excessive number of requests may be rate limited or even blocked completely.

minimize the amount of data transferred
Client code should be designed to minimize the amount of data requested from the server. The simplest way to achieve this is to avoid requesting information that is not actually needed. Beyond that, some APIs may support features that allow the amount of data to be reduced, such as: Furthermore, some features should be avoided in order to reduce the amount of traffic: Clients should also make use of HTTP features that may reduce the amount of traffic, such as transparent compression.
 * filtering of collections [TBD: reference the respective API design guide]
 * filtering of fields [TBD: reference the respective API design guide]
 * Property expansion, when not needed [TBD: reference the respective API design guide]

NOTE: Strategies to reduce the amount of data can be add odds with the goal of minimizing the number of requests. The goal should be to keep both at a minimum, rather than to optimize one at the expense of the other. See.

NOTE: Clients that cause an excessive ammount of traffic may be rate limited or even blocked completely.

use on officially supported client library
TODO: Adjust MediaWiki's action API client to set X-Api-User-Agent (optionally include Gadget!) and follow retry and error handling rules. It currently doesn't implement any of this. TBD: Ticket.

TODO: Adjust the action API client in the node service template to set a User-Agent (optionally include Gadget!) and follow retry and error handling rules. It currently doesn't implement any of this. TBD: Ticket.

TODO: Adjust pywikibot's action API client to set a good User-Agent (maybe including the user name?) and follow retry and error handling rules. Retry defaults to 5 seconds min with exponential back-off, 15 retries or 120 sec max (which is retries). Retry-after is honored. TBD: Ticket.

TBD...

SHOULD follow redirects

 * ...unless...
 * use correct semantics for 301, 302, 303, and 308, etc

MUST comply with robots.txt when scraping [??]
[do we need this here? robots.txt isn't really about APIs...]

SHOULD support compression
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Encoding

https://developer.mozilla.org/en-US/docs/Web/HTTP/Compression#end-to-end_compression

MUST follow HTTP standards
Clients that interact with you APIs must follow the relevant HTTP standards, most importantly RFC 9110. This can for the most part be achieved by using a good HTTP client library.

More resources:


 * https://developer.mozilla.org/en-US/docs/Web/HTTP/Resources_and_specifications

SHOULD be designed to be robust against changes and failures
Clients could should follow the Robustness Principle: "be conservative in what you do, be liberal in what you accept from others". In practice, this means that failures of the network and of the server should be handled gracefully, and assumptions about the behavior of the server should be kept to a minimum.

See also:

[TBD: reference doc about what changes to the response body structure are considerd non-breaking (adding fields, etc - check wikidata stable interface policy)]

MAY authenticate

 * authentication methods
 * csrf tokens
 * SHOULD use OAuth when acting on behalf of others
 * SHOULD auth when editing?

MAY request localization

 * content, errors

MUST NOT rely on undocumented or deprecated behavior
????