User:DKinzler (WMF)/API URL Guidelines

From mediawiki.org
This is a personal brain dump and not aligned with anyone

This document provides guidelines for the URLs under which APIs are exposed on Wikimedia sites. It is intended primarily for the use with REST APIs, but is designed to be applicable to other types of APIs as well.

Note that this guideline does not mandate any change to or replacement of the MediaWiki action API. See the bottom opf this page for an exmplanation of how these guidelines relate to Existing APIs.

Changes over Previous Practice[edit]

Formalization the idea of routing prefixes (base URLs). In the past, API base URLs were dictated by the entry point (e.g. /w/rest.php for MediaWiki /api/rest_v1 for RESTbase). Per this policy, a routing prefix comes with operational guarantees and constraints. It will often correspond to an API gateway. By looking at the routing prefix, it should be obvious whether the APIs under it are intended for public use or not.
We nede a good name for this... "routing prefix" is too technical. "API root" is oncorrect, it's not the root of an API (rather, it contains several component APIs, each of which has a root). It's not necessarily gateway URL, though it's nice to think of it that way. We could call it an "entry point", but we really want to stop exposing the actual entry points, and hide the implementation details... -- DKinzler (WMF) (talk) 20:47, 12 July 2023 (UTC)
Existing examples:

Possible future examples:

Formalize the idea of component APIs. Previsouly, API endpoints may have been grouped informally by prefix, often identifying a resource type or implementation module. For instance, the /api/rest_v1/page/... provide access to wiki pages and are implemented by the content module in RESTbase. Per this policy, component APIs are defined to be modeling conceptual domains and are expected to align with team boundaries. Versioning is shifted from the entry point to the level of individual components, to provide teams with control over the evolution of the APIs they own.
  • instead of /api/rest_v1/page/..., the page endpoints could be moved into a new content component using the path /api/content.v2/page/....

Guideline for Exposing New APIs[edit]

Teams that want to expose a new component API should use the following steps to determin the API's URL:
First, determine the kind of prefix to use. If the API operates on the content of individual wikis, it should use a per-wiki routing prefix, like https://{wiki}/api. If the API operates across wikis, it should use a cross-wiki routing prefix, like https://api.wikimedia.org/. Only if the API has special operational needs or restrictions, and the team that maintains the API will maintain infrastructure desigend to meet these needs, the creation of a dedicated routing prefix may be considered.
We should mention the standard routes for non-public APIs as well, but we haven't agreed on them yet. I propse something like https://{wiki}/internal-api and https://internal-api.wikimedia.org. Using "internal" here in the same way we use it in PHP for things that are technically accessible, but shouldn't be accessed and are not stable. -- DKinzler (WMF) (talk) 21:50, 4 July 2023 (UTC)
Second, decide whether the API should be public for use by the community. Public APIs empower the community to build tools, and allow third parties access to our content. However, they come with the burden of upholding interface stability. If the API is intended ony for use by clients controlled by Wikimedia and its affilliates, it should be exposed from an internal routing prefix that makes it obvious that it is not safe to use by others.
Third, determine the component name of the new API. The component name should be descriptive of the domain it models and it must be unique across all routing prefixes. Components should be defined in a way that allows a single team to maintain all API endpoints that belong to the component, and for all the endpoints to be updated together. Components are units of ownership, deployment and versioning.
Forth, attach a version number or stability indicator to the component name, to allow the new API to evolve without breaking clients.
The (fictitious) editor engagement team wants to create an API endpoint that models barnstars. The team expects to be creation more endpoints around user badges in the future.

Since the API is going to be public for use by gadgets, and it interacts with wiki pages, it will be using the per-wiki https://{wiki}/api routing prefix.

The component will be called user-badges, because that is the domain/scope that the team is working on at the moment. The new endpoint will not be added to the (supposedly) existing user and page components, since these components are owned by different teams. Also, while there is conceptual overlap, the user-badges has a somewhat different perspective on users and user pages.

So, once it has reached maturity, the new endpoint will be exposed as https://{wiki}/api/badges.v1/barnstars/{user}, as part of the "badges API".

The (fictitious) GLAM support team wants to create an API that reports on the re-use of media uploaded by specific organisations. The team expects high volume usage of the some of the API's endpoints and plans to maintain a dedicated cluster for handling the demans.

Since the API is going to integrate infromation across all wikis, it will be using the cross-wiki routing prefix.

The component will be called 'glam-monitor', which is the name of the product that is going to be provided to GLAM partners. Since the API has special operational demands and the team plans to operate the infrastructure that handles it, the routing prefix (i.e.e the domain name) will be dedicated to the component.

So, once it has reached maturity, the new API will use the URL https://glam-monitor-api.wikimedia.org/v1/, with https://glam-monitor-api.wikimedia.org/ being the routing prefix.

Norms for API URLs[edit]

The URLs of API endpoints accessible from outside the Wikimedia network SHOULD be constructed a routing prefix", a component name" including a version number, and the endpoint' path' typically followed by a resource identifyer.

Routing Prefixes[edit]

Wikimedia APIs are exposed from different routing prefixes. Each routing prefix exposes one or more component APIs.
Routing prefixes group together APIs based on operational properties. Each routing prefix comes with its own set of guarantees and restrictions, such as SLAs or authentication requirements. However, routing prefixes SHOULD not expose implementation details or concrete technologies.
Routing prefixes are typically used to route requests to a specific gateway or cluster, but it is entirely possible to route multiple prefixes to the same gateway and cluster, or to implement routing rules based on other criteria, such as the component name.
Requesting content from a routing prefix' base URL should return documentation about the APIs under that prefix, and a description of any operational restrictiosn and requirements that apply to them.
routing-prefix-def There are two kinds of routing prefixes, per-wiki and cross-wiki:
  • per-wiki APIs operate on content of a specific wiki site. They use URLs of the form https://{wiki-domain}/{prefix}/{component}/{endpoint}. The routing prefix is part of the path, so the same routing prefix is available on each of the different wiki domains.
  • cross-wiki APIs operate across wikis. They use URLs of the form https://{prefix}/{component}/{endpoint}. The routing prefix is at least a domain name, but may also include a path prefix.
Most public APIs SHOULD be exposed through the generic routing prefix:
The parsoid API uses the generic per-wiki routing prefix, /api/:

https://en.wikipedia.org/api/rest_v1/page/html/Earth

APIs that have special oeprational requirement or guarantees SHOULd use a routing prefix that correspondes to these requirement or guarantees.
dedicated-prefix Some routing prefixes may be dedicated to a specific component API. In that case, the component name is synonymous with the routing prefix. In the URL, the component name is reduced to a version number:
  • Dedicated cross-wiki routing prefix: https://{prefix}/{version}/{endpoint}
  • Dedicated per-wiki routing prefix: https://{wiki-domain}/{prefix}/{version}/{endpoint}
Dedicated routing prefixes are typically used when the team that ownes the component API also operates the infrastructure for running and accessing the services that implement that API.

The Content Translation API uses a dedicated routing prefix (and the cross-wiki URL pattern):
https://cxserver.wikimedia.org/v2/page/{sourcelanguage}/{targetlanguage}/{title}.

The routing prefix as well as the component is cxserver, the version is v2. The page part ofthe URL identifies the endpoint and is followed by the resource path, {sourcelanguage}/{targetlanguage}/{title}.

internal-prefix Some routing prefixes may be reserved for use by client code that is controlled by Wikimedia ans its affilliates. Such routing prefixes are referred to as internal, and they are not safe to use for third parties. Internal routing prefixes should have the string "internal" in their name, to make it immediately obvious which APIs are safe to use for thrid parties, and which are not.
The terms "internal" and "private" are used here in the way they are also used in languages like PHP and Java: "private" is technically inaccessible, while "internal" is restricted by convention and annotation. The stable interface policy for PHP also uses the terms this way.

However, we have historically been using the term "internal" for both, which causes confusion.

Calling such APIs "non-public" would be consfuing as well, since they are accessible from outside the WMF network, though they are not intended for public use. The term "unstable" is generally used for interfaces that are not yet stable, but intended to become so....

We could also use the term "restricted". Or perhaps "bespoke", since such APIs tend to be bound to a specific client.
private-services Internal APIs should not be confused with the APIs of private services which are accessible only from within the WMF network.
Non-public APIs SHOULD be exposed through the default internal called internal-api, unless there is a specific reason to expose it through additional or different prefixes.
This is an ad-hoc proposal, needs alignment -- DKinzler (WMF) (talk) 20:05, 6 July 2023 (UTC)
Whether an API is intended for use for a third parties should be obvious from looking at the URL, without needing to consult documentation.
The same component API MAY be exposed through multiple prefixes. Operational guarantees and restrictions may vary depending on which prefixesis used to access the component.
The same component API MAY be considered public for third party used on one prefixes, but restricted to use by Wikimedia and affilliates on another prefixes.
Modules exposed under the generic per-wiki routing prefix (/api/) are typically portable, meaning that they are defined by a component that can easily be installed and run by a third party. Other APIs are typically WMF-specific.
There should rarely be a need to introduce a new routing prefix.
Routing prefixes SHOULD contain the word "api", to avoid collisions with component names as well as paths and subdomains used for other purposes. See also Constraints on Names below.
Routing prefixes MUST be disjunct in the sense that it must not be possible to construct another existing routing prefix by appending a path element preceeded by a slash ("/"). This restriction allows for simpler rules for routing requests and avoids naming conflicts.
Do we need to make an exception for https://api.wikimedia.org/api/. Or say that they are indeed in conflict, but we just live with it for now. -- DKinzler (WMF) (talk)

Component APIs[edit]

The {component} part of the URL represents a logical grouping of endpoints which together consitute the API of a functional component.
Components can be understood as conceptual domain models. However, it is more important that they reflect organizational reality: component boundaries should align with team boundaries to ensure cohesion and consistency, providing teams with complete control over the endpoints they maintain.
If two teams are running endpoints that are conceptually similar, they SHOULD still use different component. This may even mean using the team name as the component prefix.
If we end up having many ugly sounding, non-obvious component names, this indicates that something is wrong with the team structure. In that case, both the teams and the APIs should be re-structured, using the mechanisms described in the sections on Versioning and Deprecation.
Component APIs act as units of versioning and units of deployment: all endpoints under a component prefix SHOULD always updated together, to ensure consistency between them. Because of this, all endpoints under a component prefix SHOULD be maintained by the same team. This provides teams with autonomy over the lifecycle of the APIs they own.
component-version To allow component APIs to evolve without causing disruption to clients, the component name SHOULD contain a version number or stability indicator (see Versioning). The preferred way to include this information is by deviding the name into two parts separated by a dot ("."). The first part is the component's base name, which identifies the conceptual domain model. The second part is the version: '''{base-name}.{version}'''. Using other separators like slash or dash or underscore is acceptable but discouraged. Examples include {name}core/v1{version} and rest_v1. See also Constraints on Names below.
Using "." between name and version is pretty, but not the status quo. Right now we use "/" mostly. And we have "_" in rest_v1. -- DKinzler (WMF) (talk) 11:26, 4 July 2023 (UTC)
The version number is part of the component name. From that follows that there cannot be two versions of the same component. An old component API may be deprecated in favor of a new component API that supersede it. And the name of the new component APIs may differ only in the version number from the name of the old one. But they are still separate and independent APIs.
Under a dedicated prefix, the component name would be implicitly the same as the routing prefix. The component part of the URL is reduced to the version number.
The names of components MUST NOT end in "-api". This avoids collisions with routing prefixes. See also Constraints on Names below.
Or perhaps we want to follow the pattern established by Enterprise, i.e. api.{name}.wikimedia.org? It's not as nice when reading out aloud, but matches the way subdomains are typically used. -- DKinzler (WMF) (talk) 12:43, 11 July 2023 (UTC)
Component names MUST be unique across all routing prefixes, and the same component may be exposed through multiple prefiyes. The same name MUST always refer to the same component.
Component names MUST be disjunct in the sense that it must not be possible to construct the name of another component by appending a path element separated by a slash ("/"). This SHOULD be true across all routing prefixes. This restriction allows for simpler rules for routing requests and avoids naming conflicts.
The following component names are in conflict (not disjunct):
  • /foo/bar/ with /foo/

The following components are not in conflict with each other:

  • /foo/ with /bar/foo/
  • /foo/bar/ with /foo/bla/
  • /foo-bar/ with /foo/
The component name may be used in the service infrastructure to facilitate routing to a backing service. Typically, all requests to endpoints under a have a given component prefix will be routed to the same service.

Endpoint Paths[edit]

Component APIs MAY contain endpoints using different paradigms, such as REST and RPC.
For REST APIs, the pattern for resource URLs should be {prefix}/{component.version}/{endpoint}/{resource}. See the REST Resource Guidelines for details.
Some component APIs may by design define only a single endpoint. In this case, the URL of the endpoint may simply be the URL of the component API.

Constraints on Names[edit]

Routing prefixes and component names MUST match the following ABNF rule, per RFC 2234:

namechar = ALPHA / DIGIT
separator = "/" / "-" / "." / "_"
name = 1*namechar *( separator namechar )

Do we need an actual ABNF, or can this be defined less formally? -- DKinzler (WMF) (talk) 12:24, 11 July 2023 (UTC)
The names of components or routing prefixes MUST NOT be one of the reserved names, and the name of components or routing prefixes MUST NOT start with a reserved name separated by a non-alphanumeric character such as slash ("/") or dot (".") or hyphen ("-").
The following names are reserved:
  • w: used as MediaWiki's script path.
  • wiki: used as MediaWiki's article path.
  • test: reserved for testing. [TBD]
  • debug: reserved for testing. [TBD]
  • status: reserved for operational use. [TBD]
  • wmf: generic prefx/suffix for restricted (private) APIs [TBD]
  • api: generic prefx/suffix for public APIs
Furthermore, all language codes in the IANA Language Subtag Registry are reserved and must not coinside with routing prefiyes or component names. This is to avoid problems for wikis that use variant paths.
For instance, the Serbian Wikipedia uses the following variant paths: So, a routing prefix like "sr-el" or "sr-ec" would create a conflict.

Existing APIs[edit]

This section explains how this guideline relates to pre-existing API.

RESTbase REST API (rest_v1)[edit]

The RESTbase exposes endpoints under the prefix https://{wiki}/api/rest_v1/, where rest_v1 acts as a component name shared by all endpoints exposed by RESTbase.
The parsoid API is an example of a RESTbase style API:

https://en.wikipedia.org/api/rest_v1/page/html/Earth

Here, the component is page and the resource path is html/Earth.

While RESTbase itself is deprecated and will be removed soon, the URLs of existing endpoints will continued to be supported for backwards compatibility reasons for some time.
The fact that all APIs in RESTbase are exposed under a shared componet name means they share a version number. This has been making it difficult to update endpoints, since it makes it effectively impossible to change the version number. A structure like /api/content.v1/page/html/Earth would allow for more autonomy, since the endpoints under the content component could be evolved independently of other endpoints.

MediaWiki REST API (rest.php)[edit]

The routing prefix of REST APIs exposed by MediaWiki is https://{wiki}/w/rest.php/. Most endpoints are currently exposed under https://{wiki}/w/rest.php/v1/, where the /v1/ acts as a component name.
The routing prefix exposes an implementation detail, namely the name of the PHP file that acts as an entry point for API requests. This makes it hard to re-implement components using a different stack, like node.js.
The lack of a component name makes /w/rest.php/ behave like a dedicated prefix. This has been making it difficult to update APIs, since it makes it effectively impossible to change the version number. A structure like /api/content.v1/page/html/Earth would allow for more flexibility, since endpoints under the content component could be evolved independently of other endpoints.

MediaWiki Action API (api.php)[edit]

The routing prefix of the MediaWiki Action API is https://{wiki}/w/api.php. It is an RPC style API. Individual procedures are addressed using the action parameter.
The routing prefix exposes an implementation detail, namely the name of the PHP file that acts as an entry point for API requests. This makes it hard to re-implement components using a different stack, like node.js.
The lack of a versioned component (or action) name in the path makes it difficult to evolve API modules. A structure like https://{wiki}/action-api/query.v2, where query.v2 is the component (or action), would allow for more flexibility.

Wikimedia API Gateway (api.wikimedia.org)[edit]

The routing prefix of the Wikimedia API Gateway is https://api.wikimedia.org/. It serves as the default routing prefix for public cross-wiki APIs.
This is a URL for a resource in the Liftwing] API:

https://api.wikimedia.org/service/lw/inference/v1/models/enwiki-articlequality:predict

This uses the default cross-wiki routing prefix, https://api.wikimedia.org/, followed by the component name and version service/lw/inference/v1, followed by an endpoint and resource path.

A component name that contains slashes like this may be taken to imply a hierarchy of sub-components. This kind of structure us not recommended, since it makes it harder to immediately identify the component and owner by looking at the URL.

Non-public cross-wiki APIs SHOULD be exposed through https://internal-api.wikimedia.org/, unless there is a specific reason to expose it through additional or different prefixes.
This is an ad-hoc proposal, needs alignment -- DKinzler (WMF) (talk) 20:05, 6 July 2023 (UTC)

The API Portal Wiki APIs[edit]

The per-wiki APIs exposed by the API Portal wiki are a somewhat strange case, as they reside under the https://api.wikimedia.org/ prefix, which is the routing prefix for cross-domain wiki APIs, but also the domain of the portal wiki. So the pathes under the reserved prefixes /w/, /wiki/, and /api/ are not considered to be part of the https://api.wikimedia.org/ prefix. Instead, https://api.wikimedia.org/api/ is the default routing prefix for per-wiki APIs of the portal wiki.
This violates the "routing prefixes must be disjunct" requirement. Maybe we can make it a notable exception, and promise to never do it again? -- DKinzler (WMF) (talk) 20:36, 6 July 2023 (UTC)
https://api.wikimedia.org/core/v1/wikipedia/en/page/ is an endpoint in the cross-wiki core API.

https://api.wikimedia.org/api/rest_v1/page/html is an endpoint in the per-wiki API for the wiki at https://api.wikimedia.org/. It has nothing to do with the base URL of the default routing prefix, https://api.wikimedia.org/. Rather, its base URL is https://api.wikimedia.org/api/, using the per-wiki routing prefix /api/.

Wikimedia Enterprise API (api.enterprise.wikimedia.org)[edit]

An example for a dedicated routing prefix (domain name) is the Wikimedia Enterprise gateway, as used in https://api.enterprise.wikimedia.com/v2/snapshots. The routing prefix is api.enterprise. This does not quite match the recommendation of the URL Guidelines: the name should be enterprise-api.

If we were to introduce per-wiki Wikimedia Enterprise endpoints, these could then be located under a dedicated routing prefix, such as /enterprise-api/:

https://en.wikipedia.org/enterprise-api/v3/snapshots/...

Content Translation API (cxserver.wikimedia.org)[edit]

The Content Translation API uses a dedicated routing prefix (domain name) to expose a cross-wiki API:

https://cxserver.wikimedia.org/v2/page/{sourcelanguage}/{targetlanguage}/{title}

This API does not quite match the recommendations of the API URL Guidelines: The prefix does not end in "-api".

Attic[edit]

The component's routing prefix (with the {path} part empty) SHOULD return documentation about the API it exposes.
Out of scope? -- DKinzler (WMF) (talk) 20:29, 10 July 2023 (UTC)
MediaWiki extensions and stand-alone backend services SHOULD each have a separate prefix. They may share a prefix with another service or extension only if both are maintained by the same team.
Out of scope? -- DKinzler (WMF) (talk) 20:29, 10 July 2023 (UTC)
Components should be derpecated as a whole. If a new version of a component is made available, it should contain all endpoints, including those that are unchanged from the previsous version.
Out of scope? -- DKinzler (WMF) (talk) 20:29, 10 July 2023 (UTC)

How Others Do Versioning and Deprecation[edit]

GitHub[edit]

Amazon[edit]

Google[edit]

Meta (Facebook)[edit]

Twitter[edit]

Atlassian (Jira)[edit]

Further Reading[edit]