Core Platform Team/Initiative/Core REST API in Mediawiki/Design principles

These are some design principles we’re using for the MediaWiki REST API. They are not algorithmic, and some may conflict.

= Conceptual =


 * The developer comes first. The more we can make developers successful, the more they will use our API and empower their users. Client developers are closer to their users and can determine their needs better than we can.
 * Wikimedia Engineering Architecture Principles. We comply with the architecture principles, especially those related to APIs.
 * Middle of the pack. We want to have a familiar API experience that developers are used to and feel comfortable with. Design principles that are widely used by commercial and Open Source APIs should be used in our API. We should break no new ground, nor should we be way behind.
 * Guessable interface. If you know that a revision is at /revision/ and a page is available at /page/ you should be able to guess that a user will be at /user/.
 * The API should be specific for MediaWiki. API concepts and data types should be recognizable to users from the Web and mobile interfaces; for example, pages, users, and revisions. If we generalize past MediaWiki concepts to “digital assets” or “entities” or “agents” we have gone too far, and need to bring things back down to the specific.
 * You aren’t going to need it. We keep things simple and do the base case first. If we get client requests for more obscure variants, we work on those later.
 * Easy things should be easy, and hard things should be possible. We should design to make the mainline case obvious, and extra functionality possible.
 * Leave space for change. If we’re lucky, we will have a lot of people use this API. We will almost definitely change our ideas based on that usage. We should leave space in the design to allow those changes to happen. Versioning strategy should reflect this.
 * "A foolish consistency is the hobgoblin of little minds." (Ralph Waldo Emerson) Consistency is valuable when it helps our client developers guess at how the API works. Consistency is not an end in itself. We won’t force unrelated interfaces or data structures to be identical if it makes them harder to use.

= Schedule =


 * Get the minimum done first. We’re focusing on CRUD + Search + a little more in our first epic.
 * Reader stories before Contributor stories before Curator before Administrator. In terms of the personas in our user stories, we try to get the functionality for the broadest category of user done first, and more narrow categories later. Developers looking at our API will be looking for functionality that they see in the Wikipedia Web interface. If the basics are not there, it will seem like the API is incomplete, and they'll skip over it. We need to get the basic stuff covered early, and then we can elaborate to more specific activity. Especially if we are going to grow this API over a year or more (which seems likely), having the base functionality done first is going to get us the most testing from developers.
 * Functionality for new internal client development jumps the queue. Our client developers in Product need new APIs all the time. We don’t want them to have to duplicate endpoints or functionality that we’re already planning for the MediaWiki REST API. To keep them on their schedules, if we can, we will make new epics to support their work, even if it was originally scheduled for later.

= Identifiers =


 * / / for URLs.
 * / // for larger, more expensive, less-frequently-used properties, or properties with different access rights. For example, a page history, a page edit count, or user settings.
 * / // / if a sub-property’s identifier is not global. If the identifier is global, use a top-level type and ID URL. This format would be good for slots of a page, for example: /page/France/slot/main.
 * Use identifiers that developers have easy access to. For example, page title or user name.
 * One unique identifier should be enough to get the data you want. If the client developer has a unique identifier available, they shouldn’t have to put together a lot of other data to form a URL. So, if they have a revision ID 12345, they should be able to GET /revision/12345, instead of digging around somewhere for a page title to compose /page/France/revision/12345.

= Operations =


 * CRUD using HTTP methods.


 * Support PUT to create if possible. In cases where the ID of a data type is not auto-generated, we should also support PUT to create.
 * Non-CRUD operations use POST / // . For example, to protect a page, POST /page/ /protect.

= Data types =


 * (Almost) always use JSON for output. If there are formats like HTML or wikitext in output, wrap them in a JSON wrapper.
 * Support streaming HTML. For large documents, it makes sense to provide streaming HTML endpoints so that browsers or native HTML widgets can provide faster output to the user. Some developers may not be getting the page content for online use to display to a user, so we should also provide simple JSON-wrapped alternatives.
 * Use Parsoid-style reversible HTML. We should output only Parsoid marked-up HTML.
 * Use JSON for input. POST and PUT bodies should be JSON.
 * Metadata should be in JSON properties. Not HTTP headers.
 * The result is the thing. As much as possible, we’re trying to map an URL to a thing. A revision is at /revision/ and you can write client code like this pseudocode:""
 * Object composition. For properties in a result that are related, cluster them into a sub-object. For example, the user ID and name of the author of a revision.
 * Pre-chew the food. As an extension of the spirit of the API/FORMAT requirement, properties should not require additional processing on the client side to be useful. So, use an array rather than a string with comma-separated values.
 * Results should be objects. Numbers, strings, and arrays should be wrapped as JSON objects, per the OWASP recommendation for older browsers because of this hack. (Remember, strings should be pre-chewed.)
 * Re-use schemas. The same property names and types should be used in different endpoints. For example, the properties of a revision item in a page history endpoint should be a subset of the properties in the revision endpoint.
 * Empty properties should have the value "null". If a property has no value, we should include the property name in the JSON output, with the value.
 * Re-use segmented result set structure. A segmented (“paged”) result set should have URLs for fetching the next and previous segment, as well as the contents of the current segment.
 * Keep it light. We should include just enough data in a result to make it useful for “most” developers. But at WMF scale, even a few extra bytes can add up over time. Shift heavy or costly properties to their own URL. Even if an URL maps conceptually to a database row or MediaWiki internal class, we don’t need to dump the entire contents of that row or object out through the API.
 * Balance number of requests vs. size of the output. It takes a light touch. Neither principle is absolute.
 * Provide URLs. Where possible, provide the URL for a referenced object. For example, give the (API) URL for a page when referring to it from another page.
 * Snake-case for multi-word properties.   not

= Headers =


 * Headers are for programs, not programmers. If metadata is important for developers to include in code, it should be in the JSON structures, not in HTTP headers. HTTP headers should be useful at a lower level, like for HTTP client libraries or browser implementations.
 * Support client-side caching soon but not immediately. Client-side caching headers are a good first optimization. Including them in the first version of our endpoints is probably premature optimization.

= Extensions =


 * Let extensions expose extension functionality. We shouldn’t include extension functionality in core API code. The extensions should be able to expose those API endpoints themselves.
 * Set a good example. Our core code should set a good example for how extension API endpoints should work.
 * Extensions are on a different schedule. We should allow different versioning and release schedules for extensions.

= Errors =


 * HTTP codes for errors. We should use HTTP status codes in the 4xx or 5xx block for errors.
 * Empty results are not errors. An empty list or map should not have a 404 status code.
 * Error HTTP bodies are JSON. RFC 7807 has a good format for error bodies.
 * Error messages should be machine-readable and human-readable. RFC 7807 is has an error details property that is good for human-readable content.
 * Error messages should be localized for the end user.