Topic on API talk:REST API

Doc review: conditional requests

3
Summary by APaskulin (WMF)

Docs updated

APaskulin (WMF) (talkcontribs)

Hi @NNikkhoui (WMF), Here's the section about conditional requests that I'd like to add to this doc once the page endpoints are moved into v1. Please review! (PS: Looks like the JSON spacing and syntax highlighting are weird in this preview, so disregard that.)

Conditional requests

Most MediaWiki REST API endpoints support conditional GET requests using ETags. ETags, short for entity tags, are unique identifiers assigned to different versions of the same resource. You can use an ETag value with the If-None-Match header to optimize your API calls when accessing the same resource multiple times.

Here's an example of an ETag returned by the get HTML endpoint:

# Retrieve information about the Pinnation page on English Wikipedia
$ curl --include https://en.wikipedia.org/w/rest.php/v1/page/Pinnation/bare

HTTP 200
content-type: application/json
etag: W/"917562775"
{
"id": 339742,
"key": "Pinnation",
"title": "Pinnation",
"latest": {
"id": 917562775,
"timestamp": "2019-09-24T11:43:46Z"
},
"content_model": "wikitext",
"license": {
"url": "//creativecommons.org/licenses/by-sa/3.0/",
"title": "Creative Commons Attribution-Share Alike 3.0"
},
"html_url": "https://en.wikipedia.org/w/rest.php/v1/page/Pinnation/html"
}

A conditional request allows you to shortcut subsequent requests for the same resource by comparing the ETag value provided with the If-None-Match header. If the resource has not changed since the last request, the API returns HTTP status 304 (Not Modified). If the resource has changed, the API returns HTTP status 200 (OK) with the latest data and a new ETag.

# Conditional request using If-None-Match
curl --include \
--header 'If-None-Match: W/"917562775"' \
https://en.wikipedia.org/w/rest.php/v1/page/Pinnation/bare

# Response: resource unchanged
HTTP 304
content-type: application/json
etag: W/"917562775"

# Response: resource changed
HTTP 200
content-type: application/json
etag: W/"537558444"
{
"id": 339742,
"key": "Pinnation",
"title": "Pinnation",
"latest": {
"id": 537558444,
"timestamp": "2020-01-25T20:03:40Z"
},
"content_model": "wikitext",
"license": {
"url": "//creativecommons.org/licenses/by-sa/3.0/",
"title": "Creative Commons Attribution-Share Alike 3.0"
},
"html_url": "https://en.wikipedia.org/w/rest.php/v1/page/Pinnation/html"
}

Conditional requests with If-Modified-Since

When present, ETags and If-None-Match headers are the preferred method for conditional requests. However, to improve API efficiency in certain cases, some endpoints don't support ETags. To make a conditional request to an endpoint that doesn't return an ETag, use the last-modified value with the If-Modified-Since header.

Here's an example of an endpoint that returns only a last-modified value:

# Get the number of edits to the Pinnation page on English Wikipedia
$ curl --include https://en.wikipedia.org/w/rest.php/v1/page/Pinnation/history/counts/edits

HTTP 200
content-type: application/json
last-modified: Tue, 24 Sep 2019 11:43:46 GMT
{"count":115,"limit":false}

To make a conditional request, include the last-modified value with the If-Modified-Since header. If the resource has not changed since the last request, the API returns HTTP status 304 (Not Modified). If the resource has changed, the API returns HTTP status 200 (OK) with the latest data and a new last-modified timestamp.

# Conditional request using If-Modified-Since
curl --include \
--header 'If-Modified-Since: Tue, 24 Sep 2019 11:43:46 GMT' \
https://en.wikipedia.org/w/rest.php/v1/page/Pinnation/history/counts/edits

# Response: resource unchanged
HTTP 304
content-type: application/json
last-modified: Tue, 24 Sep 2019 11:43:46 GMT

# Response: resource changed
HTTP 200
content-type: application/json
last-modified: Mon, 11 Jan 2020 06:51:35 GMT
{"count":117,"limit":false}
NNikkhoui (WMF) (talkcontribs)

It looks great! A few possible things to consider mentioning.

  • You could mentioned "max-age" header. "max-age" is a response header set set by the API, and without it none of the other conditional request headers will kick in. Max-age tells the browser to cache the response for X number of seconds, and then after that the other headers you mentioned above will kick in. The default max-age right now is supposed to be 60 seconds (according to some comments on https://phabricator.wikimedia.org/T238374) but i think some endpoints like PageSourceHandler have it set to 5 sec. For that, maybe its worth noting here the default value and then on a per-endpoint basis you could mention "This endpoint is an exception with max-age of X sec"?
  • We return 412 HTTP response sometimes too, if there was something wrong with the precondition e.g. you send an "If-Unmodified-Since" header in a POST and the last modified time of the resource you are changing is greater than the "If-Unmodified-Since" timestamp.
  • The "If-Match" and "If-Unmodified-Since" headers
    • The ConditionalHeaderUtil class supports these, so should be worth noting i think
  • "However, to improve API efficiency in certain cases" - I think the wording on this one could be ever so slightly more specific to say what exactly is more efficient? You could say that Etag are not used when the level of effort in determining the Etag is seen as more costly than the benefit it provides. For example, In the PageHistoryCountHandler, we would need to hit the database to get visibility settings of the page the user is searching for and work that value into the Etag. Since that requires a database hit, and the payload that comes back is pretty small (we're not saving much on sending data across the wire) it didn't seem worth it to use Etags in that case.


APaskulin (WMF) (talkcontribs)

Thank you! I've published an updated version to API:REST API/Conditional requests. Edits welcome!


Regarding max-age and database efficiency considerations, I was thinking that it might be helpful to create a REST API implementers' guide that includes this type of information. That would help us separate docs for people who want to use the API as opposed to people who are interested in the inner workings of the API. What do you think?