Wikimedia Apps/Team/RESTBase services for apps

The Reading Infrastructure team is developing a Node.js mobile content service backed by RESTBase to provide content in a form tailored to the needs of the mobile platforms.


 * Sample URL: https://en.wikipedia.org/api/rest_v1/page/mobile-sections-lead/Albert%20Einstein
 * OpenAPI/Swagger documentation: Mobile section of RESTBase, only MCS (on wmflabs instance)
 * Phabricator project: #mobile_content_service, shortcut: #mcs

General ideas & goals
The idea for the services mentioned here is to provide a layer of abstraction on top of various MediaWiki action API and existing RESTBase requests, custom-made for consumption by apps. In other words, they provide a Façade which makes it easy for apps to consume content from Wikipedia. The initial main goal is to improve page load performance.

We want to achieve that through the following approaches:
 * Standard endpoint structure instead of dealing with many query parameters that can be arranged differently; less cache fragmentation.
 * Reduce amount of payload by removing unneeded content.
 * Reduce the need for separate requests by aggregating information from multiple request into fewer requests.
 * Flatten and trim JSON structures. (Again, remove unused data.)


 * Take advantage of Parsoid annotations to improve the quality of the transformations done.
 * Move DOM transformations of page content (currently done client-side) to the server.

Service usage
The service endpoints are used by the Android app. Android app users get to use them by default except for usages of zhwiki or when  is disabled in the app settings. In those two cases it falls back to using regular api.php endpoints, and some newer features which are only implemented for RESTBase users are automatically disabled. In the app developer settings you can check if RESTBase is enabled and change that if necessary.

The Wikipedia Android app uses this service to get an article's opening section, table of contents, description, lead image URL, and other article information in a single request, followed by another request for the remaining sections. Other endpoints are used to fetch content for link previews and term definitions. The Android app also uses the smart random and the aggregated feed endpoint to retrieve the data needed for the cards of the Explore feed which are not user specific.

The iOS app team is working on using the aggregated feed endpoint for the iOS app. The web team is considering using RB/MCS in the future.

Routes
Production routes start with.

Local dev box routes start with (for MCS directly) or  (for RB).

/?doc (Swagger UI)
You can access the Swagger UI at the /?doc route. Examples: Prod | Beta cluster | Labs | Local RB | Local MCS

.../page/mobile-sections-remaining/{title}
These three routes are used by the beta Android app when the  developer option is enabled (which is now enabled by default).

The output has a similar JSON structure to the PHP  module, except:
 * has a top-level object with two properties:  and  . This is an endpoint which gets the contents of the next two endpoints in one single request, which is useful for refreshing saved pages.
 * Examples: Prod | Beta cluster | Labs | Local RB | Local MCS
 * Is used for the initial page load.
 * Instead of the  object to get the URL for the lead image it has a   property under  . This object contains a hashtable of common lead image widths (640, 800, 1024) pointing to respective URL of the lead image in the size in pixel.
 * If the article has a pronunciation the the  object has a   string with the fully qualified URL to the pronunciation file.
 * If the article uses one of the  templates the the   object has a   array with the fully qualified URLs to the parts of a recorded audio version of this article.
 * If there are Geo coordinates associated with the article then the  object will have the   and   of the place.
 * The  array includes the information needed to display the lead section and also to build the table of contents. Therefore, it has the section text of the lead section only and the rest of the sections don't include it.
 * Examples: Prod | Beta cluster | Labs | Local RB | Local MCS
 * The  array includes the information needed to display the lead section and also to build the table of contents. Therefore, it has the section text of the lead section only and the rest of the sections don't include it.
 * Examples: Prod | Beta cluster | Labs | Local RB | Local MCS


 * Note that this route's  array does not include the lead section text since this was already retrieved as part of the lead response.
 * Examples: Prod | Beta cluster | Labs | Local RB | Local MCS
 * For debugging: The HTML content comes from Parsoid. To make it more convenient to debug transformations at a high level here are the respective examples from Parsoid:
 * Examples: Prod | Beta cluster | Labs | Local RB | Local MCS
 * For comparison, here are the equivalent  requests the Android app uses:
 * Examples: Lead | Remaining
 * Examples: Lead | Remaining

Page Content Service
See Page Content Service. Eventually these endpoints will be separated to a different deploy package.

.../page/definition/{title}
This route provides a set of definitions pulled from the Wiktionary page from the term. (It does not provide the Wiktionary content in full.)

Currently used in the Wikipedia Beta Android app, where users can view a popup with definitions by highlighting a word in the app and choosing the "define" option from the context menu.

Available for English Wiktionary only; rollout to other languages pending based on user engagement.

Note: this endpoint is not for Wikipedia sites, only for Wiktionary.

Examples: Prod | Beta cluster (MCS and Parsoid not setup for some reason) | Labs | Local RB | Local MCS (Wiktionary entry of bar)

.../page/random/{format}
MCS provides the  format. All other formats ( and  ) are provided by RESTBase. See T132597 (Agree on feed endpoints).

This endpoint tries to provide more interesting pages in its result than a straight random MW API query. It prefers pages with a lead image, WD description, and longer text extract.

Examples:
 * : Prod | Beta cluster | Labs | Local RB | Local MCS
 * : Prod | Beta cluster | Labs | Local RB | Local MCS
 * : Prod | Beta cluster | Labs | Local RB | Local MCS

.../feed/announcements
This endpoint is meant to provide information about surveys and fundraising announcements for the iOS and Android apps only. It is experimental to the extent that it might significantly change or even go away in the future, more likely than other experimental endpoints. Client should code very defensively about the structure and the existence of this endpoint. If a client gets 404s, an exponential backoff strategy may be advisable.

Examples: Prod | Beta cluster | Labs | Local RB | Local MCS

See the announcement config spec: Wikimedia_Apps/Team/RESTBase_services_for_apps/Feed_announcement_config_spec

.../feed/featured/{yyyy}/{mm}/{dd}
This endpoint provides an aggregation of feed related microservices for one specific day. Note that year has to be exactly four digits, and month and day have to be two digits. Pad with 0 if needed. Earliest year supported is 2016. Example: 2016/07/01.

The response contains the following properties: While the other feed microservices are implemented in MCS they are not exposed via RESTBase at this time. Some example URIs to just invoke the microservices locally is in the README.md of the source repo.
 * : featured article (WP languages supported: bg, cs, de, el, en, fa, fr, he, hu, ja, la, no, ur, vi)
 * : featured image of the day (from Wikimedia Commons)
 * : a list of the previous day's top read articles
 * : current news, irrespective of day requested. This item is only available for a few wikis right now: da, de, el, en, es, fi, fr, he, ko, no, pl, pt, ru, sv, vi. Latest list and implementation if you want to help us expand it to more languages.

Examples: Prod | Beta cluster | Labs | Local RB | Local MCS (Aggregated feed for Febuary 6th, 2017)

For debugging: Local MCS routes of microservices: tfa | image | mostread | news (news in MCS directly is always current, not easy to get to historic content, recent versions of the aggregated RESTBase endpoint try to preserve historical news as much as possible)

.../feed/onthisday/{type}/{mm}/{dd}
This endpoint provides information about what event which happened on a specific day and month of the year. Note that month and day have to be two digits. Pad with 0 if needed. Example: selected/07/01. Supported types of events and some examples:
 * all: all of the following. Examples: Prod | Beta cluster | Labs | Local RB | Local MCS
 * selected: a list of a few selected anniversaries which happen on the provided day and month; often the entries are curated for the current year. Examples: Prod | Beta cluster | Labs | Local RB | Local MCS
 * births: a list of birthdays which happened on the provided day and month. Examples: Prod | Beta cluster | Labs | Local RB | Local MCS
 * deaths: a list of deaths which happened on the provided day and month. Examples: Prod | Beta cluster | Labs | Local RB | Local MCS
 * holidays: a list of fixed holidays celebrated on the provided day and month. Examples: Prod | Beta cluster | Labs | Local RB | Local MCS
 * events: a list of significant events which happened on the provided day and month and which are not covered by the other types yet. Examples: Prod | Beta cluster | Labs | Local RB | Local MCS

Supported languages: ar, de, en, es, fr, pt, ru, sv. See see entries in code for latest.

Route usage
We have a RESTBase dashboard in Grafana which shows request rates for all individual endpoints. You can choose all the endpoints related to mobile on that graph to get the metrics of how many client requests actually hit RESTBase. The requests are split to several categories: However, for external requests this represent only the cache misses while the vast majority of the requests is served by Varnish.
 * internal - the request came from the WMF cluster or Labs
 * internal_update - it’s an update request from Change-Propagation
 * external - the request came from an external user.

There's also a Grafana dashboard specifically for the mobile-sections requests.

Source
The services are in the following Gerrit repos:
 * 1) source: (Gerrit) (diffusion) (GitHub mirror)
 * 2) deploy: (Gerrit) (diffusion) (GitHub mirror)

The second repo is for deployment purposes. The first repo contains the implementation of the service routes. The source repo is based on the ServiceTemplateNode project provided by the Services team.

The Swagger spec can be found in the source repo in the file. This spec must be updated when the output structure is changed since there are automated tests which verify that the output adheres to the spec.

Some settings for IntelliJ are documented for your convenience. You can also use other text editors or IDEs of course. Still, the discussion about run/debug configurations might be applicable for other IDEs, too.

Testing with the Android app
The Android app has a Developer settings screen which lets you change the backends used for both MW API calls and RESTBase/MCS calls. If you have a developer flavor of the Android apk then the Developer settings are already enabled. For other flavors you need to enable the Developer settings first. To do so go to,  , then tap seven times on the logo. Then you should get a Snackbar saying "You are now a developer!". Once enabled you can tap on the new icon in the top right of the toolbar of the Settings screen. For MCS development you'll want to change the  to utilize on the the following options described in the following sections. The dialog has example URIs in the hints which can be copy&pasted to the text input form field.
 * Prod: use the production RESTBase servers (the default)
 * Beta cluster settings: (There's no specific entry here but you can get it to work using the following settings:
 * RESTBaseUriFormat (default):
 * mediaWikiBaseUriSupportsLangCode (default): enabled
 * mediaWikiBaseUri (nondefault):
 * Labs: use the labs service
 * Dev: use a local developer machine which runs MCS and possibly RESTBase, too. Note that where it says host you want to replace that with the actual hostname or IP address of the host that's running your MCS services. Some of the features (link previews, aka. summary endpoint, and the aggregated feed) that are implemented directly in RESTBase don't work unless you are pointing to a local RESTBase installation. To use a local RESTBase installation change the port from the MCS port 6927 to the RB port 7231.

Setting up a local RESTBase instance
Since there is some interactions between RESTBase and MCS which make it desirable to also run RB locally sometimes (featured feed; hydration of summary data), here are some hints on how to configure RB to work with MCS.


 * 1) Clone RB
 * 2) In your new , change all occurrences of   to
 * 3) Under , change the host value for   from   to
 * 4) You may also want to consider adding the following at the top of the file to run RB in a debugger:
 * 1) Under , change the host value for   from   to
 * 2) You may also want to consider adding the following at the top of the file to run RB in a debugger:
 * 1) Start MCS (if it isn't already running)
 * 2) Start RESTBase with   or   in the restbase root directory
 * 3) RESTBase listens on port   by default.
 * 4) Test URI: http://localhost:7231/en.wikipedia.org/v1/page/mobile-sections/Cassini%E2%80%93Huygens

Setting up a local Parsoid instance
To test new Parsoid patches:


 * 1) Clone Parsoid repo
 * 2) In your new , you may want to edit the   section. For example to hook it up with a few production Wikipedias:  You may also want to consider adding the following at the top of the file to run Parsoid in a debugger:
 * 3) Start Parsoid with   in the Parsoid root directory.
 * 4) Test URI example (note the  ): http://localhost:8000/en.wikipedia.org/v3/page/html/Foobar/798652007
 * 5) Now it's time to update the MCS config file.
 * 6) * If you want MCS to talk directly talk to Parsoid instead through RESTBase then use these settings: Change the uri in  of MCS' config.dev.yaml
 * 7) * If you want go through a local RB instance then use instead:
 * 8) Start MCS with   in the MCS folder
 * 9) Test URI: http://localhost:6927/en.wikipedia.org/v1/page/mobile-sections/Foobar/798652007
 * 1) Start MCS with   in the MCS folder
 * 2) Test URI: http://localhost:6927/en.wikipedia.org/v1/page/mobile-sections/Foobar/798652007

See also the Parsoid/Setup/RESTBase page.

Development on local machine
The README.md file in the repo has some great pointers on how to set up and use the service on a dev machine.

MW Vagrant
Enable the  role in MW Vagrant. The code is located under. To restart just the service without having to restart the whole Vagrant instance you can run: Since the Vagrant instance is self-contained you cannot access other servers. If you have a page called Foo in your Vagrant instance you can access it via the following command after sshing into the box: The log file is.

Deployment on labs machine
T91794 Deploy experimental version of mobile apps content service

The service on appservice.wmflabs.org is updated and restarted automatically a few minutes after code gets merged. Troubleshooting on labs machine:
 * Restart the service:
 * view logs:

Beta cluster
Similarly to the beta instance on deployment-restbase0[12].deployment-prep.eqiad.wmflabs, there is now also a MCS instance deployment-mcs01.deployment-prep.eqiad.wmflabs.

You can see examples for the various endpoints running in the Beta Cluster listed with each endpoint above. Here's just one example: https://en.wikipedia.beta.wmflabs.org/api/rest_v1/page/mobile-sections/Dog.

Setup notes
The service is deployed on several machines in Service Cluster B.

Deployment process
A description of the deployment process we follow.

Deployment schedule
Deployment calendar

Deployment logs
Every once in a while someone would like to know if patch XYZ has been deployed yet. Lately we note the deployment tag in the Phab task. In addition to that here are a couple of other options to find indications of that.
 * Look for mobileapps in the Server Admin Log (or directly on #wikimedia-operations) then look up the commit message of the mentioned SHA1 in the deploy repo. This option is great for better real-time notification that a new version of MCS got deployed to production.
 * Check out the tags in the source repo:  . This happens usually a bit later.