Wikimedia Apps/Team/RESTBase services for apps

The apps team has developed new RESTBase services based on node.js and various node modules. The Wikipedia Android app uses one of these to get an article's opening section, TOC, description, and lead image URL in one request.


 * Sample URL: https://en.wikipedia.org/api/rest_v1/page/mobile-sections/Albert%20Einstein
 * API documentation: Mobile section of rest_v1 docs
 * Phabricator project: #mobile_content_service

General ideas & goals
The idea for the services mentioned here is to provide a layer of abstraction on top of various MediaWiki api.php and other RESTBase requests, custom-made for consumption by apps. In other words, they provide a Façade which makes it easy for apps to consume content from Wikipedia. The initial main goal is to improve page load performance.

We want to achieve that through the following approaches: Another emerging/future goal is:
 * Move DOM transformations of page content (currently done client-side) to the server.
 * Reduce amount of payload by removing unneeded content and using Parsoid.
 * Reduce the need for separate requests by aggregating information from multiple request into fewer requests.
 * Flatten and trim JSON structures. (Again, remove unused data.)
 * Take advantage of Parsoid annotations to improve the quality of the transformations done.

Routes
All routes start with.

.../mobile-sections-remaining/{title}
These three routes are going to be used by the alpha Android app when the Developer option  is enabled.

The output has a similar JSON structure to the PHP api, except: The Swagger spec can be found in the source repo in the file. This is a good source to see the actual structure of the output. It would also have to be updated when we significantly changed the structure since there are automated tests which verify that the output adheres to the spec.
 * : has a top level object with two properties:  and  . This is an endpoint which gets the contents of the next two endpoints in one single request, which is useful for refreshing saved pages.
 * : is used for both the link preview and the initial page load.
 * Instead of the  object to get the URL for the lead image it has a   property under  . This object contains a hashtable of common lead image widths (640, 800, 1024) pointing to respective URL of the lead image in the size in pixel.
 * The  array is a JSON representation of the first infobox in the lead section.
 * The  object shows the text extracts we use for the link preview.
 * If the article has a pronunciation the the  object has a   string with the fully qualified URL to the pronunciation file.
 * If the article uses one of the  templates the the   object has a   array with the fully qualified URLs to the parts of a recorded audio version of this article.
 * If there are Geo coordinates associated with the article then the  object will have the   and   of the place.
 * The  array includes the information needed to display the lead section and also to build the table of contents. Therefore, it has the section text of the lead section only and the rest of the sections don't include it.
 * The  section contains a list of images and videos contained on the page and the metadata displayed in the app. Mainly needs license data. (It was originally used in the   response but since the link preview needs this info, too, it was moved to the   response.)
 * : Note that this route's  array does not include the lead section text since this was already retrieved as part of the lead response.

.../mobile-text/{title}
This route is meant for a new generation lite app; initially targeted for low-powered, older Android devices. The idea is, instead of using a WebView, to use native Android UI components to show the page contents.

This route is currently just using action=mobileview but it's foreseeable that it'll use similar backend calls as the previous routes.

More at T90758.

Source
The services are in the following Gerrit repos:
 * 1) mediawiki/services/mobileapps
 * 2) mediawiki/services/mobileapps/deploy

The second repo is for deployment purposes. The first repo contains the implementation of the service routes. Both repos are based on the templates provided by our services team.

Development on local machine
The README.md file in the repo has some great pointers on how to set up and use the service on a dev machine.

Deployment on labs machine
T91794 Deploy experimental version of mobile apps content service

The service on appservice.wmflabs.org is updated and restarted automatically a few minutes after code gets merged. Here is a simple example for some endpoints: Troubleshooting on labs machine:
 * http://appservice.wmflabs.org/en.m.wikipedia.org/v1/page/mobile-sections/Dilbert
 * http://appservice.wmflabs.org/en.m.wikipedia.org/v1/page/mobile-sections-lead/Dilbert
 * http://appservice.wmflabs.org/en.m.wikipedia.org/v1/page/mobile-sections-remaining/Dilbert
 * Restart the service:
 * view logs:

FYI: Beta cluster
This is more an FYI since this is for the RESTBase framework itself. There is a beta instance on deployment-restbase0[12].deployment-prep.eqiad.wmflabs. It would use the labs instance, appservice.wmflabs.org, to complete the mobile route requests.

Deployment on Production cluster
Some example endpoints:
 * https://en.wikipedia.org/api/rest_v1/page/mobile-sections/Dilbert (everything; used to refresh saved pages)
 * https://en.wikipedia.org/api/rest_v1/page/mobile-sections-lead/Dilbert
 * https://en.wikipedia.org/api/rest_v1/page/mobile-sections-remaining/Dilbert

Endpoint Swagger docs: https://rest.wikimedia.org/en.wikipedia.org/v1/?doc#resource_Mobile

Documentation: Docker + Deployment notes from service template Deployment procedures on SCB (Service Cluster B):
 * Note: The deployment preparation requires the use of Docker to make sure the Node modules have the correct binaries. We've been exploring the option of using Boot2docker on Mac OS but ran into issues with the docker script. So, instead, I'm using an Ubuntu 14.04 inside a VirtualBox VM. There is still an issue (T104304) with that but it's a progress from before.
 * Here are the detailed Ubuntu setup notes in case it helps someone.
 * Marko's deployment page

Deployments
Deployments log

Deployment calendar