Wikimedia Apps/Team/RESTBase services for apps

The apps team is in the process of developing and switching to new RESTBase services based on node.js and various node modules.

General ideas & goals
The idea for the services mentioned here is to provide a layer of abstraction on top of various MediaWiki api.php and other RESTBase requests, custom-made for consumption by apps. In other words, they provide a Façade which makes it easy for apps to consume content from Wikipedia. The initial main goal is to improve page load performance.

We want to achieve that through the following approaches: Another emerging/future goal is:
 * Move DOM transformations of page content (currently done client-side) to the server.
 * Reduce amount of payload by removing unneeded content and using Parsoid.
 * Reduce the need for separate requests by aggregating information from multiple request into fewer requests.
 * Flatten and trim JSON structures. (Again, remove unused data.)
 * Take advantage of Parsoid annotations to improve the quality of the transformations done.

Phabricator project
The Phabricator project for this is: mobile_content_service

Routes
All routes start with.

.../mobile-html-sections-remaining/{title}
These three routes are going to be used by the alpha Android app when experimental JSON page load strategy (Developer option ) is enabled.

The output has a similar JSON structure to the PHP api action=mobileview, except: The Swagger spec can be found in the source repo in the file. This is a good source to see the actual structure of the output. It would also have to be updated when we significantly changed the structure since there are automated tests which verify that the output adheres to the spec.
 * : has a top level object with two properties:  and.
 * : Instead of the  object to get the URL for the lead image it has a   property under  . This object contains a hashtable of common lead image widths (640, 800, 1024) pointing to respective URL of the lead image in the size in pixel. The   array includes the information needed to display the lead section and also to build the table of contents. Therefore, it has the section text of the lead section only and the rest of the sections don't include it (since that would basically mean we don't need a remaining request and the whole page content was already included in the lead request.
 * : Note that this route's  array does not include the lead section text since the lead section text one is already part of the lead response. The   info is currently there as well, but it will most likely move to the lead section (T108373).

The request flow diagram is similar to the one for the mobile-html route below, except we don't use Parsoid (yet) and the route has changed since the diagram was last updated.

.../mobile-html/{title}   (on hold, not actively developed and not public on production servers)
Since this route is currently on hold we don't expose this option in the Developer options anymore.

This route was used by the alpha Android app when experimental page load strategy was enabled.

The idea is to call webView.loadUrl on the service URL, and let it stream in the content.

When given an existing wikipedia title it responds with HTML page content with two embedded JSON data blocks for metadata: This route executes multiple requests to backend services in parallel via promises:
 * is at the beginning and should contain only what's needed to show things above the fold.
 * is at the end of the payload, mainly for metadata we don't immediately need. Currently it contains some gallery metadata (images and videos)
 * : the main HTML payload
 * This is the page content we get from Parsoid (RESTBase), plus some modifications.
 * : general page metadata -> goes to
 * This data comes from MediaWiki API: This is currently still using action=mobileview. (TODO: Will need to use different backend api requests to be leaner. No need to request page content here, too. We only need the JSON parts.)
 * : other page metadata ->
 * This data comes from MediaWiki API ; with up to two follow-up requests: one for images, another one for videos.

The HTML output also adds links to a CSS and a JavaScript file, which means CSS and JS are coming from the service now instead of being bundled directly with the app. This is easier to get working on the Android app since everything comes from the same source. For subsequent page loads CSS and JS should be cached on the client.

There are currently two alert dialogs triggered by the JS bundle when the DOM is done loading:
 * One generic DOMLoaded event to be sent over the JS bridge to notify the app that the DOM is loaded. This one is probably obsolete for the new model, and we should be able to get rid of it (theoretically; I haven't tested it fully yet).
 * One includes : This is a small optimization to avoid having the app send a message over the JS bridge to get the initial metadata. We'll see if want to keep it that way. It would be better to get the initial metadata to the app even sooner. I'm open for suggestions here.

.../mobile-text/{title}
This route is meant for a new generation lite app; initially targeted for low-powered, older Android devices. The idea is, instead of using a WebView, to use native Android UI components to show the page contents.

This route is currently just using action=mobileview but it's foreseeable that it'll use similar backend calls as the previous route.

More at T90758.

Source
The services are in the following Gerrit repos:
 * 1) mediawiki/services/mobileapps
 * 2) mediawiki/services/mobileapps/deploy

The second repo is for deployment purposes. The first repo contains the implementation of the service routes. Both repos are based on the templates provided by our services team.

Development on local machine
The README.md file in the repo has some great pointers on how to set up and use the service on a dev machine.

Deployment on labs machine
T91794 Deploy experimental version of mobile apps content service

The service on appservice.wmflabs.org is updated and restarted automatically a few minutes after code gets merged. Here is a simple example for some endpoints: Troubleshooting on labs machine:
 * http://appservice.wmflabs.org/en.m.wikipedia.org/v1/page/mobile-html-sections/Dilbert
 * http://appservice.wmflabs.org/en.m.wikipedia.org/v1/page/mobile-html-sections-lead/Dilbert
 * http://appservice.wmflabs.org/en.m.wikipedia.org/v1/page/mobile-html-sections-remaining/Dilbert
 * Restart the service:
 * view logs:

FYI: Beta cluster
This is more an FYI since this is for the RESTBase framework itself. There is a beta instance on deployment-restbase0[12].deployment-prep.eqiad.wmflabs. It would use the labs instance, appservice.wmflabs.org, to complete the mobile route requests.

Deployment on Production cluster
Some example endpoints: Endpoint Swagger docs: https://rest.wikimedia.org/en.wikipedia.org/v1/?doc#!/Mobile/page_mobile_html_sections_lead__title__get
 * https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections/Dilbert (everything; potentially used to refresh saved pages)
 * https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-lead/Dilbert
 * https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-remaining/Dilbert

The mobileapps services was first deployed to the SCA cluster on August 17, 2015: T105538 and T92627.

Documentation: Docker + Deployment
 * Note: The deployment preparation requires the use of Docker to make sure the Node modules have the correct binaries. We've been exploring the option of using Boot2docker on Mac OS but ran into issues with the docker script. So, instead, I'm using an Ubuntu 14.04 inside a VirtualBox VM. There is still an issue (T104304) with that but it's a progress from before.
 * Here are the detailed Ubuntu setup notes in case it helps someone.