Reading/Web/Projects/A frontend powered by Parsoid/Parsoid html size initial report

In order to inform what a fast initial response would be for A frontend powered by parsoid we've performed some initial comparisons from the raw parsoid HTML and an optimized version.

Set up
For the experiment we're running a middle man server that has two endpoints: First hit on the API may be going to restbase and performing transformations, so for measuring we'll ensure to hit the API once, and after it's cached on the middle man we'll measure 5 runs for each endpoint.
 * : Serves the raw parsoid html without transformations.
 * : Serves an optimized version of the HTML
 * For the purposes of the demo, this optimized version performs the following transformations:
 * Removing  attributes
 * Removing comments
 * Remove references
 * Remove tables
 * Remove images

Each run will be done with browser caching disabled.

For measuring we'll use 2 devices: Both devices will be connected to the same WIFI.
 * Macbook Air with OSX 10.10 and Chrome 46
 * Nexus 5 with Android 6 and Chrome mobile 46

For measuring we'll use the Chrome developer tools both connected to the desktop browser and to the mobile browser.

Since this is an initial research we'll measure one of our heaviest articles, Barack Obama

Glossary
Aggregated Time (ms): Time from initial connection to get the HTML to the DOMContentLoaded event.

Html Size (kb): Size of the HTML document downloaded.

Loading Time (ms): Loading time aggregated on the Timeline tab on the developer tools for the Aggregated Time period.

What are we trying to find out
Anecdotal experience shows that loading a parsoid html in a mobile phone blocks the browser for some time, so we're trying to get clear insight into what is happening and how to avoid it, since the purpose of the experiment is to quickly serve content to users on bad connections.

Data
See original google docs spreadsheet for data, also replicated below:

Desktop vs Phone
There is a huge difference when targeting desktop users vs mobile users. Even a modern Nexus 5 with the latest Chrome at the moment is five times slower loading and rendering the same content over the same connection.

Render times (not represented in the data) were also an order of magnitude worse. Will need further studies (anecdotally, about 4s rendering raw content on the mobile device).

Payload size
By stripping certain parts of the content, we can get one of the biggest articles on english wikipedia to be an order of magnitude smaller (1500kb to 268kb). After this, several options arise for surfacing the remaining content, like immediately trigger loading the remaining content, or defer it to user actions, but the benefits are clear.

On fast WIFI the improvement of stripping is 5x in loading time, further tests required to verify impact of payload size on worse connections, but we can foresee it being the biggest factor on 2g connections and similar.

Perception
The perception on the mobile phone of the slim version is instant. Extremely fast.

More research
The subject begs for more research:
 * Using a wider range of devices, network conditions, and a bigger sample of articles.
 * Taking into account also rendering times, besides loading times, payload size and loading time.
 * Measuring different variations of the slim version (what each transformation provides to the end improvement).

Links

 * Source code for the middle man server at the moment of data gathering: https://github.com/joakin/loot/tree/5d6ee62885cd1cd4324b1f40e99b7d418b66c811
 * Deployed version links
 * Raw: http://reading-web-research.wmflabs.org/api/raw/html/Barack%20Obama
 * Slim: http://reading-web-research.wmflabs.org/api/slim/html/Barack%20Obama

Authors

 * Reading Web Team
 * Sam Smith
 * Joaquin Oltra Hernandez