Reading/Web/Projects/Barack Obama in under 15 seconds on 2G

The problem
If you have a 2G connection your Wikipedia experience is terribly slow. This leads to third parties transcoding our pages. We need to make this faster. This was highlighted as an issue back in 2012 but still hasn't been solved. It's time to fix that.

2G usage in the world

 * First lets look what kind of connections people uses in the world. Almost no one have 2G in North America but the rest of the world the situation is different.
 * Then it becomes really interesting by looking at the same numbers but scaled by market size.

Next 5 years
This graph shows us what the growth will look like for smartphones: The growth will happen where there's a lot of 2G users.

Proxy browsers?
But what about "proxy" browsers? A proxy browser is a browser where the content is prepared on the server side like Opera Mini, UC or Chrome with data reduction turned on. These browsers are mostly used where the connection type is slow and make the browser experience better. The numbers I've found doesn't perfectly add up but it seems like these browsers are still a fraction of the internet traffic. The first image shows percentage by OS where proxy browser are in the **others**.

And this chart show how much is actually by individual proxy browser.

This shows proxy browser users are a fraction of the total browsers.

All the graphs are taken from Tim Kadlecs talk Better By Proxy and Maximiliano Firtmans talk [Extreme Web Performance for mobile devices.

Conclusion
The majority of the mobile users in the world uses a 2G connection, the growth will happen in the countries where the 2G usage is highest. And proxy browsers usage is low so it doesn't solve the problem. We need to have a site that is fast on 2G.

The how
The main issues are serving unnecessary content to mobile users that may not be used. This includes non-essential content and images. We believe by switching the mobile web site to Parsoid we will gain more control over the content we display to users. We believe we can initially serve just the HTML of the lead section and load the rest of the content, images included, via JavaScript, without ruining the existing experience for users who are fortunate enough to have a good connection. Switching to Parsoid will also allow us to create a more modern Wikipedia experience.

Test 1
We took the top 5 pages from an arbitrary week in September and duplicated them on a MediaWiki instance. For each version we created a duplicate which only contained the lead section and tested speed on both 2G and 3G connections.

Analysing the data we collected we found that on this small sample of pages, the following was true:


 * A median of a 40% decrease in total bytes shipped to user - the total bytes saved by such a move were around 40% decrease in page size, and a 31% decrease in the number of bytes needed to enable the first view of the content to the user.
 * A median of a 22 percent decrease in time to first render on a 3G connection
 * A median of a 31 percent decrease in time to first render on a 2G connection
 * A median of a 12 percent decrease in time to fully load the content on a 3G connection
 * A median of a 36 percent decrease in time to fully load the content on a 2G connection
 * On the largest article in the sample, enwiki:Serena Williams
 * there was a 47% decrease in time to first render on 3G, and 36% on 2G.
 * This equated to a saving of 4.9 seconds - 14 to 9s.
 * there was a 55% decrease in the time to fully load on 2G (32% on 3G)
 * This equated to a saving of 152.611s on the 2G connection

Conclusions
 * Although obvious, reducing the size of the HTML shows promise that it will impact first paint and first interactive time (based on the lower fully load time) significantly.

Next steps:
 * We need to test on a much larger sample to get a sense of the overall impact of these changes and the median improvement we can expect to see.
 * The entire article should still be available.
 * We'll need to analyse where savings can be made by identifying content that can be defer loaded.
 * We'll need to measure whether simply loading lead section will satisfy a certain percentage of users (and what percentage that is)

Success criteria

 * Barack Obama loads in under 15 seconds on a 2G connection
 * The content served initially has value (e.g. lead section)
 * First paint increases across the site (counter-intuitively due to more traffic from slower connections)
 * Overall page visits for site increase
 * Mobile web service does not deteriorate e.g. there is not an increase in outages
 * Bytes per page view drops for casual readers

Join the conversation
This task is being tracked on Phabricator and we'd love to hear your thoughts.