Reading/Web/Projects/Performance/Stripping references from page in beta cluster

From mediawiki.org
< Reading‎ | Web‎ | Projects

Hypothesis[edit]

Certain content doesn't necessarily need to be shipped to the user upfront and sometimes not at all. A good example is the list of references. If a mobile user never clicks on a superscript reference link or loads the references section, then they do not make use of the HTML required to generate. We can thus remove this HTML from the initial page load and lazy load it if and when needed.

Prediction[edit]

Previous experiments had shown on the Barack Obama that removal of references had a significant impact on the fully loaded time at a small increase to TTFB. First render was unlikely to be impacted by such a change.

Method[edit]

MobileFrontend has a library called MobileFormatter which extends the HtmlFormatter in core. We used this to strip any elements in the HTML with the class references. On various good quality articles the size of this HTML is significant, e.g it accounts for 50% of all HTML in the Barack Obama article.

Due to a performance related change then went out the same day, which stripped srcset attributes from image tags in the page, we had to establish a new baseline. The configuration on the beta cluster was first updated to remove references. Later the change was reverted to retain the references list.

A script was used to calculate the median and average of values before the revert during a 5 day period and after the revert during the same specified period of time for a specified article (Barack Obama) on an emulated 2G connection.

The commands used to measure the impact of the change were:

node wptreporter.js "webpagetest.enwiki-bc-mobile-2gslow.anonymous.Barack_Obama.us-east-1.Google_Chrome-emulateMobile.firstView" 23 02 2016 00 54 "" 6
node wptreporter.js "webpagetest.enwiki-bc-mobile-beta-2gslow.anonymous.Barack_Obama.us-east-1.Google_Chrome-emulateMobile.firstView" 23 02 2016 00 54 "" 6

Results[edit]

The re-addition of the HTML for references seemed to improve performance.

Stripping references has a positive impact on the fully loaded time for 2G connections on large articles such as Barack Obama.

Note, that given the change we are measuring is the re-addition of references, a negative percentage decrease is a positive results. Fully loaded time was better without references as you might expect but TTFB and render time were not impacted. Savings in bytes were high.

Property With references (avg) Without references (avg) Delta (Avg) % decrease (Avg) With references (median) Without references (median) Delta (median) % decrease (median)
html.bytes 153134.8 64910.7 88224.1 57.6% 155053.0 64911.0 90142 58.1%
TTFB.median 3914.3 3913.6 0.7 0% 3912.0 3911.5 0.5 0%
render.median 5806.9 5846.0 -39.1 0.67% 5884.0 5883.5 1.5 0.0%
fullyLoaded.median 21198.1 20311.3 886.8 4.1% 21247.0 20439.5 807.5 3.8%

The impact in beta was much more noticeable but followed the same trend.

Property Without references (avg) With references (avg) Delta (Avg) % decrease (Avg) Without references (median) With references (median) Delta (median) % decrease (median)
html.bytes 64847.0 155112.8 -90265.8 -139.20% 64782.0 155117.0 -90335.0 -139.44%
TTFB.median 6583.8 7723.7 -1139.9 -17.31% 3922.0 3930.5 -8.5 -0.22%
render.median 8853.0 11110.8 -2257.8 -25.50% 5891.0 5994.0 -103.0 -1.75%
fullyLoaded.median 22967.3 26497.8 -3530.4 -15.37% 18906.0 24556.0 -5650.0 -29.88%

During the experiment the best time for fully loaded time we saw in stable was 18.91s.

Analysis[edit]

The improvements in fully loaded time were rather small but still for users that do not view references at all, they provide a big impact in savings of bytes.

Conclusions[edit]

Removing references from the HTML has unmistakeable bytes savings but does not look like it will improve time to first render.

The impact on fully loaded time is positive, although not as large gets us closer to the 15 second mark for fully loading Barack Obama and other large pages.

Beta even when run alongside stable does not seem to show a correlation with stable with regards to fully load time.

Next steps[edit]

We should aim to lazy load references from stable.

First things first we need to verify this does not impact anything else in the cluster given the additional storage it requires.