Lazy loading of references on Russian Wikipedia

From mediawiki.org

On the 1st September, lazy loading references was disabled on Russian Wikipedia after both images and references were enabled back in July. The beneficial Lazy loading of images continued to be enabled. Whereas previously all references views were routed via the API, now references would be served in the HTML.

The impact gives the impression that very few users need references in a page view. In a week period, after ending the experiment, an additional 338GB were shipped to Russian Wikipedia and there was only a 0.11GB decrease in bytes shipped via by the API.

In the worse case disabling the experiment increased page load by 2 seconds and first paint by 0.5 seconds.

What we noticed[edit]

Impact on performance[edit]

In progress In progress [ToDo: Normalise sample size]

Fully load time, first paint and first interactive time were inspected before and after the experiment was disabled.

select * from NavigationTiming_15485142 where wiki = 'ruwiki' and event_mobileMode = 'stable' and event_action ='view' and timestamp > 20160822000000 and timestamp < 20160909000000

Label Sample Size 95th percentile median
With lazy loaded references 35582 14868.5 2692.0
Without lazy loaded references 39506 16447.75 3067.0
With lazy loaded references (anons) 35555 14899.3 2693.0
Without lazy loaded references (anons) 39467 16450.0 3067.0
With lazy loaded references (http2) 26872 11518.15 2359.0
Without lazy loaded references (http2) 30016 13009.25 2697.0
With lazy loaded references (http1) 8710 23583.05 4146.5
Without lazy loaded references (http1) 9490 26780.95 4767.5

First paint

Label Sample Size 95th percentile median
With lazy loaded references 20217 6927.6 1427.0
Without lazy loaded references 22440 7469.15 1502.0

DomInteractive

Label Sample Size 95th percentile median
With lazy loaded references 35582 7181.0 1081.0
Without lazy loaded references 39506 7580.5 1121.0

Impact on bytes shipped[edit]

The following SQL query was made on all page views for the Russian mobile site:

use wmf; select month, day, sum(response_size) from webrequest where year = 2016 and month = 8 and day = $i and uri_host = 'ru.m.wikipedia.org' and uri_path rlike '^/wiki/([^:])+$' and content_type rlike '^text/html' and agent_type = 'user' and http_status = '200' group by month, day;"
Month Day Total bytes shipped Total bytes shipped (GB)
8 23 195433081582 195.4330816
8 24 196452275069 196.4522751
8 25 194610708845 194.6107088
8 26 193316844255 193.3168443
8 27 204064575497 204.0645755
8 28 213554827616 213.5548276
8 29 193135615810 193.1356158
8 30 197245837879 197.2458379
8 31 184946052684 184.9460527
9 1 181258661113 181.2586611
9 2 218129458634 218.1294586
9 3 253029949643 253.0299496
9 4 271154670250 271.1546703
9 5 244388334543 244.3883345
9 6 247460404402 247.4604044
9 7 247948861959 247.948862
9 8 249034788519 249.0347885
week bytes shipped (GB)
24th-30th (With lazy loaded references) 1392.380685
2nd-8th (Without lazy loaded references) 1731.146468
bytes increase 338.765783

We also had to consider the increased load on the API to retrieve references. The bytes shipped by the API before and after the change to the references api were considered using the following query:

for i in `seq 23 31`;
do
 hive -e "use wmf; select month, day, sum(response_size) from webrequest where year = 2016 and month = 8 and day = $i and uri_host = 'ru.m.wikipedia.org' and uri_path like '%api.php%' and uri_query like '%action=mobileview%sections=references%' and http_status = '200' group by month, day;" > ru-8-$i.tsv
done
for i in `seq 1 9`;
do
 hive -e "use wmf; select month, day, sum(response_size) from webrequest where year = 2016 and month = 8 and day = $i and uri_host = 'ru.m.wikipedia.org' and uri_path like '%api.php%' and uri_query like '%action=mobileview%sections=references%' and http_status = '200' group by month, day;" > ru-9-$i.tsv
done
Month Day API: Total bytes shipped MB GB
8 23 1015320157 1015.320157 1.015320157
8 24 1003732951 1003.732951 1.003732951
8 25 991807250 991.80725 0.99180725
8 26 998057985 998.057985 0.998057985
8 27 1042861398 1042.861398 1.042861398
8 28 1124165529 1124.165529 1.124165529
8 29 1014439791 1014.439791 1.014439791
8 30 1067402736 1067.402736 1.067402736
8 31 955704418 955.704418 0.955704418
9 1 1012481681 1012.481681 1.012481681
9 2 1008896513 1008.896513 1.008896513
9 3 984832141 984.832141 0.984832141
9 4 982142387 982.142387 0.982142387
9 5 954347415 954.347415 0.954347415
9 6 1046599717 1046.599717 1.046599717
9 7 1142749069 1142.749069 1.142749069
9 8 1011144808 1011.144808 1.011144808
week bytes shipped (GB)
24th-30th (With lazy loaded references) 7.24246764
2nd-8th (Without lazy loaded references) 7.13071205
bytes increase -0.11175559

Questions and answers[edit]

Q: Can we attribute this change to lazy loaded references?

A: There were no known changes that rolled out during this period that we'd expect to cause such a large increase in bytes shipped by the HTML.

Similar to how we did the analysis for lazy loading images, it's quite possible the Russian Wikipedia project did a lot of editing that week to reduce the size of pages, but that would be a lot of editing for such a large amount of bytes!

One theory might be that during the experiment traffic contributed to this increase in bytes shipped, but looking at the graph you can see this is not the case.

A 338 gb additional shipped page HTML on mobile is quite significant and had that impacted all traffic on desktop as well that would have been noticed by ops, so it's safe to say that there were no core changes that may have caused this! We could analyse desktop traffic for the same period if we lack that confidence, but I feel it would be a lot of effort for little gain.

Traffic on Russian Wikipedia was stable during the experiment.