Page Previews/2017-18 A/B Tests

From mediawiki.org

Page Previews (formerly known as Hovercards) is a new software feature on the desktop version of Wikimedia sites, designed to reduce the cost of exploring of a link, as well as to promote learning by allowing readers to gain context on the article they are reading or to define an unfamiliar event, idea, object, or term without navigating away from their original topic.

Page Previews consist of a small card that opens in the vicinity of a link once the reader hovers the cursor over the link for more than 500ms. (They replace the small existing popups on some browsers that show only the title of the linked page.) They were inspired by the existing “Navigation Popups” gadget popular among editors. (As both features have overlapping functionality, editors can choose to enable each one separately. Current Navpopups users will not be shown Page Previews without having disabled Navpopups first.)

Page Previews were first introduced as a beta feature in 2014. Over several months in 2017, following various earlier tests (qualitative and quantitative), page previews were rolled out to anonymous users on all Wikipedias except German and English. To confirm that the feature is working as intended (after numerous improvements and instrumentation fixes following the earlier tests), we evaluated two A/B tests that were run in October/November 2017 and December 2017-February 2018 on the German and English Wikipedias. There, the feature was activated for a small percentage of anonymous users, and data was collected measuring their interactions with the feature and with links in general, compared with a control sample of equal size.

Research Questions[edit]

Page Previews are designed to reduce the cost of exploring a link, as well as to promote learning, by allowing readers to gain context on the article they are reading or to quickly check the definition of an unfamiliar event, idea, object, or term without navigating away from their original topic.

Through our analysis, we wanted to study user behavior towards the Page Previews feature to answer the following questions:

  • How often do people use the feature to preview other pages? We set a threshold of 1000ms (one second) for a preview card to stay open before we counted it as viewed.
  • How often do people disable the feature? It can be disabled by clicking on a settings icon at the bottom of each preview. A high rate in disabling Page Previews would indicate that this is not a feature users want. A low rate indicates users like the feature and would continue using it.

We also wanted to study how the introduction of Page Previews changes reading behavior in general. Since reading the content of a linked page no longer requires users to navigate away from the article they are currently viewing, we expected:

  • the number of pageviews per browser session (also referred to as session depth) to decrease - readers will choose to view some links using the previews feature, rather than opening them resulting in a full pageview
  • the total number of distinct pages interacted with per session (via either pageviews or seen previews) to increase - since readers no longer have to open a new page to view information from it, they are more likely to explore a larger variety of topics
  • A reader who would have opened the linked article in the absence of previews - for context about the main article they are reading, or to get a definition - might also then have gone back to the article containing the link (which registers as another pageview). Therefore we expected part of the decrease in pageviews to come from reduced usage of the back button.

Impact on fundraising: since we are hypothesizing that implementing Page Previews will result in a decrease in desktop pageviews overall, we measured the impact of the feature on donations coming from desktop banners.

Results[edit]

Change in the total number of distinct pages interacted with per session (in the second A/B test)

Pageviews and Page Interactions, effects on fundraising[edit]

As expected, pageviews per session decreased slightly with previews enabled:

  • -4% on enwiki, -3% on dewiki in the first A/B test (October/November)
  • -5% on enwiki, -3% on dewiki in the second A/B test (December-February)

However, the total number of distinct pages interacted with per session (via either pageviews or seen previews) increased:

  • +22% on enwiki, +20% on dewiki in the first A/B test (October/November)
  • +21% on enwiki, +21% on dewiki in the second A/B test (December-February)

As expected, usage of the back button (estimated as the difference of pageviews and unique - per session - pageviews) decreased. The effect was considerably smaller on the German Wikipedia than on the English Wikipedia:

  • -8% on enwiki, -2% on dewiki in the first A/B test (October/November)
  • -9% on enwiki, -3% on dewiki in the second A/B test (December-February)

Impact on fundraising: No negative impact was observed. In fact, although banner impressions were reduced, our test results suggest that the implementation of Page Previews may even have a positive impact on donation rates.

Feature usage: average previews seen per pageview, disable rates[edit]

The average number of preview cards viewed (for at least 1000ms) per pageview was

  • 0.26 on both wikis in the first A/B test (October/November)
  • 0.27 on both wikis in the second A/B test (December-February)

Disable rates were very low:

  • The feature was disabled in around 0.01% of sessions on both English and German Wikipedia in the first A/B test (October/November)
  • The feature was disabled in around 0.01% of sessions on both English and German Wikipedia in the second A/B test (December-February)

To ensure that these rates were not artificially low due to usability issues, we also confirmed in a separate qualitative test that users were indeed able to find and operate the disable functionality if they desired to disable Page previews.


Next steps[edit]

Since Page Previews are an important new way of reading Wikipedia content (based on the above rate of 0.26 previews seen per pageview, there will be more than 50 million previews seen per day after full deployment), we are currently working on a separate instrumentation that will count Page Previews in the same way we currently count regular pageviews. This will eventually enable editors to know how often their articles are being viewed in this way, and the Foundation and the movement to better assess overall quantitative trends in Wikipedia readership.

Data sources[edit]

Details about the EventLogging instrumentation can be found on the schema page; see also this visual overview of the different kinds of events that were logged.

The first experiment was launched on Oct 18, 2017, taking a day or more to reach the full event rate because of caching, and was stopped on November 15. To account for the strong weekly seasonality of reader behavior on desktop, the analysis above is limited to the three weeks of Oct 23 to Nov 12.

A second experiment was launched in December 2017, mainly for the purpose of gathering data for a newly added instrumentation field (time to first interaction, informing work on performance improvements), but we also used the opportunity to evaluate the same metrics as in the first test. The data from this experiment used above covers the eight weeks from December 21, 2017 to February 14, 2018.

The detailed database queries used to derive the above numbers can be found via the overview task: phab:T182314

Sampling rates and method: in both experiments, sampling was done per browser session, with probability 1.5% for test and control group each on enwiki, and 4% each on dewiki. See here for more detail.

Due to the very large samples used (e.g. 53 million sessions on enwiki in the second A/B test; chosen to be large enough to allow the analysis of other aspects of the feature not covered in this report), uncertainties due to random variation are negligible in the above results. There are some other sources of minor limitations, uncertainties and caveats though, including:

  • Longer-term seasonal variations in reader’s user of Wikipedia
  • Truncation effects for session-based data, due to some browser sessions starting before the begining of the experiment, or ending after the experiment. In particular, this means that per-session numbers are slightly lower for the first, shorter experiment.
  • Very rare collisions (birthday problem) of the various pseudorandom tokens used in the instrumentation
  • The pageview definition used here may differ slightly from the standard one, in particular in that it counts view to special pages.
  • The data is limited to sendBeacon-capable browsers.
  • The result for the total number of pages interacted with is based on a query that makes the slightly simplifying assumption that a browser session won’t contain pages with the same name in different namespaces.
  • The average number of previews seen per pageview should normally be calculated based on data that is sampled per pageview, but in order to answer the other research questions, this instrumentation used sampling by browser session instead. This can introduce inaccuracies because the numbers of previews in each pageview may not be statistically independent within one session; however we assume that this error is very small, considering the small average session lengths.

Past data issues: During the prior work on this instrumentation, we encountered various small and large bugs that affected data quality, including an upstream bug in the Firefox browser (which we filed with Mozilla, who have since fixed it) and an error in the code that implemented the sampling logic, which also affected several other EventLogging schemas. All of these have been resolved. But due to the complex nature of this instrumentation, we can’t guarantee with absolute certainty that no further bugs remain. (During a more detailed analysis of interaction timings, we observed some patterns in the data at high resolution - milliseconds - that very likely have to do with quantization of browsers’ internal timers, but have not been fully explained yet. We do not expect this to affect the timing data as used for the results in this report.)