Reading/Web/Desktop Improvements/AB testing and Opt Out possibilities

From mediawiki.org

We would like to explore the feasibility of providing opt-out and A/B testing capabilities to logged-out users throughout the course of the desktop improvements project.

User stories[edit]

As a product team, we would like to A/B test desktop improvements changes (such as changing the location of the language switcher) on all users (logged-in and logged-out), so that we can determine whether the change has any positive or negative impacts. These changes, at this time, would involve sending different HTML to different users from the server side.

As a product team, we would like to allow all users (logged-in and logged-out) to be able to opt-out of the new experience, so that we can measure the retention rate for the new treatment

Research[edit]

Logged-in users[edit]

A/B testing logged-in users is straightforward. We can easily do that on the server-side, and there is no need for any caching layer changes as all logged-in users requests skip cache.

The only required work is the backend (PHP) work and the work needed to keep track of what user is assigned to what treatment. While this is not hard to do, the code to segment users given a login does not exist.

Logged-out users[edit]

There are common problems regarding AB testing logged out users in mediawiki ecosystem. The first one is that since we do not have a logged-in user bucketization of users, assigning users to different treatments is only possible via cookie or localstorage settings. This is what we routinely have been doing in tests to date and, as long as tests are short lived (weeks), is not likely to be an issue.

Segmenting users at the varnish layer[edit]

Without doing changes to how mediawiki composes a page it is complicated to test this case as anon requests are cached in Varnish. Backend servers generate a response and all users browsing the same page see the same response, that is, the same HTML.

To A/B test logged-out users, we need to use some mechanism, eg cookie, to skip the caching, or split the cached responses. There are three main problems with A/B testing anon-users

  • additional complexity on the varnish side, varnish is a caching system, not an application platform so it is not designed for this use case
  • privacy concerns, as we need to mark unique user sessions with some token through all requests

There is an additional way we could test Logged-out users and that is changing the default view for all users for a short time. For example: switching the default skin to new, and monitor user behavior for a short short period. After changing the skin, we would need to wipe the entire varnish cache for the given wiki. On small wikis, it's not a problem, but on English wiki, it requires a bit more planning as wiping varnish cache can lead to service unavailability.

In 2016 we explored the idea of segmenting users for AB testing at the varnish layer. However, the idea was abandoned mostly because of complexity at the caching layer and privacy issues on small wikis with bucketization. For more information, please refer to [T135762]

Also https://docs.google.com/document/d/1jRGjVAthJXoCovxyvXWyg07R1POb8zvD_n8IlJXrPVM/

Server-side A/B testing is an issue that's come up every few months for the past couple of years.

https://phabricator.wikimedia.org/T135762

and it's pretty deep and unfruitful rabbit-hole

Segmenting users on the client side[edit]

If possible, we should do client-side A/B testing, this removes the complexity of dealing with the Varnish caching layer, it requires changes on the way we present fronend end code to the users as we would need to lazy load the UI for each treatment.

Research Results[edit]

Is it possible and what is the relative difficulty of providing all logged-out users with an opt-out option for the proposed improvements

It is possible although very difficult. The recommendation is to avoid this for the time being

Does this change based on how many wikis we are testing on? (i.e. what is the maximum number of pages that we can split the cache for at any given time, and for how long?

Testing on smaller wikis is not recommended, but is possible

Is it possible, and what is the relative difficulty of performing per-user or per-session A/B test on logged-out users?

Similar to A/B testing all users. We need to do pretty the same code to support the A/B testing. Also, it's pretty tricky to ping the user to the same group through the entire experiment. The only way to pin users is to use cookies, and those might not be present through multiple sessions. A/B testing a single session is easier, although serving different experience each browsing session is something we should avoid.

Does this change based on how many wikis we are testing on? (i.e. what is the maximum number of pages that we can split the cache for at any given time, and for how long?

From my knowledge, no, until we test on small wikis.

Is it possible to do A/B/C testing for logged-out users?

Yes, but it's not recommended.

Is it possible to do A/B/C testing for logged-in users?

Yes, this is the only recommended way as all requests are served by backend and we can properly assign users to buckets through the entire experiment.