Reading/Web/Desktop Improvements/AB testing and Opt Out possibilities

We would like to explore the feasibility of providing opt-out and A/B testing capabilities to logged-out users throughout the course of the desktop improvements project

User stories
As a product team, we would like to A/B test desktop improvements changes (such as changing the location of the language switcher) on all users (logged-in and logged-out), so that we can determine whether the change has any positive or negative impacts. These changes, at this time, would involve sending different HTML to different users from the server side.

As a product team, we would like to allow all users (logged-in and logged-out) to be able to opt-out of the new experience, so that we can measure the retention rate for the new treatment

Logged-in users
A/B testing logged-in users is straightforward. We can easily do that on the server-side, and there is no need for any caching layer changes as all logged-in users requests skip cache.

The only required work is the backend (PHP) work.

Logged-out users
There are common problems regarding AB testing logged out users in mediawiki ecosystem. The first one is that since we do not have a logged-in user bucketization of users to different treatments is only possible via cookie or localstorage settings.

Without doing changes to how mediawiki composes a page it is complicated to test this case as anon requests are cached in Varnish. Backend servers generate a response and all users browsing the same page see the same response, that is, the same HTML.

To A/B test logged-out users, we need to use some mechanism, eg cookie, to skip the caching, or split the cached responses. There are three main problems with A/B testing anon-users


 * privacy concerns, as we need to mark unique user sessions with some token through all requests
 * additional complexity on the varnish side
 * Successful A/B test is when a single user is assigned to the same group during the whole experiment. This is not possible when A/B testing is executed on the Varnish/Backend sides.

There is an additional way we could test Logged-out users, to switch the default skin to new, and monitor their behavior in some short period. After changing the skin, we would need to wipe the entire varnish cache for the given wiki. On small wikis, it's not a problem, but on English wiki, it requires a bit more planning as wiping varnish cache can lead to service unavailability.

In 2016 we already tried to do an A/B testing framework. However, the idea was abandoned mostly because of the fact it's not possible to pin given user to one group during the experiment and the additional complexity on the caching layer. For more information, please refer to [ T135762 ]

Also https://docs.google.com/document/d/1jRGjVAthJXoCovxyvXWyg07R1POb8zvD_n8IlJXrPVM/

All previous approaches with server-side A/B testing were

> seasonal issue that's come up every few months for the past couple of years.



and it's pretty deep and unfruitful rabbit-hole

If possible, we should do client-side A/B testing, this removed the complexity of dealing with the Varnish caching layer.

Research Results
Is it possible and what is the relative difficulty of providing all logged-out users with an opt-out option for the proposed improvements

It is possible although very difficult. The recommendation is to avoid this for the time being

'''Does this change based on how many wikis we are testing on? (i.e. what is the maximum number of pages that we can split the cache for at any given time, and for how long?'''

Testing on smaller wikis is not recommended, but is possible

Is it possible, and what is the relative difficulty of performing per-user or per-session A/B test on logged-out users?

Similar to A/B testing all users. We need to do pretty the same code to support the A/B testing. Also, it's pretty tricky to ping the user to the same group through the entire experiment. The only way to pin users is to use cookies, and those might not be present through multiple sessions. A/B testing a single session is easier, although serving different experience each browsing session is something we should avoid.

'''Does this change based on how many wikis we are testing on? (i.e. what is the maximum number of pages that we can split the cache for at any given time, and for how long?'''

From my knowledge, no, until we test on small wikis.

Is it possible to do A/B/C testing for logged-out users?

Yes, but it's not recommended.

Is it possible to do A/B/C testing for logged-in users?

Yes, this is the only recommended way as all requests are served by backend and we can properly assign users to buckets through the entire experiment.