Wikipedia Zero/Accept-Language Aware Redirects

From mediawiki.org

This page exists to follow along on Accept-Language aware redirect software engineering activity for the mobile web experience on Wikipedia.

A patch has been implemented to allow Accept-Language aware redirects for the "webroot" of "mdot" and "zerodot" Wikipedia traffic originating from known zero-rated operator networks (e.g., where Wikipedia Zero header tagging is in force). When we say "webroot" of "mdot" and "zerodot" we mean http://m.wikipedia.org/, https://m.wikipedia.org/, http://zero.wikipedia.org/, and https://zero.wikipedia.org/.

Here's the patch: https://gerrit.wikimedia.org/r/#/c/169210/

If this goes well on the zero-rated experience - and thus far it seems okay - we'll examine putting this in place on the non-zero-rated mdot experience for Wikipedia as well. To date, the non-zero-rated mdot Wikipedia experience as a general rule redirected the user to the English en.m.wikipedia.org main page, despite about 47% of users visiting that webroot not having English as their primary language (n.b., the www.wikipedia.org/ global homepage portal is actually more skewed toward English and, furthermore, lists all Wikipedia languages). And even in the zero-rated context, prior to the patch a user's Accept-Language wasn't taken into account; rather the user was shown a list of top languages from which to select (usually okay, but not ideal) or the user was redirected to a singular language main page (e.g., English or French). The patch tries to remove some of the cumbersome steps for the user.

Here's how the initial patch works:

Partner tech managers and software engineers who modify Zero: namespaced articles on https://zero.wikimedia.org/ need to do the following to enable Accept-Language aware redirects:

  1. Set the showZeroPage value to false.
  2. Add the language codes to be redirected to the main page in that language to the showLangs array. Ensure that X-CS tagging is enforced in the content accelerator configuration as well.
  3. The behavior that will result is as follows:
Given the user is accessing Wikipedia
When the user visits the "webroot" of "mdot" or "zerodot"
And the operator is configured to not show the language landing page
And the user's Accept-Language contains a language known to be zero-rated
Then the user is redirected to the main page for the most relevant Accept-Language prefix
Given the user is accessing Wikipedia
When the user visits the "webroot" of "mdot" or "zerodot"
And the operator is configured to show the language landing page
And the user's Accept-Language contains a language known to be zero-rated
Then the user is redirected to the landing page using the most relevant Accept-Language prefix

This patch prefers to redirect the user to a language that is both in the user's Accept-Language header and in the showLangs field (in ranked order). There is shortcircuit logic in place to do a faster redirect if there is overlap between the two lists. If there isn't overlap, yet the user's language is zero-rated, the user is redirected to the highest ranked user's language that is known to exist as a Wikipedia language; if the user's language is not zero-rated, the user is redirected to the first language in showLangs.

Technically, what actually happens is the user is first sent to http://<preferred_lang>.m.wikipedia.org/wiki/Special:ZeroRatedMobileAccess, and then that page bounces the user to the correct place.

See the makeRedirectInfo() method inside of PageRendering.php around the section where ( $toUrl === null || $from === null ) is validated for the behavior when the user visits Special:ZeroRatedMobileAccess without any meaningful parameters on the URL query. It would be trivial to modify this part of the code to send the user to the main page for a given language provided it is zero-rated. However, to avoid a proliferation of unnecessary objects stored at the /wiki/Special:ZeroRatedMobileAccess endpoint (and its localized versions), it would be more efficient to instead send the user to the main page for a given language directly, confining the redirect cache object growth to the webroot alone and also minimizing tougher-to-diagnose redirect bugs should they ever occur.