Requests for comment/Zero architecture

This proposal was partially implemented. It's (allegedly) awaiting Wikimedia Foundation operations work on ESI which is blocked by a Varnish upgrade.

Historical architecture

 * Varnish used a set of IP blocks to determine carrier and sets headers
 * Very inefficient - linear search through all blocks on each mobile request


 * Zero extension shows carrier-specific banner based on the header
 * Zero and mobile extensions need to be decoupled


 * Zero Configuration was done on two en:MediaWiki:... pages
 * Partners could not make changes themselves - must go through managers and developers
 * Settings were hosted on English Wikipedia instead of meta
 * The process was error-prone, there was no validation of any sorts

Current architecture (2013-12-19)

 * IP-to-carrier lookups are dynamic.
 * IP-to-proxy provider lookups are dynamic.
 * Whether Wikipedia Zero formatting is applicable is still determined to an extent by VCL hardcoding, though.
 * Zero extension shows carrier-specific banner based on the header
 * Zero and mobile extensions need to be decoupled


 * Zero Configuration is mostly with access controlled configuration blobs on META.
 * ''Carriers familiar with JSON can in principle propose configuration changes, provided the Wikipedia Zero team has granted them access. The Wikipedia Zero team can then validate and approve such proposed configuration changes.
 * Data validation is pretty strong. The future system will need to do things like ARIN lookups for cross-referencing of IPs for basic "sanity checks".

Pipeline

 * Mark and Yuri talking with Brandon about a hashmapper to determine Wikipedia Zero applicability of incoming requests. This netmapper-like functionality would use data derived in a dynamic fashion from the Wikipedia Zero JSON configuration blobs. It's expected that this could be done in roughly January 2014.
 * Mark and Yuri talking with Brandon about backporting a Varnish patch. Likelihood of success to support ESI via this backport is low. An upgrade to Varnish 3.0.5 or higher may end up being required, which could take a while, probably post January 2014.

Partner configuration

 * DONE: Zero extension configurations will be stored as wiki pages in JSON format, one page per partner.
 * DONE: Config pages will reside on meta-wiki, in a dedicated namespace Zero:
 * DONE Zero: namespace will be writable only by people in a dedicated security group
 * DONE: Custom content-handler will validate json structure on Save, and invalidate any related caches.
 * IN ESSENCE, DONE: Custom visualizer will show settings, duplicate/inefficient IP ranges, banner visualization, etc (TODO: more robust IP checking could be done)
 * NOT STARTED: As a further improvement, the config page could have a dedicated form editor to simplify changes by the partner (nice JSON editor in place, though).

UPDATE 19-DECEMBER-2013: The following JSON configuration format has changed a little since actual implementation, but it has the same general characteristics.


 * TBD: PARTNER name might be localizable, and might be a link to custom URL
 * TBD: Allow HTML blob instead of banner+style

Zero extension

 * DONE: The extension will get the JSON blob from the meta page  with an API call, process and verify it, and cache the result in memcached (Key = 'zero-config-' + ID). The ID will be taken from the X-CS header (set by Varnish).
 * DONE: Updating a Zero page will trigger a memcached reset for that ID

Banners
Ideally, due to banners being very small HTML blobs, they should be inserted using ESI (Edge Side Includes) by Varnish caching servers. The only concern is what would be the performance impact, which we really don't know.

UPDATE 19-DECEMBER-2013: ESI support is in place in the ZeroRatedMobileAccess extension, but a Varnish bug is preventing use of this feature until Varnish is upgraded.

In case ESI proves to be too costly, we could try to implement it similar to Central Notice ext for JavaScript-capable clients, and use ESI only for the older ones. Varnish should be able to determine this based on the user agent. JavaScript does not have access to the response header's X-CS ID unless it was an AJAX call, so the client makes an AJAX HEAD or an empty valued GET request to get X-CS header, followed by a request to get carrier-specific JSON settings. We could even make a small API module to return only the needed HTML portion, without other settings like IP blocks (check if Varnish caches API calls). Both AJAX calls should be highly cacheable, hence only the HTML will be requested on subsequent calls.

Varnish

 * DONE: See related RT ticket


 * DONE (ACTUALLY MAKES AN API CALL FOR ONE SHOT) A (semi?-)automated script will iterate through all pages in Zero namespace, validate them, and extract the IP blocks.
 * DONE: A simple text file is created and uploaded to the servers
 * DONE: Varnish extension, similar to GeoIP, will load the config file and do an IP to ID lookup:
 * DONE (via netmapper invocation):  - fast search among IP ranges to get carrier's ID (name of the page in the Zero namespace)

Cache fragmentation improvement
See the Wikitech-l emails with ZERO Architecture in the subject line at and  for more background.

Analytics Mingle story 444 may be a means of measuring JavaScript support in UAs, perhaps with a   tracking pixel.

The Wikipedia Zero (Partners) software engineering team believes it can achieve performance and usability enhancements for the Wikipedia Zero experience with Edge Side Includes (ESI) and JavaScript DOM manipulation.

UPDATE 19-DECEMBER-2013 ESI support is in place in the ZeroRatedMobileAccess extension, but a bug is preventing use of this feature until Varnish is upgraded. Other than the banners, the HTML is the same for articles on URLs that are zero-rated by carriers (e.g., http://en.m.wikipedia.org/wiki/Main_Page has almost identical HTML for a carrier in Russia as compared with one in India). Results for things like "interstitial" URLs vary, though, and will continue to do so - these are generally located as URLs with Special:ZeroRatedMobileAccess (e.g., /wiki/Special:ZeroRatedMobileAccess?from=Main_Page&to=).

The problem
Wikipedia Zero webpages are served to users on mobile devices with participating mobile carriers. The number of Wikipedia Zero cached pages, in excess of non-Wikipedia Zero mobile-formatted pages, is roughly:

cached_pages = 0 foreach carrier c: cached_pages += c.one_or_two_subdomains_from_m_or_zero_subdomains * c.num_languages_supported

Carriers support Zero-rating of .zero.wikipedia.org, .m.wikipedia.org, or both. They also support up to ten customized free languages, otherwise they support all languages.

The amplification of cached pages means more hits at the origin servers than wanted, meaning slower loading pages for Wikipedia Zero users. Furthermore, the current page caching scheme employed via Wikipedia Zero introduces a challenge to differentiating such Wikipedia Zero cached pages from non-Wikipedia Zero cached pages - a problem when the Wikipedia Zero team wants to purge the cache to modify aspects of the Wikipedia Zero experience without impacting other aspects of the Wikipedia mobile-formatted experience.

Coarse grained page elements
Wikipedia Zero pages are visually composed of several elements. For simplicity, here's a coarse grained breakdown excluding general mobile-formatted navigational and other JavaScript or CSS pieces:

Special:ZeroRatedMobileAccess:
 * Partner banners
 * Listed high-priority languages ("showLangs" and "langNameOverrides")
 * Language dropdown list composed of hyperlinks that differ based on Zero-rated supported languages ("whitelistedLangs")

Articles:
 * Partner banners
 * Article body
 * Read in Another Language section
 * General purpose footer (Legal, Privacy, and so on)

Partner banners are different depending on language (the language code in the domain name dictates the site which yields the language code) and the carrier (X-CS header).

Listed high-priority languages are different depending on the carrier (X-CS header yields "showLangs" and "langNameOverrides") and the subdomain (X-Subdomain of M or ZERO).

Language dropdown lists are different depending on the carrier (X-CS header yields "whitelistedLangs") and the subdomain (X-Subdomain of M or ZERO).

Article bodies are different depending on the URL (the entire URL, which includes the language code in the domain name), the carrier (X-CS header yields "whitelistedLangs"), and the subdomain (X-Subdomain of M or ZERO).

Read in Another Language sections are different depending on the URL (the entire URL, which includes the language code in the domain name), the carrier (X-CS header yields "whitelistedLangs"), and the subdomain (X-Subdomain of M or ZERO).

General purpose footers (Legal, Privacy, and so on) are different depending on language (specifically, the language code in the domain name).

Cache duration
Currently, banners and footers can in effect end up being cached for up to 31 days on articles that have not been flushed from the Varnish cache. This is good for content that never changes, but is also inconvenient when things like banner wording actually need to be updated.

We don't see this as a problem if we can make cache flushes less painful.

Simplest to hardest
We recommended starting with the simplest possible conversion to ESI, and then, depending on the learning, converting more and more stuff to ESI in order of ascending complexity and risk. In the case of the actual articles, ESI may not be a good option, at least not yet, but use of a central redirector controller and JavaScript may serve as an equally good option.

In order to clearly distinguish Wikipedia Zero traffic from non-Wikipedia Zero traffic, an auxiliary header of X-WZ may need to be added within Varnish and used for coarse grained cache variance. Alternatively, the presence of an X-CS header may be a sufficient factor for basic cache variance differentiating Wikipedia Zero access from general mobile web access.

UPDATE 19-DECEMBER-2013: We ended up making the HTML identical across pages on mdot and across zerodot. Currently, only the banners are different, and once ESI is supported there wouldn't be caching on a per-carrier basis, but rather on a "is Wikipedia Zero supported basis. Special:ZeroRatedMobileAccess URLs will continue to be different cached objects on a per-carrier basis.

Footers
Footers vary on the language alone. An ESI fragment of the following form would be easiest: http:// .wikipedia.org/wiki/Special:ZeroRatedMobileAccess?generatefooter=please (or if it is easier due to existing hook architecture, http:// .m.wikipedia.org/wiki/Special:ZeroRatedMobileAccess?generatefooter=please ). The backing code would only need to identify the language of the request in order to generate a suitable footer.

Footers can retain their 31 day cache lifetime. In the case that a footer needs to be flushed from cache, it will be easiest to issue a cache purge for all objects matching /wiki/Special:ZeroRatedMobileAccess?generatefooter=please and let the banners naturally be re-populated as inbound requests are made.

UPDATE 19-DECEMBER-2013: We ended up making the HTML for footers identical across pages on mdot and across zerodot.

Language dropdown lists
Language dropdown lists vary on X-Subdomain and X-CS. Special:ZeroRatedMobileAccess will already have access to both headers, and therefore all information necessary for the ESI fragment is already present.

http://en.wikipedia.org/wiki/Special:ZeroRatedMobileAccess?generatelangdropdown=please

Headers...

X-CS:

X-Subdomain: 

Dropdowns can retain their 31 day cache lifetime. In the case that a dropdown needs to be flushed from cache, it will be easiest to issue a cache purge for all objects matching /wiki/Special:ZeroRatedMobileAccess?generatelangdropdown=please and let the dropdown footers naturally be re-populated as inbound requests are made.

As you'll see below, we may be able to actually simplify to one dropdown list, but this may still be a good place for use of ESI.

UPDATE 19-DECEMBER-2013: We ended up making the HTML for the dropdown list identical across pages on mdot and across zerodot.

Listed high priority languages
Listed high priority languages are similar to language dropdowns:

http://en.wikipedia.org/wiki/Special:ZeroRatedMobileAccess?generateshowedlangs=please

Headers...

X-CS:

X-Subdomain: 

Listed high priority languages can retain their 31 day cache lifetime. In the case that a high priority languages listing needs to be flushed from cache, it will be easiest to issue a cache purge for all objects matching /wiki/Special:ZeroRatedMobileAccess?generateshowedlangs=please and let the high priority languages listsings naturally be re-populated as inbound requests are made.

UPDATE 19-DECEMBER-2013: Special:ZeroRatedMobileAccess (this is the "landing page") will continue to vary based upon the carrier on which the user is accessing the site.

Partner banners
Partner banners vary on language and X-CS. Banners should be requested in an ESI fragment with a URL that includes the language code and which includes an X-CS header.

ZeroRatedMobileAccess-generated pages already have access to the X-CS header, and the will by definition also be known because of the enclosing page, and therefore all information necessary for the ESI fragment is already present.

http:// .wikipedia.org/wiki/Special:ZeroRatedMobileAccess?generatebanner=please

Header...

X-CS:

Banners can retain their 31 day cache lifetime. In the case that a banner needs to be flushed from cache, it will be easiest to issue a cache purge for all objects matching /wiki/Special:ZeroRatedMobileAccess?generatebanner=please and let the banners naturally be re-populated as inbound requests are made.

Articles and Read in Another Language Sections
Currently, articles and their corresponding Read in Another Language sections are rewritten to support users upgrading their experience to (1) be able to see an image while browsing on a zero.wikipedia.org domain, (2) switch from the text and icons-only view of a zero.wikipedia.org subdomain to the image-rich m.wikipedia.org subdomain experience (which may or may not be zero-rated at a carrier), and (3) to visit external sites.

The rules for the server's rewriting of s and hyperlinks in articles and Read in Another Language sections are based upon whether the carrier supports zero.wikipedia.org, m.wikipedia.org, or both, as well as the array of languages supported freely by the carrier. We believe we can reduce the cache fragmentation by transitioning to one canonical mobile article (and Read in Another Language section) per language, relying upon JavaScript to rewrite hyperlinks for upgrade transitions. But this does not address non-JavaScript, insufficient JavaScript, or JavaScript-disabled browsing and the corresponding cache fragmentation that would still be present with the current server code.

We believe the simplest solution to reducing cache fragmentation for article and Read in Another Language content is to convert all article and Read in Another Language links that do not point directly to a same-origin article (specifically, hyperlinks that are non-redirects are strictly relative paths) served to Wikipedia Zero-participating UAs to one consistent redirect URL endpoint. The redirect URL endpoint would issue a 302 for redirects (301s may inadvertently send a user to the wrong page if the 301 is cached per spec, given that Vary: headers may be misinterpreted by a UA or impossible for the UA to actually handle because it won't have access to all request data in the network stream between server nodes).

For example, instead of a relative link to ?renderZeroRatedRedirect=true&amp;returnto=http%3A%2F%2Fwww.mlb.com%2F to visit an external site, such a hyperlink would instead be set as /wiki/Special:ZeroRatedMobileAccess?redirect=http%3A%2F%2Fwww.mlb.com%2F. And a relative link such as "/w/index.php?title=Major_League_Baseball&amp;renderZeroRatedBanner=true&amp;renderZeroRatedRedirect=true&amp;returnto=%2F%2Fda.m.wikipedia.org%2Fwiki%2FMajor_League_Baseball would instead be written as /wiki/Special:ZeroRatedMobileAccess?redirect=%2F%2Fda.m.wikipedia.org%2Fwiki%2FMajor_League_Baseball. The redirect endpoint would examine the content of the redirect parameter, and depending on carrier support would render an appropriate dialogue for upgrading (or for links accessed outside of Wikipedia Zero networks, simply continuing) or immediately redirect the user to the same-origin resource. In case of the need to flush the redirect cache for Wikipedia Zero, requests of the form /wiki/Special:ZeroRatedMobileAccess?redirect could be flushed, and the cache could be repopulated as clicks are accumulated. Although the total number of cached objects could remain the same, the ability to flush the cache would be dramatically improved and the size of those cached redirect objects would be slight compared to full pages.

For user agents that support JavaScript-driven link rewriting, such user agents could rewrite such URLs at page runtime by:
 * 1) Making a request to an endpoint to obtain JSON data providing carrier preferences on free languages, M versus ZERO support, and so forth. This endpoint could be something to the effect of http://en.wikipedia.org/wiki/Special:ZeroRatedMobileAccess?generateprefs=please . The X-CS header augmentation from Varnish should be present for such requests, making the lookup straightforward.
 * 2) Using the received preferences to walk the DOM and rewrite hyperlinks and set onClick actions for appropriate dialogs.
 * 3) Taking this concept to its logical extreme, it's actually conceivable that for UAs sufficiently supporting JavaScript, the article and Read in Another Language page content could actually be identical to the non-Wikipedia Zero mobile-formatted experience, further reducing the mobile-formatted cache. This said, the extra work involved in trying to shoehorn Wikipedia Zero for JS UAs into the exact same experience as general mobile users may be a later stage change involving more complicated use of ESI fragments. And to be clear, it doesn't make sense to try to shoehorn all of the mobile-formatted experience into Wikipedia Zero (i.e., hyperlinks should not have redirect parameters in them outside of the Wikipedia Zero experience).

tags still pose a challenge for HTML rewrite behavior in that they must be rewritten as interstitial links for users on subdomains of zero.wikipedia.org. One potential option is to replace each tag with an tag (the solution could possibly be coded to use configuration to determine whether to emit an fragment or | ). This would shrink the object cache size, and the delta for number of objects would depend on the number of images across accessed articles; unfortunately, Varnish ESIs suffer performance impacts around 5 or or more objects per page. Another solution would involve Varnish rewriting tags on the fly for outbound responses for Wikipedia Zero; unfortunately, this is viewed as misuse of Varnish where PHP would do. The simplest option is to have different page HTML between m.wikipedia.org and zero.wikipedia.org (i.e., varying on X-Subdomain).

Article hyperlinks on the same domain don't require redirect behavior through URL rewriting.

By standardizing all of the other hyperlinks, though, the object cache size can be shrunk down drastically in a given language on the Wikipedia Zero experience.

UPDATE 19-DECEMBER-2013: We went ahead with this style.

Order of work and dependencies
DONE, but pending Varnish upgrade for full effect: The ESI fragment generation code and the server-based redirector endpoint code (/wiki/Special:ZeroRatedMobileAccess?redirect= can be worked on concurrently.

DONE: JavaScript rewriting of hyperlinks depends upon (1) the comprehensive change to redirect hyperlinks in HTML and (2) a well-functioning redirector endpoint.