Requests for comment/Unified Zero design

Background
In order to significantly reduce varnish fragmentation and reduce the complexity, Zero team would like to unify HTML served to all Zero partners's users. The banner will be replaced by the &lt;script> tag to include a dynamically generated, carrier-specific document.write banner, or for the no-script browsers, a dynamically created GIF image.

Previous discussion and options were discussed at Requests_for_comment/Zero_Architecture and Requests_for_comment/Data-driven_Zero_Varnish_Configuration.

Problem
Wikipedia Zero webpages are served to users on mobile devices with participating mobile carriers. The number of Wikipedia Zero cached pages, in excess of non-Wikipedia Zero mobile-formatted pages, is roughly:

cached_pages = 0 foreach carrier c: cached_pages += c.one_or_two_subdomains_from_m_or_zero_subdomains * c.num_languages_supported

Carriers support Zero-rating of .zero.wikipedia.org, .m.wikipedia.org, or both. They also support up to ten customized free languages, otherwise they support all languages.

The amplification of cached pages means more hits at the origin servers than wanted, meaning slower loading pages for Wikipedia Zero users. Furthermore, the current page caching scheme employed via Wikipedia Zero introduces a challenge to differentiating such Wikipedia Zero cached pages from non-Wikipedia Zero cached pages - a problem when the Wikipedia Zero team wants to purge the cache to modify aspects of the Wikipedia Zero experience without impacting other aspects of the Wikipedia mobile-formatted experience.

Banner generation
All HTML to the Zero partners will contain this at the top:

No-script banners
 * NoScript banner is rendered as a small, non-customizable image for the specific carrier in format "free from ". The &lt;img> tag will not set height or width.
 * For MDOT, if the banner should not be shown for the specific request (e.g. this language is not whitelisted), returns 1px x 1px
 * ZERODOT request on partner network:
 * banner response is RED WARNING if lang.zero not supported; the article content still comes back, though
 * otherwise, it's the normal banner
 * ZERODOT request on non-partner network shows UNCACHED error with the IP address

JavaScript banners
 * API will return a  javascript code snippet which will render partner-specific banner if required.


 * TBD: should we wrap &lt;span> tag with &lt;a> tag? Or should javascript that runs on document complete inject onclick event handler for "zero-rated-banner-text" id? Or should we do onclick for "zero-rated-banner" id to allow clicks anywhere on the banner. Would that interfere with clicking the dismiss button?

Varnish logic
For all mobile traffic (both ZERODOT & MDOT), set X-CS2 if from a carrier network (already implemented):
 * if HTTPS, use top value of X-Forwarded-For
 * If ip matches ANY proxy, use top value of X-Forwarded-For
 * If ip matches ANY carrier, set req.http.X-CS2 = carrier ID


 * In vcl_deliver, append req.http.X-CS2 to resp.http.X-Analytics
 * All traffic will continue to vary on X-CS, X-SUBDOMAIN, language, and the URL's path

Note that we are redefining the meaning of the X-CS - it used to mean carrier's ID for all valid zero traffic, or unset if coming from a non-zero source, or if the carrier did not explicitly whitelist it. With this change, X-CS will always be set if the traffic is coming from the zero carrier. On the other hand, the value of X-CS will be set to ID only for the calls to the zeroconfig API and to the Zero special page. For all other requests, X-CS will be a boolean on/off flag ("1" or unset), thus greatly reducing cache fragmentation.

TBD

 * image library i18n exotic character support
 * Image width/height JS/CSS and old HTML support
 * For smartphones, what is the best way to override banner just-in-time
 * Avoid FOUC - Flash of Unstyled Content
 * Prevent downloading of unneeded banner
 * Acceptable stats deviations from reality
 * Acceptability of "one" banner format per language
 * If on ZERODOT okay to show article content, BUT have red banner. This is still relatively low bandwidth and should be a rare occurrence.
 * Any impacts on existing app APIs? Don't think so off top of head, but need to check

Analytics
Current Zero analytics is based on presence/absence of the X-CS ID in the X-Analytics response header. Since we are switching from the accurate tagging of valid Zero traffic to tagging all of the carrier's traffic, X-Analytics will contain an X-CS value more frequently. This would include m.wikipedia.org and zero.wikipedia.org traffic for all languages, arriving via direct, proxied (to date, Opera and Nokia), and HTTPS. If the carrier is not zero-rating any of it, we would be overcounting.

To prevent this, we could leave all of the complex Varnish per-carrier logic intact so that X-Analytics header remains accurate. Varnish will be cleaned up once analytics is capable of further filtering (MapReduce routines?). This will give us a smooth migration path to the new design without impacting statistics numbers. This also means that in case zero team updates configuration but forgets to make corresponding changes in Varnish, statistics might be incorrect while the site's functionality will be correct.

Analytics API
To simplify how analytics post-process traffic, Zero will provide an API to present historical configurations without IPs in a consise manner.

There could be multiple configurations per carrier ID. For example, carrier could have a WAP gateway that has a separate set of external IPs, which would only support a subset of functionality - like HTTP only, or only specific languages. To accomodate that, in addition to X-CS (carrier's ID), Varnish will append another, optional value in the X-Analytics: zerosfx=a, where 'a' is the suffix to be appended to the X-CS in order to look it up in the following API response. When producing graphs, we might want separate graphs per suffix, but mostly we would want the summary graphs where all suffixes are combined.

Lastly, please note that it is possible for the X-Analytics to contain X-CS, while the API to not have a valid configuration for that time frame. If that's the case, analytics should ignore that X-CS and treat it as a regular, non-zero traffic.