Requests for comment/Unified Zero design

Background
Zero team would like to unify HTML served to all Zero partners's users by replacing HTML banner with an image banner. The image served will be different based on the carrier network. Newer JavaScript-capable browsers will replace image with carrier-specific HTML.

Please review Requests_for_comment/Zero_Architecture and Requests_for_comment/Data-driven_Zero_Varnish_Configuration to become familiar with the challenge.

Problem
Wikipedia Zero webpages are served to users on mobile devices with participating mobile carriers. The number of Wikipedia Zero cached pages, in excess of non-Wikipedia Zero mobile-formatted pages, is roughly:

cached_pages = 0 foreach carrier c: cached_pages += c.one_or_two_subdomains_from_m_or_zero_subdomains * c.num_languages_supported

Carriers support Zero-rating of .zero.wikipedia.org, .m.wikipedia.org, or both. They also support up to ten customized free languages, otherwise they support all languages.

The amplification of cached pages means more hits at the origin servers than wanted, meaning slower loading pages for Wikipedia Zero users. Furthermore, the current page caching scheme employed via Wikipedia Zero introduces a challenge to differentiating such Wikipedia Zero cached pages from non-Wikipedia Zero cached pages - a problem when the Wikipedia Zero team wants to purge the cache to modify aspects of the Wikipedia Zero experience without impacting other aspects of the Wikipedia mobile-formatted experience.

Banner image generation
Each request that comes from a Zero partner will get the following HTML at the top:

No-script banner image notes:
 * NoScript banner is rendered as a small, non-customizable image for the specific carrier in format "free from ". The &lt;img> tag will not set height or width.
 * For MDOT, if the banner should not be shown for the specific request (e.g. this language is not whitelisted), returns 1px x 1px
 * ZERODOT request on partner network:
 * banner response is RED WARNING if lang.zero not supported; the article content still comes back, though
 * otherwise, it's the normal banner
 * ZERODOT request on non-partner network shows UNCACHED error with the IP address

JavaScript banner notes:
 * API will return a  javascript code snippet which will render partner-specific banner if required.


 * TBD: should we wrap &lt;span> tag with &lt;a> tag? Or should javascript that runs on document complete inject onclick event handler for "zero-rated-banner-text" id? Or should we do onclick for "zero-rated-banner" id to allow clicks anywhere on the banner. Would that interfere with clicking the dismiss button?

Varnish logic
For all mobile traffic (both ZERODOT & MDOT), set X-CS2 if from a carrier network (already implemented):
 * if HTTPS, use top value of X-Forwarded-For
 * If ip matches ANY proxy, use top value of X-Forwarded-For
 * If ip matches ANY carrier, set req.http.X-CS2 = carrier ID


 * In vcl_deliver, append req.http.X-CS2 to resp.http.X-Analytics
 * All m. & zero. results are varied on X-CS, X-SUBDOMAIN headers

TBD

 * image library i18n exotic character support
 * Image width/height JS/CSS and old HTML support
 * For smartphones, what is the best way to override banner just-in-time
 * Avoid FOUC - Flash of Unstyled Content
 * Prevent downloading of unneeded banner
 * Acceptable stats deviations from reality
 * Acceptability of "one" banner format per language
 * If on ZERODOT okay to show article content, BUT have red banner. This is still relatively low bandwidth and should be a rare occurrence.
 * Any impacts on existing app APIs? Don't think so off top of head, but need to check

Analytics
Since we are switching from accurate tagging of only applicable traffic for whitelisted domains to tagging all of the carrier's traffic, X-ANALYTICS will contain an X-CS value more frequently. This means that we will not look at the total M. + ZERO. traffic, but rather look at what the actual carrier is whitelisting. Since ZERO. and .ZERO. AND .M. is negligable for most carriers, the number change is negligable as well. If carrier only whitelists ZERO, we will need to only look at the ZERO subgraph, not the total. Also, if carrier only whitelists common languages in M., the graph will also be slightly inflated by non-whitelisted M. languages; but since they are not frequently used, the difference should not have high impact.. Alternatively, could the MapReduce routines could be coded to only count eligible subdomain traffic?