Requests for comment/Data-driven Zero Varnish configuration

This RFC might be superseded by Unfragmented ZERO RFC

In order to remove all the carrier-specific settings from Varnish config files, we propose to use netmapper plugin to map client IP to a magic string. That string will contain all information required to identify zero traffic.

In the current Varnish code, netmapper maps IP to the carrier ID (X-CS), followed by carrier-specific validation, and if it all passes, the X-CS header is set:

We propose to alter this system to remove all carrier-specific code from Varnish:

Various carrier configurations

 * How carriers identify traffic


 * IP-based - carrier whitelists the entire Wikimedia's IP range, implies HTTPS support
 * IP-based, no images - carrier whitelists all IPs except upload.wikimedia.org (legacy contracts only), only allows zero.wikipedia.org (unless we implement a no-image m.), implies HTTPS support
 * URL-based: all languages or list of languages, both m. & zero. or just one of them - carrier does DPI to whitelist matching traffic


 * Connections


 * Multiple direct gateways (several ranges of originating carrier IP addresses, each possibly having different settings)
 * Opera Mini
 * Nokia and other proxies
 * Some carriers support HTTPS - could be either for direct gateways (must whitelist IPs, not URLs), or via a custom browser+proxy, e.g. Opera Mini which can whitelist URLs while also support HTTPs

Exposing configuration via API
API returns results as a mapping between a set of IPs and a magic string that contains all data required to make a decision if a given request is Zero or not. There could be many forms of the magic string, and its format is the biggest unknown at this point, as it has to on one hand cover every possible usage scenario, and on the other be easily parsable and concise.

Magic string format is a set of substrings separated with a space character. "  <...>"
 *  in format 250-99
 * proxy - either empty string for direct connection, or the name of the proxy, e.g. "Opera"
 * ssl - ether empty string for non-ssl connection, or the keyword "ssl" if this connection could come via HTTPS
 * domain+languages is a set of comma-separated strings, one for each allowed language+domain pair (or a wildcard). It could be one or more of the following values:
 * "*" - all languages on all domains (ip-whitelisting). Note that this implies desktop as well as sister sites (wikiquotes, wikibooks, etc), hence header may be set on everything. If given, must be the only value present.
 * "*.m" - all languages on m.wikipedia.org domain. If given, no specific languages in .m domain should exist.
 * "*.zero" - all languages on zero.wikipedia.org domain. If given, no specific languages in .zero domain should exist.
 * "lang.subdomain" - specific language in wikipedia.org domain, such as "en.m,fr.m,en.zero,fr.zero" would whitelist 2 languages in both m and zero.

Varnish implementation
Sudo-logic for implementing above specs in Varnish: