Requests for comment/CentralNotice Caching Overhaul - Frontend Proxy

Overview
The current CentralNotice architecture is unsatisfactory for several reasons:
 * 1) It uses massively more cache objects than is desired (a worst case of about 200GB of on-disk usage)
 * 2) We do not load the banner until the ResourceLoader controller has loaded; which results in a page bump

The proposed solution to these issues is to have a static javascript blob in every content namespace page that will call up to a special CentralNotice proxy. This proxy will map the request (which may have a space size greater than 30M) into approximately 200 different mincut allocations in a lookup table.

The lookup table will have specific details about banners in that allocation which will allow, with the usage of a random number generator, a direct request by banner name to the backend varnish cache. This allows the backend varnish to much more elegantly handle objects as it only needs to cache a banner by three varying parameters .



The JSONP Call in HEAD
A small snippet of Javascript in the head of a page will call a new subdomain entitled banners.wikimedia.org (or similar). The request parameters will be a combination of static variables known at page generation time on a wiki and dynamic variables that will come from a cookie or local storage. Dynamic variables are required where the user state will have changed from the defaults on the wiki; aka they logged in, or they changed their UI language. It is also required so that we may bucket users for A/B testing. These variables are:
 * Static project name (e.g. wikipedia)
 * Static project content language
 * Static user login status (may be overriden by cookie)


 * Dynamic UI language
 * Dynamic bucket (1 thru 4 currently)
 * Dynamic user login status

Additional Variables added by Proxy
CentralNotice needs additional variables in order to know what banner is suitable to serve to a user. These are unique per client and should initially be calculated on the proxy and then passed down for usage in a dynamic cookie. These are:
 * The user's device (e.g. desktop, iphone, android, ...)
 * The user's location as determined by IP address

Proxy Mincut Mapping
CentralNotice will routinely produce a mincut map of banner allocations and distribute them to the proxy servers. In the current plan this will be a series of three sets of tables, several offset lookup tables, a map table, and several map line->banner entry tables. Presuming that the proxy server is a node.js server string lookups will probably be the most efficient map format and the lookup would look something like:


 * 1) For each variable, determine the map string offset
 * 2) In each map line, check if offset is set, if so keep map line in future lookups
 * 3) Repeat until you have gone through all variables or map lines; there should be either 1 or no viable map lines

Simplified example:

Random Banner Choice
Using a random number generator in the proxy we select a random banner and then request that from the backend proxy. We optionally composite the request (e.g. feeding back the detected country/device) with the returned banner which is set in a JavaScript variable ready for consumption by CentralNotice.

CentralNotice Provided Data
Ideally all dynamic data will be provided by CentralNotice in a single JSON blob that will be updated every time there is an allocation change (e.g. banner added/removed from campaign, campaign enabled/disabled.) The blobs will come with an expiry date and they should request the data from meta.wm.o once the expiry date is passed. The proxy's should also be able to accept a purge command will will then have the proxies rerequest the data from meta.

Device Detection Regex
Currently we have this in a ResourceLoader JS file; this could easily be moved into a JSON blob for distribution. Basically we currently look for easy strings like 'iphone' or 'android' in the UA string and call it that.

Mincut Data
We only currently provide a somewhat buggy version of this currently which we can debug and place into a JSON file. See an example [//meta.wikimedia.org/wiki/Special:GlobalAllocation here].

Reporting
Right now we are using Udp2Log and a null endpoint to detect what banner was actually shown to the user. The endpoint is called after the banner controller has run. This will remain unchanged and will be even more important to have independent of the delivery mechanism in case JS is disabled on the client (we don't want false statistics.)

We should however, eventually migrate to event logging; but that's not in the immediate scope of this work.

Software
We can either use Node.JS or Varnish as the proxy. Because I think this will be simpiler to implement in Node I'd prefer to start there as the proxy. However, if the performance is too bad I can certainly write VCL to do the same (but instead of distrbuting JSON blobs it'll probably be XML because Expat is what I'm familiar with in C).

Hardware
I have estimated that we see ~600Mbps peak CentralNotice traffic (~6,500 requests per second) based on current banner requests served and average banner size. Given that this requires redundancy and would greatly benefit from being located in caching centers; I estimate 4 servers (2 in eqiad, 2 in ams) with dual gigabit cards (and 16GB of RAM if we do on board varnish) would easily be able to handle the load.