Topic on Talk:Requests for comment/CentralNotice Caching Overhaul - Frontend Proxy

IRC meeting 2013-10-02

1
Tim Starling (talkcontribs)

<gwicke> mwalker: re load, did you do further benchmarking on real hardware?
<mwalker> gwicke: no; not yet -- more on that later
<gwicke> k
<TimStarling> I am reading the CentralNotice one, I haven't seen it before
<TimStarling> btw RFCs should generally be subpages of https://www.mediawiki.org/wiki/Requests_for_comment/
<TimStarling> otherwise my scripts will get confused
<mwalker> yep yep - I started it as a brainstorming page on the extension
<mwalker> I can add a redirect to it -- would that make things work better?
<Elsie> Instead of moving it?
<TimStarling> and also having RFCs there makes sure that the page is clear about its purpose -- i.e. definitely an RFC rather than some other kind of design page
<mwalker> Elsie: I'm not opposed to moving it -- but I dont have those rights
<legoktm> mwalker: anyone can move a page
<Elsie> It's a wiki, man.
<legoktm> (if not, someone will just give you sysop rights)
<mwalker> oh hey -- moving is allows -- don't know why I thought it wasnt
<TimStarling> so there will be a script tag in the header that provides a JSON blob
<TimStarling> how does the data from the JSON blob get into the actual page content?
<mwalker> the rest of the CentralNotice JS will be delivered via resourceloader
<TimStarling> but that part will be semi-static JS?
<mwalker> but having the banner content already available reduces the amount of time it takes to display -- and will also get rid of a round trip (to get the geoiplookup)
<Elsie> Where does the 200GB figure come from?
<mwalker> "but that part will be semi-static JS?" -- yes; the bit in the head will be as small, simple, and static as I can make it
<mwalker> "Where does the 200GB figure come from?" CN has a potential space of all projects, languages, countries, user states, buckets, and slots -- which comes to a large number which is then multiplied by the average size of a fundraising banner and varnish overhead
<mwalker> *trying to find my worksheet on that now
<TimStarling> presumably some of those dimensions would have to be fairly large
<TimStarling> is that what you mean by "worst case", that they are all as large as possible?
<mwalker> yes
<TimStarling> e.g. buckets and slots, there are not always a lot of those, right?
<mark> i wonder how many we've actually got cached right now
<mwalker> so right now the space is 14 projects * ~300 languages * ~200 countries * 3 device types * 30 slots * 4 buckets * 2 uesr states
<mark> not entirely trivial to figure out though
<mwalker> mark: most of them :(
<mwalker> the timeout is 15minutes
<mwalker> and for everything but wikipedia they're empty
<TimStarling> ok
<gwicke> mwalker: I believe that your performance estimation for node might be about right, but it would definitely be good to establish a baseline on a real machine
<gwicke> I'm getting about 7k req/s on my laptop with a trivial http server
<TimStarling> so what is the reason for using a separate domain name?
<TimStarling> connection setup is expensive
<mark> so, we don't need to do that
<mark> in the discussion page I argue it shouldn't be a separate (node.js) server, but should probably just use varnish and be a backend to or a plugin of that
<mark> and then we also have the option to do this on one of the existing host names/clusters
<TimStarling> it seems pretty similar to the mobile varnish stuff
<TimStarling> the problem is that it is dynamic in various annoying ways, right?
<TimStarling> and we want to resolve that to some smaller number of cacheable objects
<mwalker> TimStarling: it was a naive suggestion thinking that I could separate the infrastructure entirely from the rest of the site so that if it goes down nothing else suffers
<mwalker> but I agree with mark that bits varnish could 'pass' a request to bits.wm.o/banners or something to a backend
<MaxSem> separate LB and varnish boxes?
<mwalker> TimStarling: and yes; your summary of the problem is correct
<TimStarling> have you considered redirecting?
<mwalker> it costs a round trip; and additionally still has to be cached
<mwalker> the round trip is important because something a lot of users complain about is the 'page bump' that happens when a banner loads
<TimStarling> I mean, have the dynamic part redirect to the static part
<mark> ESI is one of the options on the table for that
<TimStarling> presumably the way to avoid a page bump is to load early
<TimStarling> right?
<mwalker> ya; that's the thought process
<TimStarling> then you could use document.write()
<TimStarling> like advertising, advertising usually uses document.write(), doesn't it?
<mwalker> potentially yes -- banners are fairly dynamic though so it's not the best solution
<TimStarling> what is the difference between this and what you have proposed?
<mwalker> not much; in my original proposal I had the proxy being the machine LVS directed to; in marks suggestion LVS directs to bits.wm.o which will vcl_pass to a backend
<mark> bits or anything else
<mark> can be a separate cluster, but doesn't have to be
<TimStarling> you are saying that JS in the head will fetch some dynamic data
<mwalker> yes
<TimStarling> then a script on the client side will use this dynamic data to form a URL to request some static data
<TimStarling> then that static data will be used by another script to actually display the HTML
<TimStarling> is that fair?
<mwalker> no; everything dynamic is fetched in the first call; which is then used by a resourceloader script to actually display
<TimStarling> so the banner HTML is delivered in the first call?
<mwalker> yes -- that's the plan
<TimStarling> ok, I agree with mark that doing this in varnish would be better than doing it in node.js
<TimStarling> I think a node.js frontend server would add quite a lot of complexity
<mwalker> do you have thoughts on the backend technology?
<gwicke> I see not reason why varnish can't do the front-end, with all requests being forwarded to a node backend
<mwalker> if the backend should be written as a VMOD to eventually move into the frontend -- or if it should be a node server?
<gwicke> should keep the backend simple
<mark> I think I prefer to have it as a backend
<TimStarling> I think we should continue this on the talk page
<mark> there's not much reason to integrate it into varnish itself
<mark> other than that it needs to see every request
<mwalker> ok -- I will update the RfC and we can continue with other topics
<mark> varnish can't cache anything there

Reply to "IRC meeting 2013-10-02"