Requests for comment/Partial page caching

This is a request for comment about adding support for partial page caching (a.k.a. edge side includes). Please leave your comments below.

Proposal
Both Varnish and Squid support edge side includes (ESI). This proposal is to implement support for ESI within the MediaWiki software so that we can take advantage of this caching option where needed.

If implemented, Mediawiki could send out sections of a page like so:



This would allow the included content to be cached separately from the rest of the page. In other words, dynamic content could be included in a page without invalidating the cache of the rest of the page. This would let us do lots of cool things like:
 * Dynamically changing content based on the user's location
 * Dynamically changing content based on the user's browser (which would allow us to take advantage of more HTML5 features, as well as client-side SVG rendering)
 * Instant banner loading for CentralNotice
 * Partially cache pages for logged in users to optimize page loading

Bugzilla 

Current Status in MediaWiki
The mobile team has been having great success with using ESI in Varnish via the mobile frontend.

There is also partial support for ESI already present in MediaWiki. The OutputPage object has awareness of the 'Surrogate-Control' header, and resource loader has an experimental ESI mode.

Moving Forward
As Fundraising understands it right now, Squid is unable to deliver ESI content with the performance required for our normal operations. However, we also understand that there are plans to move the entire site over to Varnish. Fundraising proposes to create the features needed by us as indicated in the requirements as a sort of bootstrap for the main site -- i.e. Fundraising creates the proof of concept code (which will deliver banners through bits.wikimedia.org -- a varnish cluster) -- which then will be evaluated for full site deployment/migration.

Requirements

 * As few modifications to varnish core as possible
 * Try to make this as generic as possibly so that VCL almost never has to be updated

To this end, I'm imagining a situation where
 * We seem to need to resurrect X-Vary-Options for cookies and headers
 * A loop will be provided (via C extension in the VCL) that reads the cookie values and adds them to the vary string


 * I also see a need for the ability to vary on specific get params (as opposed to creating a top level cache object for every URL) -- that way we can purge an entire subgroup
 * so, in the query string have &getvary&...&endgetvary and anything in betwean is varied on
 * also potentially after the &endgetvary everything is a comment/nop field?
 * This would be used for analytics purposes; ie: instead of having to add yet another header/udplog field.


 * We would also have the ability to replace known GET string options with dynamic values, currently:
 * country=ESI - would be replaced by the GeoIP lookup ISO country code
 * randval=ESI-# (where # is a number) - would be replaced by a random number between 1 and #.

CentralNotice
CentralNotice serves content that varies on
 * Country (Served by GeoIP)
 * Project (JS/Static?)
 * Language (JS/Cookie?)
 * Logged In/Out (Cookie option)
 * Bucket (Cookie option)
 * Slot (Random number)

We will also shortly be varying on
 * Carrier (Mobile front end header)
 * Device Class (Mobile front end header)

Ideally under varnish we would have an ESI fragment that would get a JSON object containing the banner and then the controller would simply make use of this without having to make another server side call.

Suggested API

 * In the OutputPage object we add a new method includeFragment which will either inject the ESI code, or the actual content if ESI is not available. This function would merely be accepting a HTTP resource link (or title/link object) to something that could provide partial content.
 * Any vary options provided by the fragment would then have to be included in the page vary options if ESI is not available
 * We would also provide a new abstract class off of SpecialPage called FragmentProvider which would know how to to set the vary options correctly and interpret the special GET options and know how to recurse down the include path.

Speed

 * How long does Varnish take to serve an ESI page?
 * An ESI page with options?
 * I'm thinking that to improve performance we will also take advantage of Varnish's ability to serve stale content whilst it's fetching new backend content, aka grace mode . A reasonable timeout here seems to be 60 seconds.

Useful Links

 * formal standard by the W3C
 * EnWiki page on ESI
 * Varnish documentation

Comments
Copied from email: There's a lot of definite pluses there... I think Gabriel Wicke actually helped with some preliminary ESI work back in '04 or '05 that we never quite were ready to deploy, I don't know if any of that infrastructure is still around. --Brion Vibber

Copied from email: We've had this idea basically forever, but never really implemented it. Squid supports ESI. I think Gabriel Wicke (now on our staff) has played with it at the time... in 2004 or so? Anyway, Varnish supports ESI too, but allegedly it's limited. Artur Bergman/Wikia has played with it too, but not sure how far they got. Perhaps it's more feasible with current day MediaWiki too. --Mark Bergsma

I support the principle, but some planning would have to happen before people start haphazardly implementing this. There are pros and cons to using ESI that differ on a case-by-case basis, so I can only really comment in a useful way on specific proposals. --Catrope 09:55, 24 November 2011 (UTC)