Requests for comment/Simplify thumbnail cache


 * TODO
 * Add more data/options from meeting notes
 * Send link to Aaron & Faidon
 * Move to RFC namespace proper
 * Announce on wikitech-l, etc

Simplify media thumbnail cache and storage operations to increase reliability


 * Streamline varnish cache purge process
 * Remove need to enumerate all generated thumbs
 * Reduce disk storage footprint of thumbnails

Things we are not happy about:


 * Issuing varnish/squid purge messages from php in response to media file change or deletion requires enumerating all potentially cached thumbs
 * Lots and lots of varnish purge messages may be needed to clean up the thumbs for a given media delete
 * Swift has been configured somewhat awkwardly to support wildcard listing of stored thumbs for enumeration
 * Thumbs take up 60% (FIXME: triple check number with Faidon) of the on disk storage footprint in swift
 * PHP layer has extra complexity to hash thumb path into right swift collection (FIXME: proper term?)

could also discuss hashed image urls and versioned image urls as other aspects/solutions for the same problem.

Prop #1
Treat thumbs as a CDN only concern.
 * 1) Configure varnish so that a single purge message drops all variants of a given media file's thumbnails
 * 2) Stop storing generated thumbnails in swift
 * 3) (re)Generate individual thumbs in real-time in response to cache misses

Pros:

 * Only one htcp purge message needed
 * Simplifies php code by removing a list generation and traversal
 * Reduces swift load by eliminating wildcard enumeration request
 * No need to delete superceeded thumb files from swift
 * Reduces swift load
 * Removes a potential point of failure for a delete/move operation on the base file
 * Lots of disk reclaimed from swift
 * Reduces hardware cost of swift cluster
 * Reduces maintenance cost of swift cluster

Cons:

 * Increased utilization of image scalers
 * Faidon estimates that image scaler jobs would grow from current ~50/s to ~300/s to handle request volume
 * Increased latency for CDN misses
 * The ~250/s requests that are currently satisfied by swift fetches of generated thumbs would instead require a fetch of the original media and a scaling transformation