Requests for comment/Simplify thumbnail cache

This is a request for comment about changing the thumbnail storage and caching pipeline for Wikimedia projects.

Background
There is a significant amount of complexity both for software developers and operations engineers related to the management of scaled media files (thumbnails) in the Wikimedia projects. The current implementation tightly couples backend storage with frontend caching somewhat to the detriment of both systems. This topic has been discussed in the past but as yet has no resolution.

Problem

 * Issuing HTCP purge messages from PHP in response to media file change or deletion requires enumerating all potentially cached thumbnails
 * Lots and lots of HTCP purge messages may be needed to clean up the thumbnails for a given media delete, to the point that the end-user request may timeout
 * HTCP purges (?action=purge) are not idempotent; objects get deleted and the purges for thumbnails cannot be repeated to e.g. recover from multicast packet loss.
 * Thumbnails of all sizes are stored forever, until the original gets deleted. Unused & rarely used thumbnails are never cleaned up.
 * Swift has been configured somewhat awkwardly to support wildcard listing of stored thumbnails for enumeration
 * Swift has an extra layer of complexity to handle 404s in thumbs in a special & fragile way to fetch from imagescalers
 * Thumbnails take up 25-30% of the on disk storage footprint in Swift

Proposed solution
Treat thumbnails as a CDN only concern

Rather than treating thumbnails as first class assets that must be stored permanently and durably treat them as a temporary work product that can be recreated as needed.


 * 1) Configure Varnish so that a single purge message drops all variants of a given media file's thumbnails
 * 2) Add a group of Varnish servers backed by spinning disk to store thumbnails in a large cache with TTL + LRU eviction
 * 3) Configure Varnish to pattern-match thumbnails and switch backend from Swift to imagescalers
 * 4) Configure MediaWiki imagescalers to stop storing generated thumbnails in Swift
 * 5) Generate individual thumbs in real-time in response to cache misses

The implementation would be rolled out in a phased manner with multi-page media types such as multi-page TIFF, DjVu and PDF omitted from the initial deployment. Some media types may still need durable storage of generated thumbnails. TimedMediaHandler for example uses reference thumbnails to render other thumbnails at the same time of a video and should continue storing these thumbnails somewhere in swift.

Benefits

 * Only one HTCP purge message needed
 * Simplifies php code by removing a list generation and traversal
 * Reduces Swift load, I/O pressure & hardware cost significantly by eliminating wildcard enumeration requests
 * No need to delete superseded thumbnail files from swift
 * Reduces Swift I/O load significantly
 * Removes a potential point of failure for a delete/move operation on the base file
 * Lots of disk reclaimed from swift
 * Reduces hardware cost of Swift clusters
 * Reduces maintenance cost of Swift clusters

Drawbacks

 * Use of  in this way is untested and thus carries unknown risks
 * Varnish currently tracks items mapped to the same hash key in a linked list . This could become a bottleneck for media such as multi-page TIFF, DjVu or PDF files that have page variants as well as size variants. Research would be needed to determine a reasonable upper limit for variants to collapse into a single hash and/or find a more efficient data structure to implement in Varnish itself.
 * Varnish 4 may include surrogate key/secondary hashing (Fastly & Varnish were working independently on this) but while the release is imminent, deployment at Wikimedia is probably months away.
 * Increased utilization of image scalers
 * Amount of increase currently unknown and dependent on the amount of additional storage added to the Varnish cluster. Faidon estimated that image scaler jobs would grow from current ~75/s (avg) & ~110/s (max) to ~500-950/s to handle request volume with the current size of the Varnish cache.
 * Increased latency for CDN misses
 * The requests that are currently satisfied by Swift fetches of generated thumbnails would instead require a fetch of the original media and a scaling transformation.
 * May not be reasonable for media types that have high thumbnail generation costs or a potentially huge number of thumbnails
 * Specifically, there are: a) multiple multi-page TIFF, PDF and DjVu files that tend to have a huge number of thumbnails, b) photographs or paintings, mostly TIFF, that are hundreds of megabytes large.
 * Even with the current approach image scalers and swift can be DoS'ed by a single visit to Special:NewFiles on Commons, 65217.
 * Reduced hardware failure tolerance
 * Swift keeps 3 copies of each thumb distributed across the storage cluster to provide HA access to stored files.
 * Varnish uses URL persistent hashing to ensure the same backend Varnish is hit every time resulting in a single node holding a thumb in cache. This makes storage in Varnish more susceptible to hardware failures, which in turn will mean increased imagescaler load with all of the above drawbacks.
 * The Varnish cluster in eqiad has 8 boxes, which means that ~12.5% of the cache resides in each one of them.

Variations

 * Increase size of existing backend Varnish storage
 * Rather than add a new tier of Varnish servers just increase the persistent storage capacity of the existing servers but adding more SSD drives and/or servers. This would likely have similar hardware costs to the solution of adding an additional tier of Varnish servers but would reduce operational complexity by horizontally scaling an existing tier instead of introducing a new one.