Requests for comment/Standardized thumbnails sizes

Wikitext editors can have media materials rendered using any arbitrary size (the thumbnails wikitext syntax). Each different size would produce a hit to the server backend which will generate a thumbnail for that size. This requests for comment is about having a consistent set of thumbnails that are allowed to be rendered and let the client browser to do the up or down scaling.

Context
Our thumbnails system could use some improvements to better scale out and stop generating any possible thumbnail size our user might requests. For the context, the size vary according to:


 * 1) MediaWiki default (220px on most Wikimedia wikis)
 * 2) per user preference picked from the list set by $wgThumbLimits
 * 3) Wikitext thumb width parameter (which can be anything)

The default value for $wgThumbLimits is :

$wgThumbLimits = array(       120,        150,        180,        200,        250,        300    );

A global default is set for users, they can override the preference by picking one of the width from the $wgThumbLimits array.

Problem
The thumb width parameter in wiki text is free form. When a user chooses a 119-pixel width, a request is made to the server for it to produce that thumbnail size. Rendering the thumbnail is CPU-intensive, especially when rendering large images. That also means that the thumbnails are hard to cache properly since we have to cache a copy of each of the sizes requested.

Mark Bergsma, Wikimedia caching expert, has extracted a list of thumbnail sizes requested on Wikimedia's thumbnail cache: /thumb sizes requested.

Serving only closest match
We could instead send the closest match (ex: 220 pixels wide) and instruct the browser to scale it down to the requested format. The client scaling can be done by passing the  attribute to an   element.

To achieve that, we should choose some preferred number series so each requested size is always within X% of the available size. The Wikipedia article Preferred number list several guidelines to create such a series. We would want to use such a series based on power of 2 and make sure that the sizes are dividable by 16 due to the way that most image compression algorithms work.

Icons in templates and gadget interfaces (e.g. the ~ 16px icon for featured articles in the langlinks list in the sidebar and the various templates positioned on the top right of wiki pages, such as Coordinates, Good/Featured article, Wiktionary and what not). So we'd need a small size as well.

Anything used in CSS. Because there there is no width or size restriction. If the image is not served with the size specified in the url, it will have consequences that are unacceptable for the user interface[1]. We'd need a long migration path to make sure all those cases are migrated to one of the standard sizes, which can be very hard as it would basically require gadgets to redesign their interface completely.

Other strategies
A possible solution to the second part of the problem (i.e. "the thumbnails are hard to cache properly since we have to cache a copy of each of the sizes") could be a better caching strategy than simply refusing to cache non-standard sizes.

Potential thumbnail strategies:


 * Serving
 * (A) serve arbitrary sizes as requested;
 * (B) serve only preferred sizes (rounding non-standard sizes according to some algorithm).


 * Storing
 * (1) store every requested thumbnail indefinitely;
 * (2) store only preferred-size thumbnails indefinitely.


 * Caching
 * (i) retain indefinitely (ad hoc deletion);
 * (ii) periodically delete the longest-unserved thumbnails (LRU), regenerating and recaching them if re-requested.

The status quo is A.1.i. The proposal seems to be to switch to B.2.i or possibly A.2.i (I think Brion implies that A.2 is "Tim's position").

But to prevent excessive thumbnail file storage, all that is needed is a better caching strategy. "Least Recently Used" seems like a sound but simple and non-disruptive basis for predicting which thumbnails will not be requested again.

So, instead of inconveniencing users by switching serving strategy or storing strategy, why not just switch caching strategy from (i) to (ii)?

Preferred numbers
There are statistics on the most common thumbnail sizes as of February 2016.

Based on some power of 2: 16, 32, 48, 64, 96, 128, 192, 256, 384, 512, 768, 1024, 1536, 2048

Multiples of 80: 80, 160, 240, 320, 400, 480, 560, 640, 720, 800, 880, 960, 1040, 1120, 1200, 1280, 1360, 1440, 1520 ...

Maybe something a little less linear (based on 20): 80, 220, 380, 560, 760, 980, 1220, 1480, 1760, 2060, 2380, 2720, 3080 80, 220, 440, 760, 1200, 1780, 2520

Or based on 16: 16, 64, 160, 320, 560, 896, 1344, 1920

Kaldari's Sequence™ (power of 2 for small thumbnails, MediaWiki defaults for medium thumbnails, multiples of 100 or 200 for large thumbnails): 32, 48, 64, 100, 120, 150, 180, 200, 220, 250, 300, 400, 500, 600, 800, 1000

There are a lot of possible sequences, that could be tweaked further, upon which to base this idea.

Currently used numbers
As of March 2015, thumbnails are pre-generated at upload according to the $wgUploadThumbnailRenderMap config variable. The values currently used are: 320, 640, 800, 1024, 1280, 1920, 2560, 2880