Requests for comment/Media file request counts

Background
Since 2008 Wikimedia collects pageview counts for most pages on nearly all wikis. A longstanding request of stakeholders (editors, researchers, GLAM advocates) has been to publish similar counts for media files: images, sounds, videos. A major obstacle to effectuate this was the existing traffic data collecting software. Webstatscollector simply couldn't be scaled up further without incurring huge costs. Even page view counts weren't complete: no per page mobile counts, just per wiki.

In 2014 WMF engineers rolled out a new Hadoop based infrastructure, which makes it possible to collect raw request counts for media files. So a few months after releasing extended pageview counts (with mobile/zero added), the time has come to produce similar data dumps for media files.

Problem
What will be the exact specifications of the new media file request dumps?

Definition of media files
Media files are all images, sound files and videos on the WMF upload servers (url '..//upload.wikimedia.org/..'). These images are mostly embedded in articles, but can be requested separately. Out of scope are therefor images which are served to users from e.g. the bits servers (e.g. navigation icons).

Primary (and only ?) defining criterium is therefore the location:

Currently three folder hierarchies on the upload servers are included:
 *  //upload.wikimedia.org/ -project code- / [archive] / -language code- / [thumb] / @ / @@ / -image file name- 
 * e.g. http://upload.wikimedia.org//wikipedia/commons/4/4a/Commons-logo.svg


 *  //upload.wikimedia.org/math/ @ / @ / @ / -math image file name- 
 * e.g. http://upload.wikimedia.org/math/f/f/f/fffffd30a4febac3dab210ae1537419e.png


 *  //upload.wikimedia.org/ -project code- / @@ /timeline/ -timeline image file name- 
 * e.g. https://upload.wikimedia.org/wikipedia/en/timeline/d2d8e00fd34c75c9be9ce74f63be3517.png

(!! may be incomplete, check)

Legend:
 * each @ stands for one character in range 0-9a-f
 * -xxx- generic description
 * [..] = optional segment
 * language code includes strings like 'meta', 'commons' etc)
 * ignore spaces

(question: do we filter other file paths, or maybe some extensions?) e.g. http://upload.wikimedia.org/robots.txt

Update frequency
The proposed update frequency is daily dump files. Reason to chose for daily updates instead of say hourly updates (like with page views) are:
 * Generation is more cost-effective than hourly updates (one daily job, to be scheduled at the most convenient time of day, in terms of overall system activity)
 * Requires less post-processing for aggregation.
 * More convenient to download (due to much smaller per-day size)