Talk:Requests for comment/Media file request counts

NARA pilot
At the Zuerich HAckaton in May 2014 a 'NARA' pilot was done to nail down relevant Hive queries for GLAM. There is relevance here but no one-on-one translation of requirements surfaced at that time. For one, geo-breakdown of requests is not (yet) available. Breaking down the rows by country in the dataset proposed here would explode the file size.

Test runs
Incomplete examples from two test runs: (ignore extra whitespace for column-based layout):

mediafile                                           req_total  req_int_full  req_int_thumbs  req_ext          bytes /wikipedia/commons/4/4a/Commons-logo.svg            41569730           954         41556404    12372    59198663369 /wikipedia/en/4/4a/Commons-logo.svg                 41382929          1010         41373778     8141    66620759690 /wikipedia/commons/f/fa/Wikiquote-logo.svg          28020764           543         28016206     4015    47132189012 /wikipedia/en/b/bc/Wiki.png                         26643030      26341441             1021   300568   523179005167 /wikipedia/commons/2/23/Icons-mini-file_acrobat.gif 22808095      22521938           221214    64943     6803053997 /wikipedia/commons/4/4c/Wikisource-logo.svg         21153210           234         21149420     3556    39723888186 /wikipedia/en/9/99/Question_book-new.svg            16232299           929         16225236     6134    80921974777 (!! add req_unknown) base_name                                              total   original  high_quality    low_quality /wikipedia/commons/4/4a/Commons-logo.svg            46950570      13380         27129       46910061 /wikipedia/en/4/4a/Commons-logo.svg                 45997381       9261          3133       45984987 /wikipedia/commons/f/fa/Wikiquote-logo.svg          31759104       4587          1981       31752536 /wikipedia/en/b/bc/Wiki.png                         29846080   29844995             0           1085 /wikipedia/foundation/2/20/CloseWindow19x19.png     26264640   26264640             0              0 /wikipedia/commons/2/23/Icons-mini-file_acrobat.gif 24394138   24143535             0         250603 /wikipedia/commons/4/4c/Wikisource-logo.svg         23027261       3805          1811       23021645 /wikipedia/en/9/99/Question_book-new.svg            17954807       7092          1747       17945968 (!! no distinction made by referer in this test run for new UDF)
 * all thumbs in one count
 * large and small thumbs counted separately

On status codes
About 206: applies to timed files: video and sound files
 * 'Partial content' gets sent if browsers got interrupted for a request, and restart it and signal that they only need a certain part of the content. Around media files, this is often due to seeking in a video or audio stream.
 * When seeking in audio or video files, new requests for only part of the file are made. The varnishes respond with 206 HTTP status code. Also, when showing directly within a webpage, the OGV-Viewer does a request for the file that gets responded by a 200 HTTP status code and two requests for the file that get responded to with 206 HTTP status code. Since many 206 are just additional requests for the same file, it seems right to not count all 206s.
 * On closer inspection: the 200 need not be for the media file itself, but may be (and mostly is) for the thumbnail picture for the media file. So we end up having the GET for the real media file respond with a 206. But not all 206 responses are for requests for the full file. So we end up counting 206's for first chunk of data (+/- 9k chunk), (and a few of those may be re-requests for the first chunk, if that chunk got interrupted by the user, but for video files 9k is just a few seconds, for sounds files it may occur a bit more often) (paraphrased from comments by Chris) Erik Zachte (WMF) (talk) 17:32, 14 November 2014 (UTC)