Analytics/Pageviews/Webstatscollector

Webstatscollector is a legacy tool which allows us to take sampled logs and generate page view counts for all Wikimedia projects. The page view counts are used in the monthly report card: http://reportcard.wmflabs.org/graphs/pageviews

Webstatscollector's Pageview Definition
This graph best explains what is considered a page view by webstatscollector https://github.com/wikimedia/analytics-metrics/blob/master/pageviews/webstatscollector/pageview_definition.png

Webstatcollector has a limited definition of what constitutes a page view and over-counts some actions, while under-counting others. Research is ongoing into page views and a new standard definition is definition is available. https://meta.wikimedia.org/wiki/Research:Page_view. Research is also studying the nature of the differences between the new pageview counts and webstatscollector's counts.

Vital Signs
For a brief period of time (December 2014), Vital Signs displayed pageview data using webstatscolletor's definition. The definition was implemented on Refinery's Hadoop cluster using Hive, processing raw webrequest logs.

Architecture
https://wikitech.wikimedia.org/wiki/Analytics/Webstatscollector