Analytics/Pageviews/Webstatscollector

Webstatscollector is a legacy tool which allows us to take sampled logs and generate page view counts for all Wikimedia projects. The page view counts are used in the monthly report card: http://reportcard.wmflabs.org/graphs/pageviews

Webstatscollector's Pageview Definition
This graph best explains what is considered a page view by webstatscollector https://github.com/wikimedia/analytics-metrics/blob/master/pageviews/webstatscollector/pageview_definition.png

Webstatcollector has a limited definition of what constitutes a page view and over-counts some actions, while under-counting others. Research is ongoing into page views and a new standard definition is definition is available: https://meta.wikimedia.org/wiki/Research:Page_view. Research is also studying the nature of the differences between the new pageview counts and webstatscollector's counts.

Vital Signs
For a brief period of time (December 2014), Vital Signs displayed pageview data using webstatscolletor's definition. The definition was implemented on Refinery's Hadoop cluster using Hive, processing raw webrequest logs. You can tell which definition is displayed in Vital Signs by clicking on the "Daily Pageviews" title of the graph. The link will take you to the pageview definition used to generate the data in the graph.

Architecture
https://wikitech.wikimedia.org/wiki/Analytics/Webstatscollector

Storage
Hive is involved. See Analytics/Cluster/Hive