Analytics/Data Processing

Vision
Have a big data processing platform to produce reports and metrics and facilitate research while complying with the privacy policy and expectations of the MediaWiki community.

Roadmap

 * 1) Decommission varnishncsa
 * 2) * Benefit:
 * 3) * reduced load on varnishes (Ops has requested we remove varnishesncsa)
 * 4) * less reliance on udp2log (we can start using lossless kafkatee)
 * 5) Total PageView Prototype (top level metrics on total page views)
 * 6) * Benefits: 2014-15 Q1 goal - produce metrics for executives
 * 7) * gain experience towards fully implementing ETL and page view counting
 * 8) Replace Webstats Collector
 * 9) * Benefits:
 * 10) * No more packet losses (use kafkatee instead of udp2log)
 * 11) * Scalable and maintainable (use Hadoop on all data instead of sampled logs)
 * 12) Fully Dimensionned PageViews and ImageViews
 * 13) * Benefits: for executives and community.
 * 14) * Replaces reliance on Comscore data
 * 15) Wikipedia Zero filtering & Page Views
 * 16) * Benefits: reliable & robust generation of data for WP Zero reports. Replace temporary solution in use.