Analytics/Kraken

Rationale
The Wiki Movement has a chronic need for analytics. We need it to understand our editors, to encourage growth, to engender diversity, to focus our resources, to improve our engineering efforts, and to measure our success. It permeates nearly all our goals, yet our current analytics capabilities are underdeveloped: we lack infrastructure to capture editor, visitor, clickstream, and device data in a way that is easily accessible; our efforts are distributed among different departments; our data is fragmented over different systems and databases; our tools are ad-hoc.

Rather than merely improve existing jobs and data pipelines, the Analytics Team aims to construct a Data Services Platform capable of mining intelligence from all datastreams of interest, providing this insight in real time, and exposing it via an API to power applications, mash up into websites, and stream to devices.

Timeline

 * Rough Teamwide Milestones
 * Analytics Team Roadmap

Docs



 * Cluster Dataflow Diagram
 * Request Logging &mdash; capturing the incoming firehose from our front-end servers.
 * System Recommendation
 * Distributed Logging Systems Research
 * Feature Comparison Spreadsheet
 * Pixel Service Endpoint
 * Hardware Planning
 * Notes
 * Test Cluster Setup Notes
 * Hadoop Setup Notes