Analytics/Hypercube

=Background=

We are sketching a vision for an API to query pageview data that we think is extensible and will allow us to address many use-cases. Rome wasn't built in one day and neither will this be.

We call the project Hypercube -- an hypercube is a 'n-dimensional cube'. It's our intention to keep adding new dimensions, such as mobile, country, browser family, etc. over time.

=Questions that the API should be able to answer=

This is a non-exhaustive list of questions that we have collected over time.


 * Top X articles by metric Y within filter aggregated by timeseries  for period  - 
 * Daily metric Y for articles Z,Z1,Z2,...
 * Top X articles in categories A,B,C
 * Top X articles in namespace F,G,H
 * Total pageviews per namespace per project per time series
 * Total pageviews per namespace per project per time series mobile only
 * Total pageviews per namespace per project per time series broken down by country/region code

=Minimum Viable Prototype=

We will built the first prototype against the datastream as generated by webstatscollector. This means that we will only have the following dimensions available:
 * time
 * project
 * namespace (if we would parse the article title)

Available Metrics (facts in data warehouse terminology):
 * hourly pageview count per article (excluding mobile and commons pageview counts)
 * hourly bytes sent per article

Initially, we will aggregate these metrics to daily values.

We want to release a prototype as soon as possible but that still is useful to the community and is viable from an architectural point of view as well.

=API=

The API will not be RESTful -- it will only support GET requests.

The proposed format for a query is:

GET https://hypercube.wikimedia.org/v1/json/ ?metric=pageviews &timeseries=daily &order=desc &filters=articles-category-or:A,B,C; articles:Main,Napoleon;project:enwiki &limit=10 &start-timestamp=2010-10-01 &end-timestamp=2011-10-31

Parameters
=Design Questions=


 * Should we convert the non-ascii titles to unicode?
 * Should we expose the bytes sent metric?

=How to Participate=

There are many ways to participate:


 * Chime in with your use cases on the Talk Page
 * Give feedback on the suggested API
 * Or even better help us build the hypercube by cloning the repo and submitting your patches!