Analytics/Epics/Pageview API

From mediawiki.org
For the documentation of the current pageview API, see: wikitech:Analytics/PageviewAPI.

Goals[edit]

Wikipedians need a reliable and accurate API for querying page views for articles. This epic describes the steps that need to be taken to build such an API. Initially, this epic will focus on the underlying infrastructure (e.g. kafka/hadoop) that needs to be built for this purpose. This Epic is definitely not finished and will be expanded with more requirements about the front end as the back end work progresses.

Detailed Tracking Links[edit]

TBD

Users[edit]

User Description
Product Managers The people who are researching, designing and iterating on the page view metrics
Researchers/Analytics Developers The people who define the various page view metrics
Analytics Developers The people who write the software that produce the metrics
Analytics Operators The people who ensure the software is running and the data is updated
Management WMF who make decisions based on the results of the data
Community The wikipedians who look at the data to assess their success and the health of the community and their pages
Readers The people who read wikipedia

Prioritized Use Cases[edit]

High Priority[edit]

  1. As a Wikipedian, I need an API that allows me to query various page view stats
  2. As a Reader, I want any PII (IP address, UA, etc) to be removed from my page view information
  3. As a Product Owner, I want page views to be geo-coded at a country level
  4. As a Product Owner (and a lot of other stakeholders), I want raw logs to be deleted within 90 days
  5. As a Product Owner, I want page views to conform to a community reviewed definition

Later[edit]

Non functional requirements[edit]

  1. Data should be updated daily, with hourly granularity

Additional information[edit]

We've done some planning with tech-ops documented here: List of tasks for backend work