Analytics/Archive/Editor Engagement Vital Signs

'''This page is archived! Find new documentation at https://wikitech.wikimedia.org/wiki/Analytics/Vital_Signs'''

Enjoy https://vital-signs.wmflabs.org https://vital-signs.wmflabs.org/

= Goals =

The Editor Engagement program is a top strategic priority for the Foundation and has many individual initiatives. We do not, however, have a dashboard that provides consistent, explorable, and timely data on this program, which makes it difficult and time-consuming for implementors to understand the impact of their changes. A consistent dashboard would also help normalize results across products and other aspects of the program which are difficult to compare at this time.

The goal of this project is implementing such a dashboard. Research and Data will provide the definitions and SQL queries for the actual metrics.

= Detailed Tracking Links =

Development (Mingle)

Research (Trello)
 * repo

= Users =

= Prioritized Use Cases =

High Priority

 * 1) As a Product Manager I want a well designed dashboard that I can use to explore the metrics listed in the New Users section
 * 2) As a Product Manager, Researcher and Analytics Developer (and just about everybody else) I need documentation about the how these metrics are calculated
 * 3) As a Product Manager, I need these metrics calculated across all Wikis (this list is accessible via this API call: http://en.wikipedia.org/w/api.php?action=sitematrix
 * 4) As a Researcher, I need to create a new table for reverts (as they happen)
 * 5) As a Product Manager and Researcher, I want historical data to be maintained in a format that can be queried easily (Warehouse)
 * 6) As a Product Manager, I need the data to be in tabular and graphed formats and I want to be able to download the data as a CSV or TSV

Later

 * 1) As a Product Manager I want to be able to compare arbitrary data series in a graph
 * 2) As a Manager I want some graphs to be available to WMF only
 * 3) As a Product Manager, I want geographic and other arbitrary breakdowns (such as mobile/desktop)
 * 4) As a Product Manager, I want League Tables (such as daily top-X contributors by namespace)
 * 5) As a Product Manager, I want trends, moving averages, projections and other UI candy

Non functional requirements

 * 1) All dashboards should be updated daily.
 * 2) Sizing data from the size and scale section below should be used for capacity planning. We should double this number to allow for future growth. However, it is preferable that a system that allows storage and I/O capacity to be added dynamically be used for storage.
 * 3) Dashboards should render within 2 seconds
 * This does not include query time


 * 1) Once we have signoff on the basic issues described below, all issues with dashboards should be addressed within 2 "business" days of the problem reports.
 * 2) Data be retained indefinitely

= Metrics = The minimum granularity for this data should be daily with monthly rollups.

Acquisition

 * Newly registered user

Activation

 * Live users (deferred pending instrumentation)
 * New editor

Productivity

 * Productive new editor

Retention

 * Surviving new editor

Community

 * Editors
 * Active editors
 * Very active editors
 * Anonymous editors
 * Bots

Content

 * Edits
 * Uploads
 * Page creations
 * Total pages

Curation

 * Reverts
 * Deletions

= Design =

Here are the functional requirements for the dashboards. We will engage the UX team for specific wireframes/prototypes.

The New User metrics that are high priority for this Epic are part of a hierarchy:

Global Aggregations
 * Project (e.g enwiki)
 * Grouping (e.g. New Users)
 * Metrics (e.g New Editors)

The Metrics levels of the hierarchy should contain all of the metrics listed in this document for the grouping.

For this release, just Wikipedia is fine for projects, although for subsequent versions, we should include Meta and Commons We need to support all of the projects for the first release

For this release, all of the data available for a metric will be graphed

Loading time is extremely important -- our users would like a consumer web experience (50 percentile < 1 second from a warm cache)
 * We need to explore technical solutions to ensure we can commit to this goal

Labels should be linear (no special handling is needed)
 * X Axis should be the date, Y the value

Definitions should be linked to from the Metrics pages

No additional summaries or breakdowns are needed

No overlays or comparisons are needed -- one time series per graph

The graphs must render on webkit or gecko engines. The only browsers that we need to support are Safari, Chrome and Firefox.

Deprioritized Requirements
Resolution should be selectable from the following values (Day, Month, Year)

Timeframe should be selectable from a control and should be constrained to provide a balance between functionality and performance
 * The constraint is dependent on the performance of the underlying rendering libraries

These requirements are captured here for historical purposes but not needed for the initial release

Tabular data would nice

Trends would be nice

Rolling averages would be nice

= Implementation Details =

Dashboard Technical Stack
A detailed description of our technology choices for the editor vital signs dashboard can be found here:

Dashboard Technical Stack

Backfilling
Load tests and backfiling of EEVS data

Size and update rate
For the metrics in phase 1, the following tables will need to be used.

These sizes/updates are from enwiki. They should be doubled to account for all of the wikis. We should double this number again to allow for growth.