Analytics/Archive/Editor Engagement Vital Signs

From mediawiki.org

Enjoy https://vital-signs.wmflabs.org https://vital-signs.wmflabs.org/

Goals[edit]

The Editor Engagement program is a top strategic priority for the Foundation and has many individual initiatives. We do not, however, have a dashboard that provides consistent, explorable, and timely data on this program, which makes it difficult and time-consuming for implementors to understand the impact of their changes. A consistent dashboard would also help normalize results across products and other aspects of the program which are difficult to compare at this time.

The goal of this project is implementing such a dashboard. Research and Data will provide the definitions and SQL queries for the actual metrics.

Detailed Tracking Links[edit]

Development (Mingle)

Research (Trello)

Users[edit]

User Description
Product Managers The people who are researching, designing and iterating on the editor engagement features.
Researchers The people who define the various metrics and use them to measure the performance of editor engagement features.
Analytics Developers The people who write the software that produces the dashboards
Analytics Operators The people who ensure the software is running and the data is updated
Communications WMF personnel who use the data in monthly reports
Management WMF who make decisions based on the results of the data
Community The wikipedians who look at the data to assess their success and the health of the community

Prioritized Use Cases[edit]

High Priority[edit]

  1. As a Product Manager I want a well designed dashboard that I can use to explore the metrics listed in the New Users section
  2. As a Product Manager, Researcher and Analytics Developer (and just about everybody else) I need documentation about the how these metrics are calculated
  3. As a Product Manager, I need these metrics calculated across all Wikis (this list is accessible via this API call: http://en.wikipedia.org/w/api.php?action=sitematrix
  4. As a Researcher, I need to create a new table for reverts (as they happen)
  5. As a Product Manager and Researcher, I want historical data to be maintained in a format that can be queried easily (Warehouse)
  6. As a Product Manager, I need the data to be in tabular and graphed formats and I want to be able to download the data as a CSV or TSV

Later[edit]

  1. As a Product Manager I want to be able to compare arbitrary data series in a graph
  2. As a Manager I want some graphs to be available to WMF only
  3. As a Product Manager, I want geographic and other arbitrary breakdowns (such as mobile/desktop)
  4. As a Product Manager, I want League Tables (such as daily top-X contributors by namespace)
  5. As a Product Manager, I want trends, moving averages, projections and other UI candy

Non functional requirements[edit]

  1. All dashboards should be updated daily.
  2. Sizing data from the size and scale section below should be used for capacity planning. We should double this number to allow for future growth. However, it is preferable that a system that allows storage and I/O capacity to be added dynamically be used for storage.
  3. Dashboards should render within 2 seconds
  1. This does not include query time
  1. Once we have signoff on the basic issues described below, all issues with dashboards should be addressed within 2 "business" days of the problem reports.
  2. Data be retained indefinitely

Metrics[edit]

The minimum granularity for this data should be daily with monthly rollups.

New Users[edit]

Acquisition[edit]

Activation[edit]

Productivity[edit]

Retention[edit]

Community[edit]

Content[edit]

Curation[edit]

Design[edit]

Here are the functional requirements for the dashboards. We will engage the UX team for specific wireframes/prototypes.

The New User metrics that are high priority for this Epic are part of a hierarchy:

Global Aggregations

Project (e.g enwiki)
Grouping (e.g. New Users)
Metrics (e.g New Editors)

The Metrics levels of the hierarchy should contain all of the metrics listed in this document for the grouping.

For this release, just Wikipedia is fine for projects, although for subsequent versions, we should include Meta and Commons We need to support all of the projects for the first release

For this release, all of the data available for a metric will be graphed

Loading time is extremely important -- our users would like a consumer web experience (50 percentile < 1 second from a warm cache)

  • We need to explore technical solutions to ensure we can commit to this goal

Labels should be linear (no special handling is needed)

  • X Axis should be the date, Y the value

Definitions should be linked to from the Metrics pages

No additional summaries or breakdowns are needed

No overlays or comparisons are needed -- one time series per graph

The graphs must render on webkit or gecko engines. The only browsers that we need to support are Safari, Chrome and Firefox.

Deprioritized Requirements[edit]

Resolution should be selectable from the following values (Day, Month, Year)

Timeframe should be selectable from a control and should be constrained to provide a balance between functionality and performance

  • The constraint is dependent on the performance of the underlying rendering libraries

These requirements are captured here for historical purposes but not needed for the initial release

Tabular data would nice

Trends would be nice

Rolling averages would be nice

Implementation Details[edit]

Dashboard Technical Stack[edit]

A detailed description of our technology choices for the editor vital signs dashboard can be found here:

Dashboard Technical Stack


Backfilling[edit]

Load tests and backfiling of EEVS data

Size and update rate[edit]

For the metrics in phase 1, the following tables will need to be used.

These sizes/updates are from enwiki. They should be doubled to account for all of the wikis. We should double this number again to allow for growth.

Table Size Number of Records Update Rate
Archive 10 GB 45 M <1/sec
Logging 7.3 GB 50 M <1/sec
Page 3.5 GB 30 M <1/sec
Recentchanges 3 GB 8 M 2/sec
Revision 136 GB 500 M 2/sec
User 5 GB 20 M <1/sec
User properties 5 GB 20 M <1/sec
Total 170 GB 680 M 5/sec