Analytics/Archive/Editor Engagement Vital Signs
This page is obsolete. It is being retained for archival purposes. It may document extensions or features that are obsolete and/or no longer supported. Do not rely on the information here being up-to-date.
The Editor Engagement program is a top strategic priority for the Foundation and has many individual initiatives. We do not, however, have a dashboard that provides consistent, explorable, and timely data on this program, which makes it difficult and time-consuming for implementors to understand the impact of their changes. A consistent dashboard would also help normalize results across products and other aspects of the program which are difficult to compare at this time.
The goal of this project is implementing such a dashboard. Research and Data will provide the definitions and SQL queries for the actual metrics.
Detailed Tracking Links
|Product Managers||The people who are researching, designing and iterating on the editor engagement features.|
|Researchers||The people who define the various metrics and use them to measure the performance of editor engagement features.|
|Analytics Developers||The people who write the software that produces the dashboards|
|Analytics Operators||The people who ensure the software is running and the data is updated|
|Communications||WMF personnel who use the data in monthly reports|
|Management||WMF who make decisions based on the results of the data|
|Community||The wikipedians who look at the data to assess their success and the health of the community|
Prioritized Use Cases
- As a Product Manager I want a well designed dashboard that I can use to explore the metrics listed in the New Users section
- As a Product Manager, Researcher and Analytics Developer (and just about everybody else) I need documentation about the how these metrics are calculated
- As a Product Manager, I need these metrics calculated across all Wikis (this list is accessible via this API call: http://en.wikipedia.org/w/api.php?action=sitematrix
- As a Researcher, I need to create a new table for reverts (as they happen)
- As a Product Manager and Researcher, I want historical data to be maintained in a format that can be queried easily (Warehouse)
- As a Product Manager, I need the data to be in tabular and graphed formats and I want to be able to download the data as a CSV or TSV
- As a Product Manager I want to be able to compare arbitrary data series in a graph
- As a Manager I want some graphs to be available to WMF only
- As a Product Manager, I want geographic and other arbitrary breakdowns (such as mobile/desktop)
- As a Product Manager, I want League Tables (such as daily top-X contributors by namespace)
- As a Product Manager, I want trends, moving averages, projections and other UI candy
Non functional requirements
- All dashboards should be updated daily.
- Sizing data from the size and scale section below should be used for capacity planning. We should double this number to allow for future growth. However, it is preferable that a system that allows storage and I/O capacity to be added dynamically be used for storage.
- Dashboards should render within 2 seconds
- This does not include query time
- Once we have signoff on the basic issues described below, all issues with dashboards should be addressed within 2 "business" days of the problem reports.
- Data be retained indefinitely
The minimum granularity for this data should be daily with monthly rollups.
Here are the functional requirements for the dashboards. We will engage the UX team for specific wireframes/prototypes.
The New User metrics that are high priority for this Epic are part of a hierarchy:
- Project (e.g enwiki)
- Grouping (e.g. New Users)
- Metrics (e.g New Editors)
- Grouping (e.g. New Users)
The Metrics levels of the hierarchy should contain all of the metrics listed in this document for the grouping.
For this release, just Wikipedia is fine for projects, although for subsequent versions, we should include Meta and Commons
We need to support all of the projects for the first release
For this release, all of the data available for a metric will be graphed
Loading time is extremely important -- our users would like a consumer web experience (50 percentile < 1 second from a warm cache)
- We need to explore technical solutions to ensure we can commit to this goal
Labels should be linear (no special handling is needed)
- X Axis should be the date, Y the value
Definitions should be linked to from the Metrics pages
No additional summaries or breakdowns are needed
No overlays or comparisons are needed -- one time series per graph
The graphs must render on webkit or gecko engines. The only browsers that we need to support are Safari, Chrome and Firefox.
Resolution should be selectable from the following values (Day, Month, Year)
Timeframe should be selectable from a control and should be constrained to provide a balance between functionality and performance
- The constraint is dependent on the performance of the underlying rendering libraries
These requirements are captured here for historical purposes but not needed for the initial release
Tabular data would nice
Trends would be nice
Rolling averages would be nice
Dashboard Technical Stack
A detailed description of our technology choices for the editor vital signs dashboard can be found here:
Size and update rate
For the metrics in phase 1, the following tables will need to be used.
These sizes/updates are from enwiki. They should be doubled to account for all of the wikis. We should double this number again to allow for growth.
|Table||Size||Number of Records||Update Rate|
|Archive||10 GB||45 M||<1/sec|
|Logging||7.3 GB||50 M||<1/sec|
|Page||3.5 GB||30 M||<1/sec|
|Recentchanges||3 GB||8 M||2/sec|
|Revision||136 GB||500 M||2/sec|
|User||5 GB||20 M||<1/sec|
|User properties||5 GB||20 M||<1/sec|
|Total||170 GB||680 M||5/sec|