Wikimedia Product/Data dictionary/repo active editors

From mediawiki.org

The cchen.repo_active_editors table (available on Hive) contains active editors data, generated by aggregating wmf.editors_daily and neilpquinn.editor_month on Hive by month. It is stored in the Parquet columnar file format and partitioned by month.

This page describes the data set repo_active_editors that is loaded from cchen.repo_active_editors on Hive through Presto, which can be accessed via Superset.

Schema[edit]

Field name data type description data example source schema source field
project string Project name from hostname acewiki wmf.editors_daily

neilpquinn.editor_month

project
project_family string Project family name wikipedia wmf.editors_daily

neilpquinn.editor_month

database_group
market string Global markets (see definition) Global North canonical_data.countries economic_region
active_editors bigint Number of active editors (see definition) 10000 wmf.editors_daily

neilpquinn.editor_month

count(*) then aggregated by month
new_active_editors bigint Number of new active editors (see definition) 5 wmf.editors_daily

neilpquinn.editor_month

sum(cast(registration_month = month as int)) then aggregated by month
returning_active_editors bigint Number of returning active editors (see definition) 49 wmf.editors_daily

neilpquinn.editor_month

sum(cast(registration_month != month as int)) then aggregated by month

Note: In order to get unique editors count for each level of the dimensions, in project, market and project_family, there are values equal to "All" to show the sum of editors within certain groups.

  • To view active editors data by project, add a filter with market = "All".
  • To view active editors data by project family, add filters with market = "All" and project = "All".
  • To view active editors data by diversity markets, add filters with project_family = "All" and project = "All".

Dashboards which use this table[edit]

Editors Dashboard

Known issues and changes[edit]