Wikimedia Product/Data dictionary/repo active editors
The cchen.repo_active_editors
table (available on Hive) contains active editors data, generated by aggregating wmf.editors_daily
and neilpquinn.editor_month
on Hive by month. It is stored in the Parquet columnar file format and partitioned by month.
This page describes the data set repo_active_editors
that is loaded from cchen.repo_active_editors
on Hive through Presto, which can be accessed via Superset.
Schema[edit]
Field name | data type | description | data example | source schema | source field |
---|---|---|---|---|---|
project | string | Project name from hostname | acewiki | wmf.editors_daily
neilpquinn.editor_month |
project |
project_family | string | Project family name | wikipedia | wmf.editors_daily
neilpquinn.editor_month |
database_group |
market | string | Global markets (see definition) | Global North | canonical_data.countries | economic_region |
active_editors | bigint | Number of active editors (see definition) | 10000 | wmf.editors_daily
neilpquinn.editor_month |
count(*) then aggregated by month |
new_active_editors | bigint | Number of new active editors (see definition) | 5 | wmf.editors_daily
neilpquinn.editor_month |
sum(cast(registration_month = month as int)) then aggregated by month |
returning_active_editors | bigint | Number of returning active editors (see definition) | 49 | wmf.editors_daily
neilpquinn.editor_month |
sum(cast(registration_month != month as int)) then aggregated by month |
Note: In order to get unique editors count for each level of the dimensions, in project
, market
and project_family
, there are values equal to "All" to show the sum of editors within certain groups.
- To view active editors data by project, add a filter with
market = "All"
. - To view active editors data by project family, add filters with
market = "All"
andproject = "All"
. - To view active editors data by diversity markets, add filters with
project_family = "All"
andproject = "All"
.