Wikimedia Product/Data dictionary/content pv

From mediawiki.org

The cchen.content_pv table (available on Hive) contains content topics related daily pageview data, generated by aggregating wmf.pageview_hourly and join with isaacj.article_topics_outlinks on Hive. It is stored in the Parquet columnar file format and partitioned by year, month and day.

This page describes the data set content_pv that is loaded from cchen.content_pv on Hive through Presto, which can be accessed via Superset.

Schema[edit]

Field name data type description data example source schema source field
date timestamp The date of pageviews 2021-05-29 00:00:00.0 wmf.pageview_hourly event_timestamp
project string Project name from hostname hu.wikipedia. wmf.pageview_hourly project
market string Global markets (see definition) Global North canonical_data.countries economic_region
country string Country Albania canonical_data.countries country
country_code string ISO code for country AL canonical_data.countries country_code
topics string Topics related to certain articles using outlink-based model (refer to the taxonomy for detailed article topics) Geography.Geographical isaacj.article_topics_outlinks topic
main_topic string Top level of the topic Geography cchen.topic_component main_topic
sub_topic string Second level of the topic Geographical cchen.topic_component sub_topic
pageviews bigint Number of pageviews 10000 wmf.pageview_hourly count(1) then aggregated year, month, and day

Dashboards which use this table[edit]

Pageview_Topics_Dashboard

Known issues and changes[edit]