Wikimedia Research/Datasets

As part of their research projects, the Research Team aims to publicly share the datasets that were used or created in these projects. This page aims to curate a list of these resources in order to make them more accessible.

2021

 * COVID-19 Pandemic Wikipedia Readership figshare
 * Tracking Knowledge Propagation Across Wikipedia Languages zenodo
 * More details: Research:Exploration_on_content_propagation_across_Wikimedia_projects
 * Wiki-Reliability: A Large Scale Dataset for Content Reliability on Wikipedia figshare
 * More details: Research:Wiki-Reliability:_A_Large_Scale_Dataset_for_Content_Reliability_on_Wikipedia
 * Wikipedia Article Topics for All Languages (based on article outlinks) figshare
 * More details: Research:Language-Agnostic_Topic_Classification#Topic_Classification_of_Wikipedia_Articles

2020

 * Topics for each Wikipedia Article across Languages figshare
 * More details:Research:Language-Agnostic_Topic_Classification#Topic_Classification_of_Wikidata_Items