Wikimedia Research/Showcase

The Monthly Research & Data Showcase is a public showcase of recent research by the Wikimedia Foundation's Research and Data Team, other WMF researchers and occasionally guest presenters. The showcase is hosted at the Wikimedia Foundation every 3rd Wednesday of the month at 11.30 Pacific Time and live streamed on YouTube.

How to attend
We live stream our research showcase every month on YouTube. The link is announced a few minutes before the showcase starts via wiki-research-l, analytics-l and @WikiResearch. You can join the conversation and participate in Q&A after each presentation by connecting to our IRC channel on freenode:

October 2014
October 15, 2014 Video: Commons? YouTube?
 * Emotions under Discussion: Gender, Status and Communication in Wikipedia
 * By David Laniado: I will present a large-scale analysis of emotional expression and communication style of editors in Wikipedia discussions. The talk will focus especially on how emotion and dialogue differ depending on the status, gender, and the communication network of the about 12000 editors who have written at least 100 comments on the English Wikipedia's article talk pages. The analysis is based on three different predefined lexicon-based methods for quantifying emotions: ANEW, LIWC and SentiStrength. The results unveil significant differences in the emotional expression and communication style of editors according to their status and gender, and can help to address issues such as gender gap and editor stagnation.


 * Wikipedia as a socio-technical system
 * By Halfak (WMF): Wikipedia is a socio-technical system. In this presentation, I'll explain how the integration of human collective behavior ("social") and information technology ("technical") has lead to a phenomena that, while being massively productive, is poorly understood due to lack of precedence.  Based on my work in this area, I'll describe five critical functions that healthy, Wikipedia-like socio-technical systems must serve in order to continue to function: allocation, regulation, quality control, community management and reflection.  Next I'll argue the Wikimedia Foundation's analytics strategy currently focuses on outcomes related to a relatively narrow aspect of system health and all but completely ignores productivity.  Finally, I'll conclude with an overview of three classes of new projects that should provide critical opportunities to both practically and academically understand the maintenance of Wikipedia's socio-technical fitness.

September 2014
September 17, 2014 ''The September showcase was canceled because of a conflict with other events scheduled by WMF. We will resume showcases in October.''

August 2014
August 20, 2014 Video: Commons? YouTube
 * Everything You Know About Mobile Is WrW^Right: Editing and Reading Pattern Variation Between User Types


 * By Oliver Keyes: Using new geolocation tools, we look at reader and editor behaviour to understand how and when people access and contribute to our content. This is largely exploratory research, but has potential implications for our A/B testing and how we understand both cultural divides between reader and editor groups from different countries, and how we understand the differences between types of edit and the editors who make them.


 * Wikipedia Article Curation: Understanding Quality, Recommending Tasks


 * By Morten Warncke-Wang: In this talk we look at article curation in Wikipedia through the lens of task suggestions and article quality. The first part of the talk presents SuggestBot, the Wikipedia article recommender. SuggestBot connects contributors with articles similar to those they previously edited. In the second part of the talk, we discuss Wikipedia article quality using “actionable” features, features that contributors can easily act upon to improve article quality. We will first discuss these features’ ability to predict article quality, before coming back to SuggestBot and show how these predictions and actionable features can be used to improve the suggestions.

July 2014
July 16, 2014 Video: Commons YouTube
 * Halfak's wiki research libraries (v0.0.1)


 * By Aaron Halfaker: Along with quantitative research comes data and analysis code. In this presentation, Aaron will introduce you to 4 python libraries that capture code he uses on a regular basis to get his wiki research done.  MediaWiki Utilities is a general data processing library that includes connectors for the API and MySQL databases as well as an XML dump parser and revert detection.  Wiki-Class is a machine learning library that is designed to train, test and deploy automatic quality assessment class detection for Wikipedia articles.  MediaWiki-OAuth provides a simple interface for performing an OAuth handshake with a MediaWiki installation (e.g. Wikipedia).  Deltas is an experimental text difference detection library that implements cutting-edge research to track changes to Wikipedia articles and attribute authorship of content.


 * Using Open Data and Stories to Broaden Crowd Content


 * By Nathan Matias: Nathan will share a series of research on gender diversity online and designs for collaborative content creation that foster learning and community. He will also demo a prototype for a system that could leverage open data to attract and support new Wikipedia contributors.



June 2014
June 18, 2014 Video: Commons YouTube
 * Moodbar -- lightweight socialization improves long-term editor retention.pdfar -- lightweight socialization improves long-term editor retention
 * by Giovanni Luca Ciampaglia -- I will talk about MoodBar, an experimental feature deployed on the English Wikipedia from 2011 to 2013 to streamline the socialization of newcomers. I will present results from a natural experiment that measured the effect of Moodbar on the short-term engagement and long-term retention of newly registered users attempting to edit for the first time Wikipedia. Our results indicate that a mechanism to elicit lightweight feedback and to provide early mentoring to newcomers significantly improves their chances of becoming long-term contributors.


 * Active editor survival.pdfe Editors' Survival Models
 * by Leila Zia -- I will talk about first results in building prediction models for active editors' survival. A sample of such prediction models, their performance, and the important variables in predicting survival will be presented.



May 2014
May 21, 2014 Video: Commons YouTube
 * A bird's eye view of editor activation
 * by Dario Taraborelli -- In this talk I will give a high-level overview of data on new editor activation, presenting longitudinal data from the largest Wikipedias, a comparison between desktop and mobile registrations and the relative activation rates of different cohorts of newbies.


 * Collaboration patterns in Articles for Creation
 * by Aaron Halfaker -- Wikipedia needs to attract and retain newcomers while also increasing the quality of its content. Yet new Wikipedia users are disproportionately affected by the quality assurance mechanisms designed to thwart spammers and promoters. English Wikipedia’s en:WP:Articles for Creation provides a protected space for newcomers to draft articles, which are reviewed against minimum quality guidelines before they are published. In this presentation, describe and a study of how this drafting process has affected the productivity of newcomers in Wikipedia. Using a mixed qualitative and quantitative approach, I'll show the process's pre-publication review, which is intended to improve the success of newcomers, in fact decreases newcomer productivity in English Wikipedia and offer recommendations for system designers.



April 2014
April 16, 2014 Video: Commons YouTube
 * WikiProjects yesterday, today and tomorrow
 * Morgan_WMFresearchShowcase04162014_slides.pdf)]] by Jonathan Morgan -- in this talk I'll give an overview of some research on English Wikipedia Wikiprojects: what kind of work they do, how they do it, and how they have changed over time. 


 * Visualizing Wikipedia Communities using Gephi
 * by Haitham Shammaa -- I will introduce Gephi as a tool for generating a visualized representation of Wikimedia projects communities. Gephi is an open-source network analysis and visualization software, and is utilized to generate graphs that represent users and the interaction among them based on the frequency they send messages to each other on their talk pages.



March 2014
March 19, 2014 Video: Commons YouTube
 * Metrics standardization
 * Metrics Standardization - Wikimedia Research & Data showcase - March 2014.pdfby Dario Taraborelli -- In this talk I'll present the most recent updates on our work on metrics standardization and give a teaser of the Editor Engagement Vital Signs project. 


 * Wikipedia: maintaining production efficiency
 * Maintaining_production_efficiency_(March,_2014).pdfby Aaron Halfaker -- In Halfaker et al. (2013) we present data that show that several changes the Wikipedia community made to manage quality and consistency in the face of a massive growth in participation have ironically crippled the very growth they were designed to manage. Specifically, the restrictiveness of the encyclopedia's primary quality control mechanism and the algorithmic tools used to reject contributions are implicated as key causes of decreased newcomer retention.



February 2014
February 26, 2014 Video: Commons YouTube


 * Mobile session times
 * Mobile_sessions_presentation_(Feb_2014).pdf by Oliver Keyes -- A prerequisite to many pieces of interesting reader research is being able to accurately identify the length of users' 'sessions'. I will explain one potential way of doing it, how I've applied it to mobile readers, and what research this opens up. (slides, read more)


 * Wikipedia article creation research
 * Wikipedia article creation (Nov, 2013).pdf by Aaron Halfaker -- A brief overview of research examining trends in newcomer article creation across 10 languages with a focus on English and German Wikipedias.  In wikis where anonymous users can create articles, their articles are less likely to be deleted than articles created by newly registered editors.  An in-depth analysis of Articles for Creation (AfC) suggests that while AfC's process seems to result in the publication of high quality articles, it also dramatically reduces the rate at which good new articles are published.  (slides, read more)



January 2014
January 15, 2014
 * IP reliability tracking: by Oliver Keyes
 * The Wikipedia Adventure, quantitative and qualitative results from the pilot: by Jake Orlowitz (User:Ocaasi) We made a 7 mission gamified interactive onboarding tutorial to teach people how to edit Wikipedia in 1 hour. The journey involves badges, barnstars, challenges, and simulated interaction throughout a realistic quest to edit the article Earth. Game dynamics were used to create a sense of understanding, belonging, deep value identification, and technical proficiency. The use of games in open source and free culture online communities has great potential to drive participation. This talk will share the inspiration for taking a gamified approach, a review of the design highlights, and a discussion of quantitative and qualitative data and survey analysis. (slides, read more)

December 2013
December 18, 2013


 * Metrics standardization: Metrics Standardization 10 Dec 2013.pdf by Dario Taraborelli
 * On the nature of Anonymous Editors
 * Anonymous_editors_-_WMF_R%26D_showcase_(Dec._2013).pdf by Aaron Halfaker -- A brief discussion & critique of the use of the term "anonymous" to refer to IP editors and a presentation of research results that suggest that newly registered users who edit anonymous right before registering their account are highly productive. (slides, read more)


 * Overview of Program Evaluation (beta) Reports
 * Program Evaluation overall responses - 2013.png by Jaime Anstee -- A brief overview of the first round reporting for programs including summary of the target measures along with strategies and challenges in metric standardization. Overview outline