Wikimedia Research/Showcase

The Monthly Research & Data Showcase is a public showcase of recent research by the Wikimedia Foundation's Research and Data Team, other WMF researchers and occasionally guest presenters. The showcase is hosted at the Wikimedia Foundation every 3rd Wednesday of the month at 11.30 Pacific Time and live streamed on YouTube. The schedule may change, see the calendar below for a list of confirmed showcases.

How to attend
We live stream our research showcase every month on YouTube. The link is announced a few minutes before the showcase starts via wiki-research-l, analytics-l and @WikiResearch. You can join the conversation and participate in Q&A after each presentation by connecting to our IRC channel on freenode:

July 2015
July 29, 2015 Video:
 * VisualEditor's effect on newly registered users
 * By Aaron Halfaker
 * It's been nearly two years since we ran an initial study of VisualEditor's effect on newly registered editors. While most of the results of this study were positive (e.g. workload on Wikipedians did not increase), we still saw a significant decrease in the newcomer productivity.  In the meantime, the Editing team has made substantial improvements to performance and functionality.  In this presentation, I'll report on the results of a new experiment designed to test the effects of enabling this improved VisualEditor software for newly registered users by default.  I'll show what we learned from the experiment and discuss some results have opened larger questions about what, exactly, is difficult about being a newcomer to English Wikipedia.

Wikipedia knowledge graph with DeepDive


 * By Juhana Kangaspunta and Thomas Palomares
 * Despite the tremendous amount of information present on Wikipedia, only a very little amount is structured. Most of the information is embedded in text and extracting it is a non-trivial challenge. In this project, we try to populate Wikidata, a structured component of Wikipedia, using Deepdive tool to extract relations embedded in the text. We finally extracted more than 140,000 relations with more than 90% average precision.This report is structured as follows:  first we present DeepDive and the data that we use for this project. Second, we clarify the relations we focused on so far and explain the implementation and pipeline, including our model, features and extractors. Finally, we detail our results with a thorough precision and recall analysis.

May 2015
May 13, 2015 Video: YouTube
 * The people's classifier: Towards an open model for algorithmic infrastructure


 * By Aaron Halfaker
 * Recent research has implicated that Wikipedia's algorithmic infrastructure is perpetuating social issues. However, these same algorithmic tools are critical to maintaining efficiency of open projects like Wikipedia at scale.  But rather than simply critiquing algorithmic wiki-tools and calling for less algorithmic infrastructure, I'll propose a different strategy -- an open approach to building this algorithmic infrastructure.  In this presentation, I'll demo a set of services that are designed to open a critical part Wikipedia's quality control infrastructure -- machine classifiers.  I'll also discuss how this strategy unites critical/feminist HCI with more dominant narratives about efficiency and productivity.
 * Social transparency online


 * By Jennifer Marlow and Laura Dabbish
 * An emerging Internet trend is greater social transparency, such as the use of real names in social networking sites, feeds of friends' activities, traces of others' re-use of content, and visualizations of team interactions. There is a potential for this transparency to radically improve coordination, particularly in open collaboration settings like Wikipedia. In this talk, we will describe some of our research identifying how transparency influences collaborative performance in online work environments. First, we have been studying professional social networking communities. Social media allows individuals in these communities to create an interest network of people and digital artifacts, and get moment-by-moment updates about actions by those people or changes to those artifacts. It affords and unprecedented level of transparency about the actions of others over time. We will describe qualitative work examining how members of these communities use transparency to accomplish their goals. Second, we have been looking at the impact of making workflows transparent. In a series of field experiments we are investigating how socially transparent interfaces, and activity trace information in particular, influence perceptions and behavior towards others and evaluations of their work.

April 2015
April 30, 2015 Video: YouTube
 * Creating, remixing, and planning in open online communities
 * By Jeff Nickerson
 * Paradoxically, users in remixing communities don’t remix very much. But an analysis of one remix community, Thingiverse, shows that those who actively remix end up producing work that is in turn more likely to remixed. What does this suggest about Wikipedia editing? Wikipedia allows more types of contribution, because creating and editing pages are done in a planning context: plans are discussed on particular loci, including project talk pages. Plans on project talk pages lead to both creation and editing; some editors specialize in making article changes and others, who tend to have more experience, focus on planning rather than acting. Contributions can happen at the level of the article and also at a series of meta levels. Some patterns of behavior – with respect to creating versus editing and acting versus planning – are likely to lead to more sustained engagement and to higher quality work. Experiments are proposed to test these conjectures.
 * Authority, power and culture on Wikipedia: The oral citations debate
 * By Heather Ford
 * In 2011, Wikimedia Foundation Advisory Board member, Achal Prabhala was funded by the WMF to run a project called 'People are knowledge' or the Oral citations project. The goal of the project was to respond to the dearth of published material about topics of relevance to communities in the developing world and, although the majority of articles in languages other than English remain intact, the English editions of these articles have had their oral citations removed. I ask why this happened, what the policy implications are for oral citations generally, and what steps can be taken in the future to respond to the problem that this project (and more recent versions of it) set out to solve. This talk comes out of an ethnographic project in which I have interviewed some of the actors involved in the original oral citations project, including the majority of editors of the surr article that I trace in a chapter of my PhD.

March 2015
March 25, 2015 Video: YouTube
 * User Session Identification Based on Strong Regularities in Inter-activity Time
 * By Aaron Halfaker
 * Session identification is a common strategy used to develop metrics for web analytics and behavioral analyses of user-facing systems. Past work has argued that session identification strategies based on an inactivity threshold is inherently arbitrary or advocated that thresholds be set at about 30 minutes. In this work, we demonstrate a strong regularity in the temporal rhythms of user initiated events across several different domains of online activity (incl. video gaming, search, page views and volunteer contributions). We describe a methodology for identifying clusters of user activity and argue that regularity with which these activity clusters appear implies a good rule-of-thumb inactivity threshold of about 1 hour. We conclude with implications that these temporal rhythms may have for system design based on our observations and theories of goal-directed human activity.


 * Mining Missing Hyperlinks from Human Navigation Traces
 * By Bob West
 * Wikipedia relies crucially on the links between articles, but important links are often missing. In most prior work, the problem of detecting missing links is addressed by constructing a model of the existing link structure and then predicting the missing links based on this model. In this work we propose a novel method that does not rely on such a model of the static structure of existing links, but rather starts from data capturing how these links are used by people. The approach is guided by the intuition that the ultimate purpose of hyperlinks is to aid navigation, so we argue that the objective should be to suggest links that are likely to be clicked by users. In a nutshell, our algorithm suggests an as yet non-existent link from S to T for addition if users who open S are much more likely than random to later also open T. We show that this simple algorithm yields good link suggestions when run on data from the human-computation game Wikispeedia.net. Finally, we show preliminary results that show the method also works "in the wild", i.e., on navigation data mined directly from Wikipedia's server logs.

February 2015
February 18, 2015 Video: YouTube
 * Global South User Survey 2014
 * By Haitham Shammaa
 * Users' trends in the Global South have significantly changed over the past two years, and given the increase in interest in Global South communities and their activities, we wanted this survey to focus on understanding the statistics and needs of our users (both readers, and editors) in the regions listed in the WMF's New Global South Strategy. This survey aims to provide a better understanding of the specific needs of local user communities in the Global South, as well as provide data that supports product and program development decision making process.


 * Ingesting Open Geodata: Observations from OpenStreetMap
 * By Alan McConchie
 * As Wikidata grapples with the challenges of ingesting external data sources such as Freebase, what lessons can we learn from other open knowledge projects that have had similar experiences? OpenStreetMap, often called "The Wikipedia of Maps", is a crowdsourced geospatial data project covering the entire world. Since the earliest years of the project, OSM has combined user contributions with existing data imported from external sources. Within the OSM community, these imports have been controversial; some core OSM contributors complain that imported data is lower quality than user-contributed data, or that it discourages the growth of local mapping communities. In this talk, I'll review the history of data imports in OSM, and describe how OSM's best-practices have evolved over time in response to these critiques.

January 2015
January 14, 2015 Video: YouTube
 * Functional roles and career paths in Wikipedia


 * By Felipe Ortega
 * An understanding of participation dynamics within online production communities requires an examination of the roles assumed by participants. Recent studies have established that the organizational structure of such communities is not flat; rather, participants can take on a variety of well-defined functional roles. What is the nature of functional roles? How have they evolved? And how do participants assume these functions? Prior studies focused primarily on participants' activities, rather than functional roles. Further, extant conceptualizations of role transitions in production communities, such as the Reader to Leader framework, emphasize a single dimension: organizational power, overlooking distinctions between functions. In contrast, in this paper we empirically study the nature and structure of functional roles within Wikipedia, seeking to validate existing theoretical frameworks. The analysis sheds new light on the nature of functional roles, revealing the intricate "career paths" resulting from participants' role transitions.


 * Free Knowledge Beyond Wikipedia
 * A conversation facilitated by Benjamin Mako Hill
 * In some of my research with Leah Buechley, I've explored the way that increasing engagement and diversity in technology communities often means not just attacking systematic barriers to participation but also designing for new genres and types of engagement. I hope to facilitate a conversation about how WMF might engage new readers by supporting more non-encyclopedic production. I'd like to call out some examples from the new Wikimedia project proposals list, encourage folks to share entirely new ideas, and ask for ideas about how we could dramatically better support Wikipedia's sister projects.

December 2014
December 18, 2014 Video: YouTube
 * Mobile Madness: The Changing Face of Wikimedia Readers


 * By Oliver Keyes
 * A dive into the data we have around readership that investigates the rising popularity of the mobile web, countries and projects that are racing ahead of the pack, and what changes in user behaviour we can expect to see as mobile grows.
 * Global Disease Monitoring and Forecasting with Wikipedia
 * By Reid Priedhorsky (Los Alamos National Laboratory)
 * Infectious disease is a leading threat to public health, economic stability, and other key social structures. Efforts to mitigate these impacts depend on accurate and timely monitoring to measure the risk and progress of disease. Traditional, biologically-focused monitoring techniques are accurate but costly and slow; in response, new techniques based on social internet data, such as social media and search queries, are emerging. These efforts are promising, but important challenges in the areas of scientific peer review, breadth of diseases and countries, and forecasting hamper their operational usefulness. We examine a freely available, open data source for this use: access logs from the online encyclopedia Wikipedia. Using linear models, language as a proxy for location, and a systematic yet simple article selection procedure, we tested 14 location-disease combinations and demonstrate that these data feasibly support an approach that overcomes these challenges. Specifically, our proof-of-concept yields models with r² up to 0.92, forecasting value up to the 28 days tested, and several pairs of models similar enough to suggest that transferring models from one location to another without re-training is feasible. Based on these preliminary results, we close with a research agenda designed to overcome these challenges and produce a disease monitoring and forecasting system that is significantly more effective, robust, and globally comprehensive than the current state of the art.

November 2014
November 14, 2014 Video: YouTube


 * Does Team Competition Increase Pro-Social Lending? Evidence from Online Microfinance.


 * By Yan Chen
 * In the first half of the talk, I will present our empirical analysis of the effects of team competition on pro-social lending activity on Kiva.org, the first microlending website to match lenders with entrepreneurs in developing countries. Using naturally occurring field data, we find that lenders who join teams contribute 1.2 more loans per month than those who do not. Furthermore, teams differ in activity levels. To investigate this heterogeneity, we run a field experiment by posting forum messages. Compared to the control, we find that lenders from inactive teams make significantly more loans when exposed to a goal-setting message and that team coordination increases the magnitude of this effect.


 * In the second part of the talk, I will discuss a randomized field experiment we did in May 2014, when we recommend teams to lenders on Kiva. We find that lenders are more likely to join teams in their local area. However, after joining teams, those who join popular teams (on the leaderboard) are more active in lending.

October 2014
October 15, 2014 Video: Commons? YouTube
 * Emotions under Discussion: Gender, Status and Communication in Wikipedia
 * By David Laniado: I will present a large-scale analysis of emotional expression and communication style of editors in Wikipedia discussions. The talk will focus especially on how emotion and dialogue differ depending on the status, gender, and the communication network of the about 12000 editors who have written at least 100 comments on the English Wikipedia's article talk pages. The analysis is based on three different predefined lexicon-based methods for quantifying emotions: ANEW, LIWC and SentiStrength. The results unveil significant differences in the emotional expression and communication style of editors according to their status and gender, and can help to address issues such as gender gap and editor stagnation.


 * Wikipedia as a socio-technical system


 * By Aaron Halfaker: Wikipedia is a socio-technical system. In this presentation, I'll explain how the integration of human collective behavior ("social") and information technology ("technical") has lead to phenomena that, while being massively productive, is poorly understood due to lack of precedence.  Based on my work in this area, I'll describe five critical functions that healthy, Wikipedia-like socio-technical systems must serve in order to continue to function: allocation, regulation, quality control, community management and reflection.   Finally, I'll conclude with an overview of three classes of new projects that should provide critical opportunities to both practically and academically understand the maintenance of Wikipedia's socio-technical fitness.



September 2014
September 17, 2014 ''The September showcase was canceled because of a conflict with other events scheduled by WMF. We will resume showcases in October.''

August 2014
August 20, 2014 Video: Commons? YouTube
 * Everything You Know About Mobile Is WrW^Right: Editing and Reading Pattern Variation Between User Types


 * By Oliver Keyes: Using new geolocation tools, we look at reader and editor behaviour to understand how and when people access and contribute to our content. This is largely exploratory research, but has potential implications for our A/B testing and how we understand both cultural divides between reader and editor groups from different countries, and how we understand the differences between types of edit and the editors who make them.


 * Wikipedia Article Curation: Understanding Quality, Recommending Tasks


 * By Morten Warncke-Wang: In this talk we look at article curation in Wikipedia through the lens of task suggestions and article quality. The first part of the talk presents SuggestBot, the Wikipedia article recommender. SuggestBot connects contributors with articles similar to those they previously edited. In the second part of the talk, we discuss Wikipedia article quality using “actionable” features, features that contributors can easily act upon to improve article quality. We will first discuss these features’ ability to predict article quality, before coming back to SuggestBot and show how these predictions and actionable features can be used to improve the suggestions.

July 2014
July 16, 2014 Video: Commons YouTube
 * Halfak's wiki research libraries (v0.0.1)


 * By Aaron Halfaker: Along with quantitative research comes data and analysis code. In this presentation, Aaron will introduce you to 4 python libraries that capture code he uses on a regular basis to get his wiki research done.  MediaWiki Utilities is a general data processing library that includes connectors for the API and MySQL databases as well as an XML dump parser and revert detection.  Wiki-Class is a machine learning library that is designed to train, test and deploy automatic quality assessment class detection for Wikipedia articles.  MediaWiki-OAuth provides a simple interface for performing an OAuth handshake with a MediaWiki installation (e.g. Wikipedia).  Deltas is an experimental text difference detection library that implements cutting-edge research to track changes to Wikipedia articles and attribute authorship of content.


 * Using Open Data and Stories to Broaden Crowd Content


 * By Nathan Matias: Nathan will share a series of research on gender diversity online and designs for collaborative content creation that foster learning and community. He will also demo a prototype for a system that could leverage open data to attract and support new Wikipedia contributors.



June 2014
June 18, 2014 Video: Commons YouTube
 * Moodbar -- lightweight socialization improves long-term editor retention.pdfar -- lightweight socialization improves long-term editor retention
 * by Giovanni Luca Ciampaglia -- I will talk about MoodBar, an experimental feature deployed on the English Wikipedia from 2011 to 2013 to streamline the socialization of newcomers. I will present results from a natural experiment that measured the effect of Moodbar on the short-term engagement and long-term retention of newly registered users attempting to edit for the first time Wikipedia. Our results indicate that a mechanism to elicit lightweight feedback and to provide early mentoring to newcomers significantly improves their chances of becoming long-term contributors.


 * Active editor survival.pdfe Editors' Survival Models
 * by Leila Zia -- I will talk about first results in building prediction models for active editors' survival. A sample of such prediction models, their performance, and the important variables in predicting survival will be presented.



May 2014
May 21, 2014 Video: Commons YouTube
 * A bird's eye view of editor activation
 * by Dario Taraborelli -- In this talk I will give a high-level overview of data on new editor activation, presenting longitudinal data from the largest Wikipedias, a comparison between desktop and mobile registrations and the relative activation rates of different cohorts of newbies.


 * Collaboration patterns in Articles for Creation
 * by Aaron Halfaker -- Wikipedia needs to attract and retain newcomers while also increasing the quality of its content. Yet new Wikipedia users are disproportionately affected by the quality assurance mechanisms designed to thwart spammers and promoters. English Wikipedia’s en:WP:Articles for Creation provides a protected space for newcomers to draft articles, which are reviewed against minimum quality guidelines before they are published. In this presentation, describe and a study of how this drafting process has affected the productivity of newcomers in Wikipedia. Using a mixed qualitative and quantitative approach, I'll show the process's pre-publication review, which is intended to improve the success of newcomers, in fact decreases newcomer productivity in English Wikipedia and offer recommendations for system designers.



April 2014
April 16, 2014 Video: Commons YouTube
 * WikiProjects yesterday, today and tomorrow
 * Morgan_WMFresearchShowcase04162014_slides.pdf)]] by Jonathan Morgan -- in this talk I'll give an overview of some research on English Wikipedia Wikiprojects: what kind of work they do, how they do it, and how they have changed over time. 


 * Visualizing Wikipedia Communities using Gephi
 * by Haitham Shammaa -- I will introduce Gephi as a tool for generating a visualized representation of Wikimedia projects communities. Gephi is an open-source network analysis and visualization software, and is utilized to generate graphs that represent users and the interaction among them based on the frequency they send messages to each other on their talk pages.



March 2014
March 19, 2014 Video: Commons YouTube
 * Metrics standardization
 * Metrics Standardization - Wikimedia Research & Data showcase - March 2014.pdfby Dario Taraborelli -- In this talk I'll present the most recent updates on our work on metrics standardization and give a teaser of the Editor Engagement Vital Signs project. 


 * Wikipedia: maintaining production efficiency
 * Maintaining_production_efficiency_(March,_2014).pdfby Aaron Halfaker -- In Halfaker et al. (2013) we present data that show that several changes the Wikipedia community made to manage quality and consistency in the face of a massive growth in participation have ironically crippled the very growth they were designed to manage. Specifically, the restrictiveness of the encyclopedia's primary quality control mechanism and the algorithmic tools used to reject contributions are implicated as key causes of decreased newcomer retention.



February 2014
February 26, 2014 Video: Commons YouTube


 * Mobile session times
 * Mobile_sessions_presentation_(Feb_2014).pdf by Oliver Keyes -- A prerequisite to many pieces of interesting reader research is being able to accurately identify the length of users' 'sessions'. I will explain one potential way of doing it, how I've applied it to mobile readers, and what research this opens up. (slides, read more)


 * Wikipedia article creation research
 * Wikipedia article creation (Nov, 2013).pdf by Aaron Halfaker -- A brief overview of research examining trends in newcomer article creation across 10 languages with a focus on English and German Wikipedias.  In wikis where anonymous users can create articles, their articles are less likely to be deleted than articles created by newly registered editors.  An in-depth analysis of Articles for Creation (AfC) suggests that while AfC's process seems to result in the publication of high quality articles, it also dramatically reduces the rate at which good new articles are published.  (slides, read more)



January 2014
January 15, 2014
 * IP reliability tracking: by Oliver Keyes
 * The Wikipedia Adventure, quantitative and qualitative results from the pilot: by Jake Orlowitz (User:Ocaasi) We made a 7 mission gamified interactive onboarding tutorial to teach people how to edit Wikipedia in 1 hour. The journey involves badges, barnstars, challenges, and simulated interaction throughout a realistic quest to edit the article Earth. Game dynamics were used to create a sense of understanding, belonging, deep value identification, and technical proficiency. The use of games in open source and free culture online communities has great potential to drive participation. This talk will share the inspiration for taking a gamified approach, a review of the design highlights, and a discussion of quantitative and qualitative data and survey analysis. (slides, read more)

December 2013
December 18, 2013


 * Metrics standardization: Metrics Standardization 10 Dec 2013.pdf by Dario Taraborelli
 * On the nature of Anonymous Editors
 * Anonymous_editors_-_WMF_R%26D_showcase_(Dec._2013).pdf by Aaron Halfaker -- A brief discussion & critique of the use of the term "anonymous" to refer to IP editors and a presentation of research results that suggest that newly registered users who edit anonymous right before registering their account are highly productive. (slides, read more)


 * Overview of Program Evaluation (beta) Reports
 * Program Evaluation overall responses - 2013.png by Jaime Anstee -- A brief overview of the first round reporting for programs including summary of the target measures along with strategies and challenges in metric standardization. Overview outline