The Monthly Wikimedia Research Showcase is a public showcase of recent research by the Wikimedia Foundation's Research Team and guest presenters from the academic community. The showcase is hosted virtually every 3rd Wednesday of the month at 9:30 a.m. Pacific Time/18:30 p.m. CET and is live-streamed on YouTube. The schedule may change, see the calendar below for a list of confirmed showcases.

How to attend

We live stream our research showcase every month on YouTube. The link will be in each showcase's details below and is also announced in advance via wiki-research-l, analytics-l, and @WikiResearch on Twitter. You can join the conversation and participate in Q&A after each presentation using the YouTube chat. We expect all presenters and attendees to abide by our Friendly Space Policy.

Upcoming Events

August 2024

No showcase due to Wikimania.

Archive

For information about past research showcases (2013-present), you can search below or see listing of all months here.

2024

July 2024

Time: Wednesday, July 24, 16:30 UTC: Find your local time here
Theme: Machine Translation on Wikipedia

July 24, 2024 Video: YouTube

The Promise and Pitfalls of AI Technology in Bridging Digital Language Divide

By Kai Zhu, Bocconi University

Machine translation technologies have the potential to bridge knowledge gaps across languages, promoting more inclusive access to information regardless of native languages. This study examines the impact of integrating Google Translate into Wikipedia's Content Translation system in January 2019. Employing a natural experiment design and difference-in-differences strategy, we analyze how this translation technology shock influenced the dynamics of content production and accessibility on Wikipedia across over a hundred languages. We find that this technology integration leads to a 149% increase in content production through translation, driven by existing editors becoming more productive as well as an expansion of the editor base. Moreover, we observe that machine translation enhances the propagation of biographical and geographical information, helping to close these knowledge gaps in the multilingual context. However, our findings also underscore the need for continued efforts to mitigate the preexisting systemic barriers. Our study contributes to our knowledge on the evolving role of artificial intelligence in shaping knowledge dissemination through enhanced language translation capabilities.

Paper: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4708614

Implications of Using Inorganic Content in Arabic Wikipedia Editions

By Saied Alshahrani and Jeanna Matthews, Clarkson University

Wikipedia articles (content pages) are one of the widely utilized training corpora for NLP tasks and systems, yet these articles are not always created, generated, or even edited organically by native speakers; some are automatically created, generated, or translated using Wikipedia bots or off-the-shelf translation tools like Google Translate without human revision or supervision. We first analyzed the three Arabic Wikipedia editions, Arabic (AR), Egyptian Arabic (ARZ), and Moroccan Arabic (ARY), and found that these Arabic Wikipedia editions suffer from a few serious issues, like large-scale automatic creations and translations from English to Arabic, all without human involvement, generating content (articles) that lack not only linguistic richness and diversity but also content that lacks cultural richness and meaningful representation of the Arabic language and its native speakers. We second studied the performance implications of using such inorganic, unrepresentative articles to train NLP tasks or systems, where we intrinsically evaluated the performance of two main NLP upstream tasks, namely word representation and language modeling, using word analogy and fill-mask evaluations. We found that most of the models trained on the organic and representative content outperformed or, at worst, performed on par with the models trained with inorganic content generated using bots or translated using templates included, demonstrating that training on unrepresentative content not only impacts the representation of native speakers but also impacts the performance of NLP tasks or systems. We recommend avoiding utilizing the automatically created, generated, or translated articles on Wikipedia when the task is a representation-based task, like measuring opinions, sentiments, or perspectives of native speakers, and also suggest that when registered users employ automated creation or translation, their contributions should be marked differently than “registered user” for better transparency; perhaps “registered user (automation-assisted)”.

Paper: https://aclanthology.org/2023.arabicnlp-1.19.pdf

June 2024

No Research Showcase due to Wiki Workshop.

May 2024

Time: Wednesday, May 15, 16:30 UTC: Find your local time here
Theme: Reader to Editor Pipeline

May 15, 2024 Video: YouTube

Journey Transitions

By Mike Raish and Daisy Chen

What kinds of events do readers and editors identify as separating the stages of their relationship with Wikipedia, and which of these kinds of events might the Wikimedia Foundation possibly support through design interventions? In the Journey Transitions qualitative research project, the WMF Design Research team interviewed readers and editors in Arabic, Spanish, and English in order to answer these questions and provide guidance to WMF Product teams making strategic decisions. A series of semi-structured interviews revealed that readers and editors describe their relationships with Wikipedia in different ways, with readers describing a static and transactional relationship, and that even many experienced editors express confusion about core functions of the Wikimedia ecosystem, such as the role of Talk pages. This presentation will describe the Journey Transitions research, as well as present its implications for the sponsoring Product teams in order to shed light on the way that qualitative research is used to inform strategic decisions in the Wikimedia Foundation.

Project: Journey transitions

Increasing participation in peer production communities with the Growth features

By Morten Warncke-Wang and Kirsten Stoller

For peer production communities to be sustainable, they must attract and retain new contributors. Studies have identified social and technical barriers to entry and discovered some potential solutions, but these solutions have typically focused on a single highly successful community, the English Wikipedia, been tested in isolation, and rarely evaluated through controlled experiments. In this talk, we show how the Wikimedia Foundation’s Growth team collaborates with Wikipedia communities to develop and experiment with new features to improve the newcomer experience in Wikipedia. We report findings from a large-scale controlled experiment using the Newcomer Homepage, a central place where newcomers can learn how peer production works and find opportunities to contribute, and show how the effectiveness depends on the newcomer’s context. Lastly, we show how the Growth team has continued developing features that further improve the newcomer experience while adapting to community needs.

Paper: https://arxiv.org/abs/2308.09642

April 2024

Time: Wednesday, April 17, 16:30 UTC: Find your local time here
Theme: Supporting Multimedia on Wikipedia

April 17, 2024 Video: YouTube

Towards image accessibility solutions grounded in communicative principles

By Elisa Kreiss

Images have become an omnipresent communicative tool -- and this is no exception on Wikipedia. However, the undeniable benefits they carry for sighted communicators turns into a serious accessibility challenge for people who are blind or have low vision (BLV). BLV users often have to rely on textual descriptions of those images to equally participate in an ever-increasing image-dominated online lifestyle. In this talk, I will present how framing accessibility as a communication problem highlights important ways forward in redefining image accessibility on Wikipedia. I will present the Wikipedia-based dataset Concadia and use it to discuss the successes and shortcomings of image captions and alt texts for accessibility, and how the usefulness of accessibility descriptions is fundamentally contextual. I will conclude by highlighting the potential and risks of AI-based solutions and discussing implications for different Wikipedia editing communities.

Code: https://github.com/elisakreiss/concadia
Paper: https://arxiv.org/abs/2104.08376

Automatic Multi-Path Web Story Creation from a Structural Article

By Daniel Nkemelu

Web articles such as Wikipedia serve as one of the major sources of knowledge dissemination and online learning. However, their in-depth information--often in a dense text format--may not be suitable for mobile browsing, even in a responsive user interface. We propose an automatic approach that converts a structured article of any length into a set of interactive Web Stories that are ideal for mobile experiences. We focused on Wikipedia articles and developed Wiki2Story, a pipeline based on language and layout models, to demonstrate the concept. Wiki2Story dynamically slices an article and plans one to multiple Story paths according to the document hierarchy. For each slice, it generates a multi-page summary Story composed of text and image pairs in visually appealing layouts. We derived design principles from an analysis of manually created Story practices. We executed our pipeline on 500 Wikipedia documents and conducted user studies to review selected outputs. Results showed that Wiki2Story effectively captured and presented salient content from the original articles and sparked interest in viewers.

Paper: https://arxiv.org/abs/2310.02383

March 2024

Time: Wednesday, March 20, 16:30 UTC: Find your local time here
Theme: Addressing Gender Gaps

Wednesday, March 20, 2023 Video: YouTube

Leveraging Recommender Systems to Reduce Content Gaps on Wikipedia

By Mo Houtti

Many Wikipedians use algorithmic recommender systems to help them find interesting articles to edit. The algorithms underlying those systems are driven by a straightforward assumption: we can look at what someone edited in the past to figure out what they’ll most likely want to edit next. But the story of what Wikipedians want to edit is almost definitely more complex than that. For example, our own prior research shows that Wikipedians prefer prioritizing articles that would minimize content gaps. So, we asked, what would happen if we incorporated that value into Wikipedians’ personalized recommendations? Through a controlled experiment on SuggestBot, we found that recommending more content gap articles didn’t significantly impact editing, despite those articles being less “optimally interesting” according to the recommendation algorithm. In this presentation, I will describe our experiment, our results, and their implications - including how recommender systems can be one useful strategy for tackling content gaps on Wikipedia.

Paper: https://arxiv.org/abs/2307.08669

Bridging the offline and online- Offline meetings of Wikipedians

[[1]]

By Nicole Schwitter

Wikipedia is primarily known as an online encyclopaedia, but it also features a noteworthy offline component: Wikipedia and particularly its German-language edition – which is one of the largest and most active language versions – is characterised by regular local offline meetups which give editors the chance to get to know each other. This talk will present the recently published dewiki meetup dataset which covers (almost) all offline gatherings organised on the German-language version of Wikipedia. The dataset covers almost 20 years of offline activity of the German-language Wikipedia, containing 4418 meetups that have been organised with information on attendees, apologies, date and place of meeting, and minutes recorded. The talk will explain how the dataset can be used for research, highlight the importance of considering offline meetings among Wikipedians, and place these insights within the context of addressing gender gaps within Wikipedia.

Paper: https://link.springer.com/article/10.1007/s42001-023-00225-8

February 2024

Time: Wednesday, February 21, 16:30 UTC: Find your local time here
Theme: Platform Governance and Policies

Wednesday, February 21, 2023 Video: YouTube

Sociotechnical Designs for Democratic and Pluralistic Governance of Social Media and AI

By Amy X. Zhang, University of Washington

Decisions about policies when using widely-deployed technologies, including social media and more recently, generative AI, are often made in a centralized and top-down fashion. Yet these systems are used by millions of people, with a diverse set of preferences and norms. Who gets to decide what are the rules, and what should the procedures be for deciding them---and must we all abide by the same ones? In this talk, I draw on theories and lessons from offline governance to reimagine how sociotechnical systems could be designed to provide greater agency and voice to everyday users and communities. This includes the design and development of: 1) personal moderation and curation controls that are usable and understandable to laypeople, 2) tools for authoring and carrying out governance to suit a community's needs and values, and 3) decision-making workflows for large-scale democratic alignment that are legitimate and consistent.

January 2024

Time: Wednesday, January 17, 17:30 UTC: Find your local time here
Theme: Connecting Actions with Policy

January 17, 2023 Video: YouTube

Presenting the report "Unreliable Guidelines"

By Amber Berson and Monika Sengul-Jones

The goal behind the report Unreliable Guidelines: Reliable Sources and Marginalized Communities in French, English and Spanish Wikipedias was to understand the effects of the set of reliable source guidelines and rules on the participation of and the content about marginalized communities on three Wikipedias. Two years following the release of their report, researchers Berson and Sengul-Jones reflect on the impact of their research as well as the actionable next steps.

Paper: https://artandfeminism.org/resources/research/unreliable-guidelines/

Why Should This Article Be Deleted? Transparent Stance Detection in Multilingual Wikipedia Editor Discussions

By Lucie-Aimée Kaffee and Arnav Arora

The moderation of content on online platforms is usually non-transparent. On Wikipedia, however, this discussion is carried out publicly and the editors are encouraged to use the content moderation policies as explanations for making moderation decisions. However, currently only a few comments explicitly mention those policies. To aid in this process of understanding how content is moderated, we construct a novel multilingual dataset of Wikipedia editor discussions along with their reasoning in three languages. We demonstrate that stance and corresponding reason (policy) can be predicted jointly with a high degree of accuracy, adding transparency to the decision-making process.

Paper: Kaffee, Lucie-Aimée, Arnav Arora, and Isabelle Augenstein. Why Should This Article Be Deleted? Transparent Stance Detection in Multilingual Wikipedia Editor Discussions. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023.