File:Wikimedia Research Showcase - June 2017.webm

Jump to: navigation, search
Original file(WebM audio/video file, VP8/Vorbis, length 1 h 8 min 0 s, 1,920 × 1,080 pixels, 563 kbps overall)



Wikimedia Research Showcase - June 2017 (

  • Allen Yilun Lin: Problematizing and Addressing the Article-as-Concept Assumption in Wikipedia

Abstract: Wikipedia-based studies and systems frequently assume that each article describes a separate concept. However, in this paper, we show that this article-as-concept assumption is problematic due to editors’ tendency to split articles into parent articles and sub-articles when articles get too long for readers (e.g. “United States” and “American literature” in the English Wikipedia). In this paper, we present evidence that this issue can have significant impacts on Wikipedia-based studies and systems and introduce the subarticle matching problem. The goal of the sub-article matching problem is to automatically connect sub-articles to parent articles to help Wikipedia-based studies and systems retrieve complete information about a concept. We then describe the first system to address the sub-article matching problem. We show that, using a diverse feature set and standard machine learning techniques, our system can achieve good performance on most of our ground truth datasets, significantly outperforming baseline approaches. Related CSCW 2017 paper: (preprint, citation), Open-source code

  • Markus Kroetzsch: Understanding Wikidata Queries

Abstract: Wikimedia provides a public service that lets anyone answer complex questions over the sum of all knowledge stored in Wikidata. These questions are expressed in the query language SPARQL and range from the most simple fact retrievals ("What is the birthday of Douglas Adams?") to complex analytical queries ("Average lifespan of people by occupation"). The talk presents ongoing efforts to analyse the server logs of the millions of queries that are answered each month. It is an important but difficult challenge to draw meaningful conclusions from this dataset. One might hope to learn relevant information about the usage of the service and Wikidata in general, but at the same time one has to be careful not to be misled by the data. Indeed, the dataset turned out to be highly heterogeneous and unpredictable, with strongly varying usage patterns that make it difficult to draw conclusions about "normal" usage. The talk will give a status report, present preliminary results, and discuss possible next steps. (Project page on meta)

Source YouTube: Wikimedia Research Showcase - June 2017
Author Wikimedia Foundation


This screenshot or video was originally uploaded on YouTube under a CC license.
Their website states: "YouTube allows users to mark their videos with a Creative Commons CC BY license."
To the uploader: You must provide a link (URL) to the original file and the authorship information if available.
w:en:Creative Commons
This file is licensed under the Creative Commons Attribution 3.0 Unported license.
Attribution: Wikimedia Foundation
You are free:
  • to share – to copy, distribute and transmit the work
  • to remix – to adapt the work
Under the following conditions:
  • attribution – You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work).

This file, which was originally posted to YouTube: Wikimedia Research Showcase - June 2017, was reviewed on by the administrator or reviewer Dyolf77, who confirmed that it was available there under the stated license on that date.

File history

Click on a date/time to view the file as it appeared at that time.

current04:41, 27 June 20171 h 8 min 0 s, 1,920 × 1,080 (273.9 MB)AtlasowaImported media from

There are no pages that link to this file.