Wikimedia Technology/Annual Plans/FY2019/TEC9: Address Knowledge Gaps

Teams contributing to the program
Research, Services, Security, and Language

Annual Plan priorities
Primary Goal: 1. Knowledge Equity - grow new contributors and content

How does your program affect annual plan priority?
The program contributes to annual plan priorities by redefining knowledge gaps through three main types of gapsː content gaps, contributors gaps, and readership gaps.

Knowledge Equity

We contribute towards knowledge equity throughː Knowledge as a Service - increase reach
 * more content in more languages and forms. We research, design, test, and develop technologies that can identify missing content across Wikimedia projects and languages, prioritize them, and recommend them to editors who are interested to contribute towards closing such gaps. By doing so, we surface the gaps of knowledge and their importance (awareness), as well as means for addressing them (enablement), which are key steps towards achieving knowledge equity.
 * a stronger and more balanced representation of minority voices. Some of the gaps in Wikimedia content is associated with gaps in contributor representation. We design and test socio-technical frameworks that can remove some of the barriers for contributions by the minority voices, creating a more equal opportunity for knowledge creation.
 * measuring gaps through the lens of readers and their needs. We generate knowledge that can help the Wikimedia Movement better understand the needs of Wikimedia readers and consumers. What the consumers of the content seek and cannot find is key to understanding the gaps of knowledge on our projects, and encouraging content contribution by including this perspective contributes to knowledge equity.

Algorithms and systems we develop and test are accompanied by the release of data-sets and/or public API end-points that empower others to build on the knowledge we gain through this program.

Knowledge as a Service - evolve our systems and structure

One of the fundamental problems we go back to in every research we do as part of this program is how to represent a Wikipedia article in an abstract level that would allow us to go beyond the content of the article and build technologies for that abstract representation, instead of technologies for specific use-cases or subsets of the problems related to that article. The research in Wikipedia article section alignment indicates that creating such an abstract representation is possible. As part of the upcoming fiscal, we will invest in representation learning) to build towards an abstract representation of Wikipedia articles. Such abstract representation will allow us and others to easily respond to questions such asː give me the list of all articles the belong to topic Presidents that already include section title Honors and Awards, for example. The answer to such a simple question is at the moment very hard to get.

Program Goal
''We help Wikimedia editors identify gaps in the content, readership, and contributors of Wikimedia projects, as well as means for reducing such gaps. We enable Wikimedia developers to build products and services that can surface and help reduce Wikimedia knowledge gaps. We research and develop end-to-end technologies and systems that automatically identify such gaps, prioritize them, and recommend actions or frameworks for reducing them.''
 * Outcome 1
 * One or more of the followingː Interested editors will be able to use recommendation services that will allow them to have relevant information about the articles they want to edit immediately at their repository. Newcomers can more easily contribute to Wikimedia projects. Editathon organizers benefit from automatically generated templates and recommendations that can help them in onboarding new or less experienced editors. More readers will have access to content in their local languages. Developers will be able to surface more diverse set of recommendations for article expansion through their tools.


 * Output 1.1
 * Improved section recommendation algorithm with user-feedback


 * Output 1.2
 * Section recommendation algorithm in many languages.


 * Output 1.3
 * Section recommendation algorithm with more context and information


 * Output 1.4
 * Public test (vs. production) APIs corresponding to algorithms designed and tested in other outputs.


 * Output 1.5
 * The first version of the algorithm that prioritizes missing sections


 * Outcome 2
 * Interested editors, developers, and partners can identify more types of gaps in content


 * Output 2.1
 * An improved task recommendation gadget


 * Output 2.2
 * A framework for understanding and measuring the knowledge gaps and inequality of access to knowledge that includes reader representation by demographics and characterizes readers who come to Wikipedia based on their readership characteristics as well as demographics.




 * Outcome 3
 * More minority voices and diverse newcomers in Wikimedia projects can stay longer on the projects to contribute.


 * Output 3.1
 * An improved socio-technical framework to remove the barriers for contribution by populations that are currently considered minorities on our projects. (Continuation of FY17-18 research)


 * Output 3.2
 * An algorithm to address Wikipedia's cold-start problem when it comes to learning user interests when they join the project. (Continuation of FY17-18 research)


 * Output 3.3
 * A series of baseline statistics on contributor diversity in one or more Wikimedia projects.


 * Outcome 4
 * More decision makers can make more informed decisions about the audiences to target, the gaps to prioritize, and other research findings. More researchers can build on top of the knowledge generated through this research.


 * Output 4.1
 * Citable knowledge about the state of gaps in Wikimedia projects, the needs of Wikimedia users by demographics, and beyond.

Targets

 * Outcome 1
 * One or more of the followingː Interested editors will be able to use recommendation services that will allow them to have relevant information about the articles they want to edit immediately at their repository. Newcomers can more easily contribute to Wikimedia projects. Editathon organizers benefit from automatically generated templates and recommendations that can help them in onboarding new or less experienced editors. More readers will have access to content in their local languages. Developers will be able to surface more diverse set of recommendations for article expansion through their tools.


 * Target
 * Section recommendation with user feedback in test API in at least 2 languages
 * Section recommendation service in at least 10 languages
 * A first list of the types of context the editors would like to see exposed when editing an article
 * First version of section recommendation algorithm with section sorting feature available for testing


 * Measurement method
 * Count of languages supported after API is made available
 * Count of languages supported after API is made available
 * Interviewing users, study of the literature on user-generated-content platforms, and/or contextual research
 * No complex measurements required


 * Outcome 2
 * Interested editors, developers, and partners can identify more types of gaps in content


 * Target
 * At least 1 submitted peer-reviewed publication on reader needs as a function of reader demographics
 * At least 2 blog posts on the state of gaps and production of content with respect to the reader needs


 * Measurement
 * No complex measurements required
 * No complex measurements required


 * Outcome 3
 * More minority voices and diverse newcomers in Wikimedia projects stay longer on the projects to contribute.


 * Targets
 * A tested and approved framework for the longer term retention of more diverse newcomers
 * 1 peer-reviewed publication submitted
 * 1 blog post
 * Reliably match newcomers with similar interests


 * Measurements
 * Controlled experiment in at least one Wikipedia language
 * Whether it is submitted
 * No complex measurements required
 * Controlled experiment or through interviews and user satisfaction measures


 * Outcome 4
 * More decision makers can make more informed decisions about the audiences to target, the gaps to prioritize, and other research findings. More researchers can build on top of the knowledge generated through this research.


 * Targets
 * Blog posts, peer-reviewed, and ongoing documentation
 * Dissemination of citable knowledge through new platforms and meansː pre-made slide-decks for usage by decision makers, "Did you know?" initiative, ...


 * Measurements
 * No complex measurements required
 * No complex measurements required

Dependencies
''Outcome 3 relies on deep collaborations with Audiences. The department and Research has agreed to work together to adapt QuickSurveys to make the collection of data and computation of the baseline statistics possible. The framework tested in this program are also closely related to Growth. The two teams expect to work closely on this front.''