Wikimedia Technology/Annual Plans/FY2019/TEC9: Address Knowledge Gaps/Goals

=Program Goals and Status for FY18/19=

TEC9: Address Knowledge Gaps
 * Goal Owner: Leila Zia
 * Program Goals for FY18/19: We help Wikimedia editors identify gaps in the content, readership, and contributors of Wikimedia projects, as well as means for reducing such gaps. We enable Wikimedia developers to build products and services that can surface and help reduce Wikimedia knowledge gaps. We research and develop end-to-end technologies and systems that automatically identify such gaps, prioritize them, and recommend actions or frameworks for reducing them.
 * Annual Plan: TEC9: Address Knowledge Gaps
 * Primary Goal is Knowledge Equity: grow new contributors and content
 * Tech Goal: Supporting our Community of contributors



 = Q1 Goals =

Outcome 1 / Output 0
One or more of the following: Interested editors will be able to use recommendation services that will allow them to have relevant information about the articles they want to edit immediately at their repository. Newcomers can more easily contribute to Wikimedia projects. Editathon organizers benefit from automatically generated templates and recommendations that can help them in onboarding new or less experienced editors. More readers will have access to content in their local languages. Developers will be able to surface more diverse set of recommendations for article expansion through their tools.
 * A unit test to measure the bias of recommendation algorithms

Goal(s)

 * A report of the state of the art on bias detection and algorithm audibility in the context of recommendation systems (based on the review of the literature and industry interviews)

Status
July 2018

August 21, 2018
 * This is still in a status, this effort will be toward developing a procedural framework and this will be the first step forward.

September 13, 2018


 * ✅ Literature review at

Outcome 1 / Output 2
One or more of the following: Interested editors will be able to use recommendation services that will allow them to have relevant information about the articles they want to edit immediately at their repository. Newcomers can more easily contribute to Wikimedia projects. Editathon organizers benefit from automatically generated templates and recommendations that can help them in onboarding new or less experienced editors. More readers will have access to content in their local languages. Developers will be able to surface more diverse set of recommendations for article expansion through their tools.
 * Section recommendation algorithm in many languages.

Goal(s)

 * Build a section recommender system based on the section mapping algorithm

Status
July 2018

August 21, 2018

September 12, 2018
 * ❌ Moved to next quarter as we needed to spend some more time to improve section mapping algorithm before starting section recommendations. Section mapping is going slowly primarily due to the fact that building a training set relies on experienced bilingual editors to help us label mappings.

Outcome 1 / Output 4
One or more of the following: Interested editors will be able to use recommendation services that will allow them to have relevant information about the articles they want to edit immediately at their repository. Newcomers can more easily contribute to Wikimedia projects. Editathon organizers benefit from automatically generated templates and recommendations that can help them in onboarding new or less experienced editors. More readers will have access to content in their local languages. Developers will be able to surface more diverse set of recommendations for article expansion through their tools.
 * Public test (vs. production) APIs corresponding to algorithms designed and tested in other outputs

Goal(s)

 * Build a test API for the section recommendation algorithm in Output 2

Status
July 2018

August 21, 2018
 * This is waiting for Output 2 to be finished first before it can be started in earnest.

September 12, 2018
 * This step could only start if the previous goal would be completed. Given that that goal is pushed to Q2 this one should move to Q2 as well.

Outcome 2 / Output 1
Interested editors, developers, and partners can identify more types of gaps in content
 * An improved task recommendation gadget or API

Goal(s)

 * Improve article recommendation API to completion (of the second stage improvements)

Status
July 2018

August 21, 2018
 * The database blockers have been removed and this should be in progress.

September 13, 2018
 * ✅ The algorithm for recommendation articles to be translated is improved such that instead of ranking based on pageviews in the source language, it now ranks missing articles based on the the probability that the article should exist in the destination language considering sitelink counts as well as pageviews in the top 50 WP languages that have the article. We will continue the improvements in Q2, this task has met the intended goal for Q1.

Outcome 2 / Output 2
Interested editors, developers, and partners can identify more types of gaps in content
 * A framework for understanding and measuring the knowledge gaps and inequality of access to knowledge that includes reader representation by demographics and characterizes readers who come to Wikipedia based on their readership characteristics as well as demographics.

Dependency: Formal collaborators

Goal(s)

 * Explore the interlanguage navigation patterns as a first approach to understand knowledge gaps in specific languages. ✅
 * Characterize Wikipedia readers across languages based on survey responses, request, article, and session activity. ✅

Status
July 2018

August 21, 2018
 * The second portion of this goal is done and will be captured in outcome 4.

September 12, 2018
 * ✅ The documentation for the first goal and the second goal.

Outcome 3 / Output 1
More minority voices and diverse newcomers in Wikimedia projects can stay longer on the projects to contribute.
 * An improved socio-technical framework to remove the barriers for contribution by populations that are currently considered minorities on our projects.

Dependencies: Formal collaborators & Legal

Goal(s)

 * Run an experiment to test the effectiveness of the first design of the framework
 * Provide an early analysis of the experiment and iterate if needed

Status
July 2018

August 21, 2018

September 13, 2018
 * Details: Given that the second stage of the first goal in Outcome 3 / Output 2 failed (see below for details), we will need to revise our strategy for this part of the study as well. We expect to take part of this goal to Q2 and the remainder to Q3.

Outcome 3 / Output 2
More minority voices and diverse newcomers in Wikimedia projects can stay longer on the projects to contribute.
 * An algorithm to address Wikipedia's cold-start problem when it comes to learning user interests when they join the project.

Dependencies: Formal collaborators & Legal

Goal(s)

 * Run the experiment to test the quality of the algorithm to elicit user interests
 * Analyze the result of the experiment. Devise next steps.

Status
July 2018

August 21, 2018
 * & this is on track for a September deadline.

September 13, 2018
 * ✅ Details: Both bullet points are done. However, the second stage of the experiment failed in that we did not manage to gather enough responses from users that had participated in the first stage of the experiment. We will introduce two new goals based on the result of our learnings for Q2.

Outcome 4 / Output 1
More decision makers can make more informed decisions about the audiences to target, the gaps to prioritize, and other research findings. More researchers can build on top of the knowledge generated through this research.
 * Citable knowledge about the state of gaps in Wikimedia projects, the needs of Wikimedia users by demographics, and beyond.

Goal(s)

 * Submit the research for characterizing Wikipedia readers across languages for publications (stretch)

Status
July 2018

August 21, 2018

 =Q2 Goals =

Outcome 1 / Output 0
One or more of the following: Interested editors will be able to use recommendation services that will allow them to have relevant information about the articles they want to edit immediately at their repository. Newcomers can more easily contribute to Wikimedia projects. Editathon organizers benefit from automatically generated templates and recommendations that can help them in onboarding new or less experienced editors. More readers will have access to content in their local languages. Developers will be able to surface more diverse set of recommendations for article expansion through their tools.


 * A unit test to measure the bias of recommendation algorithms

Goal(s)

 * Iterate on and improve the report of the state of the art on bias detection and algorithm audibility in the context of recommendation systems through internal subject matter expert and stakeholder input

Status
October 18, 2018


 * . Reviewing a set of 6 process frameworks developed by governments, Tech Orgs and NGOs for addressing harmful bias in algorithmically-driven tools has begun, which will help to synthesize locally-applicable "best practices". This is in preparation to start to work in gathering input from internal subject matter expert and stakeholders.

November 2018


 * Discussed...

December 2018


 * Discussed...

Outcome 1 / Output 2
One or more of the following: Interested editors will be able to use recommendation services that will allow them to have relevant information about the articles they want to edit immediately at their repository. Newcomers can more easily contribute to Wikimedia projects. Editathon organizers benefit from automatically generated templates and recommendations that can help them in onboarding new or less experienced editors. More readers will have access to content in their local languages. Developers will be able to surface more diverse set of recommendations for article expansion through their tools.


 * Section recommendation algorithm in many languages.

Goal(s)

 * Build a section recommender system based on the section mapping algorithm

Status
October 2018


 * . We are investigating ways to address section dependence, an important step prior to start fusing the results from translation detection and synonym detection.

November 2018


 * Discussed...

December 2018


 * Discussed...

Outcome 1 / Output 4
One or more of the following: Interested editors will be able to use recommendation services that will allow them to have relevant information about the articles they want to edit immediately at their repository. Newcomers can more easily contribute to Wikimedia projects. Editathon organizers benefit from automatically generated templates and recommendations that can help them in onboarding new or less experienced editors. More readers will have access to content in their local languages. Developers will be able to surface more diverse set of recommendations for article expansion through their tools.


 * Public test (vs. production) APIs corresponding to algorithms designed and tested in other outputs (

Goal(s)

 * Build a test API for the section recommendation algorithm in Output 2

Note that the completion of this goal is dependent on the goal in Outcome 1 / Output 2.

Status
October 2018



November 2018


 * Discussed...

December 2018


 * Discussed...

Outcome 2 / Output 1
Interested editors, developers, and partners can identify more types of gaps in content


 * An improved task recommendation gadget or API

Goal(s)

 * Improve article recommendation API to completion (of the second stage improvements)

Status
October 2018


 * . With the removal of hard-coded Interlanguage Links in 2015, we need to revisit the problem of how to identify articles that are not linked via Wikidata but are still (almost) the same.

November 2018


 * Discussed...

December 2018


 * Discussed...

Outcome 2 / Output 2
Interested editors, developers, and partners can identify more types of gaps in content


 * A framework for understanding and measuring the knowledge gaps and inequality of access to knowledge that includes reader representation by demographics and characterizes readers who come to Wikipedia based on their readership characteristics as well as demographics.
 * Dependencies on: Formal collaborators, Legal, and community discussions.

Goal(s)

 * Expand the taxonomy of Wikipedia readers to include questions about demographics.
 * Prepare the infrastructure (both technical and legal) for conducting the survey(s)
 * (stretch) Run the survey in one or more Wikipedia languages.

Status
October 2018


 * . Literature review and brainstorming in progress.
 * . dependent on the completion of the previous step.
 * . same as above.

November 2018


 * Discussed...

December 2018


 * Discussed...

Outcome 3 / Output 1
More minority voices and diverse newcomers in Wikimedia projects can stay longer on the projects to contribute.


 * An improved socio-technical framework to remove the barriers for contribution by populations that are currently considered minorities on our projects.

Dependencies: Formal collaborators & Legal

Goal(s)

 * Devise the framework for matching newcomers to improve the first design of the framework
 * (stretch) Run an experiment to test the effectiveness of the first design of the framework

Status
October 2018


 * . Ongoing discussions and brainstorming between researchers and Growth team.
 * . Dependent on the completion of the previous step.

November 2018


 * Discussed...

December 2018


 * Discussed...

Outcome 3 / Output 2
More minority voices and diverse newcomers in Wikimedia projects can stay longer on the projects to contribute.


 * An algorithm to address Wikipedia's cold-start problem when it comes to learning user interests when they join the project.

Dependencies: Formal collaborators & Legal

Goal(s)

 * Develop a new experiment plan for testing the quality of the algorithm to elicit user interests
 * Conduct the experiment
 * (stretch) Analyze the result of the experiment. Devise next steps.

Status
October 2018


 * . Ongoing discussions and brainstorming between researchers and Growth team.
 * . Dependent on the completion of the previous step.
 * . Dependent on the completion of the previous step.

November 2018


 * Discussed...

December 2018


 * Discussed...

Outcome 3 / Output 3
More minority voices and diverse newcomers in Wikimedia projects can stay longer on the projects to contribute.


 * A series of baseline statistics on contributor diversity in one or more Wikimedia projects.

Dependencies: Formal collaborators & Legal

Goal(s)

 * TBD: Note that there are ongoing discussions in the phabricator task linked above to see if this output should be done in Q2 (vs. Q3 as the work in Q2 is becoming quite massive;)

Status
October 2018


 * We will need to wait for the result of the first bullet point out of Outcome 1, Output 2 before we can pick this up. As noted in the Goals, this may be delayed to next quarter.

November 2018


 * Discussed...

December 2018


 * Discussed...

Outcome 4 / Output 1
More decision makers can make more informed decisions about the audiences to target, the gaps to prioritize, and other research findings. More researchers can build on top of the knowledge generated through this research.


 * Citable knowledge about the state of gaps in Wikimedia projects, the needs of Wikimedia users by demographics, and beyond.

Goal(s)

 * Finalize the documentation for the research on characterizing Wikipedia readers
 * A series of presentations to teams in WMF as well as language communities about the results on characterizing Wikipedia readers.
 * (stretch) Submit the research for aligning article sections across languages

Status
October 2018


 * check for more details.
 * : 5 given so far. details at, , , , all under the umbrella.
 * : we are working hard against the upcoming abstract deadline on October 29 and full paper deadline on November 5.

November 2018


 * Discussed...

December 2018


 * Discussed...