Wikimedia Technology/Annual Plans/FY2019/TEC9: Address Knowledge Gaps/Goals

=Program Goals and Status for FY18/19=

TEC9: Address Knowledge Gaps
 * Goal Owner: Leila Zia
 * Program Goals for FY18/19: We help Wikimedia editors identify gaps in the content, readership, and contributors of Wikimedia projects, as well as means for reducing such gaps. We enable Wikimedia developers to build products and services that can surface and help reduce Wikimedia knowledge gaps. We research and develop end-to-end technologies and systems that automatically identify such gaps, prioritize them, and recommend actions or frameworks for reducing them.
 * Annual Plan: TEC9: Address Knowledge Gaps
 * Primary Goal is Knowledge Equity: grow new contributors and content
 * Tech Goal: Supporting our Community of contributors



 = Q1 Goals =

Outcome 1 / Output 0
One or more of the following: Interested editors will be able to use recommendation services that will allow them to have relevant information about the articles they want to edit immediately at their repository. Newcomers can more easily contribute to Wikimedia projects. Editathon organizers benefit from automatically generated templates and recommendations that can help them in onboarding new or less experienced editors. More readers will have access to content in their local languages. Developers will be able to surface more diverse set of recommendations for article expansion through their tools.
 * A unit test to measure the bias of recommendation algorithms

Goal(s)

 * A report of the state of the art on bias detection and algorithm audibility in the context of recommendation systems (based on the review of the literature and industry interviews)

Status
July 2018

August 21, 2018
 * This is still in a status, this effort will be toward developing a procedural framework and this will be the first step forward.

September 13, 2018
 * ✅ Literature review at

Outcome 1 / Output 2
One or more of the following: Interested editors will be able to use recommendation services that will allow them to have relevant information about the articles they want to edit immediately at their repository. Newcomers can more easily contribute to Wikimedia projects. Editathon organizers benefit from automatically generated templates and recommendations that can help them in onboarding new or less experienced editors. More readers will have access to content in their local languages. Developers will be able to surface more diverse set of recommendations for article expansion through their tools.
 * Section recommendation algorithm in many languages.

Goal(s)

 * Build a section recommender system based on the section mapping algorithm

Status
July 2018

August 21, 2018

September 12, 2018
 * ❌ Moved to next quarter as we needed to spend some more time to improve section mapping algorithm before starting section recommendations. Section mapping is going slowly primarily due to the fact that building a training set relies on experienced bilingual editors to help us label mappings.

Outcome 1 / Output 4
One or more of the following: Interested editors will be able to use recommendation services that will allow them to have relevant information about the articles they want to edit immediately at their repository. Newcomers can more easily contribute to Wikimedia projects. Editathon organizers benefit from automatically generated templates and recommendations that can help them in onboarding new or less experienced editors. More readers will have access to content in their local languages. Developers will be able to surface more diverse set of recommendations for article expansion through their tools.
 * Public test (vs. production) APIs corresponding to algorithms designed and tested in other outputs

Goal(s)

 * Build a test API for the section recommendation algorithm in Output 2

Status
July 2018

August 21, 2018
 * This is waiting for Output 2 to be finished first before it can be started in earnest.

September 12, 2018
 * This step could only start if the previous goal would be completed. Given that that goal is pushed to Q2 this one should move to Q2 as well.

Outcome 2 / Output 1
Interested editors, developers, and partners can identify more types of gaps in content
 * An improved task recommendation gadget or API

Goal(s)

 * Improve article recommendation API to completion (of the second stage improvements)

Status
July 2018

August 21, 2018
 * The database blockers have been removed and this should be in progress.

September 13, 2018
 * ✅ The algorithm for recommendation articles to be translated is improved such that instead of ranking based on pageviews in the source language, it now ranks missing articles based on the the probability that the article should exist in the destination language considering sitelink counts as well as pageviews in the top 50 WP languages that have the article. We will continue the improvements in Q2, this task has met the intended goal for Q1.

Outcome 2 / Output 2
Interested editors, developers, and partners can identify more types of gaps in content
 * A framework for understanding and measuring the knowledge gaps and inequality of access to knowledge that includes reader representation by demographics and characterizes readers who come to Wikipedia based on their readership characteristics as well as demographics.

Dependency: Formal collaborators

Goal(s)

 * Explore the interlanguage navigation patterns as a first approach to understand knowledge gaps in specific languages. ✅
 * Characterize Wikipedia readers across languages based on survey responses, request, article, and session activity. ✅

Status
July 2018

August 21, 2018
 * The second portion of this goal is done and will be captured in outcome 4.

September 12, 2018
 * ✅ The documentation for the first goal and the second goal.

Outcome 3 / Output 1
More minority voices and diverse newcomers in Wikimedia projects can stay longer on the projects to contribute.
 * An improved socio-technical framework to remove the barriers for contribution by populations that are currently considered minorities on our projects.

Dependencies: Formal collaborators & Legal

Goal(s)

 * Run an experiment to test the effectiveness of the first design of the framework
 * Provide an early analysis of the experiment and iterate if needed

Status
July 2018

August 21, 2018

September 13, 2018
 * Details: Given that the second stage of the first goal in Outcome 3 / Output 2 failed (see below for details), we will need to revise our strategy for this part of the study as well. We expect to take part of this goal to Q2 and the remainder to Q3.

Outcome 3 / Output 2
More minority voices and diverse newcomers in Wikimedia projects can stay longer on the projects to contribute.
 * An algorithm to address Wikipedia's cold-start problem when it comes to learning user interests when they join the project.

Dependencies: Formal collaborators & Legal

Goal(s)

 * Run the experiment to test the quality of the algorithm to elicit user interests
 * Analyze the result of the experiment. Devise next steps.

Status
July 2018

August 21, 2018
 * & this is on track for a September deadline.

September 13, 2018
 * ✅ Details: Both bullet points are done. However, the second stage of the experiment failed in that we did not manage to gather enough responses from users that had participated in the first stage of the experiment. We will introduce two new goals based on the result of our learnings for Q2.

Outcome 4 / Output 1
More decision makers can make more informed decisions about the audiences to target, the gaps to prioritize, and other research findings. More researchers can build on top of the knowledge generated through this research.
 * Citable knowledge about the state of gaps in Wikimedia projects, the needs of Wikimedia users by demographics, and beyond.

Goal(s)

 * Submit the research for characterizing Wikipedia readers across languages for publications (stretch)

Status
July 2018

August 21, 2018

 =Q2 Goals =

Outcome 1 / Output 0
One or more of the following: Interested editors will be able to use recommendation services that will allow them to have relevant information about the articles they want to edit immediately at their repository. Newcomers can more easily contribute to Wikimedia projects. Editathon organizers benefit from automatically generated templates and recommendations that can help them in onboarding new or less experienced editors. More readers will have access to content in their local languages. Developers will be able to surface more diverse set of recommendations for article expansion through their tools.


 * A unit test to measure the bias of recommendation algorithms

Goal(s)

 * Iterate on and improve the report of the state of the art on bias detection and algorithm audibility in the context of recommendation systems through internal subject matter expert and stakeholder input

Status
October 18, 2018
 * . Reviewing a set of 6 process frameworks developed by governments, Tech Orgs and NGOs for addressing harmful bias in algorithmically-driven tools has begun, which will help to synthesize locally-applicable "best practices". This is in preparation to start to work in gathering input from internal subject matter expert and stakeholders.

December 14, 2018
 * This goal might be delayed ...

Outcome 1 / Output 2
One or more of the following: Interested editors will be able to use recommendation services that will allow them to have relevant information about the articles they want to edit immediately at their repository. Newcomers can more easily contribute to Wikimedia projects. Editathon organizers benefit from automatically generated templates and recommendations that can help them in onboarding new or less experienced editors. More readers will have access to content in their local languages. Developers will be able to surface more diverse set of recommendations for article expansion through their tools.


 * Section recommendation algorithm in many languages.

Goal(s)

 * Build a section recommender system based on the section mapping algorithm

Status
October 2018
 * . We are investigating ways to address section dependence, an important step prior to start fusing the results from translation detection and synonym detection.

November 2018
 * . We are making progress on and we expect to have the results of that work ready by the end of November. This will allow us to start building the section recommendation system.

December 14, 2018
 * This is and will be finished up by next week.

Outcome 1 / Output 4
One or more of the following: Interested editors will be able to use recommendation services that will allow them to have relevant information about the articles they want to edit immediately at their repository. Newcomers can more easily contribute to Wikimedia projects. Editathon organizers benefit from automatically generated templates and recommendations that can help them in onboarding new or less experienced editors. More readers will have access to content in their local languages. Developers will be able to surface more diverse set of recommendations for article expansion through their tools.


 * Public test (vs. production) APIs corresponding to algorithms designed and tested in other outputs (

Goal(s)

 * Build a test API for the section recommendation algorithm in Output 2

Note that the completion of this goal is dependent on the goal in Outcome 1 / Output 2.

Status
October 2018

November 2018
 * This is still at todo stage since first the section mapping system should be completed.

December 14, 2018
 * This is still ❌ until Q3

Outcome 2 / Output 1
Interested editors, developers, and partners can identify more types of gaps in content
 * An improved task recommendation gadget or API

Goal(s)

 * Improve article recommendation API to completion (of the second stage improvements)

Status
October 2018
 * . With the removal of hard-coded Interlanguage Links in 2015, we need to revisit the problem of how to identify articles that are not linked via Wikidata but are still (almost) the same.

November 2018
 * We have made good progress on this front and expect the work that was planned for this quarter to finish by the end of the quarter. The updated API will be shared by then.

December 14, 2018
 * This goal is and will be completed by end of next week.

Outcome 2 / Output 2
Interested editors, developers, and partners can identify more types of gaps in content


 * A framework for understanding and measuring the knowledge gaps and inequality of access to knowledge that includes reader representation by demographics and characterizes readers who come to Wikipedia based on their readership characteristics as well as demographics.
 * Dependencies on: Formal collaborators, Legal, and community discussions.

Goal(s)

 * Expand the taxonomy of Wikipedia readers to include questions about demographics.
 * Prepare the infrastructure (both technical and legal) for conducting the survey(s)
 * (stretch) Run the survey in one or more Wikipedia languages.

Status
October 2018
 * . Literature review and brainstorming in progress.
 * . dependent on the completion of the previous step.
 * . same as above.

November 2018
 * . Literature review and brainstorming in progress.
 * ❌. We expect that the running of the survey to be moved to Q3 as the start of the English campaign is near and we are too close to the holiday season now and that can introduce seasonality into the resutls.
 * ❌. We expect that the running of the survey to be moved to Q3 as the start of the English campaign is near and we are too close to the holiday season now and that can introduce seasonality into the resutls.

December 2018
 * Expand the taxonomy of Wikipedia readers is ✅, preparing the infrastructure is still and should be  by the end of this year. The stretch goal has been ❌ until Q3.

Outcome 3 / Output 1
More minority voices and diverse newcomers in Wikimedia projects can stay longer on the projects to contribute.


 * An improved socio-technical framework to remove the barriers for contribution by populations that are currently considered minorities on our projects.

Dependencies: Formal collaborators & Legal

Goal(s)

 * Devise the framework for matching newcomers to improve the first design of the framework
 * (stretch) Run an experiment to test the effectiveness of the first design of the framework

Status
October 2018
 * . Ongoing discussions and brainstorming between researchers and Growth team.
 * . Dependent on the completion of the previous step.

December 14, 2018
 * The first portion of this goal is still and will go into Q3 and the stretch goal will also be finished up in Q3.

Outcome 3 / Output 2
More minority voices and diverse newcomers in Wikimedia projects can stay longer on the projects to contribute.


 * An algorithm to address Wikipedia's cold-start problem when it comes to learning user interests when they join the project.

Dependencies: Formal collaborators & Legal

Goal(s)

 * Develop a new experiment plan for testing the quality of the algorithm to elicit user interests
 * Conduct the experiment
 * (stretch) Analyze the result of the experiment. Devise next steps.

Status
October 2018
 * . Ongoing discussions and brainstorming between researchers and Growth team.
 * . Dependent on the completion of the previous step.
 * . Dependent on the completion of the previous step.

December 14, 2018
 * All goals are now ✅

Outcome 3 / Output 3
More minority voices and diverse newcomers in Wikimedia projects can stay longer on the projects to contribute.


 * A series of baseline statistics on contributor diversity in one or more Wikimedia projects.

Dependencies: Formal collaborators & Legal

Goal(s)

 * TBD: Note that there are ongoing discussions in the phabricator task linked above to see if this output should be done in Q2 (vs. Q3 as the work in Q2 is becoming quite massive;)

Status
October 2018
 * We will need to wait for the result of the first bullet point out of Outcome 1, Output 2 before we can pick this up. As noted in the Goals, this may be delayed to next quarter.

December 14, 2018
 * This goal was not defined for Q2 and will be captured in the following quarters.

Outcome 4 / Output 1
More decision makers can make more informed decisions about the audiences to target, the gaps to prioritize, and other research findings. More researchers can build on top of the knowledge generated through this research.


 * Citable knowledge about the state of gaps in Wikimedia projects, the needs of Wikimedia users by demographics, and beyond.

Goal(s)

 * Finalize the documentation for the research on characterizing Wikipedia readers
 * A series of presentations to teams in WMF as well as language communities about the results on characterizing Wikipedia readers.
 * (stretch) Submit the research for aligning article sections across languages

Status
October 2018
 * check for more details.
 * : 5 given so far. details at, , , , all under the umbrella.
 * : we are working hard against the upcoming abstract deadline on October 29 and full paper deadline on November 5.

November 2018
 * : The bulk of the work is done and the long tail of finalizing details is left. We expect to be able to finalize this by the end of Q2.
 * ❌ We decided to move this goal to Q3 and target a different conference (SIGIR 2019) as we needed some time to improve the results of the research.
 * ❌ We decided to move this goal to Q3 and target a different conference (SIGIR 2019) as we needed some time to improve the results of the research.

December 14, 2018
 * All goals will be ✅ by end of next week, but the stretch goal will be ❌ until Q3.



= Q3 Goals =

Outcome 1 / Output 0
One or more of the following: Interested editors will be able to use recommendation services that will allow them to have relevant information about the articles they want to edit immediately at their repository. Newcomers can more easily contribute to Wikimedia projects. Editathon organizers benefit from automatically generated templates and recommendations that can help them in onboarding new or less experienced editors. More readers will have access to content in their local languages. Developers will be able to surface more diverse set of recommendations for article expansion through their tools.
 * A unit test to measure the bias of recommendation algorithms

Goal(s)

 * Share process proposals with a broader internal audience and elicit feedback ✅
 * Identify, prioritize, and document open questions and key considerations for implementing specific proposals at WMF ✅
 * Incorporate best practices and feedback from external subject matter experts
 * Finalize and publish report v1.0

Status
January 17, 2019
 * A variety of meetings (during All-Hands and otherwise) are set up with external and internal people to achieve the first 3 goals. The last goal can be picked up once all other ones are completed.

February 22, 2019
 * The first two goals are ✅. The other two goals are . We are working with Policy, Scoring Platform, and Berkman Klein center to organize a workshop in April or May 2019 and that can help with gathering more input from external subject matter experts. The third goal corresponding to gathering external expert input may as a result be extended to Q4. The last goal is and will be published by the end of March (current quarter).

March 2019
 * Discussed...

Outcome 1 / Output 1
One or more of the following: Interested editors will be able to use recommendation services that will allow them to have relevant information about the articles they want to edit immediately at their repository. Newcomers can more easily contribute to Wikimedia projects. Editathon organizers benefit from automatically generated templates and recommendations that can help them in onboarding new or less experienced editors. More readers will have access to content in their local languages. Developers will be able to surface more diverse set of recommendations for article expansion through their tools.


 * Improved section recommendation algorithm with user-feedback

Goal(s)

 * Elicit feedback from both experienced and new editors about the section recommendation system to identify feedback types
 * Update the section recommendation model to include user feedback for improvements

Status
January 17, 2019
 * We are working on improving the section recommendation system and once that's finalized, we will start the work on these two items.

February 22, 2019
 * Both goals are still . No major blockers exist.

March 2019
 * Discussed...

Outcome 1 / Output 2
One or more of the following: Interested editors will be able to use recommendation services that will allow them to have relevant information about the articles they want to edit immediately at their repository. Newcomers can more easily contribute to Wikimedia projects. Editathon organizers benefit from automatically generated templates and recommendations that can help them in onboarding new or less experienced editors. More readers will have access to content in their local languages. Developers will be able to surface more diverse set of recommendations for article expansion through their tools.
 * Section recommendation algorithm in many languages.

Goal(s)

 * Improve the section recommender system ✅

Status
January 17, 2019
 * We are working on this actively.

February 22, 2019
 * This work is and we're on track with it.

March 2019
 * Discussed...

Outcome 1 / Output 4
One or more of the following: Interested editors will be able to use recommendation services that will allow them to have relevant information about the articles they want to edit immediately at their repository. Newcomers can more easily contribute to Wikimedia projects. Editathon organizers benefit from automatically generated templates and recommendations that can help them in onboarding new or less experienced editors. More readers will have access to content in their local languages. Developers will be able to surface more diverse set of recommendations for article expansion through their tools.
 * Public test (vs. production) APIs corresponding to algorithms designed and tested in other outputs

Goal(s)

 * Build a test API for the section recommendation algorithm in Output 2 ✅

Status
January 17, 2019
 * You can test the current model in en, fr, es, ar, ru, ja via http://diegotest.wmflabs.org/en/Quilombo. We expect to improve the quality of the recommendations further in this quarter.

February 22, 2019
 * This goal is ✅. We have built a simple UI for testing the API with community members as well. https://secrec.wmflabs.org/

Outcome 2 / Output 1
Interested editors, developers, and partners can identify more types of gaps in content
 * An improved article recommendation pipeline and API

Dependancies on: Analytics

Goal(s)

 * Set up the production pipeline to generate recommendations in Hadoop ✅
 * Improve the current API to filter out newly created articles ✅
 * Improve the model for identifying similar Wikidata items

Status
January 17, 2019
 * There are quite a few team dependencies for having this work completed. At the moment, we are making good progress.

February 22, 2019
 * Some of the smaller tasks have been completed. We're still ❌ on some of the items like, which will happen in Q4, it seems. We also switched focus to page-links-change event stream during this month. Now that it's in a good shape, we'll pick up this output during March again and work on tasks such as that are not blocked on other teams.

March 2019
 * Parts of the production pipeline that don't depend on Analytics have been automated. Once we're unblocked on and, we'll be able to fully automate the pipeline.

Outcome 2 / Output 2
Interested editors, developers, and partners can identify more types of gaps in content
 * A framework for understanding and measuring the knowledge gaps and inequality of access to knowledge that includes reader representation by demographics and characterizes readers who come to Wikipedia based on their readership characteristics as well as demographics.

Dependancies on: Formal Collaborators, Legal, Reading Web

Goal(s)

 * Finalize the choice of Wikipedia languages for running the demographics survey
 * Work with Legal to develop a privacy statement for the new round of survey ✅
 * Run the survey in one or more Wikipedia languages ✅
 * Prepare the data for analysis in the following quarter

Status
January 17, 2019
 * We are on track for this goal. Conversations with Legal about the type of demographic questions we can ask is ongoing and our understanding is that we can conclude them soon. We are going to start working with the community on 2019-01-17 to identify potential languages the surveys will be launched in. We expect a test (in 1-2 wikis) no later than mid-February and if the results of the test are fine, a larger scale survey no later than the end of February, or first week of March.

February 22, 2019
 * The first goal is and we expect to finalize the list no later than 2 weeks from now. We have already made a call in wikimedia-l about it. The second goal is almost completed, . The third goal: the first test survey is scheduled for 2019-02-27 and this goal is now {[in progress}}. We expect to be able to meet the last goal by the end of March.

March 14, 2019
 * The choice of languages (first goal) is and will be completed by 2018-03-21. The two goals after that are ✅. As part of the third goal, we ran a pilot test and we are seeing some unexpected distribution of demographics as well as unexpected quicksurvey results in mobile. We have put a hold on running the survey in many more languages until we address that. The fourth goal is dropped as a result.

Outcome 3 / Output 1
More minority voices and diverse newcomers in Wikimedia projects can stay longer on the projects to contribute.
 * An improved socio-technical framework to remove the barriers for contribution by populations that are currently considered minorities on our projects.

Dependancies on: Formal Collaborators, Growth Team (potentially)

Goal(s)

 * Improve the current design of the model (to be tested) for increasing the retention of women
 * (conditional on the result of the first goal) Prepare the infrastructure for running the experiment
 * (stretch and conditional on the result of the first goal) run the experiment

Status
January 17, 2019
 * We are working on the first goal. Note that finding a model has proved itself to be very difficult as the matching of users and them collaborating has a lot of barriers at the moment. We did test this approach last quarter and could not make it work, and our hypothesis is that because of the complexity of communications between users in Wikipedia, users don't participate. We are iterating on the model now but we should note that tighter collaboration between Audiences and Tech may be needed to make this work.

February 22, 2019
 * This goal is now ❌ as a result of midterm planning and transitions in Research. We will get to it if possible during this quarter but we cannot promise this at this point.

March 2019
 * Discussed...

Outcome 4 / Output 1
More decision makers can make more informed decisions about the audiences to target, the gaps to prioritize, and other research findings. More researchers can build on top of the knowledge generated through this research.
 * Citable knowledge about the state of gaps in Wikimedia projects, the needs of Wikimedia users by demographics, and beyond.

Goal(s)

 * Submit the research on aligning article sections across languages to a peer-reviewed publication venue ✅
 * Submit the research on eliciting new editor interest models to a peer-reviewed publication venue ✅

Dependancies on: Formal collaborators

Status
January 17, 2019
 * The first submission is completed. The second will happen on January 28.

February 22, 2019
 * Both goals are ✅ now.



= Q4 Goals =

Outcome 1 / Output 0
One or more of the following: Interested editors will be able to use recommendation services that will allow them to have relevant information about the articles they want to edit immediately at their repository. Newcomers can more easily contribute to Wikimedia projects. Editathon organizers benefit from automatically generated templates and recommendations that can help them in onboarding new or less experienced editors. More readers will have access to content in their local languages. Developers will be able to surface more diverse set of recommendations for article expansion through their tools.


 * A unit test to measure the bias of recommendation algorithms

Goal(s)

 * (carry-over from Q3) Incorporate best practices and feedback from external subject matter experts through feedback at Community Data Science Collective workshop and the Round-table in Georgetown Institute for Technology, Law, and Policy

Status
April 2019


 * Discussed...

May 2019


 * Discussed...

June 2019


 * Discussed...

Outcome 1 / Output 1
One or more of the following: Interested editors will be able to use recommendation services that will allow them to have relevant information about the articles they want to edit immediately at their repository. Newcomers can more easily contribute to Wikimedia projects. Editathon organizers benefit from automatically generated templates and recommendations that can help them in onboarding new or less experienced editors. More readers will have access to content in their local languages. Developers will be able to surface more diverse set of recommendations for article expansion through their tools.


 * Improved section recommendation algorithm with user-feedback

Goal(s)

 * (carry-over) Elicit feedback from both experienced and new editors about the section recommendation system to identify feedback types
 * (carry-over) Update the section recommendation model to include user feedback for improvements

Status
April 2019


 * Discussed...

May 2019


 * Discussed...

June 2019


 * Discussed...

Outcome 1 / Output 3
One or more of the following: Interested editors will be able to use recommendation services that will allow them to have relevant information about the articles they want to edit immediately at their repository. Newcomers can more easily contribute to Wikimedia projects. Editathon organizers benefit from automatically generated templates and recommendations that can help them in onboarding new or less experienced editors. More readers will have access to content in their local languages. Developers will be able to surface more diverse set of recommendations for article expansion through their tools.

Section recommendation algorithm with more context and information

Goal(s)

 * Research and identify the type of context to be provided to editors as part of the section recommendation (for article expansion and for a general use API)

Status
April 2019


 * Discussed...

May 2019


 * Discussed...

June 2019


 * Discussed...

Outcome 1 / Output 5
One or more of the following: Interested editors will be able to use recommendation services that will allow them to have relevant information about the articles they want to edit immediately at their repository. Newcomers can more easily contribute to Wikimedia projects. Editathon organizers benefit from automatically generated templates and recommendations that can help them in onboarding new or less experienced editors. More readers will have access to content in their local languages. Developers will be able to surface more diverse set of recommendations for article expansion through their tools.

The first version of the algorithm that prioritizes missing sections

Goal(s)

 * Develop the first version of the model that can prioritize missing sections

Status
April 2019


 * Discussed...

May 2019


 * Discussed...

June 2019


 * Discussed...