Wikimedia Technology/Annual Plans/FY2019/CDP3: Knowledge Integrity/Goals

=Program Goals and Status for FY18/19=

Segment 1 - Research
 * Goal Owner: Leila Zia
 * Program Goals for FY18/19: Wikimedia sites provide the most trustworthy, comprehensive, neutral information across topics and languages by referencing this information to vetted reliable sources and linking it to external content providers and metadata repositories, making Wikimedia projects the central gateway to access citable information in the knowledge ecosystem.
 * Annual Plan: Segment 1 - Research
 * Primary Goal is Knowledge as a Service: increase reach



 = Q1 Goals =

Outcome 1 / Output 1
Wikimedia contributors are better able to focus and prioritize their sourcing efforts and Product teams can build the best user experiences to support readers’ learning goals and their digital literacy.
 * A map of verifiability of information in Wikimedia projects

Goal(s)

 * Design and test and end-to-end machine learning framework to identify statements in need of a citation. ✅
 * Improving the taxonomy of reasons why editors add citations to Wikipedia statements ✅
 * Design the experiment and collect larger-scale data about reasons why people add citations ✅

Status
July 2018

August 21, 2018

September 13, 2018
 * Details: we expect this goal to be fully done before the end of Q1. The first bullet point is expected to be done by the end of the month. The third bullet point is done and we have done extensive extra work on it as well. What is left from it is documentation which we expect to be done by 2018-09-18.
 * Update on Sept 18: all goals for this outcome is ✅

Outcome 1 / Output 2
Wikimedia contributors are better able to focus and prioritize their sourcing efforts and Product teams can build the best user experiences to support readers’ learning goals and their digital literacy.
 * Research to understand how readers use citations

Goal(s)

 * Prepare the data and do preliminary analysis on the first data collection on citation usage based on data gathered via Citation Usage schema ✅
 * Develop a survey to better understand the role of citations in Wikipedia readers evaluations of Wikipedia articles and to identify opportunities for supporting their learning goals and increasing their digital literacy. ✅

Status
July 2018

August 21, 2018
 * Data collection is done and the documentation just needs to be finished up ✅ . Developing the survey is and more information is in T199188

September 18, 2018
 * The survey wording and goals are now ✅

Outcome 4 / Output 6
More knowledge professionals and other contributors are motivated to join the effort to build an open citation ecosystem, and are more able to actively improve the structure, quantity, and quality of citations on Wikimedia projects.
 * Funding the WikiCite event series

Goal(s)

 * Fundraise for the annual meeting in the WikiCite series and set of satellite events, to improve the sustainability and global reach of the initiative. ✅
 * Organize the event, open the application process and design the program ✅

Status
July 2018
 * ✅ Fundraising is completed!

August 22, 2018
 * Organizing the event is underway

September 18, 2018
 * ✅ The selection process has completed, notifications to applicants are being sent out as of October 1. The chairs of individual days of the event are now collecting information from selected attendees to finalize the agenda.



=Q2 Goals =

Outcome 1 / Output 1
Wikimedia contributors are better able to focus and prioritize their sourcing efforts and Product teams can build the best user experiences to support readers’ learning goals and their digital literacy.


 * A map of verifiability of information in Wikimedia projects

Goal(s)

 * Design a machine learning framework to identify why statements need a citation in English Wikipedia. ✅
 * [Stretch] Submit a paper summarizing the modeling work for unsourced statement detection ✅

Status
November 13, 2018
 * This goal was finished up this week, yay!

Outcome 1 / Output 2
Wikimedia contributors are better able to focus and prioritize their sourcing efforts and Product teams can build the best user experiences to support readers’ learning goals and their digital literacy.


 * Research to understand how readers use citations

Goal(s)

 * Run the second round of data collection to understand Wikipedia citation usage ✅
 * Prepare the data and analyze the data collected in the second round.
 * Perform first round of survey data collection of reader citation usage on English Wikipedia.
 * Analyze first round survey data of reader citation usage

Status
October 18, 2018
 * The survey work has been ported to Qualtrics and a privacy statement has been submitted to Legal for review.

November 13, 2018
 * Second round of data collection to understand Wikipedia citation usage is now ✅

December 14, 2018
 * Analyzing the data is still and will continue through Q3 for paper submissions. Further data collection is awaiting end of fundraising before we pick up again and will begin the analysis when it's ready.

Outcome 4 / Output 6
More knowledge professionals and other contributors are motivated to join the effort to build an open citation ecosystem, and are more able to actively improve the structure, quantity, and quality of citations on Wikimedia projects.
 * Host the WikiCite 2018 event

Goal(s)

 * Host the WikiCite 2018 event in Berkeley, CA (November 27-29, 2018) ✅

Status
November 2018 ✅ We hosted 115 librarians, developers, linked open data experts, Wikimedia contributors for WikiCite 2018 in Berkeley. See the final program, live-stream, and online conversation from the event.



=Q3 Goals =

Outcome 1 / Output 1
Wikimedia contributors are better able to focus and prioritize their sourcing efforts and Product teams can build the best user experiences to support readers’ learning goals and their digital literacy.
 * A map of verifiability of information in Wikimedia projects

Dependencies on: External collaborators

Goal(s)

 * Generate a "map of verifiability" by quantifying citation need across article topics and Wikipedia language editions adapting the machine learning model developed in Q2 to a multilingual context ✅

Status
January 16, 2019
 * Work is with our collaborators to assess the extensibility of the current model (trained on English Wikipedia) to other languages. The next steps also include identifying the set of languages we want to work with and the topic modeling approach to compare topics across them. We will also hear by the end of January about our paper submission on this line of research from Q2.

February 2019
 * The models are developed for French and Italian languages. We will generate the map of verifiability next.

March 2019
 * Discussed...

Outcome 1 / Output 2
Wikimedia contributors are better able to focus and prioritize their sourcing efforts and Product teams can build the best user experiences to support readers’ learning goals and their digital literacy.


 * Research to understand how readers use citations

Goal(s)
Quantitative analysis
 * Perform first round of research to characterize readers' usage of citations based on the features extracted in the past quarter. ✅
 * Fix the main bugs and rerun the CitationUsage schema to collect data for understanding citation usage on medical content ✅

Qualitative analysis
 * Perform first round of survey data collection of reader citation usage on English Wikipedia. ✅
 * Analyze first round survey data of reader citation usage ✅
 * (Conditional on the result of the previous goal) Perform the second round of survey data collection of reader citation usage in one or more Wikipedia languages. ✅

Status
January 16, 2019
 * Qualitative research: the first round of data collection is completed and analysis is currently in progress. Quantitative research: data analysis is underway. Fixes for the data quality issues identified in Q2 are being researched and will be deployed for a second round of data collection between January and February.

February 22, 2019
 * The first two goals are completed and we are working on the last one. Specifically, we're designing a fixed-response survey based on the free form text responses from round 1. This last goal is also on track and we will deploy the survey in early March.

March 14, 2019


 * Perform first round of research to characterize readers' usage of citations is now ✅
 * Fix the main bugs and rerun the CitationUsage schema to collect data for understanding citation usage is ✅

Outcome 2 / Output 3
Contributors, tool developers and partner organizations can understand and accelerate the referencing and linking of knowledge statements to external sources, catalogs, metadata providers, and content repositories.
 * A public reference event stream

Dependencies on: Analytics, Citoid, Parsoid, Reading Infrastructure, Internet Archive

Goal(s)

 * A working prototype of the stream ✅

Status
January 16, 2019
 * Analysis of the requirements for the MVP of the event stream is ✅.

February 22, 2019
 * This goal is in an incredibly good shape. :) We expected to finish all the work for this output in Q4 but we have a working prototype, are addressing some bugs, and expect to wrap up the output fully by the end of Q3.

March 2019
 * The MVP is live and Internet Archive has been using it to archive links. ✅

Outcome 4 / Output 6
More knowledge professionals and other contributors are motivated to join the effort to build an open citation ecosystem, and are more able to actively improve the structure, quantity, and quality of citations on Wikimedia projects.
 * Host the WikiCite 2018 event

Dependencies on Technical Engagement and Community Programs

Goal(s)

 * Publish the WikiCite 2018 annual report

Status
January 16, 2019
 * We are in the process of preparing a survey for WikiCite participants, with the goal of incorporating their feedback in the annual report to the funder, which is due in February.

February 22, 2019
 * The report preparation is and on track.

March 2019
 * Discussed...



=Q4 Goals =

Outcome 1 / Output 1
Wikimedia contributors are better able to focus and prioritize their sourcing efforts and Product teams can build the best user experiences to support readers’ learning goals and their digital literacy.


 * A map of verifiability of information in Wikimedia projects

Dependencies on: External collaborators

Goal(s)

 * Release code and model for the models developed so far
 * Finalize documentation to empower others to build on the results and/or expand the work to other languages
 * (stretch) Improve and analyze the data further time permitting

Status
April 2019
 * Discussed...

May 2019
 * Discussed...

June 2019
 * Discussed...

Outcome 1 / Output 2
Wikimedia contributors are better able to focus and prioritize their sourcing efforts and Product teams can build the best user experiences to support readers’ learning goals and their digital literacy.


 * Research to understand how readers use citations

Goal(s)
Quantitative analysis


 * Deeper research and analysis of the citation usage and user behavior considering user activities in the entire user session.
 * (stretch) A model for categorizing external links

Qualitative analysis


 * Perform the analysis and write the documentation on the second round of survey data about reader citation usage in one or more Wikipedia languages
 * Develop interview protocol and begin interview recruiting for contextual inquiry follow-up study

Dependency on formal collaborators

Status
April 2019


 * Discussed...

May 2019


 * Discussed...

June 2019


 * Discussed...

Outcome 2 / Output 3
Contributors, tool developers and partner organizations can understand and accelerate the referencing and linking of knowledge statements to external sources, catalogs, metadata providers, and content repositories.


 * A public reference event stream

Goal(s)

 * Fix citation stream bugs that affect the minimum viable product ✅

Status
April 2019


 * The major bug has been fixed:.

May 2019


 * Discussed...

June 2019


 * Discussed...