Wikimedia Technology/Annual Plans/FY2019/CDP3: Knowledge Integrity/Goals

From mediawiki.org

Program Goals and Status for FY18/19[edit]

  • Goal Owner: Leila Zia
  • Program Goals for FY18/19: Wikimedia sites provide the most trustworthy, comprehensive, neutral information across topics and languages by referencing this information to vetted reliable sources and linking it to external content providers and metadata repositories, making Wikimedia projects the central gateway to access citable information in the knowledge ecosystem.
  • Annual Plan: Segment 1 - Research

[edit]

Outcome 1 / Output 1[edit]

Wikimedia contributors are better able to focus and prioritize their sourcing efforts and Product teams can build the best user experiences to support readers’ learning goals and their digital literacy.

A map of verifiability of information in Wikimedia projects

Goal(s)[edit]

  • Design and test and end-to-end machine learning framework to identify statements in need of a citation. Yes Done
  • Improving the taxonomy of reasons why editors add citations to Wikipedia statements Yes Done
  • Design the experiment and collect larger-scale data about reasons why people add citations Yes Done

Status[edit]

Note Note: July 2018

In progress In progress

Note Note: August 21, 2018

In progress In progress

Note Note: September 13, 2018

In progress In progress Details: we expect this goal to be fully done before the end of Q1. The first bullet point is expected to be done by the end of the month. The third bullet point is done and we have done extensive extra work on it as well. What is left from it is documentation which we expect to be done by 2018-09-18.
Update on Sept 18: all goals for this outcome is Yes Done


Outcome 1 / Output 2[edit]

Wikimedia contributors are better able to focus and prioritize their sourcing efforts and Product teams can build the best user experiences to support readers’ learning goals and their digital literacy.

Research to understand how readers use citations

Goal(s)[edit]

  • Prepare the data and do preliminary analysis on the first data collection on citation usage based on data gathered via Citation Usage schema Yes Done
  • Develop a survey to better understand the role of citations in Wikipedia readers evaluations of Wikipedia articles and to identify opportunities for supporting their learning goals and increasing their digital literacy. Yes Done

Status[edit]

Note Note: July 2018

In progress In progress

Note Note: August 21, 2018

Data collection is done and the documentation just needs to be finished up Yes Done. Developing the survey is In progress In progress and more information is in T199188

Note Note: September 18, 2018

The survey wording and goals are now Yes Done


Outcome 4 / Output 6[edit]

More knowledge professionals and other contributors are motivated to join the effort to build an open citation ecosystem, and are more able to actively improve the structure, quantity, and quality of citations on Wikimedia projects.

Funding the WikiCite event series

Goal(s)[edit]

  • Fundraise for the annual meeting in the WikiCite series and set of satellite events, to improve the sustainability and global reach of the initiative. Yes Done
  • Organize the event, open the application process and design the program Yes Done

Status[edit]

Note Note: July 2018

Yes Done Fundraising is completed!

Note Note: August 22, 2018

In progress In progress Organizing the event is underway

Note Note: September 18, 2018

Yes Done The selection process has completed, notifications to applicants are being sent out as of October 1. The chairs of individual days of the event are now collecting information from selected attendees to finalize the agenda.

[edit]

Outcome 1 / Output 1[edit]

Wikimedia contributors are better able to focus and prioritize their sourcing efforts and Product teams can build the best user experiences to support readers’ learning goals and their digital literacy.

A map of verifiability of information in Wikimedia projects

Goal(s)[edit]

  • Design a machine learning framework to identify why statements need a citation in English Wikipedia. Yes Done
  • [Stretch] Submit a paper summarizing the modeling work for unsourced statement detection Yes Done

Status[edit]

Note Note: November 13, 2018

This goal was finished up this week, yay!


Outcome 1 / Output 2[edit]

Wikimedia contributors are better able to focus and prioritize their sourcing efforts and Product teams can build the best user experiences to support readers’ learning goals and their digital literacy.

Research to understand how readers use citations

Goal(s)[edit]

  • Run the second round of data collection to understand Wikipedia citation usageYes Done
  • Prepare the data and analyze the data collected in the second round.In progress In progress
  • Perform first round of survey data collection of reader citation usage on English Wikipedia. task T205164Incomplete Partially done
  • Analyze first round survey data of reader citation usage task T205165 To do To do

Status[edit]

Note Note: October 18, 2018

The survey work has been ported to Qualtrics and a privacy statement has been submitted to Legal for review.

Note Note: November 13, 2018

Second round of data collection to understand Wikipedia citation usage is now Yes Done

Note Note: December 14, 2018

Analyzing the data is still In progress In progress and will continue through Q3 for paper submissions. Further data collection is awaiting end of fundraising before we pick up again and will begin the analysis when it's ready.


Outcome 4 / Output 6[edit]

More knowledge professionals and other contributors are motivated to join the effort to build an open citation ecosystem, and are more able to actively improve the structure, quantity, and quality of citations on Wikimedia projects.

Host the WikiCite 2018 event

Goal(s)[edit]

  • Host the WikiCite 2018 event in Berkeley, CA (November 27-29, 2018) Yes Done

Status[edit]

Note Note: November 2018 Yes Done We hosted 115 librarians, developers, linked open data experts, Wikimedia contributors for WikiCite 2018 in Berkeley. See the final program, live-stream, and online conversation from the event.

[edit]

Outcome 1 / Output 1[edit]

Wikimedia contributors are better able to focus and prioritize their sourcing efforts and Product teams can build the best user experiences to support readers’ learning goals and their digital literacy.

A map of verifiability of information in Wikimedia projects

Dependencies on: External collaborators

Goal(s)[edit]

  • Generate a "map of verifiability" by quantifying citation need across article topics and Wikipedia language editions adapting the machine learning model developed in Q2 to a multilingual context task T213927 Yes Done

Status[edit]

Note Note: January 16, 2019

Work is In progress In progress with our collaborators to assess the extensibility of the current model (trained on English Wikipedia) to other languages. The next steps also include identifying the set of languages we want to work with and the topic modeling approach to compare topics across them. We will also hear by the end of January about our paper submission on this line of research from Q2.

Note Note: February 2019

The models are developed for French and Italian languages. We will generate the map of verifiability next.

Note Note: March 2019

Done.


Outcome 1 / Output 2[edit]

Wikimedia contributors are better able to focus and prioritize their sourcing efforts and Product teams can build the best user experiences to support readers’ learning goals and their digital literacy.

Research to understand how readers use citations

Goal(s)[edit]

Quantitative analysis

  • Perform first round of research to characterize readers' usage of citations based on the features extracted in the past quarter. task T212225 Yes Done
  • Fix the main bugs and rerun the CitationUsage schema to collect data for understanding citation usage on medical content task T212937 Yes Done

Qualitative analysis

  • Perform first round of survey data collection of reader citation usage on English Wikipedia. task T205164 Yes Done
  • Analyze first round survey data of reader citation usage task T205165 Yes Done
  • (Conditional on the result of the previous goal) Perform the second round of survey data collection of reader citation usage in one or more Wikipedia languages. Yes Done

Status[edit]

Note Note: January 16, 2019

Qualitative research: the first round of data collection is completed and analysis is currently in progress. Quantitative research: data analysis is underway. Fixes for the data quality issues identified in Q2 are being researched and will be deployed for a second round of data collection between January and February.

Note Note: February 22, 2019

The first two goals are completed and we are working on the last one. Specifically, we're designing a fixed-response survey based on the free form text responses from round 1. This last goal is also on track and we will deploy the survey in early March.

Note Note: March 14, 2019

  • Perform first round of research to characterize readers' usage of citations is now Yes Done
  • Fix the main bugs and rerun the CitationUsage schema to collect data for understanding citation usage is Yes Done


Outcome 2 / Output 3[edit]

Contributors, tool developers and partner organizations can understand and accelerate the referencing and linking of knowledge statements to external sources, catalogs, metadata providers, and content repositories.

A public reference event stream

Dependencies on: Analytics, Citoid, Parsoid, Reading Infrastructure, Internet Archive

Goal(s)[edit]

Status[edit]

Note Note: January 16, 2019

Analysis of the requirements for the MVP of the event stream is Yes Done.

Note Note: February 22, 2019

This goal is in an incredibly good shape. :) We expected to finish all the work for this output in Q4 but we have a working prototype, are addressing some bugs, and expect to wrap up the output fully by the end of Q3.

Note Note: March 2019

The MVP is live and Internet Archive has been using it to archive links. Yes Done

Outcome 4 / Output 6[edit]

More knowledge professionals and other contributors are motivated to join the effort to build an open citation ecosystem, and are more able to actively improve the structure, quantity, and quality of citations on Wikimedia projects.

Host the WikiCite 2018 event

Dependencies on Technical Engagement and Community Programs

Goal(s)[edit]

Status[edit]

Note Note: January 16, 2019

We are in the process of preparing a survey for WikiCite participants, with the goal of incorporating their feedback in the annual report to the funder, which is due in February.

Note Note: February 22, 2019

The report preparation is In progress In progress and on track.

Note Note: March 2019

The goal is now tracked under Community Engagement as the Principal Investigator for WikiCite changed from Research to CE. The PI has learned that the deadline for the report is May 2019 and due to the transition in Research, medium-term planning, and annual planning work the finishing of this report is pushed to May 2019 (Q4).

[edit]


Outcome 1 / Output 1[edit]

Wikimedia contributors are better able to focus and prioritize their sourcing efforts and Product teams can build the best user experiences to support readers’ learning goals and their digital literacy.

A map of verifiability of information in Wikimedia projects

Dependencies on: External collaborators

Goal(s)[edit]

  • Release code and model for the models developed so far task T221006 Yes Done
  • Finalize documentation to empower others to build on the results and/or expand the work to other languages task T221009 Yes Done
  • (stretch) Improve and analyze the data further time permitting task T221005 Declined

Status[edit]

Note Note: April 2019

Work is in-progress and we expect to be able to meet the goals set.

To do To do May 2019

Discussed...

To do To do June 2019

Discussed...

Outcome 1 / Output 2[edit]

Wikimedia contributors are better able to focus and prioritize their sourcing efforts and Product teams can build the best user experiences to support readers’ learning goals and their digital literacy.

Research to understand how readers use citations

Goal(s)[edit]

Quantitative analysis

  • Deeper research and analysis of the citation usage and user behavior considering user activities in the entire user session. task T212225Yes Done
  • (stretch) A model for categorizing external links Postponed

Qualitative analysis

  • Perform the analysis and write the documentation on the second round of survey data about reader citation usage in one or more Wikipedia languages To do To do
  • Develop interview protocol and begin interview recruiting for contextual inquiry follow-up study In progress In progress

Dependency on formal collaborators

Status[edit]

Note Note: April 2019

Work in progress.

To do To do May 2019

Discussed...

To do To do June 2019

Discussed...

Outcome 2 / Output 3[edit]

Contributors, tool developers and partner organizations can understand and accelerate the referencing and linking of knowledge statements to external sources, catalogs, metadata providers, and content repositories.

A public reference event stream

Goal(s)[edit]

  • Fix citation stream bugs that affect the minimum viable product Yes Done

Status[edit]

Note Note: April 2019

The major bug has been fixed: task T216249.

To do To do May 2019

Discussed...

To do To do June 2019

Discussed...

Outcome 4 / Output 6[edit]

More knowledge professionals and other contributors are motivated to join the effort to build an open citation ecosystem, and are more able to actively improve the structure, quantity, and quality of citations on Wikimedia projects.

Host the WikiCite 2018 event

Dependencies on Technical Engagement and Community Programs

Goal(s)[edit]

Status[edit]

Note Note: April 2019

As described in the last update from Q3, this goal is now scheduled to be accomplished by the end of May 2019.

To do To do May 2019

Discussed...

To do To do June 2019

The report is prepared and delivered to the funder. The organizing committee is reviewing the report and iterating over it before it's publicly released.