Wikimedia Technology/Annual Plans/FY2019/CDP3: Knowledge Integrity

From mediawiki.org
Program Manager: Jake Orlowitz
Executive Sponsor (C-level): Erika Bjune

CDP Goals and Outcomes[edit]

Annual Plan FY18-19 topline goals
#2) Knowledge as a Service - increase reach

How does your program affect the annual plan topline goal:

Facts matter—but they do not live alone. Facts are part of a chain of verifiability, providing every person with the tools to learn how we know what we know. References to reliable sources are the foundation of Wikipedia’s trustworthiness and its broad adoption as a global source for knowledge. The Wikimedia movement can only fulfil its mission of distributing free knowledge globally and effectively if it acts as a gateway for readers to reach reliable sources that underpin our content. To support this goal, we need an infrastructure built on fact provenance, open citation standards, and interoperability that will empower readers with the best possible tools for critically consuming information. The yearlong plan described below will lay the foundations for this longer arc of work:  strengthening our reference infrastructure, expanding our network of knowledge partners, building the public’s awareness of how Wikimedians vet facts and sources, and centering Wikimedia in the pressing conversation about combating misinformation. Together, this program sets a course to establish Wikimedia as the backbone of the trustworthy web while expanding Wikimedia’s reach to a broader knowledge ecosystem.

Program Goal
Wikimedia sites provide the most trustworthy, comprehensive, neutral information across topics and languages by referencing this information to vetted reliable sources and linking it to external content providers and metadata repositories, making Wikimedia projects the central gateway to access citable information in the knowledge ecosystem.
Outcome 1 (Research)
Wikimedia contributors are better able to focus and prioritize their sourcing efforts and Product teams can build the best user experiences to support readers’ learning goals and their digital literacy.
Outcome 2 (Infrastructure and tooling)
Contributors, tool developers and partner organizations can understand and accelerate the referencing and linking of knowledge statements to external sources, catalogs, metadata providers, and content repositories.
Outcome 3 (Access and preservation)
Resources cited across Wikimedia projects are accessible to readers in perpetuity, thanks to technical partnerships securing their preservation and digitization.
Outcome 4 (Outreach)
More knowledge professionals and other contributors are motivated to join the effort to build  an open citation ecosystem, and are more able to actively improve the structure, quantity, and quality of citations on Wikimedia projects.
Outcome 5 (Awareness)
The public has increased awareness and understanding of the processes Wikimedians follow to verify and fact-check information, and on the benefits of open, auditable, linked information ecosystems.

CDP Targets[edit]

Outcome 1 (Research) Target 1 Measurement method
Wikimedia contributors are better able to focus and prioritize their sourcing efforts and Product teams can build the best user experiences to support readers’ learning goals and their digital literacy. Output 1 & 2

A study on the state of sourcing and verifiability of Wikimedia projects is delivered.

Output 1 & 2

Availability of the study.

Outcome 2 (Infrastructure & tooling) Target 2 Measurement method
Contributors, tool developers and partner organizations can understand accelerate the referencing and linking of knowledge statements to external sources, catalogs, metadata providers, and content repositories. Output 3

An event stream tracking the creation and modification of external links and references across Wikimedia projects is delivered.

Output 3

Availability of the service.

Output 4

A study on the impact of algorithmic methods on citation gaps is delivered.

Output 4

Availability of the study.

Output 5 & 11

End-to-end integration of Citoid in Wikidata.

Output 5 & 11

The integration is completed.

Outcome 3 Target 3 Measurement method
Resources cited across Wikimedia projects are accessible to readers in perpetuity, thanks to technical partnerships securing their preservation and digitization. Output 7

Compared to last year, archive links from all Wikipedia languages.

Reduce the time between link creation and link archiving to less than 1 minute.

Increase by 50% the volume of links recovered in Wikipedia outside of the English language edition.

Output 7

Currently, link archiving via InternetArchiveBot is active only on 12 projects.

We’ll measure the lag by using the new event stream.

Links recovered to date by language are:

als, 935

bar, 6,588

ckb, 1,359

en, 2,776,866

es, 129,276

it, 107,646

nl, 86,559

no, 139,735

ru, 173,380

species, 31,766

sv, 201,660

zh, 325,676

Outcome 4 Target 4 Measurement method
More knowledge professionals and other contributors are motivated to join the effort to build an open citation ecosystem, and are more able to actively improve the structure, quantity, and quality of citations on Wikimedia projects. Output 6 & 8

Increase attendance of WikiCite outreach through satellite events by 50% over the current participation baseline.

Output 6 & 8

Historical attendance statistics.

Output 9

#1Lib1Ref increases number of edits from librarians by 50%;, number of editors by 25%;  5 more languages participate and are tracked. OAbot raises the number of links added and increases unique participants by 50%; 2 non-English Wikipedias participate.

Output 9

Using the #Hashtag Tools data dump to analyze #1Lib1Ref participants and contributions vs. last year; doing the same with #OAbot’s internal data.

Outcome 5 Target 5 Measurement method
The public has increased awareness and understanding of the processes Wikimedians follow to verify and fact-check information, and of the benefits of open, auditable, linked information ecosystems.   Output 10

10,000 people read or receive blogs, tweets, or presentations about Knowledge Integrity, Wikipedia’s reliability, citation infrastructure, and fact-checking power. 4 high-profile press stories drawing from Knowledge Integrity narratives

Output 10

WMF Blog, Medium.com, and Twitter.com stats; conference attendance numbers. Press coverage.

CDP Budget Segment 1[edit]

Team: Research (Technology)
Outcome 1 (Research)
Wikimedia contributors are better able to focus and prioritize their sourcing efforts and Product teams can build the best user experiences to support readers’ learning goals and their digital literacy.
Output 1: A map of verifiability of information in Wikimedia projects
We will conduct and publish research to map the “state of verifiability” of free knowledge by conducting analyses of what content in Wikipedia and Wikidata are unsourced or in need of citations, and which existing sources cited across Wikimedia projects are accessible by the general public.
Output 2: Research to understand how readers use citations
We will conduct research to understand how readers use citations by combining quantitative and qualitative analysis to identify information quality and sourcing gaps, in order to determine to what degree readers’ learning goals are met by consuming Wikimedia content alone rather than requiring references and links to external resources.
Outcome 2 (Infrastructure and tooling)
Contributors, tool developers and partner organizations can understand and accelerate the referencing and linking of knowledge statements to external sources, catalogs, metadata providers, and content repositories.
Output 3: A public reference event stream
We will create a robust, real-time event stream tracking the creation and modification of external links and references across Wikimedia projects. This data stream will provide tool developers, content partners, and other data consumers (libraries and GLAM institutions, metadata organizations, researchers, altmetrics providers) a canonical data source to track and contribute to the sourcing work of Wikimedia volunteers. This is a dependency for the link rot initiative (Segment 2 (Programs) • Output 6)
Output 4: Smarter tools and recommender systems to add citations
We will improve tools to identify unsourced statements, such as Citation Hunt, that are heavily relied upon by outreach events and campaign organizers, with algorithmically generated recommendations. We will continue to develop and test algorithmic methods to help volunteers identify and fill citations gaps, amplifying the reach and impact of initiatives such as 1Lib1Ref and extending pilot we conducted in FY 2018.
Output 5: More usable interfaces to source Wikidata statements
We will conduct research on integrating Citoid in Wikidata, aiming to drastically reduce the number of unsourced statements in Wikidata at risk of deletion and to facilitate their reuse across other Wikimedia projects. (dependent on Segment 4 (WMDE) • Output 5)
Outcome 4 (Outreach)
More knowledge professionals and other contributors are motivated to join the effort to build  an open citation ecosystem, and are more able to actively improve the structure, quantity, and quality of citations on Wikimedia projects.
Output 6: Funding the WikiCite event series
Fundraise for the annual meeting in the WikiCite series and set of satellite events, to improve the sustainability and global reach of the initiative.

Resources[edit]

Baseline[edit]

  • 1.75 FTE (4 researchers, 1 software engineer – Output 1 & 4)
  • 0.25 FTE (1 design researcher, 1 software engineer – Output 2 & 5)
  • 0.50 FTE (1 software engineer – Output 3)

Other costs[edit]

  • 100K (WikiCite 2018 restricted grant, year 1 – Output 7)

Dependencies[edit]

  • Analytics, Reading Infrastructure, Services (Output 3)
  • Audiences, Wikimedia Deutschland (Output 5)
  • Programs (CE) (Output 1, 4 & 5)
  • Advancement (Output 7)

CDP Budget Segment 2[edit]

Team: Programs (Community Engagement)
Outcome 3 (Access and preservation)
Resources cited across Wikimedia projects are accessible to readers in perpetuity, thanks to technical partnerships securing their preservation and digitization.
Output 7: Initiatives to prevent link rot
We will deepen a partnership with the Internet Archive to facilitate the immediate and widespread caching of resources linked from Wikimedia projects, and to prioritize external efforts to digitize sources cited in Wikimedia projects.
Outcome 4 (Outreach)
More knowledge professionals and other contributors are motivated to join the effort to build  an open citation ecosystem, and are more able to actively improve the structure, quantity, and quality of citations on Wikimedia projects.
Output 8: Hosting the WikiCite event series
Host the annual WikiCite gathering and extend promotion efforts over a broader timeframe to include a set of satellite events in order to improve the sustainability and global reach of the event.
Output 9: Contribution campaigns
1Lib1Ref runs twice a year in January and May (instead of once) and is expanded to include references for statements on Wikidata; the second #OAbot campaign during Open Access Week adds more links to free-to-read versions alongside closed access sources.
Outcome 5 (Awareness)  
The public has increased awareness and understanding of the processes Wikimedians follow to verify and fact-check information, and on the benefits of open, auditable, linked information ecosystems.
Output 10: An audience map and communication plan
We will establish an audience map of our ecosystem and develop unique communications strategies based on each audience we would like to reach. Strategies may include writing for our blog, writing for other outlets, social media, and creating an events messaging strategy (the story we tell to each audience) which could include a blog series, Twitter tactics, and presentations at key conferences/events.

Resources[edit]

Baseline[edit]

  • 0.5 FTE (Program manager - supports all outputs, lead on 7, 8, & 9)
  • 0.25 FTE (1 Library Specialist – Output 7 & 9)
  • 0.1 FTE (Lead Programs Manager - Output 6, 7, 8, 9)
  • 100 hrs (Contractor - Output 9)

Growth[edit]

  • 0.25 FTE (1 Product and Metrics Analyst – Output 6 & 8)

Dependencies[edit]

  • Analytics, Reading Infrastructure, Services (Output 3 & 4)
  • Audiences, Wikimedia Deutschland (Output 5)
  • Communications (Output 9 & 10)
  • Research, Advancement (Output 6)

CDP Budget Segment 3[edit]

Team: Wikimedia Deutschland
Outcome 2 (Infrastructure and tooling)
Contributors, tool developers and partner organizations can understand and accelerate the referencing and linking of knowledge statements to external sources, catalogs, metadata providers, and content repositories.
Output 11: More usable interfaces to source Wikidata statements
We will work on the integration of Citoid in Wikidata, aiming to drastically reduce the number of unsourced statements in Wikidata at risk of deletion and to facilitate their reuse across other Wikimedia projects.

Resources[edit]

(Output 5)

Baseline[edit]

Experimental Citoid integration in Wikidata:

  • 0.25 FTE (contracts) on WMF
  • 0.25 FTE on WMDE

Growth[edit]

Full Citoid integration in Wikidata (assuming additional funding):

  • 0.25 FTE (contracts) on WMF for maintenance and consulting
  • 2 FTE on WMDE (development, design, PM)

Dependencies[edit]

  • Design Research support from Research (see Segment 1 (Research) • Output 5)