Wikimedia Discovery/FDC Proposal

The Wikimedia movement has content discovery issues. We have a wealth of content buried in Wikipedia, Commons, Wiktionary, and our other projects, which users simply cannot find in an easy fashion. Discovery’s opportunity is to improve content discovery mechanisms to connect the dots and build bridges between projects to alleviate this problem, and expose users to the wealth of relevant content that exists but they cannot find.

Discovery’s focus in FY 2015/16 was to measure and improve the fundamentals of search, and experiment with other content discovery mechanisms such as maps and complex queries to Wikidata. To that end, we launched the completion suggester in production on all wikis, piloted improvements to the search experience on wikipedia.org, launched experimental versions of a maps tile service and query service for Wikidata, and migrated all Wikivoyages to the new maps service.

In FY 2016/17, the Discovery Department strategic direction will focus on improving search and content discovery on Wikimedia projects. We will do this through improving our user satisfaction KPI across all Wikimedia projects focusing on surfacing imagery, maps, metadata, and sibling project content for on wiki and www.wikipedia.org search results. We will continue to explore new experimental mechanisms to encourage creation and discovery for community participation and innovation.

Strategic: Focus area 1
Improve Search on Wikimedia projects

Project description
Improve our search result relevancy by incorporating new heuristics that affect result ranking, surface cross-project and cross-language results, and explore commons media search.

Reasoning

 * We have a wealth of content available in other languages/projects/media that is not exposed; we should expose it!
 * Users should not have to rely on external services to search our site; we know our content better than anyone else
 * GLAM contributes a significant amount of content to Commons but has one of the worst search experiences

Dependencies

 * Continued support of a community liaison assigned at 100% time to Discovery to facilitate an increase in communication that otherwise would not be sustainable under Comms
 * Continued support of an agile coach assigned at 100% time to Discovery to maintain and evolve team practices in Discovery, ensuring optimal delivery of user value
 * Technical Operations: maintaining Wikimedia Labs which our dashboarding runs on to not incur additional load on our operations hire
 * Wikimedia DE: A commitment for structured data on Commons so that we don't have to develop an interim data pipeline

Milestones / Results / Impact

 * Significantly improve user experience of on-wiki search results by overhauling user interface to include imagery and metadata
 * Push changes to turn www.wikipedia.org into a portal for exploring content across all Wikimedia projects and languages
 * Expose relevant results from other languages and projects to users who search on-wiki
 * Explore incorporating structured data into relevance rankings across projects like Wikipedia, Commons, etc

Foundation impact

 * Readers and Editors can discover relevant content more easily
 * Less reliance on external services to search Wikipedia
 * Sibling projects content is exposed more consistently and encourages broader participation

KPIs

 * Search user engagement
 * Zero results rate

Strategic: Focus area 2
Evolve Content Discovery on Wikimedia Projects

Project description
Explore and implement new ways for users to create and discover content

Reasoning

 * A picture is worth a thousand words. But if we do not simplify and encourage rich media, our mission of educating will not be as successful.
 * Users expect and request new ways of using rich media to find, interact, and contribute content.

Dependencies

 * Hardware for Maps to move from experimental to production
 * Consistent and extensive community engagement to evaluate and focus our efforts
 * Reading: Engagement of front end and design resources to improve maps and graphs

Milestones/Results/Impact

 * Launch maps service for Wikipedia
 * Empower community to surface more content based on maps, Wikidata, and other data sources - Create more ways to interlink content across projects, allowing easier content discovery.

Movement impact

 * Technologies allowing improvements to content quality and increase educational value, fulfilling our primary goal.
 * Increases goodwill - these new capabilities directly correspond to the community and readership requests

KPIs

 * Map usage
 * Graph usage
 * "Discoverability" increase - number of links followed due to new technology
 * Increased usage of WDQS

Core: Focus area 1
Split ElasticSearch cluster

Project description
Split ElasticSearch cluster into two separate "big wikis" and "small wikis" cluster to improve performance and future scaling

Rationale

 * Our ElasticSearch cluster is monolithic which will have implications for scaling up to meet increased server load and demand

Dependencies

 * Technical Operations: data centre ops racking up servers
 * Technical Operations: coordination with Discovery Operations to maintain cluster

Milestones/Results/Impact

 * Maintain 99th percentile load time below 1.1s for prefix search, 4s for full text search, 2s for more like, and 2s for geodata
 * Maintain an n+1 configuration of ElasticSearch clusters such that the remaining data centre can serve traffic load whilst maintaining 99th percentile targets
 * Rack up additional server capacity for Q2 in preparation for entering full service in Q3

Movement impact

 * Maintain existing functionality with expected latency
 * Support smaller wikis better with more dedicated hardware
 * Better tolerance and resilience to outages

KPIs

 * Search load times
 * Search API usage

Core: Focus area 2
Expand capacity of ElasticSearch cluster

Project description
Scale and expand Core services (WDQS and Elastic Search) to meet usage demands

Rationale

 * Demand on the ElasticSearch cluster has increased significantly due to extra functionality such as "more like" functionality; search is critical to content discovery and should be well supported.

Dependencies

 * Technical Operations: data centre ops racking up servers
 * Technical Operations: coordination with Discovery Operations to maintain cluster

Milestones/Results/Impact

 * Improve ability to monitor, alert and understand what is happening in the clusters.
 * Improve performance of APIs
 * Improve documentation of APIs

Foundation impact

 * More consumers can use "more like" functionality
 * Existing consumers can continue to depend on "more like" functionality
 * Cleaner and better documented interfaces will increase usage

KPIs

 * Search load times
 * Search API usage