Wikimedia Search Platform

The Search Platform team (part of Wikimedia Technology) is responsible for maintaining and enhancing the various Search features and APIs for MediaWiki. This includes the CirrusSearch extension which relies on Elasticsearch, the search backend used at the Wikimedia Foundation to support Wikimedia projects.

Current work by this team is tracked on this Phabricator workboard and on the public Search Analytics Dashboard that monitors and analyze the impact of our efforts, as well as the External Search Traffic dashboard that very broadly looks at where our requests are coming from.

The Search Platform team was formerly part of the Discovery Department in Audiences; but, as part of the re-organization (tune-up) of June 2017, the Search Platform team is now part of Technology.
 * Pages of historical note:
 * * Discovery Department (April 2015 - December 2017)
 * * Search (prior to April 2015)

Search
Current Goals (FY 2017-18 Q3)
 * Objective 1: Implement advanced methodologies such as “learning to rank” machine learning techniques and signals to improve search result relevance across language Wikipedias.
 * Create and test advanced parser features
 * Evaluate and build new features for machine learning pipeline (T162279)
 * Begin to build relationships with external information retrieval researchers
 * Category search (keywords for sub-category searching)


 * Objective 2: Improve support for multiple languages by researching and deploying new language analyzers as they make sense to individual language wikis.
 * Continue to investigate morphological libraries for ElasticSearch plugins.
 * Implement Serbian, investigate Slovak
 * Improve search by using fuzzy (phonetic) language matching.
 * Continue general language support.
 * Investigate language analyzer config options


 * Objective 3: Investigate how to expand and scale Wikidata Query Service to improve its ability to power features on-wiki for readers
 * Acquire and productionize six new servers for WDQS, (see T178548)
 * Set up individual internal and external service endpoints with enhanced features for expert users


 * Address technical debt:
 * Elasticsearch 5.6/Logstash 5.6/Kibana 5.6 (ELK stack)
 * Maintain APIs
 * Translation extension

Structured Data on Commons
Current Goals (FY 2017-18 Q3)
 * Objective 1: Commons search will be extended via CirrusSearch and ElasticSearch and Wikidata Query Service, to support searching based on structured data elements describing media.
 * Search for file captions, including multilinguality (there will be multilingual file captures, there might be file summaries, and there might be additional related functionality implied designs when received); (also general design for search on FE)


 * Objective 2: Advanced search capabilities (e.g., Wikidata Query Service, SPARQL queries) will be updated to support the more specific media search filters and the relationships to the topics they represent
 * Upgrade and re-implement full-text search on ElasticSearch on Wikidata
 * Investigate using MCR with Wikidata

Wikidata Query Service (WDQS)
Searching structured data on Wikidata is an integral part of Discovery in building the Wikidata query service. It provides a SPARQL API through which tools can access Wikidata. Learn more about the Wikidata query service. Our current work is tracked on this Phabricator workboard and weekly deployments of WDQS are documented on Deployments; while a public WDQS Analytics Dashboard is used to monitor and analyze the impact of our efforts.

APIs
Application Programming Interfaces (APIs) provide developers ways to interact with the MediaWiki software.

API:Search and discovery lists the search APIs available and in development. View our public API Analytics Dashboard to monitor and analyze the impact of our efforts.

The team
This list was last updated on December 12th, 2017.


 * Erik Bernhardson, Tech Lead, Software Engineer
 * David Causse, Software Engineer
 * Trey Jones, Software Engineer
 * Stas Malyshev, Software Engineer
 * Guillaume Lederrey, Operations Engineer
 * Erika Bjune, Engineering Manager
 * Deb Tankersley, Product Owner
 * Chris Koerner, Community Liaison

Weekly status updates
See Discovery weekly status updates for the archive of past team updates (Subscribe)

Mailing lists
Search Platform - A public mailing list about the Wikimedia Search Platform team and projects (formerly Discovery Department). Examples of topics would include:
 * Announcements, including major upcoming initiatives, completed major releases, quarterly or annual plans, requests for feedback or input
 * Technical discussions and brainstorming regarding our work:
 * Search, Elastic, Cirrus, the Relevance Forge, and other relevant subjects
 * Our dashboards or related analysis
 * Other team news, such as changes to team structure, significant changes to processes, changes in how we use phabricator or other tools like gerrit

Meetup groups

 * San Francisco
 * Directly relevant
 * Bay Area NLP (natural language processing, not neuro-linguistic programming)
 * San Francisco text
 * Elasticsearch San Francisco
 * Indirectly related (these sorts of meetup groups attract smart/enthusiastic people who like to spend their free time learning and solving problems)
 * Silicon Valley Java user group
 * San Francisco PHP
 * Bay Area Haskell users group
 * Scala study group
 * SF JavaScript
 * Oakland advanced Scala study group

Process
The Search Platform team uses a "scrumban" process, which is a hybrid of Scrum and Kanban. It is described here: Search Platform Process.

Upcoming events

 * Hackathon 2018 - 18 – 20 May 2018
 * Wikimania 2018 - July 18-22, 2018
 * 17th International Semantic Web Conference (ISWC 2018) - October 8-12, 2018

Past events

 * All Hands - January 2018

Data Analysis
The data access and analysis guidelines used by the Search Platform team around data sources, or by other teams around Search Platform data sources, are documented on Meta.
 * Search Analytics Dashboard
 * Wikidata Query Service Analytics Dashboard
 * API Analytics Dashboard
 * External Traffic Analytics Dashboard

Deployers
Useful reference for who can deploy code. It's nice to know whom to bug if you need something:

Code
Discovery team supports the following code: