Wikimedia Search Platform
The Search Platform team (part of Wikimedia Technology) is responsible for maintaining and enhancing the various Search features and APIs for MediaWiki. This includes the CirrusSearch extension which relies on Elasticsearch, the search backend used at the Wikimedia Foundation to support Wikimedia projects.
The public Search Analytics Dashboard monitors and analyze the impact of our efforts, as well as the External Search Traffic dashboard that very broadly looks at where our requests are coming from. Please note that these boards are no longer being updated as of September 2019 and are only for historical purposes.
- Pages of historical note:
Our mission is to help people easily discover knowledge on Wikipedia and its sister projects by providing tools and infrastructure for casual readers and expert users with precise needs, while maintaining a strong emphasis on privacy.
- We operate and maintain a disparate collection of production services related to content discovery, enabling the wiki community to find information that is not available through simply following links. We also provide a platform on which other people can create tools to support editing and other workflows.
- We provide an open-source search engine, backed by an inverted index for non-structured on-wiki data. We work to develop more sophisticated searching with machine learning and natural language processing.
- We provide a SPARQL-based query service for Wikidata, encouraging users to capitalize on this vast store of computer-readable structured data for use on-wiki and in knowledge discovery.
- We endeavor to support underserved wiki communities, and we rely on those communities to help us understand their needs and evaluate potential solutions, especially with respect to underserved languages.
- We prioritize privacy for logged-in users and anonymity for logged-out users over almost everything else, even when it slows down or complicates development or hinders our ability to collect or use data.
The Search Platform team's goals for FY 2019/20 are part of the entire Technology Department's goals here.
Wikidata Query Service (WDQS)
Searching structured data on Wikidata is an integral part of Discovery in building the Wikidata query service. It provides a API through which tools can access Wikidata. Learn more about the Wikidata query service. Our current work is tracked on this Phabricator workboard and weekly deployments of WDQS are documented on wikitech:Deployments; while a public WDQS Analytics Dashboard is used to monitor and analyze the impact of our efforts.
Application Programming Interfaces (APIs) provide developers ways to interact with the MediaWiki software.
This list was last updated on May 11th, 2020.
- Erik Bernhardson, Tech Lead, Senior Software Engineer
- David Causse, Software Engineer
- Trey Jones, Senior Software Engineer
- Zbyszko Papierski, Senior Software Engineer
- Maryum Styles, Associate Software Engineer
- Guillaume Lederrey, Engineering Manager
- Deb Tankersley, Product Manager/Advisor
- Carly Bogen, Program Manager
- Ryan Kemper, Site Reliability Engineer
Weekly status updates
Search Platform - A public mailing list about the Wikimedia Search Platform team and projects (formerly Discovery Department). Examples of topics would include:
- Announcements, including major upcoming initiatives, completed major releases, quarterly or annual plans, requests for feedback or input
- Technical discussions and brainstorming regarding our work:
- Search, Elastic, Cirrus, the Relevance Forge, and other relevant subjects
- Our dashboards or related analysis
- Other team news, such as changes to team structure, significant changes to processes, changes in how we use phabricator or other tools like gerrit
- San Francisco
- Directly relevant
- Indirectly related (these sorts of meetup groups attract smart/enthusiastic people who like to spend their free time learning and solving problems)
The Search Platform team uses a "scrumban" process, which is a hybrid of Scrum and Kanban. It is described here: Search Platform Process.
Conferences, gatherings, and other events
- May 2020, Hackathon, Tirana, Albania
- All Hands - January 2018
- Hackathon 2018 - 18 – 20 May 2018
- Wikimania 2018 - July 18-22, 2018
- 17th International Semantic Web Conference (ISWC 2018) - October 8-12, 2018
- October 22 - 25, 2018, Wikimedia Technical Conference (WMTechConf, formerly known as DevSummit) in Portland, Oregon
- Late January / early February 2019, All-Hands, San Francisco
- May 2019, Hackathon, Prague
- Late January / early February 2020, All-Hands, San Francisco
Data Analysis Archive
- Search Analytics Dashboard
- Wikidata Query Service Analytics Dashboard
- API Analytics Dashboard
- External Traffic Analytics Dashboard
The data access and analysis guidelines used by the Search Platform team around data sources, or by other teams around Search Platform data sources, are documented on Meta. Please note that these boards are no longer being updated as of September 2019 and are only for historical purposes.
Useful reference for who can deploy code. It's nice to know whom to bug if you need something:
Discovery team supports the following code:
|Wikidata Query Service||https://phabricator.wikimedia.org/diffusion/WDQR/|
|Wikidata Query Service GUI||https://phabricator.wikimedia.org/diffusion/WDQG/|
|WDQS GUI deployment|
|Lucene Explain Parser||https://phabricator.wikimedia.org/diffusion/WLEP/|