The Discovery Department of Wikimedia Engineering has the mission to make the wealth of knowledge and content in the Wikimedia projects easily discoverable. We have a number of projects detailed below that focus us on creating and supporting new forms of discovery.
- 1 Projects
- 1.1 Search
- 1.2 Wikipedia.org portal
- 1.3 Maps
- 1.4 Wikidata Query Service (WDQS)
- 1.5 Analysis
- 1.6 APIs
- 1.7 Other
- 1.8 The team
- 1.9 Communications
- 1.10 Process
- 1.11 Conferences, gatherings, and other events
- 1.12 Updates
- 1.13 Archives
- 1.14 Data Analysis
- 1.15 Deployers
- 1.16 Code
Discovery is responsible for maintaining and enhancing the various Search features and APIs for MediaWiki. This includes the CirrusSearch extension which relies on, the search backend used at the Wikimedia Foundation to support Wikimedia projects.
Learn more about Search and the current work of the team. Current work by this team is tracked on this Phabricator workboard and on the public Search Analytics Dashboard that monitors and analyze the impact of our efforts, as well as the External Search Traffic dashboard that very broadly looks at where our requests are coming from.
- Implement advanced methodologies such as “learning to rank” machine learning techniques and signals to improve search result relevance across language Wikipedias.
- Perform load tests and A/B tests on new models to make sure they can be safely deployed to production
- When ready, deploy newly automated models which match (at a minimum) current performance of manually-configured search result relevance
- Improve support for multiple languages by researching and deploying new language analyzers as they make sense to individual language wikis.
- Perform research spikes to find new analyzers for different languages
- Test new analyzers to see if they are improvements (Japanese and Vietnamese)
- Deploy new / updated analyzers
- Deploy analyzers in progress from last quarter (Hebrew)
- Finish up testing of the Explore Similar feature on search results page
- Analyze data and gather community feedback
- Request internationalization strings
- Deploy feature
Many people discover Wikipedia via https://www.wikipedia.org/ (roughly 1.5-2% of our total page views) and the Discovery team has been improving the user experience for these visitors. Here is a report from 2015, detailing the initial analysis from the Discovery team about what we can do to make the portal better.
Learn more about the work around the Wikipedia.org portal project. Current work by this team is tracked on a Phabricator workboard and a listing of upcoming A/B tests can be found here. We also track usage on a public Portal Analytics Dashboard to monitor and analyze the impact of our efforts.
- Wikipedia.org Portal team will not have any dedicated goals this quarter other than continuing to maintain the page by fixing critical bugs and performing regular statistics and translations updates.
Discovery is about finding and navigating to content, and one way for users to do that is via maps. To provide better maps the team is working to make OpenStreetMap tiles available on all Wikimedia projects. The technical challenge is doing so at a scale sufficient for their widespread usage.
- The Maps goal for this quarter is to finalize and deploy the new map style, as well as monitoring the service for critical bugs and increasing the frequency of OSM replication.
Wikidata Query Service (WDQS)
Searching structured data on Wikidata is an integral part of Discovery in building the Wikidata query service. It provides a API through which tools can access Wikidata. Learn more about the Wikidata query service. Our current work is tracked on this Phabricator workboard and weekly deployments of WDQS are documented on wikitech:Deployments; while a public WDQS Analytics Dashboard is used to monitor and analyze the impact of our efforts.
- Wikidata Query Service goal for this quarter will be working on expanding category search in the query service, while also collecting SPARQL statistics; it will be maintained by Stas and Guillaume to support the continued growth and use of the service; the Analysis team will help with statistics.
The analysis group within Discovery manages the Discovery Dashboard, as well as analyzing A/B tests and other data. Learn more about the Discovery analysis team and even more information on how they do their analysis and the impact (on Meta). Current work by the analysis team is tracked on this Phabricator workboard
- The team will continue to work closely with the Search teams to analyze A/B tests and other assorted data; they will also prototype automated A/B test reports and investigate addition of Continuous Integration to their R codebases.
Application Programming Interfaces (APIs) provide developers ways to interact with the MediaWiki software.
For general questions about the work of the Discovery department, please see the FAQ. For any questions about the term "Knowledge Engine" please refer to this FAQ. You can find all of our data and key performance indicators on our data dashboard.
- An overview of Discovery's narrative and roadmap for FY 2017-18 (July 2017 - June 2018)
- An overview of Discovery's narrative and roadmap for FY 2016/17 (July 2016 - June 2017)
Below is a list of sub-teams in the Discovery Department. This list was last updated on June 8th, 2017.
Each sub-team lists the names and team roles (not job titles; those are listed in the staff and contractors page, and may or may not be the same as the person's team role) of anyone who spends a not insignificant amount of time on a project; this therefore means that some names are duplicated across teams.
These lists are only intended to roughly convey who is working on what; no guarantees are made that the list is accurate to any particular level of detail. If you have questions, please contact Deb Tankersley.
- Erik Bernhardson, Tech Lead, Software Engineer
- David Causse, Software Engineer
- Trey Jones, Software Engineer
- Stas Malyshev, Software Engineer
- Guillaume Lederrey, Operations Engineer
- Mikhail Popov, Data Analyst
- Chelsy Xie, Data Analyst
- Jan Drewniak, Software Engineer
- Paul Norman, Software Engineer Contractor
- Guillaume Lederrey, Operations Engineer
Wikidata Query Service
Discovery - A public mailing list about Wikimedia Discovery projects. Examples of topics would include:
- Announcements, including major upcoming initiatives, completed major releases, quarterly or annual plans, requests for feedback or input
- Technical discussions and brainstorming regarding our work:
- Search, Elastic, Cirrus, the Relevance Forge, and other relevant subjects
- The portal and associated work
- Our dashboards or related analysis
- Note that there is a separate list for maps (below)
- Departmental news, such as changes to team structure, significant changes to team process, changes in how we use phabricator or other tools like gerrit
Maps - Discussion and development coordinating the integration of OpenStreetMap and other free map sources into Wikimedia projects.
- for talking all Interactive Wikimedia projects - maps, graphs, etc.
- San Francisco
- Directly relevant
- Indirectly related (these sorts of meetup groups attract smart/enthusiastic people who like to spend their free time learning and solving problems)
Discovery uses a "scrumban" process, which is a hybrid of Scrum and Kanban. It is described here: Discovery/Process.
Conferences, gatherings, and other events
- Wikimania 2017 - Aug 11 - 13, 2017
- State of the Map US, October 19 - 22, 2017, Boulder Colorado (USA)
- Wikimedia Discovery/Past events
- Hackathon - May 19 - 20, 2017
- Discovery offsite - May 15-17, 2017
- csv,conf,v3 - A community conference for data makers everywhere - May 2-3 2017, Portland, OR
- Developers Summit - Jan. 9-11, 2017
- All Hands - Jan. 12-13, 2017
- DBpedia Community Meeting in California 2016 - October 27th 2016
- WikiConference NA 2016 - Oct 7-10 2016, talked about Wikidata and WDQS.
- State of the Map 2016 - Sept 23 - 25, 2016, Brussels (Belgium)
- Wikimania 2016
- Esino Lario, June 22 - 28
- State of the Map US 2016 - July 23 - 25, 2016, Seattle (USA)
Weekly Discovery status updates
- See Discovery/Status updates for the archive of past Discovery updates
Discovery is working on many different projects. These weekly summaries are an attempt to keep interested people up-to-date on what the department is currently working on. Weekly summaries are posted to this page and on the Discovery mailing list every Friday.
Contribute to the next edition at Discovery/Status updates/Next.
- Discovery/Status updates/2017-01-02
- Discovery/Status updates/2017-01-16
- Discovery/Status updates/2017-01-23
- Discovery/Status updates/2017-01-30
- Discovery/Status updates/2017-02-06
- Discovery/Status updates/2017-02-13
- Discovery/Status updates/2017-02-20
- Discovery/Status updates/2017-02-27
- Discovery/Status updates/2017-03-13
- Discovery/Status updates/2017-03-20
- Discovery/Status updates/2017-04-03
- Discovery/Status updates/2017-04-10
- Discovery/Status updates/2017-04-24
- Discovery/Status updates/2017-05-01
- Discovery/Status updates/2017-05-08
- Discovery/Status updates/2017-05-29
- Discovery/Status updates/2017-06-05
- Discovery/Status updates/2017-06-12
- Discovery/Status updates/2017-06-19
- Discovery/Status updates/2017-06-26
- Discovery/Status updates/2017-07-03
- Discovery/Status updates/2017-07-10
- Discovery/Status updates/2017-07-17
- Discovery/Status updates/2017-07-24
- Discovery/Status updates/2017-07-31
- Discovery/Status updates/2016-03-11
- Discovery/Status updates/2016-03-18
- Discovery/Status updates/2016-03-28
- Discovery/Status updates/2016-04-04
- Discovery/Status updates/2016-04-11
- Discovery/Status updates/2016-04-18
- Discovery/Status updates/2016-04-25
- Discovery/Status updates/2016-05-02
- Discovery/Status updates/2016-05-09
- Discovery/Status updates/2016-05-16
- Discovery/Status updates/2016-05-23
- Discovery/Status updates/2016-05-30
- Discovery/Status updates/2016-06-06
- Discovery/Status updates/2016-06-13
- Discovery/Status updates/2016-06-20
- Discovery/Status updates/2016-06-27
- Discovery/Status updates/2016-07-04
- Discovery/Status updates/2016-07-11
- Discovery/Status updates/2016-07-25
- Discovery/Status updates/2016-08-01
- Discovery/Status updates/2016-08-08
- Discovery/Status updates/2016-08-15
- Discovery/Status updates/2016-08-22
- Discovery/Status updates/2016-08-29
- Discovery/Status updates/2016-09-05
- Discovery/Status updates/2016-09-12
- Discovery/Status updates/2016-09-19
- Discovery/Status updates/2016-09-26
- Discovery/Status updates/2016-10-03
- Discovery/Status updates/2016-10-10
- Discovery/Status updates/2016-10-17
- Discovery/Status updates/2016-10-24
- Discovery/Status updates/2016-10-31
- Discovery/Status updates/2016-11-07
- Discovery/Status updates/2016-11-14
- Discovery/Status updates/2016-11-28
- Discovery/Status updates/2016-12-05
- Discovery/Status updates/2016-12-12
- Discovery/Status updates/2016-12-19
- Q4 2014-15 (2015-07-07)
- Q1 2015-16 (2015-10-05)
- Q2 2015-16 (2016-01-21)
- Q3 2015-16 (2016-04-11)
- Q2-Q3 2016-2017
- Q4 2016/17 (2017-07) (starting on page 66)
- Search Analytics Dashboard
- Portal Analytics Dashboard
- Maps Analytics Dashboard
- Wikidata Query Service Analytics Dashboard
- API Analytics Dashboard
- External Traffic Analytics Dashboard
The data access and analysis guidelines used by the Discovery team around data sources, or by other teams around Discovery data sources, are documented on Meta.
Useful reference for who can deploy code. It's nice to know whom to bug if you need something:
Discovery team supports the following code:
|Wikidata Query Service||https://phabricator.wikimedia.org/diffusion/WDQR/|
|Wikidata Query Service GUI||https://phabricator.wikimedia.org/diffusion/WDQG/|
|WDQS GUI deployment|
||This page or project is maintained by the Discovery Department.