The Discovery Department of Wikimedia Engineering has the mission to make the wealth of knowledge and content in the Wikimedia projects easily discoverable. We have a number of projects detailed below that focus us on creating and supporting new forms of discovery.
- 1 Projects
- 2 The team
- 3 Communications
- 4 Conferences, gatherings, and other events
- 5 Updates
- 6 Data Analysis
- 7 Deployers
- 8 Code
Discovery is responsible for maintaining and enhancing the various Search features and APIs for MediaWiki. This includes the CirrusSearch extension which relies on, the search backend used at the Wikimedia Foundation to support Wikimedia projects.
Learn more about Search and the current work of the team.
Current work by this team is tracked on this Phabricator workboard.
Search Analytics Dashboard - Public dashboard to monitor and analyze the impact of our efforts.
- Enable the backend for cross-project indices on all Wikimedia projects.
- Enrich the data stored in ElasticSearch to allow targeted improvements to result relevance.
Many people discover Wikipedia via https://www.wikipedia.org/ (roughly 1.5-2% of our total page views). The Discovery team is looking at how to improve the user experience for these visitors. Here is an initial analysis from the Discovery team.
Learn more about the work around the Wikipedia.org portal project.
Portal Analytics Dashboard - Public dashboard to monitor and analyze the impact of our efforts.
External Search Traffic - External search engines metric that very broadly looks at where our requests are coming from.
- No dedicated goals this quarter (See the note below this table for more information)
Discovery is about finding and navigating to content, and one way for users to do that is via maps. To provide better maps the team is working to make OpenStreetMap tiles available on all Wikimedia projects. The technical challenge is doing so at a scale sufficient for their widespread usage.
Learn more about the Maps project.
Work by this team is tracked on this Phabricator workboard
The team's roadmap can be viewed here; it was finalized in Nov 2016 for FY 2016/2017.
Maps Analytics Dashboard - Public dashboard to monitor and analyze the impact of our efforts.
- Increase maps and graphs usage on Wikipedia
- Enable shareable Geoshapes and Tabular data storage on Commons
Wikidata Query Service (WDQS)
Learn more about the Wikidata query service.
Current work by this team is tracked on this Phabricator workboard
Weekly deployments of WDQS are documented on wikitech:Deployments.
WDQS Analytics Dashboard - Public dashboard to monitor and analyze the impact of our efforts.
- Wikidata Query Service will not have any dedicated goals this quarter.
Learn more about the Discovery analysis team and even more information on how they do their analysis and the impact (on Meta).
Current work by the analysis team is tracked on this Phabricator workboard
- This quarter, instead of having team-specific goals, the analysis team will be supporting the other team's goals.
Application Programming Interfaces (APIs) provide developers ways to interact with the MediaWiki software.
API:Search and discovery lists the search APIs available and in development.
API Analytics Dashboard - Public dashboard to monitor and analyze the impact of our efforts.
For general questions about the work of the Discovery department, please see the FAQ.
For any questions about the term "Knowledge Engine" please refer to this FAQ.
You can find all of our data and key performance indicators on our data portal.
Below is a list of sub-teams in the Discovery Department. This list was last updated on 17th October 2016.
Each sub-team lists the names and team roles (not job titles; those are listed in the staff and contractors page, and may or may not be the same as the person's team role) of anyone who spends a not insignificant amount of time on a project; this therefore means that some names are duplicated across teams.
These lists are only intended to roughly convey who is working on what; no guarantees are made that the list is accurate to any particular level of detail. If you have questions, please contact Dan Garry.
- Erik Bernhardson, Tech Lead, Software Engineer
- David Causse, Software Engineer
- Trey Jones, Software Engineer
- Stas Malyshev, Software Engineer
- Guillaume Lederrey, Operations Engineer
- Mikhail Popov, Data Analyst
- Chelsy Xie, Data Analyst
- Dan Garry, Product Owner
- Yuri Astrakhan, Product Owner, Software Engineer
- Max Seminik, Software Engineer
- Julien Girault, Software Engineer
- Guillaume Lederrey, Operations Engineer
- Deb Tankersley, Product Support
Wikidata Query Service
- Katie Horn, Head of Discovery
- Erika Bjune, Engineering Manager
- Dan Garry, Product Lead
- Kevin Smith, Agile Coach
- Chris Koerner, Community Liaison
Discovery - A public mailing list about Wikimedia Discovery projects. Examples of topics would include:
- Announcements, including major upcoming initiatives, completed major releases, quarterly or annual plans, requests for feedback or input
- Technical discussions and brainstorming regarding our work:
- Search, Elastic, Cirrus, the Relevance Forge, and other relevant subjects
- The portal and associated work
- Our dashboards or related analysis
- Note that there is a separate list for maps (below)
- Departmental news, such as changes to team structure, significant changes to team process, changes in how we use phabricator or other tools like gerrit
Maps - Discussion and development coordinating the integration of OpenStreetMap and other free map sources into Wikimedia projects.
- for talking all Interactive Wikimedia projects - maps, graphs, etc.
- San Francisco
- Directly relevant
- Indirectly related (these sorts of meetup groups attract smart/enthusiastic people who like to spend their free time learning and solving problems)
Conferences, gatherings, and other events
- DBpedia Community Meeting in California 2016 - October 27th 2016
- Developers Summit - Jan. 9-11, 2017
- All Hands - Jan. 12-13, 2017
- Hackathon - May 19 - 20, 2017
- Wikimedia Discovery/Past events
- WikiConference NA 2016 - Oct 7-10 2016, talked about Wikidata and WDQS.
- State of the Map 2016 - Sept 23 - 25, 2016, Brussels (Belgium)
- Wikimania 2016
- Esino Lario, June 22 - 28
- State of the Map US 2016 - July 23 - 25, 2016, Seattle (USA)
Weekly Discovery status updates
- See Discovery/Status updates for the archive of past Discovery updates
This is the weekly update for the week starting 2016-11-07
- Many older search tickets that were in the backlog were resolved this week - due to work being completed previously.
- Opened a discussion with the Wikipedia Ambassadors community requesting volunteer wikis that want to be part of upcoming A/B tests for cross-wiki search results.
- Double suggestion when searching on wikipedia with limit/offset
- EPIC: Review current ElasticSearch configuration, and use relevance lab to run tests to optimise the configuration to improve search result relevance
- CirrusSearch should do something helpful if the search does not return enough results
- CirrusSearch: No highlighted text returned from `intitle:` phrase searches
- Improve searches on poor spellings
- Add extra breakdowns to dashboards, e.g. by country, by language
- CirrusSearch: More search results when narrowing down search term
- "San Lorenzo (quartiere di Napoli)" not first match when searching the words in different order
- Implement a new fulltext query
- Poorly tuned rankings
- opensearch: Querying for "Big" or "Big!" should include "Big" or "Big!" as first suggestion
- "Ignoring nonexistent page" that does exist
Search tickets to be released in next week's train:
- Options for Completion Suggester misaligned when description uses more than one line
- Image search by file size - will be live when Commons is reindexed
- 'Verified data pipeline for BM25 AB test - ja, zh, th
- [Dashboard][Search] Make monthly metrics module work again
- Investigate if Interactive logging schema makes sense
- crickets while waiting for code reviews
- <maplink> does not work in Monobook skin
- <maplink> does not work on "Modern" skin
- Maps "align" attribute definitions, review and fix
Interactive tickets to be released in next week's train:
Events and News
- Something goes here
Other Noteworthy Stuff
Did you know?
- Something goes here
- Something goes here
- View all open tickets related to Discovery.
- Looking to get involved? See tasks marked as Easy or volunteer needed
- Search Analytics Dashboard
- Portal Analytics Dashboard
- Maps Analytics Dashboard
- Wikidata Query Service Analytics Dashboard
- API Analytics Dashboard
- External Traffic Analytics Dashboard
The data access and analysis guidelines used by the Discovery team around data sources, or by other teams around Discovery data sources, are documented on Meta.
Useful reference for who can deploy code. It's nice to know whom to bug if you need something:
Discovery team supports the following code: