Wikimedia Discovery

The Discovery Department of Wikimedia Engineering has the mission to make the wealth of knowledge and content in the Wikimedia projects easily discoverable. We have a number of projects detailed below that focus us on creating and supporting new forms of discovery.

Search
Discovery is responsible for maintaining and enhancing the various Search features and APIs for MediaWiki. This includes the CirrusSearch extension which relies on Elasticsearch the search backend used at the Wikimedia Foundation to support Wikimedia projects.

Learn more about Search and the current work of the team.

Current work by this team is tracked on this Phabricator workboard.

Search Analytics Dashboard - Public dashboard to monitor and analyze the impact of our efforts.

Current Goals (FY 2016-17 Q1)

 * Improve search result relevance and enable fixes for multiple relevance issues by switching from tf–idf to BM25.
 * Evaluate and devise plan for using cross-project indices, which would allow users to search across all projects in any given language.

Wikipedia.org portal
Many people discover Wikipedia via https://www.wikipedia.org/ (roughly 1.5-2% of our total page views). The Discovery team is looking at how to improve the user experience for these visitors. Here is an from the Discovery team.

Learn more about the work around the Wikipedia.org portal project.

Current work by this team is tracked on this Phabricator workboard and a listing of upcoming A/B tests can be found here.

Portal Analytics Dashboard - Public dashboard to monitor and analyze the impact of our efforts.

External Search Traffic - External search engines metric that very broadly looks at where our requests are coming from.

Current Goals (FY 2016-17 Q1)

 * Make the www.wikipedia.org portal a jumping off point to explore open content on Wikimedia sites.

Maps
Discovery is about finding and navigating to content, and one way for users to do that is via maps. To provide better maps the team is working to make OpenStreetMap tiles available on all Wikimedia projects. The technical challenge is doing so at a scale sufficient for their widespread usage.

Learn more about the Maps project.

Work by this team is tracked on this Phabricator workboard

Maps Analytics Dashboard - Public dashboard to monitor and analyze the impact of our efforts.

Current Goals (FY 2016-17 Q1)

 * Make  tag available for all wiki projects, and   available everywhere except Wikipedia. Heavy operations participation will be needed to stabilize map server configuration and monitoring.
 * Offer community a way to migrate away from GeoHack by implementing a Special page (for legacy browsers) and extend  with configurable links to other mapping services information tab.

Wikidata Query Service (WDQS)
Searching structured data on Wikidata is also part of Discovery, so we are building the Wikidata query service. It provides a SPARQL API through which tools can access Wikidata.

Learn more about the Wikidata query service.

Current work by this team is tracked on this Phabricator workboard

Weekly deployments of WDQS are documented on Deployments.

WDQS Analytics Dashboard - Public dashboard to monitor and analyze the impact of our efforts.

Current Goals (FY 2016-17 Q1)

 * TBD

Analysis
The analysis group within Discovery manages the Discovery Dashboard, as well as analyzing A/B tests and other data.

Learn more about the Discovery analysis team.

Current work by the analysis team is tracked on this Phabricator workboard

Current Goals (FY 2016-17 Q1)
FDS Plan 2016/2017
 * This quarter, instead of having team-specific goals, the analysis team will be supporting the other team's goals.

APIs
Application Programming Interfaces (APIs) provide developers ways to interact with the MediaWiki software.

API:Search and discovery lists the search APIs available and in development.

API Analytics Dashboard - Public dashboard to monitor and analyze the impact of our efforts.

Other
For any questions about the term "Knowledge Engine" please refer to our FAQ.

You can find all of our data and key performance indicators on our data portal.

The team
Below is a list of sub-teams in the Discovery Department. This section was last updated on 10th August 2016.

Each sub-team lists the names and team roles (not job titles; those are listed in the staff and contractors page, and may or may not be the same as the person's team role) of anyone who spends a not insignificant amount of time on a project; this therefore means that some names are duplicated across teams.

These lists are only intended to roughly convey who is working on what; no guarantees are made that the list is accurate to any particular level of detail. If you have questions, please contact Dan Garry.

Search

 * Erik Bernhardson, Tech Lead, Software Engineer
 * David Causse, Software Engineer
 * Trey Jones, Software Engineer
 * Stas Malyshev, Software Engineer
 * Guillaume Lederrey, Operations Engineer
 * Mikhail Popov, Data Analyst
 * Chelsy Xie, Data Analyst
 * Deb Tankersley, Product Owner

Wikipedia Portal

 * Jan Drewniak, Software Engineer
 * Mikhail Popov, Data Analyst
 * Deb Tankersley, Product Owner

Maps

 * Yuri Astrakhan, Product Owner, Software Engineer
 * Max Seminik, Software Engineer
 * Julien Girault, Software Engineer
 * Guillaume Lederrey, Operations Engineer

Wikidata Query Service

 * Stas Malyshev, Software Engineer
 * Guillaume Lederrey, Operations Engineer

Analysis

 * Mikhail Popov, Data Analyst
 * Deb Tankersley, Product Owner
 * Chelsy Xie, Data Analyst

Cross-team support

 * Katie Horn, Head of Discovery
 * Dan Garry, Product Lead
 * Kevin Smith, Agile Coach
 * Chris Koerner, Community Liaison

Communications
"See Updates below for Discovery weekly status updates"

Mailing lists
Discovery - A public mailing list about Wikimedia Discovery projects. Examples of topics would include:
 * Announcements, including major upcoming initiatives, completed major releases, quarterly or annual plans, requests for feedback or input
 * Technical discussions and brainstorming regarding our work:
 * Search, Elastic, Cirrus, the Relevance Forge, and other relevant subjects
 * The portal and associated work
 * Our dashboards or related analysis
 * Note that there is a separate list for maps (below)
 * Departmental news, such as changes to team structure, significant changes to team process, changes in how we use phabricator or other tools like gerrit

Maps - Discussion and development coordinating the integration of OpenStreetMap and other free map sources into Wikimedia projects.

IRC channels
- for talking all Interactive Wikimedia projects - maps, graphs, etc.

Twitter
https://twitter.com/WMF_Discovery

Meetup groups

 * San Francisco
 * Directly relevant
 * Bay Area NLP (natural language processing, not neuro-linguistic programming)
 * San Francisco text
 * Elasticsearch San Francisco
 * Indirectly related (these sorts of meetup groups attract smart/enthusiastic people who like to spend their free time learning and solving problems)
 * Silicon Valley Java user group
 * San Francisco PHP
 * Bay Area Haskell users group
 * Scala study group
 * SF JavaScript
 * Oakland advanced Scala study group

Upcoming events

 * State of the Map 2016 - Sept 23 - 25, 2016, Brussels (Belgium)
 * Developers Summit - Jan. 9-11, 2017
 * All Hands - Jan. 12-13, 2017
 * Hackathon - May 19 - 20, 2017

Past events

 * Wikimedia Discovery/Past events
 * Wikimania 2016
 * Esino Lario, June 22 - 28
 * State of the Map US 2016 - July 23 - 25, 2016, Seattle (USA)

Weekly Discovery status updates

 * See Discovery/Status updates for the archive of past Discovery updates

Meeting minutes
Wikimedia Discovery/Meetings

Quarterly reviews

 * Q4 2014-15 (2015-07-07)
 * Q1 2015-16 (2015-10-05)
 * Q2 2015-16 (2016-01-21)
 * Q3 2015-16 (2016-04-11)

Data Analysis
The data access and analysis guidelines used by the Discovery team around data sources, or by other teams around Discovery data sources, are documented on Meta.
 * Search Analytics Dashboard
 * Portal Analytics Dashboard
 * Maps Analytics Dashboard
 * Wikidata Query Service Analytics Dashboard
 * API Analytics Dashboard
 * External Traffic Analytics Dashboard

Deployers
Useful reference for who can deploy code. It's nice to know whom to bug if you need something:

Code
Discovery team supports the following code: