Wikimedia Discovery

The Discovery department of Wikimedia Engineering is building the anonymous path of discovery to a trusted and relevant source of knowledge. We have a number of projects detailed below that focus us on creating and supporting new forms of discovery for the users of the Wikimedia wikis. For any questions about the term "Knowledge Engine" please refer to our FAQ.

You can find all of our data and key performance indicators on our data portal.

Come help us decide what we work on in the future with our request for comments.

Search
Discovery is responsible for maintaining and enhancing the various Search features and APIs for MediaWiki. This includes the CirrusSearch extension which relies on Elasticsearch the search backend used at the Wikimedia Foundation to support Wikimedia projects.

Current work by this team is tracked on this Phabricator workboard.

Current Goals (FY 2015-16 Q3)

 * Improve relevance of intra-wiki search results (see a small demo comparing the new suggester to the current)
 * Generate a model for user satisfaction with search results based on qualitatively-validated quantitative data.

Wikipedia.org portal
Many people discover Wikipedia via https://www.wikipedia.org/ (roughly 1.5-2% of our total page views). The Discovery team is looking at how to improve the user experience for these visitors. Here is an from the Discovery team.

Learn more about the Wikipedia.org Portal Improvements project.

Current work by this team is tracked on this Phabricator workboard.

Current Goals (FY 2015-16 Q3)

 * Make www.wikipedia.org a portal for exploring open content on Wikimedia sites.

Maps
Discovery is about finding and navigating to content, and one way for users to do that is via maps. To provide better maps the team is working to make OpenStreetMap tiles available on all Wikimedia projects. The technical challenge is doing so at a scale sufficient for their widespread usage.

Learn more about the Maps project.

Work by this team is tracked on this Phabricator workboard

Current Goals (FY 2015-16 Q3)

 * Improve the content discovery experience on Wikivoyage by rolling out maps to all Wikivoyages.

Wikidata Query Service (WDQS)
Searching structured data on Wikidata is also part of Discovery, so we are building the Wikidata query service. It provides a SPARQL API through which tools can access Wikidata.

Learn more about the Wikidata query service.

Current work by this team is tracked on this Phabricator workboard

Current Goals (FY 2015-16 Q3)

 * Support ongoing stability and add ability to support geoocordinate functionality by upgrading Wikidata Query Service to Blazegraph 2.0.

Analysis
The analysis group within Discovery manages the Discovery Dashboard, as well as analyzing A/B tests and other data.

Current work by the analysis team is tracked on this Phabricator workboard

Current Goals (FY 2015-16 Q3)
FDS Plan 2016/2017
 * This quarter, instead of having team-specific goals, the analysis team will be supporting the other team's goals this quarter.
 * Addednum draft for our FDC submission

APIs
Application Programming Interfaces (APIs) provide developers ways to interact with the MediaWiki software.

API:Search and discovery lists the search APIs available and in development.

The team
Below is a list of sub-teams in the Discovery Department. This section was last updated on 24th March 2016.

Each sub-team lists the names and team roles (not job titles; those are listed in the staff and contractors page, and may or may not be the same as the person's team role) of anyone who spends a not insignificant amount of time on a project; this therefore means that some names are duplicated across teams.

These lists are only intended to roughly convey who is working on what; no guarantees are made that the list is accurate to any particular level of detail. If you have questions, please contact Dan Garry.

Search

 * Erik Bernhardson, Tech Lead
 * David Causse, Software Engineer
 * Trey Jones, Software Engineer
 * Stas Malyshev, Software Engineer
 * Guillaume Lederrey, Operations Engineer
 * Mikhail Popov, Data Analyst
 * Dan Garry, Product Owner

Wikipedia Portal

 * Jan Drewniak, Software Engineer
 * Moiz Syed, User Experience Designer
 * Mikhail Popov, Data Analyst
 * Deb Tankersley, Product Owner

Maps

 * Yuri Astrakhan, Software Engineer
 * Max Seminik, Software Engineer
 * Julien Girault, Software Engineer

Wikidata Query Service

 * Stas Malyshev, Software Engineer

Analysis

 * Mikhail Popov, Data Analyst
 * Dan Garry, Product Owner

Cross-team support

 * Tomasz Finc, Head of Discovery
 * Moiz Syed, Design Lead
 * Dan Garry, Product Lead
 * Kevin Smith, Agile Coach
 * Chris Koerner, Community Liaison

Mailing lists
Discovery - A public mailing list about Wikimedia Discovery projects. Examples of topics would include:
 * Announcements, including major upcoming initiatives, completed major releases, quarterly or annual plans, requests for feedback or input
 * Technical discussions and brainstorming regarding our work:
 * Search, Elastic, Cirrus, the Relevance Forge, and other relevant subjects
 * The portal and associated work
 * Our dashboards or related analysis
 * Note that there is a separate list for maps (below)
 * Departmental news, such as changes to team structure, significant changes to team process, changes in how we use phabricator or other tools like gerrit

Maps - Discussion and development coordinating the integration of OpenStreetMap and other free map sources into Wikimedia projects.

Twitter
https://twitter.com/WMF_Discovery

Meetup groups

 * San Francisco
 * Directly relevant
 * Bay Area NLP (natural language processing, not neuro-linguistic programming)
 * San Francisco text
 * Elasticsearch San Francisco
 * Indirectly related (these sorts of meetup groups attract smart/enthusiastic people who like to spend their free time learning and solving problems)
 * Silicon Valley Java user group
 * San Francisco PHP
 * Bay Area Haskell users group
 * Scala study group
 * SF JavaScript
 * Oakland advanced Scala study group

Upcoming events

 * Wikimedia Hackathon 2016
 * Jerusalem, March 31 to April 3
 * Wikimania 2016
 * Esino Lario, June 22 - 28

Past events

 * Elastic{ON} 2016
 * San Francisco February 17-19, 2016
 * "the largest gathering of Elasticsearch, Logstash, and Kibana expertise anywhere in the world."
 * https://www.elastic.co/elasticon
 * Lightning Talks - February 2016
 * Visualizing Wikidata - Video on Youtube (Yuri's section starts at 49:47)
 * Discovery Days (team gathering)
 * San Francisco January 11-12, 2016
 * Wikimedia Developer Summit 2016
 * San Francisco January 4-6, 2016
 * https://www.mediawiki.org/wiki/Wikimedia_Developer_Summit_2016
 * WMF All Hands 2016
 * San Francisco January 7-8, 2016
 * https://office.wikimedia.org/wiki/All_hands/2016
 * Hackathon 2015
 * Lyon, France, May.
 * OpenAir 2015
 * June 4 in San Francisco.
 * https://openair2015.com/
 * "OpenAir is the premier conference that focuses on creating engineering solutions to the challenges of matching. The brightest minds in the industry will come together to tackle such issues as search and discovery, trust, internationalization, mobile, identity, and infrastructure."
 * State of the Map US 2015
 * June 6-8 in New York.
 * An annual conference for all OpenStreetMap users. http://stateofthemap.us/
 * Yuri & Max attended
 * Wikimania 2015 (July 15-19 in Mexico City)
 * Presentation (video) "Are we failing our users when they search Wikipedia?" by Dan and Moiz
 * http://wikimania2015.wikimedia.org/wiki/Main_Page
 * Smart Data Conference 2015 (August 18-20 in San Jose, CA)
 * http://smartdata2015.dataversity.net/
 * Presentation (hosted at blazegraph.com): "The Wikidata Query Service - A Knowledge Graph Application Powered by Blazegraph"
 * Gerrit Cleanup Day
 * Wednesday 2015-09-23
 * Plans here: Discovery plans for gerrit cleanup day 2015
 * WikiConference USA
 * Washington DC, October 9-11, 2015
 * http://wikiconferenceusa.org/wiki/2015/Main_Page
 * At least one team member will attend. No presentations planned.
 * Discovery offsite 2015
 * Sept 30-Oct 2, Cocoa Beach FL USA
 * 5th DBpedia Community in California 2015
 * Palo Alto November 5th
 * "DBpedia is a crowd-sourced community effort to extract structured information from Wikipedia and make this information available on the Web."
 * http://wiki.dbpedia.org/meetings/California2015

Elasticsearch cluster

 * How Elasticsearch breaks Part 1 Part 2
 * Notes on unbreaking and optimizing elasticsearch

Wikipedia.org Portal Page

 * Analysis: Results of the first A/B Portal Test
 * Meeting: Rethinking portal planning 2016-02-03
 * Analysis: Results of removing inline JavaScript on wikipedia.org portal page
 * Meeting: Wikipedia Portal and JavaScript Usage - review meeting notes 2016-02-11


 * Presentation of Analysis: Browsers Geography and JavaScript Support on Wikipedia Portal (PDF)
 * Analysis: Referrers of the10% of traffic to Wikipedia Portal that is referred by something other than a search engine
 * Analysis: Assessment of Portal update and its impact on search rate post-deployment
 * Analysis: Clickthrough rates, section usage, and language preferences of Portal visitors

Maps

 * State of Wikimedia Maps - 2016-02-01

Meeting minutes
Wikimedia Discovery/Meetings

Quarterly reviews

 * Q4 2014-15 (2015-07-07)
 * Q1 2015-16 (2015-10-05)

Data Access Guidelines
The data access and analysis guidelines used by the Discovery team around data sources, or by other teams around Discovery data sources, are documented on Meta.

Deployers
Useful reference for who can deploy code. Its nice to know whom to bug if you need something: