Wikimedia Discovery/FAQ

What is the Knowledge Engine?
"Knowledge Engine" (KE) was an early term used to describe a number of initiatives that related to search and discovery of content. It was/is not a product and instead was meant to easily reference what the Discovery team was focusing on. The Wikimedia Foundation has stopped using the term (when?) as it caused confusion. It is the title of a page the Knight Foundation posted about its grant to the Wikimedia Foundation ("What we fund / Journalism / Knowledge Engine By Wikipedia"), which states: "To advance new models for finding information by supporting stage one development of the Knowledge Engine by Wikipedia, a system for discovering reliable and trustworthy public information on the Internet." The actual text of the grant from the Knight Foundation has not been released publicly as of January 8, 2015, though many -- including Trustee Jimmy Wales -- have opined that it would be worthwhile to release it.

As the KE concept evolved, it included a wide variety of ideas, many of which ended up being discarded. Regardless of what the KE may have meant to different people at different times in the past, this page reflects the current thinking and plans that have replaced the KE.

Are you building Google?
We are not building Google. We are improving the existing CirrusSearch infrastructure with better relevance, multi language, multi projects search and incorporating new data sources for our projects. We want a relevant and consistent experience for users across searches for both wikipedia.org and our project sites.

If you're adding new data sources, isn't that a search engine?
Yes, the data could be used to potentially evolve and improve the quality of our existing search experience. Our first new data source is OpenStreetMap data for Maps which our Wikivoyage community is already starting to experiment with. There are other data sets that we could potentially surface (census, national gallery, etc) but that will be up to our communities to decide. Some of these could certainly show up in search results and we have Phabricator tasks around improving GeoData content T112026. The goal is to expand the amount of knowledge and expand the context beyond just textual search. We want to begin by showcasing content from other wiki projects including appropriate languages based on query input.

What licenses will those new data sources be under?
This will need more discussion as we want to be able to conform to the standards and policies of the Wikimedia projects they would need to serve. Our first exploration was with OSM licensing and legal and we'll want to learn from that in any further work.

Does that mean we are looking to shift search traffic away from third parties?
No. We love all the third party traffic that we get and hope that it increases over time. What we are trying to focus on is providing a search experience that doesn't look like this:
 * 1) Search on Google, Bing, etc
 * 2) Follow Wikipedia Link
 * 3) Read
 * 4) Leave and search Google, Bing, etc again because you are specifically looking for a Wikipedia article but couldn't find it using CirrusSearch

What does your overall strategy look like ?

 * Year 0 - Look inward and improve the search experience across our projects
 * Year 1 - Look outward and see if we can incorporate new data streams and public curation models for relevance

What does year 0 include ?
We call year 0 Discovery because we are focused on learning and understanding user pathways and appreciation for other knowledge sources.
 * 2015-16 Goals

What does year 1 include ?
Potential ideas that we need your feedback on:
 * Identify pathways for the community to improve relevance via Wikidata
 * Actively highlight difficult to find knowledge and empower the ability to surface it in search, reading and editing flows
 * Research open sources of knowledge to continually strengthen the legitimacy of our content through curation by humans and machines

How does this align with strategy?

 * Relevancy, accuracy and trustworthy ratings on index entities
 * Extended context to geospatial, temporal, multimedia and relational paths of knowledge
 * Display Inter-wiki projects (internal) and potentially open data sources
 * Mobile, voice, and modern consistent interface opportunity
 * Multiple-lingual and global respective experiences and results

How do you know if we are succeeding for our users?

 * Qualitative
 * Surveys, discovery@ mailing list, talk pages
 * Quantitative
 * http://discovery.wmflabs.org/

Will there by any element of human curation?

 * We'd like to explore this and need your help on our RFC to think through how to do it right

How can I help?

 * Join our mailing list
 * Post on our talk page
 * Review what we're working on and create tasks for us