Wikimedia Discovery/FAQ/en

What is the Knowledge Engine?
"Knowledge Engine" was an early term used to describe a number of initiatives that related to search and discovery of content across Wikimedia projects. It was referenced by the Knight Foundation under ("What we fund / Journalism / Knowledge Engine By Wikipedia"), and stated: "To advance new models for finding information by supporting stage one development of the Knowledge Engine by Wikipedia, a system for discovering reliable and trustworthy public information on the Internet."

As the concept evolved, it included a wide variety of ideas, many of which ended up being discarded. Although the original grant referred to "Knowledge Engine by Wikipedia" as "the Internet's first transparent search engine," the term "Knowledge Engine" was also used to easily refer to what the Discovery team was focusing on rather than a new project. The Wikimedia Foundation has stopped using the term as of Q3 FY15-16 as it caused confusion.

Regardless of what "Knowledge Engine" may have meant to different people at different times in the past, this page reflects the current thinking and plans, as understood by the Discovery department.

Are you building a search engine to compete with Google?
No, as we said when this FAQ was created in early November 2015, we are not creating a general internet search engine. The team has been tasked with improving a search function for Wikimedia sites.

Could search results in Wikipedia include more information from its sister projects?
The Wikimedia movement's vision is to make the sum of all human knowledge freely available to everyone. Wikipedia is our largest and most well-known project, but there are many other projects like Wikimedia Commons and Wikidata that move us towards making our vision happen. These projects have millions of users every month! So, can we make a search system that's good and meets the needs of our users and show content from around our movement? We think people would use it. As a result it would bring more attention to the great work in projects across the movement by using one of the larger, more well-known places people visit.

If you're adding new data sources, isn't that a search engine?
If you define "search engine" as including a web crawler that indexes the whole web, which is the most common definition, no.

What licenses will those new data sources be under?
This will need more discussion as we want to be able to conform to the standards and policies of the Wikimedia projects they would need to serve. Our first exploration was with OSM licensing and legal and we'll want to learn from that in any further work.

Does that mean we are looking to shift search traffic away from third parties?
We love all the third party traffic that we get and we hope that it increases over time.


 * 1) Search on Google, Bing, etc
 * 2) Follow Wikipedia Link
 * 3) Read
 * 4) Leave and search Google, Bing, etc again because you are specifically looking for a Wikipedia article but couldn't find it using CirrusSearch

What does your overall strategy look like ?



 * Year 0 - Look inward and improve the search experience across our projects
 * Year 1 - Look outward and see if we can incorporate new data streams and public curation models for relevance

What does year 0 include ?
We call year 0 Discovery because we are focused on learning and understanding user pathways and appreciation for other knowledge sources.
 * 2015-16 Goals

What does year 1 include ?
Potential ideas that we need your feedback on:


 * Identify pathways for the community to improve relevance via Wikidata
 * Actively highlight difficult to find knowledge and empower the ability to surface it in search, reading and editing flows
 * Research open sources of knowledge to continually strengthen the legitimacy of our content through curation by humans and machines

This feels like a huge long term project. Is it?
Our users interested in search request a lot of improvements: inter wiki, multi-lingual, media search [1][2], improving UX, improving search relevance, and others. The Discovery Department aims to improve search in these areas, and that will take a lot of time! During this process, we will continually re-evaluate our plans on a quarterly and annual level to assess our impact and hold ourselves to the same standards as any other team at the Wikimedia Foundation.

How does this align with strategy?

 * Relevancy, accuracy and trustworthy ratings on index entities
 * Extended context to geospatial, temporal, multimedia and relational paths of knowledge
 * Display Inter-wiki projects (internal) and potentially open data sources
 * Mobile, voice, and modern consistent interface opportunity
 * Multiple-lingual and global respective experiences and results

See also: 

Is all your work open source?
Yes. All of our code is contained in public repositories, and falls under the same licensing as MediaWiki.

How do you know if we are succeeding for our users?

 * Qualitative
 * Surveys, discovery@ mailing list, talk pages
 * Quantitative
 * http://discovery.wmflabs.org/

Will there be any element of human curation?

 * We'd like to explore this and need your help on our RFC to think through how to do it right

I want to help! How can I get involved?
We'd love the input of anyone who wants to join us in building and improving search. Here's how you can do that:


 * Join our mailing list
 * Post on our talk page
 * Review what we're working on and create tasks for us