Wikimedia Discovery/FAQ

Translation admins: Do not mark for translation. Cleanup needed.

What is this grant for?
The Knight Foundation has awarded the Wikimedia Foundation an exploratory grant to research and evaluate ways to measure and improve search results on Wikimedia projects.

This is a restricted grant, and the funds may only be used by the Discovery team for the deliverables specified in the grant.

This grant does not increase the team's budget for this fiscal year.

Links:


 * Joint press release - Knight Foundation web site, 2016-01-06
 * "Exploring how people discover knowledge on Wikipedia and its sister projects" - Knight Foundation blog, 2016-01-06
 * "Wikimedia Foundation to explore new ways to search and discover reliable, relevant, free information with $250,000 from Knight Foundation" - Wikimedia Foundation blog, 2016-01-06
 * Grant announcement - Knight Foundation web site, date unknown, identifying the grant period as running from 2015-09-01 to 2016-08-31

Who is the Knight Foundation?
The Knight Foundation is a philanthropic organization dedicated to supporting media, journalism and fostering communities and the arts.

As part of their efforts, they have previously given grants to the Wikimedia Foundation to support Wikipedia Zero.

What are the deliverables?

 * User testing and research on current user flows to understand the search and discovery experience
 * Creation and maintenance of a dashboard of core metrics to use in product development
 * Research on search relevancy and the possibility of integrating open data sources
 * Open discussion with the Wikimedia community of volunteer editors
 * Creation of sample prototypes to showcase discovery possibilities

Overall: conducting research around search and prototyping for a more effective search experience.

What are the problems you are trying to solve?
People often use external search services to find Wikimedia and Wikipedia content.

The problem is that people tend to go back to to the external service for additional searches, even though they're already on Wikipedia.

The Discovery Department wants to create a better experience for the users of the sites by creating more accessible searching and discovery mechanisms than are presently available on our sites.

There is also an opportunity to explore surfacing information from sister projects to enhance the discovery of that knowledge for projects that have less visibility.

Lastly, the team is providing a foundation for product development that is data driven as well as user driven to iterate to the useful services and features our users desire and need.

What are you trying to understand?
The Discovery Department tracks four core metrics (also known as key performance indicators) for search:


 * 1) Zero results rate for search. If users receive no results, it means we've not been able to help find what they're looking for, so we measure the zero results rate.


 * 1) User engagement with search results. If users do not click on results, then we haven't given them the results they wanted.


 * 1) Search latency. The faster our search works, the better.


 * 1) API use. It's important that apps and third parties can search our site too.

You can see the full range of metrics that we track on the [http://discovery.wmflabs.org/metrics/ Discovery Department's search dashboard].

What happens afterwards?
The team and users will post ideas for deliberation, and will collectively come up with proposals.


 * rfc>Wikimedia Discovery/RFC|Wikimedia Discovery/RFC

How will this affect other products that the Wikimedia Foundation is developing?
The research carried out will help bring more understanding to search and discovery mechanisms across all platforms, and user flows from readers to editors and will inform decisions made on how to improve those mechanisms on desktop, mobile web, and mobile apps, as well as in specific products like VisualEditor.

We also are exploring API usage, best practices, mix of content from inter-wiki projects like Wiktionary, Wikivoyage, Wikimedia Commons and more, and utilization of open data sources like OpenStreetMap to expand contextual knowledge discovery.

We will, of course, be publishing our research, so that it may be read and taken into account by the broader movement and other interested parties.

What is the Knowledge Engine?
"Knowledge Engine" was an early term used to describe a number of initiatives that related to search and discovery of content across Wikimedia projects. It was referenced by the Knight Foundation under ([http://www.knightfoundation.org/grants/201551260/ "What we fund / Journalism / Knowledge Engine By Wikipedia"]), and stated: "To advance new models for finding information by supporting stage one development of the Knowledge Engine by Wikipedia, a system for discovering reliable and trustworthy public information on the Internet."

There were requests to publish the details of the grant and Lila Tretikov [https://meta.wikimedia.org/w/index.php?title=User_talk:LilaTretikov_(WMF)&oldid=15294374#Knowledge_Engine_grant shared her thoughts and the grants activities and outcomes]. The Knight Foundation agreed to the publication of [https://upload.wikimedia.org/wikipedia/foundation/a/a7/Knowledge_engine_grant_agreement.pdf the grant application] in February 2016.

As the concept evolved, it included a wide variety of ideas, many of which ended up being discarded. Although the original grant referred to "Knowledge Engine by Wikipedia" as "the Internet's first transparent search engine," the term "Knowledge Engine" was also used to easily refer to what the Discovery team was focusing on rather than a new project. The Wikimedia Foundation has stopped using the term as of Q3 FY15-16 as it caused confusion.

Regardless of what "Knowledge Engine" may have meant to different people at different times in the past, this page reflects the current thinking and plans, as understood by the Discovery department.

Are you building a search engine to replace Google?
We are not building a general-purpose search engine to replace Google. We are improving the existing wt-search>wikitech:Search|CirrusSearch infrastructure with better relevance, q2-goals>Wikimedia_Engineering/2015-16_Q2_Goals#Search|multi language, q3-goals>Wikimedia_Engineering/2015-16_Q3_Goals#Search|multi projects search and incorporating new [https://maps.wikimedia.org data sources] for our projects. We want a relevant and consistent experience for users across searches for both wikipedia.org and our project sites. Looking farther forward, we will explore including other sources of open knowledge. We remain fully committed to the movement's vision and values.

Could search results in Wikipedia include more information from its sister projects?
The Wikimedia movement's vision is to make the sum of all human knowledge freely available to everyone. Wikipedia is our largest and most well-known project, but there are many other projects like Wikimedia Commons and Wikidata that move us towards making our vision happen. These projects have millions of users every month! So, can we make a search system that's good and meets the needs of our users and show content from around our movement? We think people would use it. As a result it would bring more attention to the great work in projects across the movement by using one of the larger, more well-known places people visit.

If you're adding new data sources, isn't that a search engine?
The goal is to expand the amount of knowledge and expand the context beyond just textual search. We want to begin by showcasing content from other wiki projects including appropriate languages based on query input. The data could be used to potentially evolve and improve the quality of our existing search experience.

Our first new data source outside of Wikimedia projects is OpenStreetMap data for Maps which our wv-maps>wikivoyage:Wikivoyage:Travellers%27_pub#Announcing_the_launch_of_Maps</>|Wikivoyage community is already starting to experiment with. There are other data sets that we could potentially surface (census, national gallery, etc) but that will be up to our communities to decide. Some of these could certainly show up in search results and we have Phabricator tasks around improving GeoData content phab-ticket>phab:T112026</>|T112026.

Is the Wikimedia Foundation looking into replacing editors with robots?
No. We think technologies like machine learning and similar tools can help with aggregation of all the rich content humans have created across our projects. Like the work our colleagues have done with ORES in improving the quality of article content.

At no part are we trying to replace or subvert the work of our human editors. We want to figure out smarter ways to return search results that answer visitors questions - even when those searches currently result in zero results. Imagine in the future searching for something we don't have an article for in a particular language Wikipedia - but we do have books in Wikisource, or quotes in Wikiquotes, or photos in Commons. Wouldn't it be great to have a link to those items in search results instead of nothing?

What licenses will those new data sources be under?
This will need more discussion as we want to be able to conform to the standards and policies of the Wikimedia projects they would need to serve. Our first exploration was with phab-ticket>phab:T105090</>|OSM licensing and legal and we'll want to learn from that in any further work.

Does that mean we are looking to shift search traffic away from third parties?
No. We love all the [<tvar|ext-traffic>http://discovery.wmflabs.org/external/</> third party traffic] that we get and hope that it increases over time. What we are trying to focus on is providing a search experience that doesn't look like this:


 * 1) Search on Google, Bing, etc
 * 2) Follow Wikipedia Link
 * 3) Read
 * 4) Leave and search Google, Bing, etc again because you are specifically looking for a Wikipedia article but couldn't find it using CirrusSearch

What does your overall strategy look like ?



 * Year 0 - Look inward and improve the search experience across our projects
 * Year 1 - Look outward and see if we can incorporate new data streams and public curation models for relevance

What does year 0 include ?
We call year 0 Discovery because we are focused on learning and understanding user pathways and appreciation for other knowledge sources.
 * 1516-goals>Wikimedia Engineering/2015-16 Goals</>|2015-16 Goals

What does year 1 include ?
Potential ideas that we need your feedback on:


 * Identify pathways for the community to improve relevance via Wikidata
 * Actively highlight difficult to find knowledge and empower the ability to surface it in search, reading and editing flows
 * Research open sources of knowledge to continually strengthen the legitimacy of our content through curation by humans and machines

This feels like a huge long term project. Is it?
Our users interested in search request a lot of improvements: phab1>phab:T87632</>|inter wiki, phab2>phab:T104984</>|multi-lingual, media search phab3>phab:T95223</>|[1]phab2>phab:T104565</>|[2], improving UX, improving search relevance, and others. The Discovery Department aims to improve search in these areas, and that will take a lot of time! During this process, we will continually re-evaluate our plans on a quarterly and annual level to assess our impact and hold ourselves to the same standards as any other team at the Wikimedia Foundation.

How does this align with strategy?

 * Relevancy, accuracy and trustworthy ratings on index entities
 * Extended context to geospatial, temporal, multimedia and relational paths of knowledge
 * Display Inter-wiki projects (internal) and potentially open data sources
 * Mobile, voice, and modern consistent interface opportunity
 * Multiple-lingual and global respective experiences and results

See also: 

Is all your work open source?
Yes. All of our code is contained in public repositories, and falls under the copyright>Copyright</>|same licensing as MediaWiki.

How do you know if we are succeeding for our users?

 * Qualitative
 * Surveys, mailing-list>mail:discovery</>|discovery@ mailing list, talk>Talk:Wikimedia Discovery</>|talk pages
 * Quantitative
 * <tvar|dashboard-url>http://discovery.wmflabs.org/</>

Will there be any element of human curation?

 * We'd like to explore this and need your help on our rfc>Wikimedia_Discovery/RFC#Public_Curation_of_Relevance</>|RFC to think through how to do it right

I want to help! How can I get involved?
We'd love the input of anyone who wants to join us in building and improving search. Here's how you can do that:


 * Join our mailing-list>mail:discovery</>|mailing list
 * Post on our talk>Talk:Wikimedia Discovery</>|talk page
 * Review what we're discovery>Discovery</>|working on and phab>phab:tag/discovery</>|create tasks for us