Wikimedia Discovery/RFC

The purpose of this request for comments is so that we can have broader discussions and participation about where Discovery can head next. We've had a number of people ask us what we're considering after this fiscal year and we'd like this page to allow for broad community discussion about where we can have an impact and where our communities needs us most. The comments on this page will be taken into consideration when the Discovery Department is doing its quarterly and annual planning process.

Incorporating New Data Sources
During 2015 the Discovery team added OSM as a new data source for our editors on WikiVoyage to both display and use maps to surface Wikipedia content. We've seen a very positive response from the base tile set of cities/neighborhoods/countries/etc and recently we've also added transit location like ferries, boats, etc. Going forward we'd like to think about if there are more data sources that we can make available to our community to better surface and add to our articles.

These could include (but in no way are a complete/committed list):
 * Trending articles
 * Public Census Data
 * Page View Data
 * Improving GeoData coverage
 * Archive.org book content
 * Relevantly licensed public content

We can see at least two approaches to this kind of content. The first would involve adding these data sources to our existing elastic search index while the second would surface these data sets similar to OSM and reference them through on wiki tools/extensions/etc. The goal would be on increasing our user engagement by better surfacing our existing articles through new discovery paths: maps, graphs, visuals, etc.

We do want to be very sensitive to not bias our users experiences with any kind of content and allow our communities to help steer this.

Open Questions

 * What data can we use to better surface our articles?
 * Should it be included in our elastic search index and/or a separate service?

Public Curation of Relevance
Currently all relevance calculations are done as a black box through elastic search. We'd like to explore a relevance model where WikiData could be used as a component of our relevance calculations. This would not only leverage the high quality data in WikiData but could empower our communities to affect relevance calculations rather than letting algorithms do all the work. As with any system that allows user contributions we would have to be very sensitive and cognizant of anyone gaming the system.

Open Questions

 * Could WikiData handle an exponential increase in user queries as a relevance backend for our projects?
 * How would we build the feedback loop from adding an item in WikiData to having it affect search results?
 * How would this be different then Google's Customer Search?

Improving existing multi lingual / project search
During 2015 Q1 the Discovery team started exploring our zero results rate and found that between 5-10% for zero results for EN Wiki were due to non-english term searches which had matches on other wikis. We've started exploring what it would take to search across these languages and in turn also across projects.

Open Questions

 * If we have results from multiple projects how do we rank them?