Wikimedia Discovery/RFC

From mediawiki.org

The purpose of this request for comments is to have broader discussions and participation about what the Discovery Department can work on next. We have had a number of people ask us what we are considering after this fiscal year and we would like to use this page to encourage broad community discussion about where we can have the greatest impact, and where the community needs us most. The comments on this page will be taken into consideration when the Discovery Department undertakes its quarterly and annual planning process.

Incorporating New Data Sources[edit]

During 2015, the Discovery team added OpenStreetMap (OSM) as a new data source for Wikivoyage editors to display and use maps to show Wikipedia content. We have seen a very positive response from the base tile set of cities/neighborhoods/countries/etc and recently we have also added transit locations (ferries, boats, etc). In the future we hope to identify more data sources that we can make available to the community.

These could include (though this is not a complete/committed list):

  • "Trending" articles
  • Public Census Data
  • Page View Data
  • Improving GeoData coverage
  • Archive.org book content
  • Appropriately-licensed public content

We can see at least two approaches to this kind of content. The first would involve adding these data sources to our existing Elasticsearch index, while the second would display these data sets in a similar manner to OSM. They could then be accessed through on-wiki tools/extensions/etc. The goal would be to increase our user-engagement by improving the discoverability of existing articles through new methods: maps, graphs, visualizations, etc.

We do want to be very sensitive to not bias our users' experiences with any kind of content and allow our communities to help steer this.

Open Questions[edit]

  • What types of data can we use to improve the discoverability of articles?
  • Should that data be included in the Elasticsearch index and/or as a separate service?

Public Curation of Relevance[edit]

Currently all "relevance" calculations are done as a black box through Elasticsearch. We would like to explore a model where Wikidata could be used as a component of our relevance calculations. This would not only take advantage of the high quality information in Wikidata but could empower our communities to affect relevance calculations rather than letting algorithms do all the work. As with any system that allows user contributions we would have to be cognizant of potential methods for abusing the system.

Open Questions[edit]

  • Could Wikidata handle an exponential increase in user-queries as a relevance backend for our projects?
  • How would we build the feedback loop from adding an item in Wikidata to having it affect search results?
  • How would this be different from Google's Custom Search?

Improving existing multi lingual / project search[edit]

During 2015 Q1 the Discovery team started exploring our "zero results" rate. We found that between 5-10% of zero results for English Wikipedia were due to non-English term searches that did have matches on other wikis. We have started exploring what it would require to search across languages and also across sister-projects.

Open Questions[edit]

  • If we have results from multiple projects (e.g., English Wikipedia, French Wikipedia, and Italian Wiktionary) how do we rank them?