The Wikipedia Library/Search

The Wikipedia Library is integrating a search tool into the Library Card platform to enable users to search across the library's collections from one place. This project page summarises this work and will provide updates as it progresses.

Please leave comments, feedback, and questions on the talk page. Check out for topics we're looking for your thoughts on.

Background
Users of The Wikipedia Library have access to content from more than 60 publishers, most holding numerous different collections of content. Available content totals more than 100,000 periodicals, comprising countless individual sources that editors may wish to access, in addition to books, data, and other sources.

In its current form, the Library directs users to each publisher’s website individually, where they can then use the unique search and discovery capabilities of that website to search across their content. This presents a number of challenges to users. They ideally need to know which publishers have the content they want to access before searching, and must navigate a new website interface for every publisher they access. Advanced searches or filters, such as date ranges, need to be re-entered on each website. We therefore require users to have a high level of research literacy to identify publishers with relevant content and to then potentially spend a long time searching before finding the right information. This leads to frustration and confusion.

We want to provide editors with an easy way to search across all of their available collections from a single location, removing the need to visit individual websites and allowing cross-cutting filtering. We will present Library Bundle content (for which users simply need to meet an automatically verified activity threshold) as the default results, ensuring that users can navigate the results with confidence. We will also index free-to-read content and provide links to open access versions where possible.

Building a cross-publisher search platform is well out of scope for our team, and is a problem that other organisations are already solving. Major search products are already being used by libraries around the world, including Primo, WorldCat, and EBSCO Discovery Service. These products provide fully fledged search platforms and index collections from publishers, keeping them up-to-date for libraries.

Previous discussions
Users have raised issues with the current workflows a number of times. Some relevant discussions and quotes:


 * "sometimes I can guess which collections are likely to have what I'm looking for. But quite often I have to step through many of them one by one in the hope of finding what I'm after. What would be very nice is to be able to search all the Library Card collections from a single point, like a meta-search facility."
 * https://en.wikipedia.org/wiki/Wikipedia_talk:The_Wikipedia_Library/A%E2%80%93Z
 * https://meta.wikimedia.org/wiki/Talk:The_Wikipedia_Library#'Search_partners'
 * https://de.wikipedia.org/wiki/Benutzer_Diskussion:Martin_Rulsch_(WMDE)#Wikipedia_Library_Nachklapp

EBSCO Discovery Service
We have a hosted instance of EBSCO Discovery Service (EDS) - a library holdings search platform - for this project. The platform was chosen for three primary reasons: our ongoing good relationship with EBSCO, a high level of customisability, and an interface with a substantial number of translated languages (~30 at the time of writing).

EDS can be configured to index content The Wikipedia Library has access to through its partnerships, and these databases are kept up to date by EBSCO, meaning we only need to flag the collections we want to index, and their contents will be updated automatically. EDS has a wide range of configurable settings for the interface presented to users. Most importantly, we can add additional Javascript and CSS to the interface to customise the user experience in more detail.

EBSCO also makes available a range of EBSCO Apps - technical solutions that have been developed to support specific workflows and use cases and are made available to all libraries. So far we have installed the following apps:


 * Unpaywall, which adds links to open access versions of results.
 * Zotero, which allows users to save a result to the bibliographic management software Zotero.

User stories

 * As a Library Card user, I want to search for content from all my collections in one place so I can find the right sources faster
 * As a Library Card user, I want to browse content from each collection I have access to in the same place so that I don’t need to learn and use multiple interfaces
 * As an experienced researcher, I want flexible filtering and advanced search options so that I can find the most suitable content
 * As a Library Card user with little research experience, I want guidance on how to use the interface to enable effective research.
 * As a Library Card user, I want to see open access links so that I can add free-to-read links to Wikipedia articles
 * As a Wikipedia editor, I want to browse content available through The Wikipedia Library so that I can identify collections to apply for

Open questions

 * How do you use the library currently? Do you explore collections you're unfamiliar with, or stick to the ones you know?
 * Would you use a cross-publisher search tool instead of going to specific publishers' websites?
 * Do the user stories above capture what you want from this feature? Is anything missing?
 * Are there other EBSCO Apps you would find useful?

Designs
EBSCO Discovery Service comes with an out-of-the-box design which we would like to further customise. Many interface elements may not be needed, or could be confusing, and we want to ensure the design is consistent with the Library Card platform.

Design iterations will be posted here as we work on them.

Implementation
Technical integration details are tracked at https://phabricator.wikimedia.org/T240128 and its subtasks.

We will only be indexing content from the Library Bundle in the default view presented to users. This totals more than 60% of our content across ~25 collections and we're looking to expand this further over time. While we would ideally index all of our content, we feel that this would lead to a confusing user experience, where users can't easily understand which results they do or don't have access to. Additionally, some content would be accessed directly via authentication-based access and others via some other publisher-specific method.

Users will have an option to browse all TWL content indexed in EDS, but individual results will not - at least in the initial deployment - highlight whether that content is accessible or not. This feature would be technically complex so we will evaluate demand for it post-deployment.

Updates
We carried out substantial groundwork in advance of this project page, so the following timeline starts after we identified EDS as our search platform but before we began design and implementation in earnest.

Week starting...

11th January 2021

 * We finished this project page!
 * We're working on finalising the EDS configuration (T269932), which includes speaking to publishers about exactly which collections they've provided to us so that search results are accurate.