Extension:CirrusSearch/Query Construction/Use cases

From mediawiki.org

The question to answer here is: how may an extension augment CirrusSearch to let it search the data (new ContentType/metadata) provided by the extension in an efficient way?

Use cases[edit]

  1. As an extension developer I want to customize everything related to the search query for a specific namespace/query
  2. As an extension developer I don't want to care about searches in other namespaces, I assume CirrusSearch will provide good defaults
  3. As an extension developer I want my search query builders to be used if the namespaces requested are part of the namespaces I support
  4. As an extension developer I don't want another extension to hijack my search query builders
  5. As a client of the search API I want to list/count all pages that match my search query and I sometimes don't really care about the ideal search query builders/results ranking
  6. As a client of the search API I want to be able to do everything that is done in Special:Search
  7. As an extension developer I want to customize the content/information displayed on every search result
  8. As a user of the search UI I want all metadata shown in the results to be clear and obvious about what they relate to
  9. As a user of the search UI I want the default search settings to be the best settings available
  10. As a user of the search UI I want the best settings to be easily available and/or to have messages that indicate that better options are available

Possible solutions[edit]

A dispatch service in CirrusSearch that selects one best set of builders[edit]

This is the first solution attempted in Gerrit change 491815. Cirrus applies a set of routes and keeps the best one.

This solution breaks:

  • Use case 10: when selecting incompatible namespaces provides a sub-optimal search experiences without notice.
  • Use case 9: depends solely on the wiki-configuration, namespaces searched by default must be inline with the profiles available

Pros:

  • Easy to implement
  • No UI changes

Cons:

  • Some use cases not respected

UPDATE: After analyzing one month of data on wikidata we see:

What #searches %
Content (entity search or entity+lexeme) 101,973,336 99.94%
Mixed (entity + classic page search) 24,771 0.02%
classic page search 13,067 0.01%
everything 10,449 0.01%
files 8,812 0.01%

It means that the vast majority of searches use the default settings and that our fear that users could be using improper search settings is not a blocker (the number of searches where use case 10 is broken only account for 0.02%). I think it's fine to pursue the approach described in this solution as it fits the current usecases we serve on WMF wikis.

Cirrus is able to combine multiple query builders in a single search query[edit]

Hypothetical solution where Cirrus would find a way to combine all the best query builders in a single query. The result sets would contain mixed types of result.

This solution breaks:

  • Use case 7 or 8, because we provide different kinds of results in the same set.

Pros:

  • May be able to use optimal ranking in all cases

Cons:

  • Some use cases not respected
  • Extremely hard to implement (combine multiple queries with score normalization issues)

Current solution[edit]

The current solution is to implement a hook that allows an extension to override the current query builder on the fly.

This solution breaks:

  • Use case 4: extensions may compete with each others in a manner that is almost impossible to control (hook registration order)
  • Use case 5: an extension may decide arbitrarily that its namespace is better than the others and exclude them from the search query

Pros:

  • Works for wikidata so far

Cons:

  • Some use cases not respected
  • Prevents further refactoring in Cirrus as it relies on manipulating the SearchContext

A dispatch service that selects multiple best routes with multiple types of search results (e.g. tabs/sections)[edit]

The reasoning here is to say: if it's a limitation of the backend not being able to adapt to the current search UX we should focus on the solution where Cirrus is able to combine everything in a single query and a unified search results list.

or:

We could also see the problem as being tightly coupled with the UX hence this solution.

The root of this solution is to say that when different content types are mixed in a wiki and when the search system has specific tunings for these different content-types (through custom extensions) the UX should be aware of this and not assume that the backend is only able to send a single unified list of results.

If the UI wants to take advantage of the best possible options provided by the search backend it must adapt itself in such a way that the user is aware that multiple ranked lists of search results are available.

Pros:

  • may allow us to support all use cases (assuming changes in the APIs)

Cons:

  • does not fit into current UX

Notes and references[edit]