Wikimedia Discovery/So Many Search Options

December 2016 — See TJones_(WMF)/Notes for other projects.

Background
We’d always prefer providing some sort of search results to users rather than giving no results. And if we can’t give results, helpful suggestions and links are a better alternative than no results at all.

Right now we have, or have in the works, a large enough number of search modifications, extensions, and alternatives that we need to think clearly and carefully about how to order them and how to let them interact.

Current or Near Term Options
A brief summary of the features includes: ''* Search results being “poor” enough to trigger an alternative approach is not defined for all future projects. Common criteria are < 3 results, or no results.''
 * Question mark stripping: ? characters are removed unless that are escaped with a slash \?, because most people use them when asking questions, not as one-character wildcards. This is before searching.
 * ASCII/ICU folding, stemming, case folding, etc.: this happens right before search, and is done by Elastic Search as part of the language analysis step, but it’s worth mentioning explicitly, since we could theoretically use components of this type outside Elastic at some point. Currently, characters are mapped to other characters (lower case, some apostrophe-like marks are converted to apostrophes), words are reduced to their approximate roots (run, running, ran, runs all become run), etc.
 * Inter-wiki / Cross-project searching: On Wikipedias, provide one result from each sister project in the same language, if projects and results are available.
 * Did You Mean (DYM) Spelling suggestions: If search terms don’t look very likely, and another similar term does, provide a clickable link with those changes made. If the original query gave zero results, go ahead and try the suggested search.
 * Quote stripping: If a query has quotes and does poorly,* try the query again without the quotes.
 * Language detection / identification (TextCat / cross-language searching): If a query has fewer than 3 results, do language detection on it. If the language detected is not the “host” language (the language of the current wiki), try to get results from the corresponding project in that language, if it exists, and show any results.
 * Wrong keyboard detection: Using the same technique as language detection, detect when a user has typed a query in one language (e.g., Russian) while using the keyboard of another language (e.g., English), if the query does poorly.* This can be run concurrently with language detection, or separately. If a non-host language is detected, convert the query to the correct keyboard and run again.

Stopping Criteria
Ideally, it would be interesting to run everything and see what gives the best result and show that, but realistically, that’s probably too expensive. So it makes sense to order them carefully and thoughtfully, and consider stopping criteria. Potential stopping criteria include:
 * a certain amount of time has gone by or CPU has been used
 * a certain number of options have been tried (they don’t all have the same initial criteria, so aren’t all eligible to run on every query, and different options could be weighted based on the cost of running them, too)
 * an option achieves “success” (e.g., returns a certain number of results).

Initial Straw Man Proposal
Based on all this, I’m going to propose a straw man for further discussion of both generalities and specifics. Important elements include: ''These are numbered at random as they came to me, and not in any logical order. Sorry.''
 * Order with respect to default search and to each other. Options below are roughly sorted into groups that happen at the same time. Exact sorting is a point of discussion.
 * Initial eligibility criteria: “automatic” always happens; “no previous successful results” is always assumed (see below) except for “automatic” actions; the number of main search results or results from previous options is probably the most common criterion.
 * Marginal cost estimate: start with very rough low/medium/high estimates of the marginal cost of the various options, if activated. The marginal cost of determining initial criteria is presumed to be low.
 * “Success” criteria: here defined as giving good enough results so as to stop processing and trying other alternatives—so while question mark stripping is probably always going to be successful in terms of removing question mark characters, its success criterion is “none” because it will never stop processing. Success criteria could include the number of results, the “quality” of results, and maybe the length of the query (short, one-word queries seem like they could be a different class than very long and/or multi-word queries).
 * Results shown: One way to cut down on UI complexity is to only show the “best” set of results from extra search options, so if stripping quotes gives 1 result, wrong keyboard gives 2 results, and language identification gives 200 results, only the final 200 results would be added to the original main search results (which are likely fewer than 3).

[1] Are interwiki results a reason not to do anything else? Wiktionary in particular can give exact title matches in another language based on a typo. For example, in English flotam is probably a typo for flotsam that gets no results on Wikipedia, but it does get result from Wiktionary (as a transliteration of the dative plural form of Russian флот, “fleet, navy”).

[2] I’m assuming that interwiki search runs at approximately the same time as the main search, since it is automatic.

[3] Technically, the language analysis happens inside Elastic in the main search, but it happens before the query is looked up in the indices, so it’s logically “before”.

[4] Huh, I don’t know as much about DYM suggestions as I should. I’m not sure what kicks it off, or how expensive it is.

[5] I’m not sure about DYM results as a stopping criteria. Since the suggestions are always more frequent words, results are very likely, even if the suggestions aren’t great

[6] “Fewer than 3 results” from the main search has been used to define “poorly performing queries” for language identification, but was arbitrarily chosen. It’s a reasonable initial proposal for initial criterion (< 3) and success (3+), but is readily changed.

[7] I know I’m bad at these, so better estimates are very welcome. XXX-points vs stopping-XXX

[8] Do we want to mix DYM suggestions and/or results with language identification results? Depend on success criteria, but we need a plan for showing (or not showing) suggestions if another option also gives “successful” results.

[9] The idea here is that if a stage has successful results, then we stop and show them. If the stage has some results, but not enough to be “successful”, we hold on to them. If a later stage is successful, show those and discard these. If no later stage is successful, show these.

So given the order here, if stripping quotes gives 1 result, wrong keyboard gives 2 results, and language identification gives 2 results, the single quote-stripping result would be shown. An alternative option would be to show the largest set, with ties broken by order.

[10] I feel like DYM suggestions should come before the others, but don’t have strong feelings about their order.

[11] Wrong keyboard can be run at the same time as language detection (i.e, they can be rolled into one process) if they are happening sequentially.