User:TJones (WMF)/Notes/Phrase Slop Pre-Test

Introduction
We are continuing with our A/B testing of potential ElasticSearch improvements, focusing this time on "phrase slop", which is a parameter that allows query phrases, like "quick fox", to match phrases in documents that have one or more extra words in them, like "quick brown fox". A slop of one allows one extra word to intervene, a slop of 50 allows 50 extra words to intervene.

For more on phrase slop, see the ElasticSearch documentation.

Data Sample
I took a random sub-sample of one day's worth (~8 am to ~8 am starting on 2015-08-15) of full text queries against ptwiki and dewiki, sampled at 1-in-10 and 1-in-50 respectively. We chose ptwiki and dewiki because they are the largest wikis we can load in labs for testing at the moment.

I also collected a similar sample from enwiki (1-in-100), for comparison purposes, though we do not have a test copy of enwiki to run slop tests against.

Below are stats on the samples. Because only queries with phrases in them are affected by phrase slop, I only looked at queries with at least one double quote character in them. Thus, not all are actually phrase searches—some have only one double quote in them, and so won't be affected by changes to slop.

In the quoted queries, I found: Note that non-DOI, non–media-player, non-title-AND-title queries with quotes make up less than 1% of queries for all three wikis.
 * Many were DOI queries, which I did not test.
 * There were also many media player queries ("..." film).
 * enwiki still has a lot of "title_1" AND "title 2" queries, generally all coming from one IP address.; neither ptwiki nor dewiki have any.
 * There were a very small number of queries that seemed to be from malformed log entries, which were excluded.

Method
I re-ran the unique quoted queries (ignoring the DOI queries for now) against the lab instance of ptwiki and dewiki, with "precise" slop (the value used when searching for "phrases in quotes") set to 0, 1, 2, and 50, and recorded the total number of hits that resulted from each query for each slop value. Note that 50 is probably not a plausible slop value, but it provides an upper bound for a very loose configuration.

The current production configuration has zero slop, so I computed the differences in results compared to zero slop, both for zero queries, which we want to improve, and all queries, since we don't want to make everything else appreciably worse.

Results
The numbers below are for unique queries (to save CPU), so they are somewhat skewed—less so for ptwiki and more so for dewiki—but the trend is quite clear, since the number of non-zero deltas in every category for slop values of 1 and 2 are in the single digits.

ptwiki
ptwiki showed no effect for slop of 1 or 2 on current zero queries, though there was a minimal increase in total hits returned for other queries with quotes.

dewiki
dewiki returned more results for zero queries with slop set at 1 or 2, but the effect is small—only 7 and 14 queries out of 362 unique zero-result queries.

Pretty Pictures
I was hoping for some pretty scatter plots showing the changes in results for various slop values, but they were pretty disappointing. Here they are anyway.

Some data points for dewiki are not shown on the graph (see above), because the changes were extreme.

ptwiki
Changes in results for ptwiki with slop set to 1, 2, or 50. The straight line represents no change.



dewiki
Changes in results for dewiki with slop set to 1, 2, or 50. The straight line represents no change.



Conclusions
Overall, the effect of setting slop to 1 or 2 was minimal. There aren't that many queries with quotes, and most quoted zero queries are not affected by changes in slop, while a small number of non-zero quoted queries return additional, arguably less precise, results.

Given the differences in behavior between ptwiki and dewiki, it isn't clear that we can easily extrapolate to enwiki. However, there aren't that many quoted queries (outside of DOI, media player, and "title_1" AND "title 2" queries), so the maximum effect on human users will necessarily be small.

To Do

 * Run some DOI queries against ptwiki and dewiki to see what happens
 * Run with higher slop values (3, 4, or 5) if anyone asks
 * Gather quoted query stats for other wikis if anyone asks
 * Fill out any empty stats slots (e.g., for enwiki) if anyone asks