User:TJones (WMF)/Notes/Phrase Slop Pre-Test

Introduction
We are continuing with our A/B testing of potential ElasticSearch improvements, focusing this time on "phrase slop", which is a parameter that allows query phrases, like "quick fox", to match phrases in documents that have one or more extra words in them, like "quick brown fox". A slop of one allows one extra word, a slop of 50 allows 50 extra words.

Data
I took a random sub-sample of one day's worth (2015-08-15) of full text queries against ptwiki and dewiki, sampled at 1-in-10 and 1-in-50 respectively. We chose ptwiki and dewiki because they are the largest wikis we can load in labs for testing at the moment.

Below are stats on the samples. Because only queries with phrases in them are affected by phrase slop, I only looked at queries with at least one double quote character in them.
 * Many were DOI queries, which I did not test.
 * There were also many media player queries ("..." film).
 * There were a very small number of queries that seemed to be malformed logs