Thread:Talk:Search/Searching common words yeilds nothing/reply (3)

We have a fix for this in review. It mostly makes things better but it isn't perfect, unfortunately. This is one of the cases where lsearchd is still more advanced then the the rest of the open source world. In particular the change does this well: 1. Finds both exact matches and stemmed matches but sorts the exact matches higher. Yay. 2. Finds articles that only contain stop words! Sweet! 3. Highlights stopwords in results. Very nice!

But it does these things poorly: 1. Stop words will be worth as much as exact matching text. They really ought to be worth less than stemmed text. Like 10% or something. Anyway that requires some work in Elasticsearch to get that hopping. I mean, stop words in the article text will still be worth very little because they are common but they will be uncommon in things like headings and titles which will make them worth more when they appear. 2. Stop words are now required. They used to be ignored which was bad but now they are required which, I think, is less bad but still not good. The side effect here is that if you search for "the once and future king" you won't find an article named "once and future king" if the article doesn't contain the word "the". This is somewhat moot because more articles (in English) will likely contain the word "the". On enwiki, the article for "The Once and Future King" even contains the word "The" in the title.

I think the trade off is worth it but it is a tradeoff. I've started work upstream to get rid of the problems though I really can't comment on we'll get the fixes.