Thread:Talk:Search/LiquidThreads archive/'Old search' is better/reply (13)

Thanks for coming here to complain about these results. We'll figure some way out to make it at least as good for this class of search.

As to why we're replacing the old search when it is so good at finding results, here is the short list:
 * Old search crashes/rans out of resources from time to time and no one knows how to fix it. Its a pretty large code base based on really old libraries.  New search is based off of relatively standard services under active development.
 * Old search updates every few days and often misses things. New one updates pretty near real time.   Page edits are usually in the index in under a minute.  Template edits are can take longer to be reflected in the pages that contain those templates.
 * Old search doesn't do anything with templates. New search fully resolves templates.  Its *righter* but its more trouble.

The truth is that the replacement project was driven internally by ops folks raising a ruckus because the old one had no maintainer and wasn't super stable. There is also a significant backlog of bugs and feature requests for search that we've had to ignore because the old one was so hard to work on. So that's how you get where we are.

As far as why the new search doesn't spit out results exactly like the old one, one of the reasons is that the old one is super customized for English Wikipedia. Its difficult to navigate and many of the customizations were speculative: they didn't really provide better results, they just were there. So we implemented the ones that were obviously better and deployed the new search as a BetaFeature so folks could try it. When we tried it we found the results were usually similar but not better or worse. You've hit on one of the customizations that we didn't reimplement: the old search weights hits that are early in the article more highly then results at the end. We didn't do this because our tests didn't show it made much difference. But for you searches it makes a pretty huge difference.

Long story short, we'll implement that.

Also, if you are curious on how scoring works you can read the first half of this presentation. The other half won't be all the interesting.