Topic on Help talk:CirrusSearch

"Really" exact matches

10
Summary last edited by Quiddity (WMF) 21:55, 26 March 2015 9 years ago
Mikhail Ryazanov (talkcontribs)

Is it currently possible to find only really exact matches? I mean, searches "exact match", "exact-match" and "exact. Match" must give different results. The old search kind of did that, but the new engine seems to completely ignore the punctuation and the case, which is quite sad...

Nemo bis (talkcontribs)

You can use "insource", AFAIK. It's in the docs.

Junkyardsparkle (talkcontribs)

The basic form of "insource:" doesn't care about those differences, either. In theory, you could use the regex search, but it didn't seem to be working the last time I tried to use it...

Mikhail Ryazanov (talkcontribs)

Yes, I checked that "insource:" does not care. It is not even mentioned in the help, but now I found at least this. Regexp search does something (at least, the results are different), but it's hard to tell, because it does not highlight the matches and just shows two lines of text from the beginning of each found page... Moreover, it fails epically for regexps with spaces — searching for insource:/exact match/ produces:

An error has occurred while searching: We could not complete your search due to a temporary problem. Please try again later.

Luckily, insource:/exact\ match/ does work (although, again, the manual is absolutely silent about spaces).

Another sad thing is that regexp search, being an overkill for such problems, is extremely slow in real wikis. Why can't the engine just do its quick punctuation-ignorant search and then post-filter the results for exact matches?

Do we have any place/procedure to report bugs and requests?

Junkyardsparkle (talkcontribs)

I made some edits to the Help page to clarify the behavior of insource:, hopefully it makes sense. I don't know if bugzilla is appropriate for requests of that sort or not, haven't gone that far myself. (But yes, I agree that an intermediate form that supported literal non-word characters would probably be what most people would expect.)

NEverett (WMF) (talkcontribs)

I just tried to make it make more sense as well. That insource:"/spaces ok here/" is a bug - you _should_ be able to put spaces without the quotes. I've filed it as bugzilla:71053 and I'll work on it now. I have to admit that I'm sometimes too steeped in the technical search backend stuff to write sensible documentation. I just assume too much.

I've done some work on the error messaging as well though it hasn't be merged or deployed yet. Its pretty minor - it'll just tell you whether if you're hitting the maximum number of concurrent regex queries or not.

Chris the speller (talkcontribs)

This Bugzilla report covers the situation about ignoring hyphens, but a comment there actually points back to this thread as some kind of solution to the problem! This would be amusing if it were not so unhelpful. The regexp form of insource: goes away for many minutes on en.wikipedia, then returns a gateway error. The other form of insource: completely ignores hyphens. The insource: feature is not the answer.

NEverett (WMF) (talkcontribs)

> it does not highlight the matches and just shows two lines of text from the beginning of each found page


Yeah! I'll fix that right away because its silly. bugzilla:71057

> Another sad thing is that regexp search, being an overkill for such problems, is extremely slow in real wikis. Why can't the engine just do its quick punctuation-ignorant search and then post-filter the results for exact matches?


It can but you have to tell it to do so. Example: <<insource:"well-liked" insource:/well-liked/>>. I wonder if we can go some kind of "super exact" route that does that for you. Something without the power of regexes but simpler to use. The problem is that it'd be a somewhat leaky abstraction over prefiltered regex searches like the example. Still might be worth it.

> Do we have any place/procedure to report bugs and requests?


File bugs against the CirrusSearch component is the best way. I don't read on wiki as much as I should but bugs email me.

Mikhail Ryazanov (talkcontribs)

Thanks for your attention to the problem! :–)

Regarding "super exact" matches, I think that the word "exact" has a well-defined meaning, ;–) and thus ignoring all non-word characters is not really "exact". Therefore I would at least suggest to document the current behavior for "Quotes and exact matches" (such that it matches given words in given order, or what it really does?). But also I believe that truly exact matches are sometimes needed. While they can be emulated by using a combination of a quoted string and a "duplicate" regexp, that route is cumbersome and works for "insource:" only. Can you add a modifier (for example, an exclamation mark or an equal sign: <<"well-liked"!>> or <<"well-liked"=>>) to request really exact matches (since the engine already has a "~" modifier for "less exact")?

Nemo bis (talkcontribs)

It's an exact comparison between two normalised forms. :P

Reply to ""Really" exact matches"