Help talk:CirrusSearch

Jump to navigation Jump to search

About this board

Automatic reset to 20 search results - why?

3
Gerbis (talkcontribs)

Why does the search function reset to 20 search results every time search terms are rephrased? It is very cumbersome and scroll intensive, particularly because the option to display more results is only at the bottom of the page.

Example: I tried to search for a picture of a BEST showroom by the architecture firm SITE. As both "best" and "site" are very common words, I didn't expect to find what I'm looking for at the top of the results. But every time I refined the search terms (e.g. adding a city), I had to scroll down to the bottom of the page to click on 500 results (and, of course, have to look at the first 20 results twice as well). What a waste of time!

And what's worse: the search results page is not automatically active. Therefore, to get to the bottom of the page (or, later, back to the top of the page) you can't use the End or Home button on your keyboard without clicking somewhere in the page first. Very, very annoying.

Could this not be made more user friendly?

Speravir (talkcontribs)

This annous me, too, most of the time. I think, though, this is wrong here, but had to be one or two Phabricator ticket/-s.

Cpiral (talkcontribs)

It is annoying. I use my browser shortcut key to go to the bottom of the search results page.

Probably not a bug or feature request, but instead: a 20-results per page is a characteristic of processing. The help page tries to say that while indexing and weighting are unavoidable pre-processing of a query, the snippets and bolding are avoidable post-processing requiring heavy networking and text processing.

Reply to "Automatic reset to 20 search results - why?"
Jonteemil (talkcontribs)

Hello!


On en.wikt. I want to find all pages with this syntax:


===Adjective===

{{head|de|adjective form}}


# {{inflection of|de|/positive form/||str|gen|m//n|s|supd|;|wk//mix|gen//dat|all-gender|s|supd|;|str//wk//mix|acc|m|s|supd|;|str|dat|p|supd|;|wk//mix|all-case|p|supd}}


The /positive form/ varies from page to page, it can be ”rot”, ”dumm”, ”froh” etc., so is there any way to make an insource search for the entire syntax? I can do it for everything after /positive form/ and everything before /positive form/ but not everything including the varying /positive form/? Just to clarify, /positive form/ is never written on any page it’s just what I use as a variable for the words that are written in that place.

Speravir (talkcontribs)

I do not get it fully. Some examples for possible variations would be nice, the examples you give do not have this syntax. Also your search query as you have it now would be good.

What is your actual interest for finding: the doubled empty lines, the doubled slashes?

Jonteemil (talkcontribs)
Speravir (talkcontribs)

Thanks.

And do you need exactly this string from the beginning with the third order section until the end where only the actual adjective in positive form varies?

Just as a start: I would first narrow down the amount for the search query, hence the query should begin with (do not overlook the first colon):

: hastemplate:head hastemplate:"inflection of" insource:adjective

After this would come the regex insource depending on what do you expect. For the posive form part I would use this regex: [^|}]+.

Speravir (talkcontribs)

@Jonteemil, what’s up? I know you have been active in the meantime. – Speravir (talk) 01:39, 19 December 2019 (UTC)

Jonteemil (talkcontribs)

Sorry for not replying. I realized what I wanted wasn’t possible to achieve in the way I thought, and that made me leave it and also forget this talk page. I appreciate your answer, thanks! Just out of curiousity btw, what do you mean with [^|}]+?Jonteemil (talk) 02:09, 19 December 2019 (UTC)

Speravir (talkcontribs)

Well, searching for this what you presented above is possible, though the search query gets quite long. Hence I asked for what you exactly interested in.

[^|}]+ is the regex for “everything, but not a pipe and closing brace character, at least one occurrence”. This is for the variable adjective string. BTW: For German adjectives we could change this to a narrower character search [a-zäöüß]+ or, if irregular upper case letters have to be expected, [A-Za-zÄäÖöÜüß]+</code.

Jonteemil (talkcontribs)

I see, thanks for the knowledge!

Colin M (talkcontribs)

The filters section says A namespace or a prefix term is not a filter because a namespace will not run standalone, and a prefix will not negate. This seems empirically untrue. On EnWP I get the following number of results for each of these queries, as expected:

  • incategory:"LGBT-related musical films": 58
  • incategory:"LGBT-related musical films" prefix:"Hello": 2
  • incategory:"LGBT-related musical films" -prefix:"Hello": 56

So it seems like negating a prefix does work. Am I misunderstanding what this is trying to say? For now I added a {{dubious}} tag.

TJones (WMF) (talkcontribs)

I took a long hard look at this and I'm confused, too. I'm not sure I understand the definition of "filter" being used. I don't know if the documentation is out of date or using some model that I'm not able to wrap my head around. Similarly, I don't get this: Insource ... is also a filter, but insource:/regexp/ is not a filter. insource:word and insource:/regex/ behave pretty much the same, other than the regex being much slower. Sounds like the documentation could use a thorough review to make sure all the advanced features and special cases are still described correctly.

Cpiral (talkcontribs)

A "filter" can reduce unwanted matches, providing refinement. Regex are special, catered for, terms that use filters. I made educated guesses at numerous terms, and even invented "greyspace". Just trying to help.

When I rewrote the help page to its current form, (years ago), there was neither documentation, nor discussion. So I would say prefixes can be negated now, but not then.

TJones (WMF) (talkcontribs)

@Cpiral, thanks for the explanation. Also, I very much appreciate all the work you've put into these help pages! They definitely keep getting better.

Reply to "Prefixes don't negate?"
185.66.254.155 (talkcontribs)

wikipedi sayfasına kullanıcı olarak içerik yüklemek istiyorum. içerik ortağı olarak katkıda bulunmak istiyorum ama üye olmama rağmen, bu isteğim gerçekleşmiyor.

ne yapmam gerek?

TJones (WMF) (talkcontribs)

EN: Sorry, this is not related to CirrusSearch, so I don't think you will get an answer here. Try asking on the Village Pump or Köy çeşmesi.

TR: Üzgünüz, bu CirrusSearch ile ilişkili değil, bu yüzden burada bir cevap alacağınızı sanmıyorum. Burada sormayı deneyin: Village Pump / Köy çeşmesi.

Reply to "İçerik yükleme engeli!!!!"

incategory parameter and white space

5
Summary by Tacsipacsi

Use quotation marks ("New cars") to ignore whitespace in parameter.

Loman87 (talkcontribs)

Hello everybody,

I am noticing an issue when using the incategory parameter, which doesn't work if there is a white space in the query, e.g. incategory:New cars doesn't work; incategory:New_cars works fine instead. This was always ok for me, but now I need to use this parameter with Extension:InputBox to limit search to specific categories. In the call to this categories I also need to use Variables, which give as output values with white space, e.g. {{FULLPAGENAME}} gives something like Category:New cars. Is there any way to make Cirrus Search working also with white spaces in category names? Or also to "force" variables to give values with underscores instead of the white spaces?

I am not sure this the right place to post this question, anyway any help is really appreciated.

Thanks,

Lorenzo

Tacsipacsi (talkcontribs)

Space is the separator between search terms, so incategory:New cars searches for pages mentioning cars in Category:New. You can explicitly mark search term boundaries with quotation marks, i.e. incategory:"New cars".

Loman87 (talkcontribs)

This is a wonderful workaround, thanks very much!

PerfektesChaos (talkcontribs)

Or incategory:New_cars since _ is regular replacement for spaces in page names.

Tacsipacsi (talkcontribs)

But it’s not so easy to convert the output of {{PAGENAME}} (not {{PAGENAMEE}}) to use underscores, and that’s what the question is about.

T506 not clear about tilde position for easy translation - reformulate please

2
Wladek92 (talkcontribs)

In

<!--T:506--> 
A fuzzy-word or fuzzy-phrase search can suffix a tilde ~ character (and a number telling the degree).

We can understand that the fuzzy elements come AFTER the tilde, or we may also guess that the fuzzy element has a tilde as it suffix (...???). More of that the next sentence "A tilde ~ character prefixed to the first term of a query guarantees search results instead of any possible navigation." is the same as the first proposition and makes a repetition. Can somebody reformulate please ? Thanks.

Christian FR (talk) 12:54, 1 November 2019 (UTC)

Ciencia Al Poder (talkcontribs)

I think T:506 refers to "phrase~".

About "A tilde ~ character prefixed to the first term of a query guarantees search results instead of any possible navigation.", I think it means, if you search for "MediaWiki", since a page with that name exists, it will redirect you straight to the page called MediaWiki. Searching for "~MediaWiki" will give you the search results page, even if a page with that name exists.

Reply to "T506 not clear about tilde position for easy translation - reformulate please"

T398 seems bad command for ignore Translations: in inlanguage:

2
Wladek92 (talkcontribs)

Is command T398 correct ? the namespace Translations: should appear in the text but it is not present, then the command is similar to selection of pages in japanese (T396) but we are explaining how to ignore them (strange!). Can someone detail or correct ? . Thanks you.

<!--T:398-->
* to ignore Translate, and where English is the base language, add
</translate>
: <kbd>inlanguage:en</kbd>

<!--T:396-->
* to count all Japanese pages on the wiki
: <kbd>all: inlanguage: ja</kbd

Christian FR (talk) 12:34, 1 November 2019 (UTC)

Ciencia Al Poder (talkcontribs)

I'm not sure about the "ignore Translate" intent, because that seems to be done with the selection of namespaces to search (the Translations: namespace is not marked by default). I'd remove the "ignore Translate" part, and only focus on the language

Reply to "T398 seems bad command for ignore Translations: in inlanguage:"

T549 should be command => "hastemplate: portal:contents/..." instead of ": hastemplate: portal:contents/..."

2
Summary by Wladek92

done, corrected; thanks.

Wladek92 (talkcontribs)

I think first colon should be removed; any advice ???

<!--T:549-->
* <tvar|hastportal><kbd>: hastemplate: portal:contents/tocnavbar</kbd></>, finds mainspace usage of a "<tvar|tocnavbar>Contents/TOCnavbar</>" template in the Portal namespace.
Speravir (talkcontribs)

Yes, this should be a typo mistake.

Question about spelling corrections and "no results"

8
Equinox (talkcontribs)

For example: I put parimion into Wikipedia's search box. It says: "Showing results for pavilion. (LINK:) Search instead for parimion." I click that link and it says: "There were no results matching the query."

The spelling correction is (sometimes) useful, but in my experience, the "search instead" link never ever gives any results. Indeed that link only seems to be offered when your typed text is not present in the entire wiki, and then it does the best-guess spelling for you.

Am I right? If so, what's the point of that "search instead" link, which is guaranteed to produce no results?

TJones (WMF) (talkcontribs)

We do only replace your query with the suggestion if the original query got zero results. I think the "search instead" language pre-dates all of us who are currently working on the search platform team, so I can't give you the original justification for it—though mimicking Google's UI patterns generally makes search more understandable for most users. However, I can imagine that some people—particularly power users and editors of various sorts—would be upset if they searched for parimion, got results for pavilion, and then couldn't verify that parimion did in fact get zero results.

Google will override your intended search with their suggestion and give a link for your original search that gives fewer results. So, we are working in an environment where people might expect valid results to be overridden by a search engine; letting them see their original results even though there will be zero is goofy, but it's goofiness in the name of transparency.

197.235.220.190 (talkcontribs)

Seems rather simple to improve. Make it clear to the user that the query they chose will result in 0 entries, e.g.: "Showing results for pavilion. Search instead for parimion (Note: there are 0 results)".


>Am I right? If so, what's the point of that "search instead" link, which is guaranteed to produce no results?

No.

It is very important to keep the option allowing the user to search instead for whatever they typed. First, it allows them to verify the search engine's claim, it also makes it clear that the user isn't getting wild results because of some bug, and lastly, they can always check that it is accurate. After all, machines can and do make mistakes, and more importantly, the search engine can be wrong more often than not, especially, in a wiki where things can change. At the time of the query the search engine might be right, but just a few seconds later someone can create the page, or the new entry might simply be taking time to update the index despite the fact that the content was created right before your search.


Anyway, if a particular wiki doesn't like the message, I guess they could edit it using Mediawiki:search-rewritten.

Equinox (talkcontribs)

If there are no results then I think it would be better to say "no results for X; here are results for Y", and drop the pointless link. I take "197"'s point that there might be results if you search again a few seconds later, but if you want to do that you can just hit Refresh or F5 etc. Hardly a common use case.

Equinox (talkcontribs)

What is my next step? I have had bad experiences with bug trackers. How can I suggest this change without being shit on? Thanks.

TJones (WMF) (talkcontribs)

I'm sorry that you've had bad experiences with task trackers. It's a recurring problem for a lot of people, unfortunately. In this case, the people who would be working on it agree with you, so there shoudn't be any reason for unpleasant discussion.

I've uncovered some of the history of the message—turns out one person on our team was here when it was implemented—and the original thought was that we might allow suggestions to overwrite queries that got a non-zero number of results, but that never materialized.

The current plan is to create a new message that says there are no results for the original query, which we'll show when appropriate, and keep the existing message for a possible future case where we overwrite a query with non-zero results.

I've created a task: T236296

Equinox (talkcontribs)

Okay. Thanks. I really appreciate your help here as an "insider". Let's see how it goes :)

TJones (WMF) (talkcontribs)

Glad I could help. Please do keep in mind that we have to prioritize and work through lots of tasks, so while this is probably straightforward, it may take a while for us to get to it. But you definitely gave us a helpful push in the right direction. Thanks!

Reply to "Question about spelling corrections and "no results""
Summary last edited by TJones (WMF) 16:46, 29 October 2019 2 months ago

First posting was not trolling, but a question about a misunderstood meaning of a word. Solved by rephrasing of this part.

Zabavuju flašku chlastu maskovanou jako zubní pastu (talkcontribs)

Warning: Do not run a bare <tvar|insreg>insource:/regexp/</> search. It will probably timeout after 20 seconds anyway, while blocking responsible users.

is this a bad joke?

Speravir (talkcontribs)

No, it is a serious warning.

Zabavuju flašku chlastu maskovanou jako zubní pastu (talkcontribs)

Am I missing something? But why should making a search query block someone???

Clump (talkcontribs)

Probably due to some serialization in handling the search request that results in the server being unable to respond to other requests in a timely fashion.

Speravir (talkcontribs)

@Clump, from Help:CirrusSearch#Insource:

Regex scan all the textual characters in a given list of pages; they don't have a word index to speed things up, […]

And in Help:CirrusSearch#Regular expression searches:

A regex search actually scours each page in the search domain character-by character. By contrast, an indexed search actually queries a few records from a database separately maintained from the wiki database, […]
Zabavuju flašku chlastu maskovanou jako zubní pastu (talkcontribs)
Speravir (talkcontribs)

Primarly “to block” is just a verb and has not only this meaning you apparently think of, cf. e.g block - český překlad - slovník bab.la or more verbous Překladač Google (searched this for you). In this case it means the servers are in worst case unreachable for others, so these “responsible users” are blocked from using these servers.

Zabavuju flašku chlastu maskovanou jako zubní pastu (talkcontribs)

Generally speaking most terms with more meanings have a dominant interpretation. You can probably understand that the dominant interpretation of "USA" is not United Scenic Artists or United Soccer Association. I'm sure that in context of Mediawiki the dominant interpretation of the verb "to block" is not "just a verb" but "to remove edit right". (Accordingly the dominant interpretation of "image" is image file and not "A characteristic of a person, group or company etc., style, manner of dress, how one is, or wishes to be, perceived by others." Etc.)

PS: I'm most active at Wiktionary. No need to tell me what a verb means or how to find it.

Speravir (talkcontribs)

Well, my first language is not English, and I am sure I am far from perfect understanding, but I know that for every word we have to look at the context, and I think I understood it right here.

This said a tip for you: Next time you want to point to a potential wrong use of a word here or elsewhere do not caption it with “bad joke”. End of discussion for me.

TJones (WMF) (talkcontribs)

Zabavuju, like Speravir, I didn't interpret "block" in the admin sense, since I'm familiar with how regex searches work—they can be very computationally expensive, so they can tie up the search servers, and only a limited number are allowed to run at once, so even if the servers aren't too busy overall, running very expensive regexe searches can temporarily "block" other users from running more reasonable regex searches.

I took your "bad joke" comment to refer to the somewhat hostile tone of the documentation; I run bare insource regex queries fairly regularly because there's no other way to get the info I need, plus they aren't super expensive on much smaller wikis. I generally try not to get involved in issues of style and tone in the documentation, only technical accuracy, so I didn't get involved in the discussion when it first started. It wasn't until reading over it today that I saw the further replies and now see the issue.

Anyway, as a sometime copyeditor, I think that any text that is potentially confusing should be edited for clarity. It doesn't matter whether 95% or 5% of people are going to misread "block" as an admin action. It's easy enough to use another word. I'll edit it from my volunteer account, since this isn't a technical issue.

Speravir (talkcontribs)

@TJones (WMF)/Trey314159: Thank you for your edit. And again: I think most of the discussion could have been avoided with a different title and first posting in a neutral tone. So, I misunderstood the intention of the thread starter.