Help talk:CirrusSearch

About this board

2001:14BA:9CD6:4200:D43C:5ABA:9AD8:104 (talkcontribs)
2001:14BA:9CD6:4200:D43C:5ABA:9AD8:104 (talkcontribs)

That ping didn't work so I'll try again: User:JWBTH

Reply to "Update en-wiki"
Pirhayati (talkcontribs)

Hi. In case I want to search (fuzzy search) two words with a word to fit in but not the exact sequence of the two words, is it possible? For example I want "flowers for Algernon" to be in findings but not "flowers Algernon".

DCausse (WMF) (talkcontribs)

Hi,

Unfortunately no, you could do an approximation by using a negation: "flowers Algernon"~1 NOT "flowers Algernon".

The first part would find documents with flowers algernon or flowers for algernon and the second part would exclude documents matching flowers algernon.

In the end you might find pages that have occurrences of flowers for algernon but not all of them. If a page have both forms flowers for algernon and flowers algernon it would be excluded.

Pirhayati (talkcontribs)

Thank you. It works for me.

DCausse (WMF) (talkcontribs)

Hi, I saw that you contacted me on IRC but I responded too late.

We don't have immediate plans to improve this kind of queries and implement the feature you need so I would suggest to file a new ticket at https://phabricator.wikimedia.org/ (tagging CirrusSearch) to describe your usecase.

Thanks!

Reply to "fuzzy search"

Multiple keyword searches

2
Seeker1030 (talkcontribs)

Hi how to search using multiple key words, For eg: Libra ascendant born on 1965 how could we search this parameters

Speravir (talkcontribs)

Simply by typing libra ascendant born 1965 into the search form (I assume "on" is a so called stop word). If there are dedicated categories for a topic you could also use the filter word incategory, e.g. ascendant libra incategory:"1965 births".

Reply to "Multiple keyword searches"

The deepcat is not working

2
Summary by Speravir

User found the reason why it did not work themselves.

Strubbl (talkcontribs)

i tried to use the deepcat feature, but it shows no result. Replacing deepcat with incategory shows results. So there shall be results especially with deepcat. How can i make this query work?

Strubbl (talkcontribs)

i just saw, that i did not use Special:Search, where deepcat is working.

Updates to #Prefix and namespace for clarity; <code> vs. <kbd>

5
Ernstkm (talkcontribs)

Hi all. I had a lot of trouble understanding the #Prefix and namespace section, so I did my best to make it more readable.

Specifically confusing was term vs. term:, so I tried to make that more consistent within that section. When I see code, the programmer part of my brain thinks "oh, I type this in." Italics is used in prose for emphasis, is not very visually distinctive, and therefore doesn't trigger the same "oh, I must do something!" response. So I hope it makes sense why I think term: is preferable here.

Secondly, the HTML <kbd> tag is typically used to denote hotkeys, such as Control+c, but I had a look at the MDN docs and it seems that using it for strings of literal text input is OK, too. What would be the preference then, <kbd> or <code>?

Would it be helpful to do a pass over the whole article, or the whole batch of CirrusSearch user-facing documentation, in order to make the use of ''term'' / <code>term</code> / <kbd>term</kbd> more consistent? --Ernstkm (talk) 03:17, 12 November 2023 (UTC)

TJones (WMF) (talkcontribs)

Thanks for improving the documentation!

I think clarity is the most important goal, but consistency almost always helps with clarity. A lot of what the italics and <code>/<kbd> markup is trying to get at is the use–mention distinction. The problem is that there's no consistency on how to format mentions, and different traditions vary, so we are collectively not always consistent.

Adding the monospaced <code> or <kbd> to the mix lets us make finer distinctions (linguistics does this kind of thing, too, sometimes using both italics and quotes for different mentions, and mixing single and double quotes: He said, "I told my cat that gato means 'cat' in Spanish.") Search discussions often use italics rather than quotes so we can mention quotes: You should search for "pet" dog cat. And like you, I tend to interpret monospaced text as things I could type; I guess it's a tech-flavored mention.

I guess I'm mostly agreeing that it's a mess, but I think that trying to make another finer distinction between <code> and <kbd> would only make it messier and harder for newcomers to understand or contribute. Since ⌘ Command+⇧ Shift+6 (on a Mac) generates <code> tags, that's probably the best thing to standardize on—unless clarity of formatting creates a reason to use <kbd>, too.

Ernstkm (talkcontribs)

OK, I agree, and thanks for the lesson on "use-mention distinction." I guess I intuitively knew that was a thing, but didn't know its name.

I'll go ahead and replace the <kbd>s with <code>s. I thought the use of <kbd> was a little odd anyway, given that it's used on other sites like Stack Overflow and GitHub specifically to indicate keypresses, and gets styled like keycaps, in the same way that {{key press}} is used here.

Ernstkm (talkcontribs)

…or not. There are 420 uses of <kbd> in the article. I could search-and-replace all of them, but I'm not sure that improves the article materially. Leaving as is for now.

TJones (WMF) (talkcontribs)

> …or not. There are 420 uses of <kbd> in the article.

Fair enough!

Reply to "Updates to #Prefix and namespace for clarity; <code> vs. <kbd>"
217.117.125.83 (talkcontribs)

How to search for an exact string including greyspace characters?

Speravir (talkcontribs)
Reply to "greyspace characters"
Jonteemil (talkcontribs)

In the page it says that the search index will be updated, at least once a day. I've been trying to fix broken files over at Commons that have 0 x 0 px. I used the search fileh:0 filew:0 filetype:image -filemime:image/tiff to find them. Now, files I fixed weeks ago are still listed in the results. When will they go away?

DCausse (WMF) (talkcontribs)

Thanks for reporting the problem, there seems to be a problem in the way CirrusSearch is handling these edits, I filed Phab:T342562 to track and fix the issue.

Jonteemil (talkcontribs)

Okay, perfect.

Reply to "Search index update"

i want all existing templates

5
Wladek92 (talkcontribs)

hi all, going to -> https://www.mediawiki.org/w/index.php?search=%2A&title=Special:Search&profile=advanced&fulltext=1&ns10=1 i want all existing templates ie all pages title in ns Template: . After setting this single ns only from the drop list, i tried several forms but without success: 1. with no string i get no result 2. with joker '*' i get the template * only.

So please what is the syntax ? of this elementary request "give me all page titles of ns Template:" Thanks -- Christian 🇫🇷 FR (talk) 07:03, 27 June 2023 (UTC)

TheDJ (talkcontribs)

Search cannot do that. That's what the api or quarry is for.

Tacsipacsi (talkcontribs)
Cpiral (talkcontribs)

That is a feature that I too once wanted: a list of page titles matching some query. Instead I settled on storing the search result as text, and then using my text-processing skills to extract the titles.


In your case it works to first capture the search result of prefix: template: to file.


Then you grep, and can sort them alphabetically.

TheDJ (talkcontribs)

Again, this is not what you are supposed to use search for. If you want a list, you should use something made to generate lists, like Special:AllPages, database dumps or quarry. Search is fuzzy, its optimised to find words, not to generate lists.


This is an example to get the first 50 template names on mediawiki.org which are not redirects and not deleted:

https://quarry.wmcloud.org/query/74910


And when lists get really big, you will HAVE to use pagination. There is no way around this as WMF properties generally are very big properties.

Reply to "i want all existing templates"

Automatically jump to first result

2
Aschroet (talkcontribs)
TheDJ (talkcontribs)

There is no such functionality. What you are seeing is title matching. If your search exactly matches the title of a page, it will take you to that page. For wikidata the title of a page is its Q id. So you can do https://www.wikidata.org/w/index.php?search=Q111351350 and it will take you to that Q id.

Reply to "Automatically jump to first result"

How to export search result

3
Bennylin (talkcontribs)

Hi, I have a search result on wiki "articles without ref tags", and I want to dump/export the list of all the titles from that search. I tried API, but 500 is the limit of each call.

Can anyone help? Thanks beforehand.

DCausse (WMF) (talkcontribs)

Hi, sadly this is not possible.

You can try to make multiple calls to the API using pagination via the API:Continue parameter to gather more than 500 results.

But there will be limits there too, you can't paginate past the 10000th result.

Such limits are in place to protect the service because even using the continue parameter elasticsearch (the underlying search engine used by CirrusSearch) have to keep all the results from the start in memory.

A quick note regarding your query:

-insource:<ref>

The characters < and > will be ignored and what is actually run is

-insource:ref

and thus you might exclude pages that have the word ref used outside a <ref>, e.g. : https://id.wikipedia.org/wiki/Sumber_primer.

If you want to actually search for the < and > characters you have to use the regular expression syntax by wrapping you search text between a pair of / and escaping the < > characters with a \:

-insource:/\<ref\>/

But beware that the query above might not filter pages with named references <ref name="named ref"> or pages where the reference tag is added via a template.

Bennylin (talkcontribs)

Thank you for the answer and the correction!

Reply to "How to export search result"