Help talk:CirrusSearch

About this board

Previous page history was archived for backup purposes at Help talk:CirrusSearch/LQT Archive 1 on 2015-07-10.

Update en-wiki

2 comments • 14:45, 8 April 2024 10 days ago

2

2001:14BA:9CD6:4200:D43C:5ABA:9AD8:104 (talkcontribs)

Ping @User:JWBTH (or anyone who notices this). Referring to these edits, can you also update en:Help:Searching/Regex#Workarounds_for_some_character_classes? I noticed en-wiki's 􏿽 doesn't work but this MediaWiki's 􏿿 does work for newlines.

Reply 12:36, 8 April 2024 10 days ago

2001:14BA:9CD6:4200:D43C:5ABA:9AD8:104 (talkcontribs)

That ping didn't work so I'll try again: User:JWBTH

Reply 14:45, 8 April 2024 10 days ago

Reply to "Update en-wiki"

fuzzy search

4 comments • 17:26, 3 April 2024 14 days ago

4

Pirhayati (talkcontribs)

Hi. In case I want to search (fuzzy search) two words with a word to fit in but not the exact sequence of the two words, is it possible? For example I want "flowers for Algernon" to be in findings but not "flowers Algernon".

Reply 20:41, 29 March 2024 19 days ago

DCausse (WMF) (talkcontribs)

Hi,

Unfortunately no, you could do an approximation by using a negation: "flowers Algernon"~1 NOT "flowers Algernon".

The first part would find documents with flowers algernon or flowers for algernon and the second part would exclude documents matching flowers algernon.

In the end you might find pages that have occurrences of flowers for algernon but not all of them. If a page have both forms flowers for algernon and flowers algernon it would be excluded.

Reply 08:23, 2 April 2024 16 days ago

Pirhayati (talkcontribs)

Thank you. It works for me.

Reply 10:47, 2 April 2024 16 days ago

DCausse (WMF) (talkcontribs)

Hi, I saw that you contacted me on IRC but I responded too late.

We don't have immediate plans to improve this kind of queries and implement the feature you need so I would suggest to file a new ticket at https://phabricator.wikimedia.org/ (tagging CirrusSearch) to describe your usecase.

Thanks!

Reply 17:26, 3 April 2024 14 days ago

Reply to "fuzzy search"

Multiple keyword searches

2 comments • 03:28, 22 March 2024 27 days ago

2

Seeker1030 (talkcontribs)

Hi how to search using multiple key words, For eg: Libra ascendant born on 1965 how could we search this parameters

Reply 05:28, 25 February 2024 1 month ago

Speravir (talkcontribs)

Simply by typing libra ascendant born 1965 into the search form (I assume "on" is a so called stop word). If there are dedicated categories for a topic you could also use the filter word incategory, e.g. ascendant libra incategory:"1965 births".

Reply Edited 20:09, 26 February 2024 1 month ago

Reply to "Multiple keyword searches"

The deepcat is not working

2 comments • 22:59, 30 January 2024 2 months ago

2

Summary by Speravir

User found the reason why it did not work themselves.

Strubbl (talkcontribs)

i tried to use the deepcat feature, but it shows no result. Replacing deepcat with incategory shows results. So there shall be results especially with deepcat. How can i make this query work?

20:13, 30 January 2024 2 months ago

Strubbl (talkcontribs)

i just saw, that i did not use Special:Search, where deepcat is working.

Edited 20:38, 30 January 2024 2 months ago

Updates to #Prefix and namespace for clarity; <code> vs. <kbd>

5 comments • 13:31, 27 January 2024 2 months ago

5

Ernstkm (talkcontribs)

Hi all. I had a lot of trouble understanding the #Prefix and namespace section, so I did my best to make it more readable.

Specifically confusing was term vs. term:, so I tried to make that more consistent within that section. When I see code, the programmer part of my brain thinks "oh, I type this in." Italics is used in prose for emphasis, is not very visually distinctive, and therefore doesn't trigger the same "oh, I must do something!" response. So I hope it makes sense why I think term: is preferable here.

Secondly, the HTML ‎<kbd> tag is typically used to denote hotkeys, such as Control+c, but I had a look at the MDN docs and it seems that using it for strings of literal text input is OK, too. What would be the preference then, ‎<kbd> or ‎<code>?

Would it be helpful to do a pass over the whole article, or the whole batch of CirrusSearch user-facing documentation, in order to make the use of ''term'' / <code>term</code> / <kbd>term</kbd> more consistent? --Ernstkm (talk) 03:17, 12 November 2023 (UTC)

Reply Edited 03:25, 12 November 2023 5 months ago

TJones (WMF) (talkcontribs)

Thanks for improving the documentation!

I think clarity is the most important goal, but consistency almost always helps with clarity. A lot of what the italics and <code>/<kbd> markup is trying to get at is the use–mention distinction. The problem is that there's no consistency on how to format mentions, and different traditions vary, so we are collectively not always consistent.

Adding the monospaced <code> or <kbd> to the mix lets us make finer distinctions (linguistics does this kind of thing, too, sometimes using both italics and quotes for different mentions, and mixing single and double quotes: He said, "I told my cat that gato means 'cat' in Spanish.") Search discussions often use italics rather than quotes so we can mention quotes: You should search for "pet" dog cat. And like you, I tend to interpret monospaced text as things I could type; I guess it's a tech-flavored mention.

I guess I'm mostly agreeing that it's a mess, but I think that trying to make another finer distinction between <code> and <kbd> would only make it messier and harder for newcomers to understand or contribute. Since ⌘ Command+⇧ Shift+6 (on a Mac) generates <code> tags, that's probably the best thing to standardize on—unless clarity of formatting creates a reason to use <kbd>, too.

Reply 16:03, 13 November 2023 5 months ago

Ernstkm (talkcontribs)

OK, I agree, and thanks for the lesson on "use-mention distinction." I guess I intuitively knew that was a thing, but didn't know its name.

I'll go ahead and replace the <kbd>s with <code>s. I thought the use of <kbd> was a little odd anyway, given that it's used on other sites like Stack Overflow and GitHub specifically to indicate keypresses, and gets styled like keycaps, in the same way that {{key press}} is used here.

Reply 11:46, 28 November 2023 4 months ago

Ernstkm (talkcontribs)

…or not. There are 420 uses of <kbd> in the article. I could search-and-replace all of them, but I'm not sure that improves the article materially. Leaving as is for now.

Reply 12:07, 28 November 2023 4 months ago

TJones (WMF) (talkcontribs)

> …or not. There are 420 uses of <kbd> in the article.

Fair enough!

Reply 15:01, 28 November 2023 4 months ago

Reply to "Updates to #Prefix and namespace for clarity; <code> vs. <kbd>"

greyspace characters

2 comments • 12:25, 15 December 2023 4 months ago

2

217.117.125.83 (talkcontribs)

How to search for an exact string including greyspace characters?

Reply 16:54, 5 September 2022 1 year ago

Speravir (talkcontribs)

Try "exact string including?" insource:/"exact string including?"/. The last part is found under Regular Expression searches.

Reply 17:53, 5 September 2022 1 year ago

Reply to "greyspace characters"

Search index update

3 comments • 14:56, 6 November 2023 5 months ago

3

Jonteemil (talkcontribs)

In the page it says that the search index will be updated, at least once a day. I've been trying to fix broken files over at Commons that have 0 x 0 px. I used the search fileh:0 filew:0 filetype:image -filemime:image/tiff to find them. Now, files I fixed weeks ago are still listed in the results. When will they go away?

Reply 12:48, 24 July 2023 8 months ago

DCausse (WMF) (talkcontribs)

Thanks for reporting the problem, there seems to be a problem in the way CirrusSearch is handling these edits, I filed Phab:T342562 to track and fix the issue.

Reply Edited 17:07, 24 July 2023 8 months ago

Jonteemil (talkcontribs)

Okay, perfect.

Reply 18:58, 24 July 2023 8 months ago

Reply to "Search index update"

i want all existing templates

5 comments • 17:57, 6 October 2023 6 months ago

5

Wladek92 (talkcontribs)

hi all, going to -> https://www.mediawiki.org/w/index.php?search=%2A&title=Special:Search&profile=advanced&fulltext=1&ns10=1 i want all existing templates ie all pages title in ns Template: . After setting this single ns only from the drop list, i tried several forms but without success: 1. with no string i get no result 2. with joker '*' i get the template * only.

So please what is the syntax ? of this elementary request "give me all page titles of ns Template:" Thanks -- Christian 🇫🇷 FR (talk) 07:03, 27 June 2023 (UTC)

Reply Edited 07:04, 27 June 2023 9 months ago

TheDJ (talkcontribs)

Search cannot do that. That's what the api or quarry is for.

Reply 09:35, 27 June 2023 9 months ago

Tacsipacsi (talkcontribs)

Or Special:AllPages: https://www.mediawiki.org/wiki/Special:AllPages?namespace=10

Reply 20:02, 27 June 2023 9 months ago

Cpiral (talkcontribs)

That is a feature that I too once wanted: a list of page titles matching some query. Instead I settled on storing the search result as text, and then using my text-processing skills to extract the titles.

In your case it works to first capture the search result of prefix: template: to file.

Then you grep, and can sort them alphabetically.

Reply 00:08, 2 July 2023 9 months ago

TheDJ (talkcontribs)

Again, this is not what you are supposed to use search for. If you want a list, you should use something made to generate lists, like Special:AllPages, database dumps or quarry. Search is fuzzy, its optimised to find words, not to generate lists.

This is an example to get the first 50 template names on mediawiki.org which are not redirects and not deleted:

https://quarry.wmcloud.org/query/74910

And when lists get really big, you will HAVE to use pagination. There is no way around this as WMF properties generally are very big properties.

Reply Edited 14:37, 3 July 2023 9 months ago

Reply to "i want all existing templates"

Automatically jump to first result

2 comments • 22:03, 4 October 2023 6 months ago

2

Aschroet (talkcontribs)

Hello everybody, is there a possibility to automatically jump/redirect to the first result? Obviously this works for dewiki

https://de.wikipedia.org/w/index.php?search=Espenfeld

but not for Wikidata:

https://www.wikidata.org/w/index.php?search=Am_Hanffgraben_(Berlin)

Maybe there is a parameter that can be added to the URL?

Thank you in advance, --~~~~

Reply 15:21, 3 October 2023 6 months ago

TheDJ (talkcontribs)

There is no such functionality. What you are seeing is title matching. If your search exactly matches the title of a page, it will take you to that page. For wikidata the title of a page is its Q id. So you can do https://www.wikidata.org/w/index.php?search=Q111351350 and it will take you to that Q id.

Reply 19:12, 3 October 2023 6 months ago

Reply to "Automatically jump to first result"

How to export search result

3 comments • 12:10, 8 August 2023 8 months ago

3

Bennylin (talkcontribs)

Hi, I have a search result on wiki "articles without ref tags", and I want to dump/export the list of all the titles from that search. I tried API, but 500 is the limit of each call.

Can anyone help? Thanks beforehand.

Reply Edited 14:45, 30 July 2023 8 months ago

DCausse (WMF) (talkcontribs)

Hi, sadly this is not possible.

You can try to make multiple calls to the API using pagination via the API:Continue parameter to gather more than 500 results.

But there will be limits there too, you can't paginate past the 10000th result.

Such limits are in place to protect the service because even using the continue parameter elasticsearch (the underlying search engine used by CirrusSearch) have to keep all the results from the start in memory.

A quick note regarding your query:

-insource:<ref>

The characters < and > will be ignored and what is actually run is

-insource:ref

and thus you might exclude pages that have the word ref used outside a <ref>, e.g. : https://id.wikipedia.org/wiki/Sumber_primer.

If you want to actually search for the < and > characters you have to use the regular expression syntax by wrapping you search text between a pair of / and escaping the < > characters with a \:

-insource:/\<ref\>/

But beware that the query above might not filter pages with named references <ref name="named ref"> or pages where the reference tag is added via a template.