Jump to content

Help talk:CirrusSearch

Add topic
From mediawiki.org

Does boost force sort=relevance?

[edit]

If I am using a defined sort order, such as, say, &sort=last_edit_desc, what happens if I choose a bounded or boosted search on, say, 100km,San Francisco ? Does it turn off my sort selection, and switch it to relevance instead? If not, what is the meaning of a search which contains both of those (or any sort option other than relevance) ? (subscribed) Mathglot (talk) 05:47, 19 December 2024 (UTC)Reply

The sort param will take precedence over what's inside the query.
All the search keywords that do impact relevance only like:
  • prefer-recent
  • boost-template
  • boost-neartitle
  • morelike
have no impact when sort is not relevance. To be precise morelike may still have an impact but it should be very marginal and there's not much value in using it when sort is not relevance.
Other keywords (including bounded geo searches like neartitle/nearcoord) will have an impact on the set of documents returned regardless of the sort parameter.
We could possibly emit a warning when such keywords are used in conjunction with a sort param that is not relevance. DCausse (WMF) (talk) 09:07, 22 September 2025 (UTC)Reply
DCausse, thanks. Note that I have added new section § Interaction among search options to the page, based on my possibly faulty understanding of what you wrote above. Please correct it as needed, and please expand the section to include any other interactions among search options that users should know about. Thanks in advance, Mathglot (talk) 19:54, 22 September 2025 (UTC)Reply

How to search the fields of the File information template on Commons?

[edit]

Nearly all of the over 110 million files on Commons use this standardized template that specifies various useful metadata like date taken and file description. How to make use of this data and search it?

For example, how could one search file description for a term like "Kathmandu" (as asked about by another user) like on can search with intitle. description:"Kathmandu" does show some results but I don't know what it does and the results don't have that word in their description. I could not find info on this at mw:Help:CirrusSearch either. Info how to search specified fields of c:Template:Information should be added here.

EBernhardson (WMF) said Unfortunately, the image description is simply an argument to a template. CirrusSearch doesn't do anything at that level and can't be that specific.. I think the best workaround currently would be to use the insource search operator with the field name first so for example I searched for insource:"|source=[https://soundcloud.com to identify files for c:Category:Audio files from Soundcloud.com. I think easily searching fields of the File pages' Information template could be enabled by

  1. Developing some regex that searches for any content after e.g. |source=
  2. Creating some alias for it so instead of writing some complex regex query every time one can simply enter e.g. info-source:"soundcloud.com"

Please comment what you think about this proposed way to make this possible and if you have any info on what would be needed for that. Would be great if somebody could develop such (a) regex(es) if there is no better way to search specific fields of the Information template. It's great that files have that structured metadata but it could be much more useful if it was searchable.

Previously asked here. Maybe c:Module:Information could be used for this somehow. Prototyperspective (talk) 16:39, 5 March 2025 (UTC)Reply

Regex search speed

[edit]

In my experience bare regex searches seem to work even without any other terms. For example https://syl.wikipedia.org/w/index.php?go=Go&search=insource%3A%2F%5C%7B%5C%7BINTERWIKI%2F&title=%EA%A0%9B%EA%A0%A4%EA%A0%A1%EA%A0%A6%EA%A0%A1%3ASearch&ns0=1 completes quickly and returns 43 results. Why is that, and can the warning be removed? * Pppery * it has begun 14:24, 1 April 2025 (UTC)Reply

The regex search is from my understanding still searching the whole database unless the search area is narrowed with an index based parameter or filter. The database for syl-wiki will just be not very large so that the regex search is faster than the timeout. However, the help page is for every possible Mediawiki installation, so any potential caveat has to be addressed. — Speravir (talk) – 23:47, 11 April 2025 (UTC)Reply
Postscript, see phab:T411112 which corrected this longstanding misconception. * Pppery * it has begun 16:35, 4 December 2025 (UTC)Reply

not displayable chars U+10FFFF

[edit]

In chapter "Substitutions for some metacharacters" columns 'CirrusSearch' and text , all chars explained as "􏿿" is U+10FFFF" are displayed by a default pavement char as for not existing chars.Adapt or useless ? Thanks. -- Christian 🇫🇷 FR 🚨 (talk) 08:04, 7 April 2025 (UTC)Reply

You may see a “default pavement char”, I see a glyph which has the unicode number (very small) imprinted. In general, you get a glyph of a font selected by your browser, if there is one; otherwise the behaviuor apparently depends on the browser. At this very place it means, there is actually the character U+10FFFF visible and you can execute a copy and paste action. — Speravir (talk) – 00:05, 12 April 2025 (UTC)Reply

Explicit sort by pagename

[edit]

I would have thought that ordering search results by PAGENAME would have been one of the most typical sort options, but I don't see it listed. I tried various searches to come out alphabetical by PAGENAME but was not able to. Surely pagename must be an index item, and this couldn't be that hard; I would have thought it would be one of the top choices, even the default sort option (for non-huge result sets, anyway). Any chance it could be added? Mathglot (talk) 21:29, 12 November 2025 (UTC)Reply

At scale, nothing is ever simple. However there has been some recent movement for this after a few blockers were unblocked and it seems they are about to evaluate if it would be possible to create this option. Status is tracked at phab:T40403. —TheDJ (Not WMF) (talkcontribs) 22:47, 12 November 2025 (UTC)Reply
At the moment, it works with a user script, see User:PerfektesChaos/js/resultListSort or, if you understand German, de:Benutzer:PerfektesChaos/js/resultListSort. — Speravir (talk) – 01:47, 21 November 2025 (UTC)Reply

 You are invited to join the discussion at w:Wikipedia:Village pump (technical)#Early Explorations Into Semantic Search: Phase 0. Sdkb-WMFtalk 19:51, 11 January 2026 (UTC)Reply