Help talk:CirrusSearch
Add topicThis talk page is about the usage of CirrusSearch. See installation, or development, or bugs, or Wikipedia for related discussions. |
Does boost force sort=relevance?
[edit]If I am using a defined sort order, such as, say, &sort=last_edit_desc, what happens if I choose a bounded or boosted search on, say, 100km,San Francisco ? Does it turn off my sort selection, and switch it to relevance instead? If not, what is the meaning of a search which contains both of those (or any sort option other than relevance) ? (subscribed) Mathglot (talk) 05:47, 19 December 2024 (UTC)
- The sort param will take precedence over what's inside the query.
- All the search keywords that do impact relevance only like:
- prefer-recent
- boost-template
- boost-neartitle
- morelike
- have no impact when sort is not relevance. To be precise morelike may still have an impact but it should be very marginal and there's not much value in using it when sort is not relevance.
- Other keywords (including bounded geo searches like neartitle/nearcoord) will have an impact on the set of documents returned regardless of the sort parameter.
- We could possibly emit a warning when such keywords are used in conjunction with a sort param that is not relevance. DCausse (WMF) (talk) 09:07, 22 September 2025 (UTC)
- DCausse, thanks. Note that I have added new section § Interaction among search options to the page, based on my possibly faulty understanding of what you wrote above. Please correct it as needed, and please expand the section to include any other interactions among search options that users should know about. Thanks in advance, Mathglot (talk) 19:54, 22 September 2025 (UTC)
How to search the fields of the File information template on Commons?
[edit]Nearly all of the over 110 million files on Commons use this standardized template that specifies various useful metadata like date taken and file description. How to make use of this data and search it?
For example, how could one search file description for a term like "Kathmandu" (as asked about by another user) like on can search with intitle. description:"Kathmandu" does show some results but I don't know what it does and the results don't have that word in their description. I could not find info on this at mw:Help:CirrusSearch either. Info how to search specified fields of c:Template:Information should be added here.
- One could also use this to infer categories (such as by reading the date field and then adding it to a category by date like "Videos of {year}") as proposed here.
- For example, I found that many files in
deepcategory:"NASA videos from unidentified year" deepcategory:"Videos of 2020"have been miscategorized into Videos of 2020 (and thus should not be copied into "NASA videos in 2020" from there) where they have the correct date in the date field which is why I'd like to use that to correct that as well as copy them to their year category in c:Category:Videos from NASA by year. - This may also be needed for a date range filter, see phab:T329961. I'd like to search the date field but there is no information on how to do that at Help:Searching but I think it's already possible if I remember correctly.
- Another example: one could set subcats of c:Category:Media from scholarly journals depending on the link in the source field. For example, files with an URL starting with
https://www.nature.com/or a DOI that resolves to one should be in a respective subcat of c:Category:Media from Nature Publishing Group journals. - One could also search the source field to put files into cats like c:Category:Audio files from Soundcloud.com and so on.
- Also how can one search for files from a specific uploader? (I'd like to check which of my video2commons uploads were imported below resolution at source.)
EBernhardson (WMF) said Unfortunately, the image description is simply an argument to a template. CirrusSearch doesn't do anything at that level and can't be that specific..
I think the best workaround currently would be to use the insource search operator with the field name first so for example I searched for insource:"|source=[https://soundcloud.com to identify files for c:Category:Audio files from Soundcloud.com. I think easily searching fields of the File pages' Information template could be enabled by
- Developing some regex that searches for any content after e.g.
|source= - Creating some alias for it so instead of writing some complex regex query every time one can simply enter e.g.
info-source:"soundcloud.com"
Please comment what you think about this proposed way to make this possible and if you have any info on what would be needed for that. Would be great if somebody could develop such (a) regex(es) if there is no better way to search specific fields of the Information template. It's great that files have that structured metadata but it could be much more useful if it was searchable.
Previously asked here. Maybe c:Module:Information could be used for this somehow. Prototyperspective (talk) 16:39, 5 March 2025 (UTC)
Regex search speed
[edit]In my experience bare regex searches seem to work even without any other terms. For example https://syl.wikipedia.org/w/index.php?go=Go&search=insource%3A%2F%5C%7B%5C%7BINTERWIKI%2F&title=%EA%A0%9B%EA%A0%A4%EA%A0%A1%EA%A0%A6%EA%A0%A1%3ASearch&ns0=1 completes quickly and returns 43 results. Why is that, and can the warning be removed? * Pppery * it has begun 14:24, 1 April 2025 (UTC)
- The regex search is from my understanding still searching the whole database unless the search area is narrowed with an index based parameter or filter. The database for syl-wiki will just be not very large so that the regex search is faster than the timeout. However, the help page is for every possible Mediawiki installation, so any potential caveat has to be addressed. — Speravir (talk) – 23:47, 11 April 2025 (UTC)
- Postscript, see phab:T411112 which corrected this longstanding misconception. * Pppery * it has begun 16:35, 4 December 2025 (UTC)
not displayable chars U+10FFFF
[edit]In chapter "Substitutions for some metacharacters" columns 'CirrusSearch' and text , all chars explained as "" is U+10FFFF" are displayed by a default pavement char as for not existing chars.Adapt or useless ? Thanks. -- Christian 🇫🇷 FR 🚨 (talk) 08:04, 7 April 2025 (UTC)
- You may see a “default pavement char”, I see a glyph which has the unicode number (very small) imprinted. In general, you get a glyph of a font selected by your browser, if there is one; otherwise the behaviuor apparently depends on the browser. At this very place it means, there is actually the character U+10FFFF visible and you can execute a copy and paste action. — Speravir (talk) – 00:05, 12 April 2025 (UTC)
Explicit sort by pagename
[edit]I would have thought that ordering search results by PAGENAME would have been one of the most typical sort options, but I don't see it listed. I tried various searches to come out alphabetical by PAGENAME but was not able to. Surely pagename must be an index item, and this couldn't be that hard; I would have thought it would be one of the top choices, even the default sort option (for non-huge result sets, anyway). Any chance it could be added? Mathglot (talk) 21:29, 12 November 2025 (UTC)
- At scale, nothing is ever simple. However there has been some recent movement for this after a few blockers were unblocked and it seems they are about to evaluate if it would be possible to create this option. Status is tracked at phab:T40403. —TheDJ (Not WMF) (talk • contribs) 22:47, 12 November 2025 (UTC)
- At the moment, it works with a user script, see User:PerfektesChaos/js/resultListSort or, if you understand German, de:Benutzer:PerfektesChaos/js/resultListSort. — Speravir (talk) – 01:47, 21 November 2025 (UTC)
Discussion at w:Wikipedia:Village pump (technical)#Early Explorations Into Semantic Search: Phase 0
[edit]
You are invited to join the discussion at w:Wikipedia:Village pump (technical)#Early Explorations Into Semantic Search: Phase 0. Sdkb-WMF talk 19:51, 11 January 2026 (UTC)