Topic on Help talk:CirrusSearch

Jonteemil (talkcontribs)

Hello!

Is there a feature which you can use to search for the beginnings of pages? For example if you want to find every page on Commons that begins with {{Information but exclude every page that begins with something else and has {{Information in the second row?

DCausse (WMF) (talkcontribs)

Hi,

Cirrus does not allow searching for anchors (start or end of document) but I believe you can search for what you want by combining two regular expressions:

insource:/\{\{Information/ -insource:/.\{\{Information/


The first insource:/\{\{Information/ will search for all pages containing the wikitext {{Information. The second -insource:/.\{\{Information/ will exclude all pages that contain a character followed by {{Information (these are all the pages where the Information template is not used at the beginning of the wikitext).

Note that this regular expression is a bit slow to process as it has to scan a lot pages so you may end up only seeing partial results.

Jonteemil (talkcontribs)

I see, thanks! Why doesn't cirrus allow searching for anchors?

DCausse (WMF) (talkcontribs)

Simply because the underlying regular expression engine that we use does not support such feature :)

Jonteemil (talkcontribs)

Just to be sure. Will "beginswith:" and your insource regex have the exact 100% same result, however with different methods? "Beginswith:" is what I call the non-existant feature that would serve my need.

Jonteemil (talkcontribs)
DCausse (WMF) (talkcontribs)

@Jonteemil: no, the solution I provided only works if the characters you search for are only used at the beginning of the wikitext content not repeated elsewhere.


Assuming that we want to search for "xyz" only appearing at the beginning the wikitext insource:/xyz/ -insource/.xyz/ will discard valid results where "xyz" appears at the beginning but also somewhere else in the text.

In other words the query I provided is only accurate to 100% for the pages that include the Information template only once.


Allowing to anchor the search string to the start or the end of the string has been somehow brought up in https://meta.wikimedia.org/wiki/Community_Wishlist_Survey_2019/Search#Search_by_suffix

Instead of adding a new keyword I think it would make more sense to add support for ^ and $ to the insource:// and intitle:// keywords rather than adding a new keyword.

Jonteemil (talkcontribs)

Okay, thanks!

Jonteemil (talkcontribs)

Aha, thanks for the knowledge!

Speravir (talkcontribs)

In addition to @DCausse (WMF): Citing the help “when possible, please avoid running a bare regexp search”. But you also have to care about the different possible cases. Note that all this allowed: {{Information, {{information, {{ Information, {{ information, in fact an almost endless number of spaces between the opening braces and the template name.

Even I narrowed down the search amount I got a warning with this query because of the heavy template use: file: hastemplate:information insource:"information" insource:/\{\{ *[Ii]nformation/

And, of course, for this I was warned, too: file: hastemplate:information insource:"information" insource:/\{\{ *[Ii]nformation/ -insource:/.\{\{ *[Ii]nformation/

Jonteemil, this is the wrong place here (it should be discussed at Commons’ Village pump, I guess), but why do you want to know this? Do you want to add == {{int:filedesc}} ==? If so: This is not mandatory!

Jonteemil (talkcontribs)

To add == {{int:filedesc}} == was indeed my intention. Eventhough it might not be mandotory I think the goal should be that all files should have it, but as you say this is not mediawiki matter, rather Commons. I asked the question here since the question itself could be of use for every Wikimedia project. Even if I intended to use the answer on Commons.

Reply to "beginswith:?"