Topic on Manual talk:Pywikibot

Quotes, search parameter

3
Speravir (talkcontribs)

I do not know whether it is just me or there is an actual issue or more, but according to commands.log every (supposed) parameter is surrounded by double quotes, or in other words: They are added before and after every space character, even when using single quotes. This reliably causes the program execution to fail at least for users of Windows commandline (cmd.exe) wherever there are spaces in the parameter values. I noticed this with the -search param where it gets even worse because double quotes are an essential part of CirrusSearch syntax, confer Help:CirrusSearch#Words, phrases, and modifiers. I had a hard time to figure out the probably right syntax for Windows, but there is still some confusing difference compared with direct search.

Let me show you an example with some search queries in Commons: With file: example image I get more than 1 million results, but file: "example image" with double quotes leads to only around 500. Additionally, using the filter intitle it will be narrowed down to 40: file: intitle:"example image".
Now with pwb listpages -family:commons -lang:commons -format:{page.loc_title} -ns:File -search:… (the program call can be shortened this way in Windows, I will from now on leave out everything but the search param):

  • With "-search:'example image'" I get an unaltered entry in commands.log, and not surprisingly this leads to a message WARNING: API warning (result): This result was truncated because it would otherwise be larger than the limit of 12,582,912 bytes. The program paused quite a period and I canceled the execution, so I did not get any search result output.
  • "-search:'""example image""'" leads to a good-looking log entry "-search:'"example image"'". But I still get the warning, on the other hand there are not toooo many lines put out. A comparison of some results suggested valid results, but the program tells me, it would have found about 600 pages (almost 100 more than with the search in Commons). Where comes the difference from?
  • Now for the (most) confusing part: Adding intitle: leads to 0 (in words: zero) results with Pywikibot! And this output comes very fast. I’d expect the input of "-search:'intitle:""example image""'", logged as "-search:'intitle:"example image"'", should get me 40 results, though.

So, long story short, depending of whether there is an issue with the Pywikibot I’d at least suggest to better document, how to use the quotes. It could be done in one place and then linked to it from all params where quotes are possible. What I think of:

  • Write “use single quotes” instead of just “quotes” (sometimes already used) and add a section especially for users of Windows command line that the whole parameter with its values has to surrounded with a pair of double quotes while doubling every quote that should be preserved. If the double quotes around every param are also added in unixoid systems then perhaps there should be an own section for the search param, as well, but this had to be tested by someone using such an OS.
  • This could be done with “expected output”, “necessary input” or so: “expected output: -param:'foo bar', necessary input: "-param:'foo bar'"”, and for the search param the first two or all three examples from above.
  • Something you do see above only implicitly: It should be pointed out to use the dedicated namespace parameter instead of the CirrusSearch equivalent (confer Help:CirrusSearch#Prefix and namespace). The search query "-search:'"example image" prefix:file:'" leads to 0 results while in Commons this query leads to the same results like above with prepended file:, check "example image" prefix:file:. For a pwb search with this prepended namespace filter ("-search:'file: example'") I get this error message (pwb 6.0.1 from 2021-03-26):
Traceback (most recent call last):
  File "C:\Programs\Netzwerk\Mediawiki-Tools\pywikibot\pwb.py", line 363, in <module>
    if not main():
  File "C:\Programs\Netzwerk\Mediawiki-Tools\pywikibot\pwb.py", line 355, in main
    run_python_file(filename,
  File "C:\Programs\Netzwerk\Mediawiki-Tools\pywikibot\pwb.py", line 74, in run_python_file
    exec(compile(source, filename, 'exec', dont_inherit=True),
  File ".\scripts\listpages.py", line 282, in <module>
    main()
  File ".\scripts\listpages.py", line 257, in main
    output_list += [page_fmt.output(num=i, fmt=fmt)]
  File ".\scripts\listpages.py", line 165, in output
    return fmt.format(num=num, page=self)
TypeError: unsupported format string passed to Formatter.__format__
CRITICAL: Exiting due to uncaught exception <class 'TypeError'>
Xqt (talkcontribs)
  • Quotes for command line attributes are only necessary if you have spaces in your parameter e.g. -page:"Albert Einstein". To avoid this you can use an underscore instead: -page:Albert_Einstein.
  • Quotes can be escaped, this is important for -search command which explicitly uses quotes and spaces: -search:"\"example Image\" prefix:file:" . The corresponding parameter value is '"example Image" prefix:file:' and the API search string is "example+Image"+prefix:file: as expected
  • The trunctation looks like there are too many files (> ~3000) found and the API reduces the loads the them for a single step. You may use -step parameter to avoid this; seems -step:100 looks good for it.
  • For the traceback: seems you fromatstring looks wrong, please check it. It can also be found in the log file.
Speravir (talkcontribs)

Prescript, but written last: Sorry for most of the noise …

  • “Quotes for command line attributes are only necessary if …” I know, and potential issues with this and amplier documentation are my points. The usage of _ is a good example. It is very good for page param, but not for search, confer file: ghostscript_image with file: "ghostscript image". Side note: The first version only works because of the greyspace concept in CirrusSearch. (As another lucky side note, in this case there are the expected two results with intitle:ghostscript_image.)
    Also, I cannot replace the space char with the underscore in search if I want to use more than one filter: incategory:ghostscript -intitle:ghostscript versus incategory:ghostscript_-intitle:ghostscript. Also inside of the limited search regex I cannot replace the space char (\s is not supported), and the double quotes have a special meaning there. Pywikibot’s grep param is according to docs only applicable to page titles, not the wiki source – or Do I misunderstand it? Because I just now notice there is also a titleregex parameter.
  • “Quotes can be escaped, this is important for -search command”. Yes, but the escaping with backslash usually does not work in Windows command line. Nethertheless I had been testing this before, and it did not work. But now it does! The trick is apparently really to always surround the whole param with double quotes in Windows command line. I must have missed this specific variant in my tests (I cannot exactly remember). Should be documented!
    (Interesting: "-search:'\"example image\"'" and "-search:\"example image\"" differ by the about 100 results in File namespace as told above.)
  • Truncation message: “You may use -step parameter to avoid this“ – Ah good. Alas, not documented for listpages.py (neither as script specific nor as one of the embedded filter, generator and global params) and therefore I did not know that this exists. But now that you point me to it I understand, it’s a usage of ‑<config var>:n … pause … I’m now trying …
    No, still a message for pwb listpages -family:commons -lang:commons -format:{page.loc_title} -step:100 "-search:'file: ""example image""'" (maybe I need even smaller steps), and with almost 1000 findings even more search results (cf. my first posting), because I get results from other namespaces. The variant with prefix still does not want to work (0 results). Hence again: It should strongly be suggested to use the dedicated namespace parameter.
  • For the traceback you are right: Now on a second try it did run without error. I had reused an earlier search, but introduced a spelling mistake. :-/
Reply to "Quotes, search parameter"