I do not know whether it is just me or there is an actual issue or more, but according to commands.log
every (supposed) parameter is surrounded by double quotes, or in other words: They are added before and after every space character, even when using single quotes. This reliably causes the program execution to fail at least for users of Windows commandline (cmd.exe
) wherever there are spaces in the parameter values. I noticed this with the -search
param where it gets even worse because double quotes are an essential part of CirrusSearch syntax, confer Help:CirrusSearch#Words, phrases, and modifiers. I had a hard time to figure out the probably right syntax for Windows, but there is still some confusing difference compared with direct search.
Let me show you an example with some search queries in Commons: With file: example image
I get more than 1 million results, but file: "example image"
with double quotes leads to only around 500. Additionally, using the filter intitle
it will be narrowed down to 40: file: intitle:"example image"
.
Now with pwb listpages -family:commons -lang:commons -format:{page.loc_title} -ns:File -search:…
(the program call can be shortened this way in Windows, I will from now on leave out everything but the search param):
- With
"-search:'example image'"
I get an unaltered entry incommands.log
, and not surprisingly this leads to a messageWARNING: API warning (result): This result was truncated because it would otherwise be larger than the limit of 12,582,912 bytes.
The program paused quite a period and I canceled the execution, so I did not get any search result output. "-search:'""example image""'"
leads to a good-looking log entry"-search:'"example image"'"
. But I still get the warning, on the other hand there are not toooo many lines put out. A comparison of some results suggested valid results, but the program tells me, it would have found about 600 pages (almost 100 more than with the search in Commons). Where comes the difference from?- Now for the (most) confusing part: Adding
intitle:
leads to 0 (in words: zero) results with Pywikibot! And this output comes very fast. I’d expect the input of"-search:'intitle:""example image""'"
, logged as"-search:'intitle:"example image"'"
, should get me 40 results, though.
So, long story short, depending of whether there is an issue with the Pywikibot I’d at least suggest to better document, how to use the quotes. It could be done in one place and then linked to it from all params where quotes are possible. What I think of:
- Write “use single quotes” instead of just “quotes” (sometimes already used) and add a section especially for users of Windows command line that the whole parameter with its values has to surrounded with a pair of double quotes while doubling every quote that should be preserved. If the double quotes around every param are also added in unixoid systems then perhaps there should be an own section for the search param, as well, but this had to be tested by someone using such an OS.
- This could be done with “expected output”, “necessary input” or so: “expected output:
-param:'foo bar'
, necessary input:"-param:'foo bar'"
”, and for the search param the first two or all three examples from above. - Something you do see above only implicitly: It should be pointed out to use the dedicated namespace parameter instead of the CirrusSearch equivalent (confer Help:CirrusSearch#Prefix and namespace). The search query
"-search:'"example image" prefix:file:'"
leads to 0 results while in Commons this query leads to the same results like above with prependedfile:
, check"example image" prefix:file:
. For a pwb search with this prepended namespace filter ("-search:'file: example'"
) I get this error message (pwb 6.0.1 from 2021-03-26):
Traceback (most recent call last): File "C:\Programs\Netzwerk\Mediawiki-Tools\pywikibot\pwb.py", line 363, in <module> if not main(): File "C:\Programs\Netzwerk\Mediawiki-Tools\pywikibot\pwb.py", line 355, in main run_python_file(filename, File "C:\Programs\Netzwerk\Mediawiki-Tools\pywikibot\pwb.py", line 74, in run_python_file exec(compile(source, filename, 'exec', dont_inherit=True), File ".\scripts\listpages.py", line 282, in <module> main() File ".\scripts\listpages.py", line 257, in main output_list += [page_fmt.output(num=i, fmt=fmt)] File ".\scripts\listpages.py", line 165, in output return fmt.format(num=num, page=self) TypeError: unsupported format string passed to Formatter.__format__ CRITICAL: Exiting due to uncaught exception <class 'TypeError'>