Manual talk:Pywikibot/2021
Add topicThis page used the Structured Discussions extension to give structured discussions. It has since been converted to wikitext, so the content and history here are only an approximation of what was actually displayed at the time these comments were made. |
![]() Archives
|
---|
Please use one of the communication channels listed on Manual:Pywikibot/Communication rather than using this discussion board. There is very little traffic here, so it may take a while before you get a response.
interwiki.py and interwiki_graph
[edit]RESOLVED | |
See https://phabricator.wikimedia.org/T278675 |
The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.
Hi.
I want edit interlink on my wiki. Command for this:
python pwb.py interwiki -start:! -back
I am use: python-3.9.2, pywikibot - 6.0, windows 8.1.
When I try to run the script, I get an error:
What can be done?
E:\install\wiki\pywikibot\core_stable-6.0>python pwb.py interwiki -start:! -back
Traceback (most recent call last):
File "E:\install\wiki\pywikibot\core_stable-6.0\pwb.py", line 363, in <module>
if not main():
File "E:\install\wiki\pywikibot\core_stable-6.0\pwb.py", line 355, in main
run_python_file(filename,
File "E:\install\wiki\pywikibot\core_stable-6.0\pwb.py", line 74, in run_python_file
exec(compile(source, filename, 'exec', dont_inherit=True),
File ".\scripts\archive\interwiki.py", line 348, in <module>
from pywikibot import config, i18n, pagegenerators, textlib, interwiki_graph
ImportError: cannot import name 'interwiki_graph' from 'pywikibot' (E:\install\wiki\pywikibot\core_stable-6.0\
pywikibot\__init__.py)
CRITICAL: Exiting due to uncaught exception <class 'ImportError'>
95.30.62.231 (talk) 13:58, 27 March 2021 (UTC)
- bug going in job https://phabricator.wikimedia.org/T278675 Vzhik227 (talk) 17:52, 29 March 2021 (UTC)
- interwiki.py was archived due to https://phabricator.wikimedia.org/T223826. It will be restored. @xqt 08:30, 30 March 2021 (UTC)
Quotes, search parameter
[edit]I do not know whether it is just me or there is an actual issue or more, but according to commands.log
every (supposed) parameter is surrounded by double quotes, or in other words: They are added before and after every space character, even when using single quotes. This reliably causes the program execution to fail at least for users of Windows commandline (cmd.exe
) wherever there are spaces in the parameter values. I noticed this with the -search
param where it gets even worse because double quotes are an essential part of CirrusSearch syntax, confer Help:CirrusSearch#Words, phrases, and modifiers. I had a hard time to figure out the probably right syntax for Windows, but there is still some confusing difference compared with direct search.
Let me show you an example with some search queries in Commons: With file: example image
I get more than 1 million results, but file: "example image"
with double quotes leads to only around 500. Additionally, using the filter intitle
it will be narrowed down to 40: file: intitle:"example image"
.
Now with pwb listpages -family:commons -lang:commons -format:{page.loc_title} -ns:File -search:…
(the program call can be shortened this way in Windows, I will from now on leave out everything but the search param):
- With
"-search:'example image'"
I get an unaltered entry incommands.log
, and not surprisingly this leads to a messageWARNING: API warning (result): This result was truncated because it would otherwise be larger than the limit of 12,582,912 bytes.
The program paused quite a period and I canceled the execution, so I did not get any search result output. "-search:'""example image""'"
leads to a good-looking log entry"-search:'"example image"'"
. But I still get the warning, on the other hand there are not toooo many lines put out. A comparison of some results suggested valid results, but the program tells me, it would have found about 600 pages (almost 100 more than with the search in Commons). Where comes the difference from?- Now for the (most) confusing part: Adding
intitle:
leads to 0 (in words: zero) results with Pywikibot! And this output comes very fast. I’d expect the input of"-search:'intitle:""example image""'"
, logged as"-search:'intitle:"example image"'"
, should get me 40 results, though.
So, long story short, depending of whether there is an issue with the Pywikibot I’d at least suggest to better document, how to use the quotes. It could be done in one place and then linked to it from all params where quotes are possible. What I think of:
- Write “use single quotes” instead of just “quotes” (sometimes already used) and add a section especially for users of Windows command line that the whole parameter with its values has to surrounded with a pair of double quotes while doubling every quote that should be preserved. If the double quotes around every param are also added in unixoid systems then perhaps there should be an own section for the search param, as well, but this had to be tested by someone using such an OS.
- This could be done with “expected output”, “necessary input” or so: “expected output:
-param:'foo bar'
, necessary input:"-param:'foo bar'"
”, and for the search param the first two or all three examples from above. - Something you do see above only implicitly: It should be pointed out to use the dedicated namespace parameter instead of the CirrusSearch equivalent (confer Help:CirrusSearch#Prefix and namespace). The search query
"-search:'"example image" prefix:file:'"
leads to 0 results while in Commons this query leads to the same results like above with prependedfile:
, check"example image" prefix:file:
. For a pwb search with this prepended namespace filter ("-search:'file: example'"
) I get this error message (pwb 6.0.1 from 2021-03-26):
Traceback (most recent call last): File "C:\Programs\Netzwerk\Mediawiki-Tools\pywikibot\pwb.py", line 363, in <module> if not main(): File "C:\Programs\Netzwerk\Mediawiki-Tools\pywikibot\pwb.py", line 355, in main run_python_file(filename, File "C:\Programs\Netzwerk\Mediawiki-Tools\pywikibot\pwb.py", line 74, in run_python_file exec(compile(source, filename, 'exec', dont_inherit=True), File ".\scripts\listpages.py", line 282, in <module> main() File ".\scripts\listpages.py", line 257, in main output_list += [page_fmt.output(num=i, fmt=fmt)] File ".\scripts\listpages.py", line 165, in output return fmt.format(num=num, page=self) TypeError: unsupported format string passed to Formatter.__format__ CRITICAL: Exiting due to uncaught exception <class 'TypeError'>
Speravir (talk) 23:04, 29 March 2021 (UTC)
- Quotes for command line attributes are only necessary if you have spaces in your parameter e.g.
-page:"Albert Einstein"
. To avoid this you can use an underscore instead:-page:Albert_Einstein
. - Quotes can be escaped, this is important for -search command which explicitly uses quotes and spaces:
-search:"\"example Image\" prefix:file:"
. The corresponding parameter value is '"example Image" prefix:file:'
and the API search string is"example+Image"+prefix:file:
as expected - The trunctation looks like there are too many files (> ~3000) found and the API reduces the loads the them for a single step. You may use
-step
parameter to avoid this; seems-step:100
looks good for it. - For the traceback: seems you fromatstring looks wrong, please check it. It can also be found in the log file. @xqt 16:21, 1 April 2021 (UTC)
- Quotes for command line attributes are only necessary if you have spaces in your parameter e.g.
- Prescript, but written last: Sorry for most of the noise …
- “Quotes for command line attributes are only necessary if …” I know, and potential issues with this and amplier documentation are my points. The usage of _ is a good example. It is very good for page param, but not for search, confer
file: ghostscript_image
withfile: "ghostscript image"
. Side note: The first version only works because of the greyspace concept in CirrusSearch. (As another lucky side note, in this case there are the expected two results withintitle:ghostscript_image
.)
Also, I cannot replace the space char with the underscore in search if I want to use more than one filter:incategory:ghostscript -intitle:ghostscript
versusincategory:ghostscript_-intitle:ghostscript
. Also inside of the limited search regex I cannot replace the space char (\s
is not supported), and the double quotes have a special meaning there. Pywikibot’s grep param is according to docs only applicable to page titles, not the wiki source – or Do I misunderstand it? Because I just now notice there is also a titleregex parameter. - “Quotes can be escaped, this is important for -search command”. Yes, but the escaping with backslash usually does not work in Windows command line. Nethertheless I had been testing this before, and it did not work. But now it does! The trick is apparently really to always surround the whole param with double quotes in Windows command line. I must have missed this specific variant in my tests (I cannot exactly remember). Should be documented!
(Interesting:"-search:'\"example image\"'"
and"-search:\"example image\""
differ by the about 100 results in File namespace as told above.) - Truncation message: “You may use
-step
parameter to avoid this“ – Ah good.Alas, not documented for listpages.py (neither as script specific nor as one of the embedded filter, generator and global params) and thereforeI did not know that this exists. But now that you point me to it I understand, it’s a usage of‑<config var>:n
… pause … I’m now trying …
No, still a message forpwb listpages -family:commons -lang:commons -format:{page.loc_title} -step:100 "-search:'file: ""example image""'"
(maybe I need even smaller steps), and with almost 1000 findings even more search results (cf. my first posting), because I get results from other namespaces. The variant with prefix still does not want to work (0 results). Hence again: It should strongly be suggested to use the dedicated namespace parameter. - For the traceback you are right: Now on a second try it did run without error. I had reused an earlier search, but introduced a spelling mistake. :-/ Speravir (talk) 02:26, 4 April 2021 (UTC)
- “Quotes for command line attributes are only necessary if …” I know, and potential issues with this and amplier documentation are my points. The usage of _ is a good example. It is very good for page param, but not for search, confer
certificate verify failed: unable to get local issuer certificate
[edit]The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.
HELP
I am trying to do something on the Test Wikipedia
ERROR: Traceback (most recent call last):
File "C:\code\pywikibot\core\pywikibot\data\api.py", line 1540, in _http_request
response = http.request(self.site, uri=uri,
File "C:\code\pywikibot\core\pywikibot\tools\__init__.py", line 1475, in wrapper
return obj(*__args, **__kw)
File "C:\code\pywikibot\core\pywikibot\comms\http.py", line 251, in request
r = fetch(baseuri, headers=headers, **kwargs)
File "C:\code\pywikibot\core\pywikibot\tools\__init__.py", line 1475, in wrapper
return obj(*__args, **__kw)
File "C:\code\pywikibot\core\pywikibot\comms\http.py", line 414, in fetch
callback(response)
File "C:\code\pywikibot\core\pywikibot\comms\http.py", line 290, in error_handling_callback
raise FatalServerError(str(response))
pywikibot.exceptions.FatalServerError: HTTPSConnectionPool(host='test.wikipedia.org', port=443): Max retries exceeded with url: /w/api.php (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1123)')))
Traceback (most recent call last):
File "C:\code\pywikibot\core\pwb.py", line 363, in <module>
if not main():
File "C:\code\pywikibot\core\pwb.py", line 355, in main
run_python_file(filename,
File "C:\code\pywikibot\core\pwb.py", line 74, in run_python_file
exec(compile(source, filename, 'exec', dont_inherit=True),
File ".\scripts\userscripts\code.py", line 30, in <module>
page.text=""
File "C:\code\pywikibot\core\pywikibot\page\__init__.py", line 572, in text
self.botMayEdit() # T262136, T267770
File "C:\code\pywikibot\core\pywikibot\page\__init__.py", line 1024, in botMayEdit
templates = self.templatesWithParams()
File "C:\code\pywikibot\core\pywikibot\page\__init__.py", line 2076, in templatesWithParams
titles = {t.title() for t in self.templates()}
File "C:\code\pywikibot\core\pywikibot\page\__init__.py", line 1451, in templates
self._templates = list(self.itertemplates(content=content))
File "C:\code\pywikibot\core\pywikibot\data\api.py", line 2631, in __iter__
self.data = self.request.submit()
File "C:\code\pywikibot\core\pywikibot\data\api.py", line 1811, in submit
response, use_get = self._http_request(use_get, uri, body, headers,
File "C:\code\pywikibot\core\pywikibot\data\api.py", line 1540, in _http_request
response = http.request(self.site, uri=uri,
File "C:\code\pywikibot\core\pywikibot\tools\__init__.py", line 1475, in wrapper
return obj(*__args, **__kw)
File "C:\code\pywikibot\core\pywikibot\comms\http.py", line 251, in request
r = fetch(baseuri, headers=headers, **kwargs)
File "C:\code\pywikibot\core\pywikibot\tools\__init__.py", line 1475, in wrapper
return obj(*__args, **__kw)
File "C:\code\pywikibot\core\pywikibot\comms\http.py", line 414, in fetch
callback(response)
File "C:\code\pywikibot\core\pywikibot\comms\http.py", line 290, in error_handling_callback
raise FatalServerError(str(response))
pywikibot.exceptions.FatalServerError: HTTPSConnectionPool(host='test.wikipedia.org', port=443): Max retries exceeded with url: /w/api.php (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1123)')))
CRITICAL: Exiting due to uncaught exception <class 'pywikibot.exceptions.FatalServerError'>
Vanished user 098323 (talk) 03:07, 7 April 2021 (UTC)
- Also, on the wikipedia_family.py file I tried adding this:
def ignore_certificate_error(self, code):
return True
- It didn't work. Vanished user 098323 (talk) 03:09, 7 April 2021 (UTC)
- I got it now. Here it is:
site_verify=site.verify_SSL_certificate()
site_verify=False
Vanished user 098323 (talk) 22:26, 7 April 2021 (UTC)
Cannot edit protected namespaces
[edit]The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.
I'm running 1.35 and use $wgNamespaceProtection to restrict edit access to several namespaces, including the mainspace. ($wgNamespaceProtection[NS_MAIN] = ['main-edit'])
When I try to make an edit with pywikibot (add_text for example), the bot logs in and reads the page properly, but I'm told that the bot lacks permissions to edit in the "Page" namespace". I've made sure that both the bot and the associated user have the ['main-edit'] = true permission.
When I remove that configuration, the bot edits just fine. It edits other namespaces that don't have the special permission structure.
Is there anything I can do to get around this, other than constantly protecting/unprotecting the namespace. Backlitt (talk) 17:07, 22 June 2021 (UTC)
- How is your bot authenticating? If you're using BotPasswords or OAuth, you need to make sure the "main-edit" permission is in an appropriate grant (see Manual:$wgGrantPermissions) so the bot session actually has that permission. Legoktm (talk) 17:59, 22 June 2021 (UTC)
Creating new pages using add_text.py
[edit]The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.
- I wanted to create a bunch of pages using add_text.py, but I get the error message
WARNING: Page [[de:xyz]] does not exist on wikipedia.de.
- I there any possibility to suppress this error message and to create the pages anyway? Leyo 21:16, 11 September 2021 (UTC)
- Schau Dir das mal an:
- https://gerrit.wikimedia.org/r/c/pywikibot/core/+/721297 @xqt 12:35, 15 September 2021 (UTC)
- Thank you! I am looking forward to seeing this going live. Leyo 22:53, 16 September 2021 (UTC)
- Currently not because AddTextBot is derived from ExistingPageBot which skips existing pages. Maybe a
-create
option should be implemented? @xqt 12:00, 15 September 2021 (UTC)
How to do a string replacement?
[edit]RESOLVED | |
Solution: The text must match exactly, including whitespace etc. |
The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.
I would like to replace {{MyFirstTemplate|field1=MyText|field2=MyText}} with {{MySecondTemplate|field3=MyNewText}}. According to the docu this should be the line:
python3 pwb.py replace -page:My_Page "{{MyFirstTemplate|field1=MyText|field2=MyText}}" "{{MySecondTemplate|field3=MyNewText}}" -summary:Awesome_replacement
However, nothing is replaced. The script tells that nothing to replace was found. I am a bit fuzzy what to enter. Any help appreciated. Marbot (talk) 06:28, 27 September 2021 (UTC)
- I guess it has somehting to do with the pipes but adding a \ in front of them does not help either. Marbot (talk) 06:29, 27 September 2021 (UTC)
- Check for spelling mistakes and that you are connecting to the correct wiki. Note that the text must match exactly, including whitespace etc. Otherwise you will need to make a regex (and use
-regex
). The regex will then need the slashes to escape the pipes and other characters. Matěj Suchánek (talk) 10:30, 27 September 2021 (UTC)- Thanks for your response. This contained the decisive hint: Check the whitespace! I had {{MyFirstTemplate|field1=My Text|field2=My Text}} but wanted to replace {{MyFirstTemplate|field1=My_Text|field2=My_Text}}. The other paramteres understandably require _ though this does not which is understandable, too. Thanks a lot for helping me out of my misery! Much apprechiated. Marbot (talk) 14:35, 27 September 2021 (UTC)
- Works for me as expected:
C:\pwb\GIT\core>pwb replace -page:MyPage "{{MyFirstTemplate|field1=MyText|field2=MyText}}" "{{MySecondTemplate|field3=MyNewText}}" -site:wikipedia:test -simulate The summary message for the command line replacements will be something like: Bot: Automated text replacement (-{{MyFirstTemplate|field1=MyText|field2=MyText}} +{{MySecondTemplate|field3=MyNewText}}) Press Enter to use this automatic message, or enter a description of the changes your bot will make: Retrieving 1 pages from wikipedia:test. >>> MyPage <<< @@ -6 +6 @@ - {{MyFirstTemplate|field1=MyText|field2=MyText}} + {{MySecondTemplate|field3=MyNewText}} Do you want to accept these changes? ([y]es, [N]o, [e]dit original, edit [l]atest, open in [b]rowser, [m]ore context, [a]ll, [q]uit):
- Note: There is no -regex option activated and the source string must match the text, see https://test.wikipedia.org/w/index.php?title=MyPage&action=edit @xqt 10:38, 27 September 2021 (UTC)
- Thank you for confirming! I know what I did wrong. See my answer to Matěj! Marbot (talk) 14:32, 27 September 2021 (UTC)
Deprecation policy
[edit]There was a discussion in April on the mailing list about having a deprecation policy based on the one for the MediaWiki interface. Did this go anywhere?
I am asking because I have been bitten by a breaking change - specifically, this removal of APISite.getuserinfo() broke my bot on Toolforge. I am not saying the removal was unreasonable (2+ month after the "deprecated" decorator was added sounds reasonable) but there is no documentation of expectations anywhere I could find.
(My bot breaking is 95% my fault, since I did not notice the breakage in timely fashion, nor did I regularly review the Pywikibot code for deprecation tags, so even a one-year deprecation period would not have helped me much, but still.) Tigraan (talk) 14:29, 3 October 2021 (UTC)
- The deprecation warning was shown every time you are running your bot. There is a warning in our changelog for the stable release [1]. With 6.4 Pywikibot follows semantic versioning. All deprecated code will be removed with version 7.0.0 (comming at the end of this month I guess) and new deprecations are kept until 8.0 is deployed (probably until end of 2022). Please also have a look at the current deprecations [3], the code changes comming with 7.0 [4] (more to come here!) and pay attention to the deprecation warnings of your scripts.
- [1] https://doc.wikimedia.org/pywikibot/master/changelog.html#id6
- [2] https://www.mediawiki.org/wiki/Manual:Pywikibot/Development/Guidelines#Deprecation_Policy
- [3] https://doc.wikimedia.org/pywikibot/stable/changelog.html#deprecations
- [4] https://doc.wikimedia.org/pywikibot/master/changelog.html#current-release-changes @xqt 13:25, 4 October 2021 (UTC)
Alternative Implementation of Pywikibot
[edit]I am interested in programming and there especially in a kind of programming that is nearly related to natural language. I am writing scripts in the programming language R with a SQL-Integration. I created a script for extracting out of a sentence with a defined structure the Variables. This can be used to write a Program in a scripting language with own defined structured sentences that is nearly related to the natural language and then convert it into the source code in a more detailed programming language. You can find the script in the following folder in my PAWS-Profile. The script from where it starts is called Structuredtexttranspiler 20211021.R .Do you think that defining sentences and mapping them to source code in Python can be useful for Pywikibot to make it for more people possible to use the framework. I dont have deeper understanding of Python but can help setting up sentences with a defined structure when you tell me how the Python equivalent looks like. Hogü-456 (talk) 20:20, 25 October 2021 (UTC)
- Currently I don't yet know what to mean by that. Perhaps there could be an operator interface which parses the more or less natural speeking "commands" to get the expected results. Or parsing such commands and translate it into SQL commands to retrieve data via SQL queries or pass them to our sparql interface. @xqt 14:02, 29 October 2021 (UTC)
- I think that the current way how to use Pywikibot requires an understanding of programming and maybe that what I linked there is an possibilty to reduce the required knowledge about programming to make it possible for more people to use programming. The idea is to make what you say in the second sentence. Creating an operator interface which parses the more or less natural speeking "commands" to get the expected results. Hogü-456 (talk) 14:12, 29 October 2021 (UTC)
- Pywikibot can be operated without any programming knowledge, that's what Pywikibot existing scripts are for. You call the script together with its parameters. But this is a pretty old approach, not that modern. R-like languages are a modern approach, for sure. But creating an R interface to Pywikibot's Page, Site, Family, Pagegenerator, Bot ... objects is quite a lot of work. Basically you would have to be pretty good at Python and have pretty good understanding of how Pywikibot works to make this happen. First I would recommend you to explore the internals of Pywikibot, get some basic idea how it works and what are its key parts, then perhaps you would find more concrete way R lang can help the project. Also I should mention there was a Pywikibot competitor written in R, you should be able to find it being mentioned on Mediawiki.org's or English Wikipedia's Bot Help pages. Dvorapa (talk) 16:40, 7 November 2021 (UTC)
Bots workshops
[edit]Hello, we often get requests from smaller wiki communities part of the Small Wiki Toolkits initiative to learn how to create, deploy, run, and manage bots. Most of the requests are from novice programmers. As we do not have enough mentors in our technical community to address these needs, I am wondering if folks here might be interested in running a few sessions/workshops and helping develop a curriculum on the topic in the coming months. Thoughts? User:SSethi_(WMF) 23:37, 29 November 2021 (UTC)
- Hello User:SSethi (WMF),
- in the process of getting into pwb, I'd do something similar of said curriculum either way. If there is still an interest in such a guide, please let me know.
- Best regards
- Tim from BorgNetzWerk TimBorgNetzWerk (talk) 16:25, 4 December 2022 (UTC)