Manual:Pywikipediabot/wikipedia.py/ca
This page was moved from MetaWiki.
It probably requires cleanup – please feel free to help out. In addition, some links on the page may be red; respective pages might be found at Meta. Remove this template once cleanup is complete.
| Llengua: | English • català |
|---|
wikipedia.py([1]) és un mòdul habilitat amb el marc de treball del pywikipedia. Malgrat el seu nom original, no s'empra únicament per a la Viquipèdia, sinó també per tots els projectes Mediawiki. Aquest mòdul ofereix algunes classes i funcions que poden emprar-se en altres scripts. Emprant directment el wikipedia.py, es pot obtenir informaciós sobre la versió.
Classes Page, Site i la funció wikipedia.stopme() [edit]
La classe més important (i la més emprada) és la classe Page. Proveeeix totes les accions comuns per a una pàgina wiki. Per usar-la, necessiteu abans que res cridar-la, i llavors podreu emprar les seus funcions internes.
import wikipedia # Importa el mòdul wikipedia.py site = wikipedia.getSite() # Assigna el projecte per defecte page = wikipedia.Page(site, 'PAGENAME') # Crida el constructor print page wikipedia.stopme()
A l'exemple, importeu el mòdul wikipedia. Després es necessita definir el site on es troba la pàgina. Si empreu l'exemple, el projecte de la pàgina serà el que s'ha definit al vostre fitxer user-config.py.
If you want to load the page of a different project you have to do as in the following example. At the end, there is wikipedia.stopme() that is needed so that your other bot processes aren't slowed down.
import wikipedia enWikisrcSite = wikipedia.getSite('en', 'wikisource') # loading a defined project's page page = wikipedia.Page(enWikisrcSite, 'Main Page') text = page.get() # Taking the text of the page print text wikipedia.stopme()
Rewrite code better [edit]
In the previous example, you import wikipedia. The site is defined as English Wikisource, with the parameters 'en' and 'wikisource'. Then we define the Main Page as the page to take. And we load the text.
Then we print the text to the screen to let the user read it. Actually, this code is not really clean and good. An experienced coder can improve it as follows.
import wikipedia # Define the main function def main(): titleOfPageToLoad = u'Main Page' # The "u" before the title means Unicode, important for special characters wikipedia.output(u'Loading %s...' % titleOfPageToLoad) enwiktsite = wikipedia.getSite('en', 'wikisource') # loading a defined project's page page = wikipedia.Page(enwiktsite, titleOfPageToLoad) text = page.get() # Taking the text of the page wikipedia.output(text) # Print the text, encoding it with wikipedia's method if __name__ == '__main__': try: main() finally: wikipedia.stopme()
This code is better than before for many reasons. First of all, it uses a function and it's not procedural as before. In this way, the main function will only be invoked if the script is called directly. This makes it possible to import it from another script, making the code more modular and reusable.
Furthermore, the try/finally block guarantees that stopme() is always called, even when an exception is raised. This prevents the slowing down of other process and is quicker when you test a script.
Another interesting new thing is the use of wikipedia.output(). It does the same job of printing "something" but, and this is the important thing, it encodes and decodes the text using the encoding defined in user-config. This makes sure that special characters like Umlauts or Cyrillic letters are shown correctly. Note: If you don't put the text in the u"something" (with the u) the function will return a (handled, but ugly to see) error. (the text variable contain already the code encode in a good way, so it won't raise an exception).
Unfortunately, you can't always be sure that the page you are trying to load exists. To prevent that, wikipedia.py will raise a NoPage exception. You can use an except block to solve the problem, as in the following example.
site.allpages() function and the wikipedia.py errors [edit]
import wikipedia # Define the main function def main(): site = wikipedia.getSite() startpage = '!' for page in site.allpages(startpage): # Use a generetor object, this will yield all pages one by one pagename = page.title() # Take the title of the page (not "[[page]]" but "page") wikipedia.output(u"Loading %s..." % pagename) # Please, see the "u" before the text try: text = page.get() # Taking the text of the page except wikipedia.NoPage: # First except, prevent empty pages text = '' except wikipedia.IsRedirectPage: # second except, prevent redirect wikipedia.output(u'%s is a redirect!' % pagename) continue except wikipedia.Error: # third exception, take the problem and print wikipedia.output(u"Some error, skipping..") continue wikipedia.output(text) # Print the output, encoding it with wikipedia's method if __name__ == '__main__': try: main() finally: wikipedia.stopme()
Here, we can see our first generator object that is in wikipedia.py. First of all, we define the site (it's suggested to do it immediatly, because it's quite impossibile not to use a site's function if you need to work with wikipedia). The second variable is the page from which the Bot will load all the pages on your wikimedia's project.
Then, in the line below, you start a for cycle parsing every page on Wikipedia. Look out, because the function allpages() isn't a sub-function of wikipedia, but of site! (so, you can determine from what project you want to load all the pages).
The allpages() function has optional parameter, one of these are the start page (default = '!'), then there is the namespace (default = 0), then if you want to include redirects (default: True), and the throttle (default: True). In the example, if you wanted to use the optional parameters as well, you would have used:
def allpages(self, start = '!', namespace = 0, includeredirects = True, throttle = True)
In the line below, we load the page's title, in this way you don't print [[Page]] but Page, so it's a better output. Two lines below, you find the try/except block.
The Bot tries to load the page and if it doesn't exists (maybe someone has deleted it while the bot was running) it will print only a empty text (you can do what you want, I've choosen this only for explanation purpose). If the page is a redirect, it will notify it and, to make sure that nothing will happend, we put an extreme Exception that don't block the Bot (for example, if there is a Spamfilter's problem, it will be handled in this way). The possible Wikipedia's Errors are:
*wikipedia.Error General Error *wikipedia.NoUsername the Username is not in user-config.py *wikipedia.NoPage The Page does not exist *wikipedia.IsRedirectPage The Page is a redirect page *wikipedia.IsNotRedirectPage The Page is not a redirect page *wikipedia.LockedPage The Page is locked *wikipedia.LockedNoPage Page does not exist, and creating it is not possible because of a lock (cascading protection) *wikipedia.NoSuchEntity No entity exist for this character *wikipedia.SectionError The section specified by # does not exist *wikipedia.PageNotSaved Saving the page has failed (General Error) *wikipedia.EditConflict There has been an edit conflict while uploading the page *wikipedia.SpamfilterError Saving the page has failed because the MediaWiki spam filter detected a blacklisted URL. *wikipedia.ServerError Got unexpected server response *wikipedia.UserBlocked Your username or IP has been blocked *wikipedia.PageNotFound Page not found in list
wikipedia.Page(site, pagename).get() and wikipedia.Page(site, pagename).put() analysis [edit]
Now we have only done a few tests, but the real job of a Bot, on Wikipedia, is to modify pages! It's not made to print in a DOS screen what is in a page. So, let's try to make the Bot to put the text inside a page.
import wikipedia # Define the main function def main(): site = wikipedia.getSite() pagename = 'Wikipedia:Sandbox' page = wikipedia.Page(site, pagename) wikipedia.output(u"Loading %s..." % pagename) # Please, see the "u" before the text try: text = page.get(force = False, get_redirect=False, throttle = True, sysop = False, change_edit_time = True) # text = page.get() <-- is the same except wikipedia.NoPage: # First except, prevent empty pages text = '' except wikipedia.IsRedirectPage: # second except, prevent redirect wikipedia.output(u'%s is a redirect!' % pagename) exit()# wikipedia.stopme() is in the finally, we don't need to use it twice, exit() will only close the script except wikipedia.Error: # third exception, take the problem and print wikipedia.output(u"Some Error, skipping..") exit() newtext = text + '\nThis is a nice test, deleteme please' page.put(newtext, comment='Bot: Test', watchArticle = None, minorEdit = True) # page.put(newtext, 'Bot: Test') <-- is the same if __name__ == '__main__': try: main() finally: wikipedia.stopme()
In this script there are options specified that are already set by default. This is to show examples of options that can be used with this useful function. So, you can use text = page.get() to load the text and only page.put(newtext, 'Bot: Test') to put the changes, if you don't need to change the other options.
wikipedia.Page(site, pagename).get() [edit]
- force = False : if force is set as True, the Bot will ignore all the exceptions raised, included redirects, so you won't get errors (but your code can make mistakes..)
- get_redirect = False : if it is set to True, the Bot won't stop himself but it will follow the redirect (f.e. if you have [[Php]] that redirect to [[PHP]], the bot will load PHP).
- throttle = True : to slow down the Bot so as not to over-stress the Server. Within wikimedia projects, you must to set it to be True.
- sysop = False: To get the page as sysop (if you need to do some sysop's actions with this page (you can log-in as sysop if you need to edit protected pages or to do something that only sysops can do).
- nofollow_redirects=False : Like force it won't raise an exception if you load a redirect... but it will raise in all the other cases.
- change_version_date = True : set it to False if you have already loaded the page before and do not check this version for changes before saving.
wikipedia.Page(site, pagename).put() [edit]
.put(newtext, comment='Bot: Test', watchArticle = None, minorEdit = True)
- newtext is the text that you'll put on Wikipedia
- comment = None : is the comment that the Bot will use when it will make a change. (default = "Wikipedia python library")
- watchArticle = None : if you want to put the article edited to your bot's watchlist you have to change it to True
- minorEdit = True : The Bot will make a minor-edit. If it's not necessary, set it to True also if you make big-changes. Note: If you edit a user's talk page, if you want to let them know of the change that you've done, set this parameter to False otherwise set it to True.
If you want to change the default comment, you can use the function wikipedia.setAction(text) to use text instead of "Wikipedia python library" to be sure that your code won't have strange summary's messages.
wikipedia.Page(site, pagename).getReferences(), .move(), .protect(), .delete() and .undelete() [edit]
The Page class has a lot of function that let you make what you want and there are so many function that also what you think that was impossibile to do with a Bot can become reality. For example, do you know that (if you are a sysop) you can load a deleted revision, using the Bot? (and you can also undelete it!) But now it's better to explain the main and most used function (by the way, you can find what you need also taking a look at the code and searching def getDeletedRevision()
wikipedia.Page(site, pagename).getReferences() [edit]
import wikipedia def main(): page = wikipedia.Page(wikipedia.getSite(), 'Wikipedia:Sandbox') for pagetoparse in page.getReferences(follow_redirects=True, withTemplateInclusion=True, onlyTemplateInclusion=False, redirectsOnly=False): wikipedia.output(pagetoparse.title()) if __name__ == '__main__': try: main() finally: wikipedia.stopme()
It's a generator, this means that it returns all the pages that reference to the page defined with the wikipedia.Page() class. If you need to load the entire list of referred pages to work with them, use:
pages = [page for page in s.getReferences()]
Like the previous examples, you can use the reference function as:
for i in page.getReferences(): wikipedia.output(i.title())
The parameters available are: (follow_redirects=True, withTemplateInclusion=True, onlyTemplateInclusion=False, redirectsOnly=False):
- follow_redirect=True : as above, it returns also redirects if it's True (otherwise it won't return the redirects)
- withTemplateInclusion=True : if you use a page as template, in the references appears a "included in page [...]" so, if you need only to load the pages that links to the page, set to True otherwise set to False.
- onlyTemplateInclusion=False : if True, it will load only the pages that use the page as a Template and not the other.
- redirectsOnly=False : if True, it will return only the redirects to the page
--- Work in Progress ---
