Manual:Pywikibot/wikipedia.py

Wikipedia.py is a module available with the pywikipedia framework. It's the most important module, because it provides most part of the normal function and it's programmed to be as easy as possibile to be used.

Page class, site class and wikipedia.stopme function
The most important class (and the most used) is the Page class. It provides all the common action for a wikipage. To use that, you must, first of all, call it and only after you can use its internal function.

In the example, you import the wikipedia module. After that it's needed to define the site where the page is. If you take it how it's done in the example, you'll take the page that is in the project setted in user-config.py.

If you want to load the page of a different project you have to do as in the following example. At the end, there is wikipedia.stopme that is needed to exit from wikipedia's site so you don't slow down the other bot's processes.

Rewrite a code in a better way
In this example, you import wikipedia and define the site. The site, is the english wikisource because you have add the parameter 'en' and 'wikisource', so the bot won't use the default's one but what you have choosen. After that, we define the page to take (in this case, the Main Page, that we are sure that exists) and we load the text that is inside.

So, to finish, we print the text to the screen to let the user to read it. After all, this code isn't so clean and good, indeed an experienced coder can do it better as in the following example.

This code is better than before for many reasons. First of all, it uses a function and it's not procedural as before. In this way, you can call the main function, only if the script is call directly. If it's not, it won't do what is written untill the coder doesn't call the main function.

Furthermore, it's really better the try:/finally: block that allows to the coder to log-out from wikipedia always, even when an exception is raised. This permit not to slow down other process and to be quicker when you test a script. Another interessant new thing, is the used of wikipedia.output.

It does the same job of print "something" but, and this is the important thing, it encode and decode the text using the coding defined in user-config. Note: If you don't put the text in the u"something" (with the u) the function will return an (handled, but ugly to see) error. (the text variable contain already the code encode in a good way, so it won't raise an exception).

Unfortunately, you can't always be sure that the page that you will load exists always. To prevent that wikipedia.py will raise the error NoPage, maybe because you want to create a page, you can use an except block and solve the problem, as in the following example.

allpages function and the wikipedia.py errors
Here, we can see our first generator object that is in wikipedia.py. First of all, we define the site (it's suggested to do it immediatly, because it's quite impossibile not to use a site's function if you need to work with wikipedia). The second variable is the page from which the Bot will load all the pages on your wikimedia's project.

Then, in the line below, you start a for cicle parsing every page on Wikipedia. Look out, because the function allpages isn't a sub-function of wikipedia, but of site! (so, you can determineted from what project you want to load all the pages).

The allpages function has optional parameter, one of these are the start page (default = '!'), then there is the namespace (default = 0), then if you want to include redirects (default: True), and the throttle (default: True). In the example, if you wanted to use the optional parameters as well, you would have used: def allpages(self, start = '!', namespace = 0, includeredirects = True, throttle = True)

In the line below, we load the page's title, in this way you don't print Page but Page, so it's a better output. Two lines below, you find the try/except block.

The Bot tries to load the page and if it doesn't exists (maybe someone has deleted it while the bot was running) it will print only a empty text (you can do what you want, I've choosen this only for explanation purpose). If the page is a redirect, it will notify it and, to make sure that nothing will happends, we put an extreame Exception that don't block the Bot (for example, if there is a Spamfilter's problem, it will be handled in this way). The possibile Wikipedia's Errors are:


 * wikipedia.Error               General Error
 * wikipedia.NoUsername          the Username is not in user-config.py
 * wikipedia.NoPage              The Page does not exist
 * wikipedia.IsRedirectPage      The Page is a redirect page
 * wikipedia.IsNotRedirectPage   The Page is not a redirect page
 * wikipedia.LockedPage          The Page is locked
 * wikipedia.LockedNoPage        Page does not exist, and creating it is not possible because of a lock (cascading protection)
 * wikipedia.NoSuchEntity        No entity exist for this character
 * wikipedia.SectionError        The section specified by # does not exist
 * wikipedia.PageNotSaved        Saving the page has failed (General Error)
 * wikipedia.EditConflict        There has been an edit conflict while uploading the page
 * wikipedia.SpamfilterError     Saving the page has failed because the MediaWiki spam filter detected a blacklisted URL.
 * wikipedia.ServerError         Got unexpected server response
 * wikipedia.UserBlocked         Your username or IP has been blocked
 * wikipedia.PageNotFound        Page not found in list