Manual:Pywikibot/Cookbook/Getting a single page

Creating a Page object from title
In the further part of this cookbook, unless otherwise stated, we always assume that you have already used these two basic statements:

You want to get the article about Budapest in your wiki. While it is in the article namespace, it is as simple as

Note that Python is case sensitive, and in its world  and   mean classes,   and   class instances, while lowercase   and   should be variables.

For such simple experiments interactive Python shell is comfortable, as you can easily see the results without using, saving and running your code.

Getting the type of an object is often useful when you want to discover the capabilities of Pywikibot. It seems to be strange, but the main thing is that you got a. shows the path to it: you may find the  class in. Now let's see the user page of your bot. Either you prefix it with the namespace ( and other English names work everywhere, while the localized names only in your very wiki) or you give the namespace number as the third argument. So

and

will give the same result. is the localized version of  in Hungarian; Pywikibot won't respect that I used the English name for the namespace in my command, the result is always localized.

Getting the title of the page
On the other hand, if you already have a  object, and you need its title as a string,   method will do the job:

Possible errors
While getting pages may cause much less errors than saving them, a few types are worth to mention, some of them being technical, while others possible contradictions between our expectations and reality. Let's speak about them before actually getting the page.
 * 1) The page does not exist.
 * 2) The page is a redirect.
 * 3) You may have been mislead regarding the content in some namespaces. If your page is in Category namespace, the content is the descriptor page. If it is in User namespace, the content is the user page. The trickiest is the File namespace: the content is the file descriptor page, not the file itself; however if the file comes from Commons, the page may not exist in your wiki at all, while you still see the picture.
 * 4) The expected piece of text is not in the page content because it is transcluded from a template. You see the text on the page, but cannot replace it directly by bot.
 * 5) Sometimes a badly formed code may work well. For example  with two spaces will behave as  . While the page is in the category and you will get it from a pagegenerator (see below), you won't find the desired string in it.
 * 6) And, unfortunately, Wikipedia servers sometimes face errors. If you get a 500 error, go and read a book until server comes back.
 * 7) InvalidTitleError is raised in very rare cases. A possible reason is that you wanted to get a page title that contains illegal characters.

Getting the content of the page
Important: by this time we don't have any knowledge about the existence of the page. We have not contacted live wiki yet. We just created an object. It is just as a street number: you may write it on a document, but either there is a house there or not.

There are two main approaches of getting the content. It is important to understand the difference.

Page.text
You may notice that  does not have parentheses. Looking into the code we discover that it is not a method, rather a property. This means  is ready to use without calling it, may be assigned a value, and is present upon saving the page.
 * https://doc.wikimedia.org/pywikibot/master/api_ref/pywikibot.page.html#page.BasePage.text

will write the whole text on your screen. Of course, this is for experiment.

You may write  if you need a copy of the text, but usually this is unneccessary. is not a method, so referring to it several times does not slow down your bot. Just manipulate  or assign it a new value, then save.

If you want to know details on how a property works, search for "Python decorators". For using it in your scripts it is enough to know the behaviour. Click on the above link and go through the right-hand menu. You will find some other properties without parentheses.

will never raise an error. If the page is a redirect, you will get the redirect link instead of the content of the target page. If the page does not exist, you will get an empty string which is just what happens if the page does exist, but is empty (it is usual at talk pages). Try this:

is comfortable if you don't have to deal with the existence of the page, otherwise it is your responsibility to make the difference. An easy way is.

While page creation does not contact the live wiki, refering to text for the first time and  usually does. For several pages it will take a while. If it is too slow for you, go to the section. shows if it is neccessary; if it returns True, the bot will not retrieve the page again. Therefore it returns  for non-existing pages as it is senseless to reload them. Although this is a public method, you are unlikely to have to use it directly.

Page.get
The traditional way is  which forces you to handle the errors. In this case we store the value in a variable.
 * https://doc.wikimedia.org/pywikibot/master/api_ref/pywikibot.page.html#page.BasePage.get

A non-existing page causes a NoPageError:

A redirect page causes an IsRedirectPageError:

If you don't want to handle redirects, just make the difference between existing and non-existing pages,  will make its behaviour more similar to that of  :

Here is a piece of code to handle the cases. It is already too long for prompt, so I saved it. Which results in: While  is simple, it gives only the text of the redirect page. With  we got another Page instance without parsing the text. Of course, the target page may also not exist or be another redirect. Scripts/redirect.py gives a deeper insight.

For a practical application see.

Reloading
If your bot runs slowly and you are in doubt that the page text is still actual, use. The experiment shows that it does not update, which is good on one side, as you don't lose your data, but on the other side needs attention to be concious,

currently does not reflect to forced reload, see T330980.