Phabricator project: #PAWS

Manual:Pywikibot/PAWS

From MediaWiki.org
Jump to: navigation, search
Translate this page; This page contains changes which are not marked for translation.

Other languages:
Deutsch • ‎Ελληνικά • ‎English • ‎español • ‎français • ‎Bahasa Indonesia • ‎italiano • ‎日本語 • ‎polski • ‎русский • ‎اردو
See Wikitech:PAWS for more details.

This document provides a quick interactive overview of Pywikibot using a notebook hosted on the Wikimedia labs environment using PAWS (Pywikibot: A Web Shell).

Note PAWS Terminal supports copy and paste in Chromium-based browsers only (Google Chrome, Opera and Safari are fine). If you use a different browser, you will need to manually type the commands mentioned in this walk-through. You can also create a bash file with in content the command and call in terminal with bash file.sh.

Create a Wikimedia account[edit]

To follow this walk-through, you only need an account on a Wikimedia project.

To create a Wikimedia account, see the Logging in help page.

Once you have created an account, please visit https://test.wikipedia.org/ and check that your username appears in the top right corner (this works around T120327).

Sign into a notebook[edit]

To start a hosted notebook, go to https://paws.wmflabs.org

Click "Sign in with MediaWiki", and then click "Allow" when asked to approve "Use OAuth for Authentication". The first time you access PAWS, you need to create a server. Click the green "Start my Server" button. It's normal to wait a few minutes for the new server to start up.

Once that is completed, you will be redirected to a URL like https://paws.wmflabs.org/paws/user/<username>/tree

Start a terminal[edit]

To start a new interactive terminal,

  1. Go to your PAWS home
  2. click 'New' on the right hand side, and
  3. select 'Terminal'.

This will open a new window with the URL https://paws.wmflabs.org/paws/user/<username>/terminals/1, with a Linux '$' prompt.

You can bookmark this URL, and return to the terminal at any time, even after you have closed your browser or shutdown your own computer.

This terminal is not an emulator. It is a real bash shell, as part of a real Linux installation sitting on a docker container, so you can use any bash command, and use any commands available on Linux that have been installed.

To see some of the commands available, use

ls /bin/

.

bash-4.3$ ls /bin/
bash          cat            domainname  journalctl  mkdir          pwd         stty                            tar           zcmp
unzip2       chacl          echo        kill        mknod          rbash       su                              tempfile      zdiff
....
bash-4.3$ ls /usr/bin/
2to3-3.4                 dvipdf                     lcf                         printf               systemd-path                         
X11                      dwp                        ld                          prlimit              systemd-run
...


Login to the wiki[edit]

If you haven't yet, visit the testwiki in a browser: https://test.wikipedia.org/ This will establish your account on the server and allow you to log in from the command line. The following command should confirm that you can log into the testwiki. It uses OAuth, so there is no need to enter a password.

$ pwb.py login
Logging in to test:test as <username>
Logged in on test:test as <username>.

You can connect pywikibot to a different wiki by creating a file named user-config.py in your $HOME directory (/home/paws) and adding mylang and family variables:

mylang = 'test'
family = 'wikipedia'


Create a page[edit]

To create a page, enter the following command in the terminal, replacing '<username>' with your username, and pressing 'Y' when prompted to accept your changes:

bash-4.3$ pwb.py add_text -up -talk -page:"User talk:<username>" -text:"Hello. ~~~~"
Loading User talk:<username>...

>>> User talk:<username> <<<
@@ -0,0 +1 @@
+ Hello. ~~~~

Do you want to accept these changes? ([Y]es, [N]o, [a]ll, open in [b]rowser): Y
Page [[User talk:<username>]] saved

You have edited the wiki. View your changes by opening https://test.wikipedia.org/wiki/User_talk:<username> in your web browser.

You can read more about each of these command line scripts with the '-help' command line option.

bash-4.3$ pwb.py add_text -help
...

Fetch a page[edit]

Fetching many pages is achieved with the "listpages" command.

To get the contents of the page you created in the previous section, enter the following command:

bash-4.3$ pwb.py listpages -page:"User talk:<username>" -save
   1 <username>
Saving User talk:<username> to /home/paws/User_talk_<username>
1 page(s) found

Now if you go to your PAWS files list, the saved page should be present.

A real script example[edit]

When a website used on Wikipedia changes its URL, the links on Wikipedia become outdated, and possible also dead links if the website doesn't redirect from the old URLs to the new URLs. e.g. for example, Encyclopedia Britannica (EB) has changed their links, such as moving pages from http://www.britannica.com/EBchecked/media/ to http://www.britannica.com/topic/[topic name]/images-videos/*. You can find the list of usages of the old URL on English Wikipedia at https://en.wikipedia.org/wiki/Special:LinkSearch/http://www.britannica.com/EBchecked/media . Updating all those links manually will be very time consuming. Thankfully EB has maintained redirects from their old URLs to the new URLs, so this does not need to be fixed immediately.

For a simpler example, English Wikipedia currently contains links to http://britannica.com/EBchecked/ instead of http://www.britannica.com/EBchecked/; i.e. a 'www.' subdomain is missing in the URL.

There are currently 14 cases on English Wikipedia: https://en.wikipedia.org/wiki/Special:LinkSearch/http://britannica.com/EBchecked/

Wikipedia in other language also have this problem. e.g. there is one case on German Wikipedia: https://de.wikipedia.org/wiki/Spezial:Weblinksuche/http://britannica.com/EBchecked/

In order to fix those links, we can use Pywikibot replace.py script. In this demo we will use the '-simulate' argument to avoid writing to the wiki, as there are strict rules about automated editing of English Wikipedia.

First lets list all of the pages which link to http://britannica.com/EBchecked/.

bash-4.3$ pwb.py listpages -lang:en -weblink:"britannica.com/EBchecked/"
   1 Bhatner fort
   2 Mohammad Ishaq Khan
   3 Fringe theories/Noticeboard/Archive 7
   4 El Riego phase
   5 Catalonia/Archive 4
   6 Stephen I of Hungary
   7 Stephen I of Hungary/Archive 1
   8 Väinö Tanner
   9 Tokaji
  10 Transylvania/Archive5
  11 Hungarians in Romania
  12 Transylvania
  13 Uttarakhand
  14 Françoise Giroud
14 page(s) found

Now we check those pages actually have the literal URL in the page; i.e. they are not using a template.

bash-4.3$ pwb.py listpages -lang:en -weblink:"britannica.com/EBchecked/" -grep:"britannica.com\/EBchecked"
   1 Bhatner fort
   2 Mohammad Ishaq Khan
   3 Fringe theories/Noticeboard/Archive 7
   4 El Riego phase
   5 Catalonia/Archive 4
   6 Stephen I of Hungary
   7 Stephen I of Hungary/Archive 1
   8 Väinö Tanner
   9 Tokaji
  10 Transylvania/Archive5
  11 Hungarians in Romania
  12 Transylvania
  13 Uttarakhand
  14 Françoise Giroud
14 page(s) found

Now use replace to add the missing "www."

bash-4.3$ pwb.py replace -lang:en -simulate -weblink:"britannica.com/EBchecked/" -grep:"britannica.com\/EBchecked" "http://britannica.com/EBchecked/" "http://www.britannica.com/EBchecked/"
The summary message for the command line replacements will be something like: Bot: Automated text replacement  (-http://britannica.com/EBchecked/ +http://www.britannica.com/EBchecked/)
Press Enter to use this automatic message, or enter a description of the
changes your bot will make: 
Logging in to wikipedia:en as <username>
Retrieving 14 pages from wikipedia:en.
Retrieving 14 pages from wikipedia:en.


>>> Stephen I of Hungary <<<
@@ -47 +47 @@
- Stephen's birth date is uncertain because it was not recorded in contemporaneous documents.{{sfn|Györffy|1994|p=64}} Hungarian and Polish chronicles written centuries later give three different years: 967, 969 and 975.{{sfn|Kristó|2001|p=15}} The unanimous testimony of his three late 11th-century or early 12th-century [[hagiographies]] and other Hungarian sources, which state that Stephen was "still an adolescent" in 997,<ref>''Hartvic, Life of King Stephen of Hungary'' (ch. 5), p. 381.</ref> substantiate the reliability of the later year (975).{{sfn|Györffy|1994|p=64}}{{sfn|Kristó|2001|p=15}} Stephen's ''[[Life of Saint Stephen, King of Hungary (Vita minor)|Lesser Legend]]'' adds that he was born in [[Esztergom]],{{sfn|Györffy|1994|p=64}}{{sfn|Kristó|2001|p=15}}<ref name=Britannica>{{cite encyclopedia|title=Stephen I|url=http://britannica.com/EBchecked/topic/565415/Stephen-I|encyclopedia=[[Encyclopædia Britannica]]|publisher=Encyclopædia Britannica, Inc.|year=2008|accessdate=2008-07-29}}</ref> which implies that he was born after 972 because his father, [[Géza, Grand Prince of the Hungarians]], chose Esztergom as royal residence around that year.{{sfn|Györffy|1994|p=64}} Géza promoted the spread of Christianity among his subjects by force, but never ceased worshipping pagan gods.{{sfn|Kontler|1999|p=51}}{{sfn|Berend|Laszlovszky|Szakács|2007|p=331}} Both his son's ''[[Life of Saint Stephen, King of Hungary (Vita maior)|Greater Legend]]'' and the nearly contemporaneous [[Thietmar of Merseburg]] described Géza as a cruel monarch, suggesting that he was a despot who mercilessly consolidated his authority over the rebellious Hungarian lords.{{sfn|Berend|Laszlovszky|Szakács|2007|p=331}}{{sfn|Bakay|1999|p=547}}
+ Stephen's birth date is uncertain because it was not recorded in contemporaneous documents.{{sfn|Györffy|1994|p=64}} Hungarian and Polish chronicles written centuries later give three different years: 967, 969 and 975.{{sfn|Kristó|2001|p=15}} The unanimous testimony of his three late 11th-century or early 12th-century [[hagiographies]] and other Hungarian sources, which state that Stephen was "still an adolescent" in 997,<ref>''Hartvic, Life of King Stephen of Hungary'' (ch. 5), p. 381.</ref> substantiate the reliability of the later year (975).{{sfn|Györffy|1994|p=64}}{{sfn|Kristó|2001|p=15}} Stephen's ''[[Life of Saint Stephen, King of Hungary (Vita minor)|Lesser Legend]]'' adds that he was born in [[Esztergom]],{{sfn|Györffy|1994|p=64}}{{sfn|Kristó|2001|p=15}}<ref name=Britannica>{{cite encyclopedia|title=Stephen I|url=http://www.britannica.com/EBchecked/topic/565415/Stephen-I|encyclopedia=[[Encyclopædia Britannica]]|publisher=Encyclopædia Britannica, Inc.|year=2008|accessdate=2008-07-29}}</ref> which implies that he was born after 972 because his father, [[Géza, Grand Prince of the Hungarians]], chose Esztergom as royal residence around that year.{{sfn|Györffy|1994|p=64}} Géza promoted the spread of Christianity among his subjects by force, but never ceased worshipping pagan gods.{{sfn|Kontler|1999|p=51}}{{sfn|Berend|Laszlovszky|Szakács|2007|p=331}} Both his son's ''[[Life of Saint Stephen, King of Hungary (Vita maior)|Greater Legend]]'' and the nearly contemporaneous [[Thietmar of Merseburg]] described Géza as a cruel monarch, suggesting that he was a despot who mercilessly consolidated his authority over the rebellious Hungarian lords.{{sfn|Berend|Laszlovszky|Szakács|2007|p=331}}{{sfn|Bakay|1999|p=547}}

Do you want to accept these changes? ([y]es, [N]o, [e]dit, open in [b]rowser, [a]ll, [q]uit): N

...

In PAWS, and any terminal that supports color, the diff of changes will show the added "www." in green text color, making it easier to find the proposed changes.

Inside Pywikibot[edit]

Warning Warning: Don't write passwords in files of server, the files are public!

Next we will use the PAWS Python session.

  1. Go to your PAWS home,
  2. click 'New' on the right hand side, and
  3. select 'Python 3'.

This will open a new window.

In the text box, enter the following and in the Cell menu select 'Run' (or pressing shift+enter to run).

import pywikibot

A new text box will appear below. Run the following to create an APISite object connected to https://test.wikipedia.org/:

site = pywikibot.Site('test', 'wikipedia')

Describe "site" by entering it into the new text box and selecting "Run".

site

It should show

 Out[3]: APISite("test", "wikipedia")

Create a page object:

page = pywikibot.Page(site, 'test')

Check it exists by running:

page.exists()

It should output

 VERBOSE:pywiki:Found 1 wikipedia:test processes running, including this one.
 Out[5]: True

Show the text on the page:

page.text

Change the page text in the object:

page.text = 'Hello world'

Save the page to the wiki:

page.save()

The response should be:

  Page [[Test]] saved
  INFO:pywiki:Page [[Test]] saved

The interactive Python 3 notebook allows many lines to be run together. The above could be put into one text box and Run

import pywikibot

site = pywikibot.Site('test', 'wikipedia')
page = pywikibot.Page(site, 'test')

page.text = 'Hello world!'
page.save()

The log of your interactive Python session can be saved or downloaded for future reference.

Accessing online documentation in PAWS[edit]

Pywikibot documentation may be found at https://doc.wikimedia.org/pywikibot/index.html . It is primarily sourced from docstrings, which can be loaded in the interactive Python 3 notebook using the Python built-in function help().

For example, to look at the arguments for the save method above, run either:

help(page.save)

or

help(pywikibot.Page.save)


Editing Pywikibot scripts[edit]

The Pywikibot library and scripts are located in /srv/paws, and are read-only. The installed Pywikibot library can not be modified in PAWS.

Scripts may be modified after copying them into your PAWS home.

For example, to run a modified "checkimages.py":

  1. In the terminal, enter cp /srv/paws/pwb/scripts/checkimages.py ~
  2. In a browser, go to your PAWS home and click on the file checkimages.py.
  3. In the browser, you can edit the file. Edit the code -- for instance, just after the start = time.time() code on line 1775, add a new line 1776 that will print out your name: print("MYNAME's version.")
  4. In the editing interface, use the File menu and click Save to save your modifications.
  5. In the terminal, enter pwb.py ~/checkimages.py -simulate