Manual:Pywikibot/Workshop Materials/How to run basic scripts

From mediawiki.org

This page is a resource for workshop organizers who want to run a workshop about basic Pywikibot scripts. By following the instructions on this page, you will learn everything you need to know to teach others about this subject. This page features a complete script based on a past workshop, and provides useful tips for people with little experience running such events.

Note that you don't need to follow this guide to the letter. Feel free to adapt it to your needs, and contribute back to it based on your experience.

If you want to learn more about this subject before committing to running a workshop, or need more concise materials, see How to run a basic scripts (self-study).

For details of the past workshop on this subject, including the recording and the slides, see meta:Small_wiki_toolkits/Workshops.

How to use this guide?[edit]

Read the guide in its entirety to decide if the content of this workshop matches your expectations. Change whatever you feel doesn't match your style or isn't interesting for your audience.

Next, read the relevant linked materials to make sure you know everything you need to run this workshop.

Finally, prepare for the workshop by going through the #Preparation section.

Requirements[edit]

To run this workshop you should know how to run scripts included with Pywikibot. This means basic familiarity with Python, Pywikibot, your terminal or command line of choice, and PAWS.

Preparation[edit]

Organizer[edit]

To prepare for this workshop, follow the steps below.

  1. Prepare a local Pywikibot environment for the #Running Pywikibot scripts locally part of the workshop.
    • Make sure that Pywikibot and its scripts are available on your local environment. The best way to do this is to download the code from the Pywikibot repository as described on Manual:Pywikibot/Installation#Install_Pywikibot. You will then be able to run Pywikibot from the core directory by calling python pwb.py <script name> <other arguments>. The rest of this page assumes you have used this method of installing Pywikibot.
    • Run python pwb.py generate_user_files, select the wiki you want to use for your demonstrations. In this workshop we recommend working with https://test.wikimedia.org, which you can select by picking wikipedia, followed by the test site code. Specify your user credentials if you prefer.
    • Pick categories and pages on test Wikipedia that you will use to present how to manage categories and pages. Read through the instructions on this page first and then select the best categories and pages for your presentation. Create new ones if necessary.
  2. Prepare the PAWS environment for the #Running Pywikibot scripts using PAWS part of the workshop.
    • Create or upload a page file to use when presenting how to create a new page from file.
    • Create a discussion page with threads you want to archive using a bot.
  3. Make the following decisions on how you want to run the workshop.
    • Do you want the participants to follow along and perform the described activities as you talk about them, or wait until a designated time to perform the activities?
    • Do you want the participants to interrupt you to ask questions, or wait for the designated time instead?
    • Do you want to record the workshop? If so, ask the participants if they consent to being recorded.
  4. Share your decisions during the introduction to the workshop.
  5. Consider if there are any prerequisites or preparations required from the participants and share them before the workshop.

Participants[edit]

Be sure to inform workshop participants about the following points they need to consider to prepare for this workshop.

  • To run Pywikibot scripts locally, participants must have Pywikibot installed. For information on how to do this, see Pywikibot Installation, or other small wiki toolkits workshops (see small wiki toolkits workshops).
  • To run Pywikibot scripts on PAWS, participants need a unified login. They can create it on any Wikipedia page.

Workshop script[edit]

Start sharing your screen and recording if applicable.

Introduce yourself and the topic of the workshop. Explain the organizational choices and rules you've decided upon (for example, whether the participants should ask questions immediately or later - see #Organizer for details).

Present the agenda of the workshop, outlining what scripts you you plan to run and how. Talk about the purpose of the workshop.

Running Pywikibot scripts locally[edit]

Working with categories[edit]

Explain that in this section you will focus on Pywikibot scripts that operate on page categories. Go through the scripts you plan to present.

Adding categories to pages[edit]

(slide 6)

Explain that you can use the category script to add a category to a list of pages.

Show a page you intend to provide to the script.

Run the python pwb.py category add script and explain what's happening.

Pywikibot asks you to specify a page with the list, and then a category. It then awaits your confirmation and adds the category to all pages linked from the page you provided.

Show the page again, pointing out the added category.

Mention that Pywikibot doesn't add the category directly to the page you specify, only to linked pages. Also, note that you can pass the -create option if you want the script to create the linked pages that don't exist. Otherwise the bot script skips them by default.

Sources:

Removing categories from pages[edit]

(slide 7)

Explain that you can use the category script to remove the category from all pages with that category. Mention that this script does not allow you to specify pages to remove the category from. Its primary use is for when you want to delete a category entirely.

You can use a page generator to limit the reach of this script. For example, you can choose to only remove a category from pages that only exist in another, specified category. You can read more about this in the documentation.

Demonstrate the use of this script by running python pwb.py category remove and then picking a category from the test Wikipedia to remove. The best way to do this is to remove the category you have added earlier.

Sources:

Creating a list of pages in a given category[edit]

(slide 8)

Explain that you can use the category script to generate a list of pages with a specific category.

Demonstrate the use of this script by running python pwb.py category listify and then providing category name and the name of a list page to create. Show the newly created page on the wiki.

Note that you can use the -overwrite option to overwrite a page if it already exists.

Sources:

Moving pages from one category to another[edit]

(slide 9)

Explain that you can use the category script to move pages from one category to another.

Demonstrate the use of this script by running python pwb.py category move -from:"<name of source category>" -to:"<name of new category>", and showing the result of this script.

Explain that you can combine this option with other options that can be interesting to participants. You can read about them in documentation.

Sources:

Page maintenance[edit]

Scripts covered in this section provide various page maintenance utilities.

Cleaning sandbox[edit]

(slide 12)

Explain that you can use the clean_sandbox script to clean up your sandbox page. By default, this script sets the sandbox to display the {{Sandbox}} template.

You can specify the -page option to clean up a custom page, and use the -text option to set page content to custom text instead of the default.

See the documentation for information about other interesting options, for example -hours and -delay that allow for basic scheduling of this bot.

Sources:

Checking for broken links[edit]

(slide 13)

Explain that you can use the weblinkchecker script to identify broken external links on a page. Note that this script requires a Python package called memento_client.

Run the script using python pwb.py weblinkchecker -page:<page name>. Explain that when you run the weblinkchecker script for the first time, it generates a binary file that's a starting point for evaluating links. The script creates this file in the deadlinks directory. If you run the script again on the same page after at least a week, it will generate a list of links that are still broken, in a human-readable format.

This mechanism helps you to avoid removing temporarily broken links.

As with other scripts, this script also provides more options you can read about in the documentation.

Sources:

Deleting pages[edit]

(slide 14)

Explain that you can use the delete script to delete or restore specific pages. You can specify a page directly using the -page option, or use page generators to construct a list of pages to delete. For example, you can use the -ref option to delete pages that link to a certain page, or the -cat to delete pages in a category.

Run the python pwb.py delete -page:<page to delete> script to show how it works. You can follow it up with python pwb.py delete -undelete -page:<page deleted in the previous step>. Note that you might not have the necessary permissions to directly delete pages on test Wikipedia. In that case, the script offers to mark the page for deletion instead.

Sources:

Running Pywikibot scripts using PAWS[edit]

(slide 17) (slide 18) (slide 19) (slide 20) (slide 21) (slide 22) (slide 23)

PAWS is a Jupyter Notebook instance hosted by the Wikimedia Foundation. Jupyter Notebook is a web application that allows you to run code and present results without needing to install anything on your computer.

PAWS has Pywikibot preinstalled. You can use it to run Pywikibot scripts in many ways. In this part of the workshop, you will see how to use the PAWS terminal to run built-in Pywikibot scripts. Then, you will see how to write a basic custom script in the notebook interface.

Explain that PAWS is a perfect way to run your less CPU-intensive scripts, or scripts that you don't need to schedule. It's also a great way to learn more about Pywikibot and experiment with it without having to set it up locally.

Note that PAWS is not a good solution for collaborative development or longer, scheduled Pywikibot tasks.

Show the wikitech:PAWS page and explain that it's a good starting point for running PAWS and learning more about it. Then show how to log in to PAWS by clicking the link on that page or accessing https://hub-paws.wmcloud.org directly.

Sources:

In terminal[edit]

Demonstrate how to launch a terminal in PAWS. To do that, scroll down the Launcher page to the Other section and select Terminal.

Note that if you closed the Launcher page, you can reopen it using the New Launcher button in the top left corner of the PAWS screen.

Explain that you can use similar syntax to call Pywikibot from PAWS as you would locally. The only difference is that you omit python in your commands.

Explain that running pwb.py login in terminal works without any configuration. This command logs you in to test Wikipedia using your own Wikipedia account. This means you can immediately start experimenting with Pywikibot.

Creating pages from a file[edit]

(slide 24) (slide 25)

To create a page from a file, you need to prepare that page, place it in a file on PAWS, and then use the pagefromfile script. You can use the code sample below as source of your page content.

Demonstrate how to do it directly in PAWS.

Explain the syntax of the page file. Every page must start with {{-start-}} and end with {{-stop-}}. The first bold element in the file serves as a title and a heading of the page. The rest of the page follows regular wikitext syntax rules.

page.txt
{{-start-}}

'''Pywikibot Workshop In MONTH YEAR'''

This was held on DATE

==List of Participants==
* Participant 1
* Participant 2
{{-stop-}}

Create the page using pwb.py pagefromfile -file:<path to your page.txt file>. Show the page on test wiki after the process finishes.

Mention that this script has a lot of other options. You can learn more about them in the documentation.

Sources:

Archiving page discussion[edit]

(slide 26) (slide 27)

Explain that you can use the archivebot script to archive old discussions. Show a discussion page with threads that you intend to archive.

To automatically archive threads, you need to use a configuration template on the discussion page. One commonly used template is User:Miszabot/config, documented on wikipedia:User:MiszaBot/Archive_HowTo.

To configure your page for automatic archiving, add the following template at the top of the page.

page.txt
{{User:MiszaBot/config
|archive = &lt;your discussion page&gt;/Archive %(counter)d
|algo = old(10s)
|counter = 1
|minthreadsleft = 2
|minthreadstoarchive = 1
}}

Explain the meaning of template parameters (see the documentation on doc.wikimedia.org for details):

  • archive - name of the archive page. The script moves all archived threads to this page. You can generate its name dynamically using parameters described in the documentation. %(counter) is one such parameter.
  • algo - archiving algorithm - in this case, the script archives every thread that's at least 10 seconds old. Typically, you would set this option to a few weeks or months. You are using a small number here to demonstrate how the script works.
  • counter - used to dynamically generate the archive page name. You can change it to create a new archive page.
  • minthreadsleft - minimum number of discussion threads to keep on the page. If the bot can't fulfill this requirement, it doesn't archive any threads.
  • minthreadstoarchive - minimum number of threads to archive. If the bot can't fulfill this requirement, it does not archive any threads.

Once the template is present on the discussion page, run pwb.py archivebot -page:"Talk:Pywikibot March Workshop" "User:MiszaBot/config" and explain the script's parameters. Note that the name of the template is the last option when calling this script and has to match the template you used on the page.

Show the discussion page and point out how the number of threads decreased.

Sources:

Checking images[edit]

(slide 28)

Explain that you can use the checkimages script to automatically analyze images for problems, for example with description or license.

Run the script by calling pwb.py checkimages -simulate -limit:10. This simulates the check of the newest 10 images. You can also specify a page with images that you want to check using the -page option.

Sources:

In a notebook[edit]

Demonstrate how to create a new Jupyter notebook by opening the Launcher page and then choosing Python 3 (ipykernel) in the Notebook section.

This opens a new notebook where you can start creating your custom Pywikibot script.

Creating a basic page[edit]

(slide 29)

Copy and paste the script below into the notebook and run it using the Run the selected cells and advance option in the toolbar.

create_page.py
from pywikibot import Site, Page

site = Site('test', 'wikipedia') #we connect to test.wikipedia.org
page = Page(site, 'Pywikibot Workshop') #title and page name

page.text = 'This is the second in our series of workshops' #page content
page.save()

Explain what the script does based on the comments. Open the Pywikibot Workshop page to show how it looks on test Wikipedia.

Page generators[edit]

Mention that you can use page generators with some scripts you showed in this workshop. Page generators allow you to run these scripts on a dynamically generated lists of pages. You don't have to know the names of all pages you want to change using a bot. Instead, the generators automatically create a list of those pages based on specific criteria, for example pages in specific categories, or pages that are not watched by anyone.

To learn more about page generators, read the documentation.

Sources:

Closing[edit]

Thank the participants and other organizers. Leave some time for final questions the participants might have. List links to extra materials on your last slide if applicable. Let the participants know where they can find the slides and any helpful resources.