Manual:Pywikibot/Workshop Materials/How to run basic scripts (self-study)

From mediawiki.org
This page is a self-study version of the How to run a basic script via Pywikibot workshop that first took place in March 2022. To watch the recording and access meeting notes from this workshop, see meta:Small wiki toolkits/Workshops. For slides, see meta:File:Running basic scripts via Pywikibot.pdf.

This guide walks you through common Pywikibot scripts that you can run locally or on PAWS. It also teaches you how to start writing and running custom Pywikibot scripts on PAWS.

Prerequisites[edit]

Before following this guide, make sure that you understand how to use the terminal or command line on your computer, and how to run Python scripts. Also make sure that you have Pywikibot on your device. For information on how to install and configure Pywikibot, see any of the pages below:

Having a basic familiarity with Jupyter notebooks should also help.

Additional resources:

Running Pywikibot scripts locally[edit]

Make sure that Pywikibot and its scripts are available in your local environment. The best way to install them is to download the code from the Pywikibot repository as described on Manual:Pywikibot/Installation#Install_Pywikibot. This allows you to run Pywikibot from the core directory by calling python pwb.py <script name> <other arguments>. The rest of this page assumes you have used this method of installing Pywikibot.

Run python pwb.py generate_user_files and select the wiki you want to use with this guide. Test Wikipedia (https://test.wikimedia.org) is a good starting point if you have never used bots before. You can select it by picking wikipedia, followed by the test site code. Optionally, specify your user credentials.

Run python pwb.py login to log in to test Wikipedia.

Category script[edit]

To work with categories using Pywikibot, use the category script. To learn about it, see Manual:Pywikibot/category.py or the category script documentation on doc.wikimedia.org.

Adding categories[edit]

(slide 6 in workshop materials)

Manually create three pages on test Wikipedia, for example in your sandbox. Edit the first one and create a list of links to the other two pages.

To add a category to the two pages on the list, run python pwb.py category add.

Specify the name of the first page, and then a category to add - for example Test. The script then adds the category to all pages linked from the page you provided, but not directly to that page. Open the pages you created on test Wikipedia and notice that they now belong to the category you specified.

Removing categories[edit]

(slide 7 in workshop materials)

To remove a category from all pages in that category, run python pwb.py category remove and then specify a category to remove.

Caution! Caution: This script removes the category from all pages. You can't specify a single page to remove the category from, but you can use a page generator to limit the reach of this script. See the Using page generators section at the end of this page for more information.

Creating a list of pages in a category[edit]

(slide 8 in workshop materials)

To create a list of pages in a given category, run python pwb.py category listify. Specify a category name, for example Test, and the name of a list page to create. Open the newly created page to confirm that it lists all pages in the category.

Moving pages between categories[edit]

(slide 9 in workshop materials)

To move all pages from one category to another, run python pwb.py category move -from:"<name of source category>" -to:"<name of target category>". Open one of the affected pages on test Wikipedia to confirm the script worked.

Page maintenance[edit]

This section covers different page maintenance utilities available using Pywikibot scripts. You can read more about each script by following links in specific sections.

Cleaning sandbox[edit]

(slide 12 in workshop materials)

Use the clean_sandbox script to clean up your sandbox page. To learn about this script, see Manual:Pywikibot/clean_sandbox.py or the clean_sandbox script documentation on doc.wikimedia.org

Run python pwb.py clean_sandbox -page:<your sandbox page>.

By default, this script sets the sandbox to display the {{Sandbox}} template. You can use the -text option to set page content to custom text instead. Verify that the script worked by opening your sandbox page.

You can also use a page generator to dynamically define which pages the script should clean. For more information on page generators, see the Using page generators section at the end of this page.

Read the documentation for information about other interesting options, for example -hours and -delay that allow for basic scheduling of this bot.

Resources:

Checking for broken links[edit]

(slide 13 in workshop materials)

Use the weblinkchecker script to identify broken external links on a page. To learn about this script, see Manual:Pywikibot/weblinkchecker.py, or the weblinkchecker script documentation on doc.wikimedia.org.

This script requires a Python package called memento_client. Install it by running pip install memento_client.

Run the script using python pwb.py weblinkchecker -page:<page name>. A good page to test this script on test Wikipedia is testwiki:Aeroflot.

The script generates a binary file as a starting point for its comparisons. It produces a human-readable list of broken links after running again at least a week later by default. You can use the -day option to change that. This mechanism helps you to avoid removing temporarily broken links.

Resources:

Deleting pages[edit]

(slide 14 in workshop materials)

You can use the delete script to delete or restore specific pages. To learn about this script, see Manual:Pywikibot/delete.py or the delete script documentation on doc.wikimedia.org

Run the python pwb.py delete -page:<page to delete>, and then confirm that the page you indicated was removed. You can follow it up with python pwb.py delete -undelete -page:<page deleted in the previous step> if you want to restore it.

You can specify a page directly using the -page option, or use page generators to construct a list of pages to delete. See the Using page generators section at the end of this page for more information.

You might not have the necessary permissions to directly delete pages on test Wikipedia. In that case, the script offers to mark the page for deletion instead.

Resources:

Running Pywikibot scripts on PAWS[edit]

Slides in workshop materials that introduce PAWS:

PAWS is a Jupyter Notebook instance hosted by the Wikimedia Foundation. Jupyter Notebook is a web application that allows you to run code and present results without needing to install anything on your computer. To learn about PAWS and how to use it, see wikitech:PAWS, and wikitech:PAWS/Getting_started_with_PAWS.

To learn how to use PAWS with Pywikibot, see wikitech:PAWS/PAWS_and_Pywikibot.

To learn more about Jupyter Notebook, see the Jupyter project documentation.

Running Pywikibot in PAWS terminal[edit]

Log in to PAWS. Open the PAWS terminal by scrolling down the Launcher page to the Other section and selecting Terminal.

In PAWS, you can use Pywikibot without calling Python directly. For example, to log in to test Wikipedia, run pwb.py login.

Creating pages from a file[edit]

To create a new page based on the contents of a file, use the pagefromfile script. To learn about it, see Manual:Pywikibot/pagefromfile.py or the pagefromfile script documentation on doc.wikimedia.org.

Create a new file in your PAWS workspace. This file can contain one or more pages and must follow the syntax rules outlined in the script's documentation: Manual:Pywikibot/pagefromfile.py#Examples. You can use the following code sample as your starting point.

(slide 25 in workshop materials)

page.txt
{{-start-}}

'''Pywikibot Workshop In MONTH YEAR'''

This was held on DATE

==List of Participants==
* Participant 1
* Participant 2
{{-stop-}}

(slide 24 in workshop materials)

To create the page, run pwb.py pagefromfile -file:<path to your page.txt file>. Open the newly created page to confirm that the script worked.

Resources:

Archiving page discussion[edit]

Use the archivebot script to archive old discussions. To learn about it, see Manual:Pywikibot/archivebot.py or archivebot script documentation on doc.wikimedia.org.

Before starting, select or create a page with discussion threads that you want to archive.

To automatically archive threads, you need to use a configuration template on the discussion page. One commonly used template is User:Miszabot/config, documented on Archive_HowTo.

Choose a discussion page with threads that you plan to archive.

To configure this page for automatic archiving, add the template at the top of the page, for example by using the following code.

(slide 27 in workshop materials)

page.txt
{{User:MiszaBot/config
|archive = &lt;your discussion page&gt;/Archive %(counter)d
|algo = old(10s)
|counter = 1
|minthreadsleft = 2
|minthreadstoarchive = 1
}}

This script uses the following parameters:

  • archive - name of the archive page. The script moves all archived threads to this page. You can generate its name dynamically using parameters described in the documentation. %(counter) is one such parameter.
  • algo - archiving algorithm - in this case, the script archives every thread that's at least 10 seconds old. Typically, you would set this option to a few weeks or months.
  • counter - used to dynamically generate the archive page name. You can change it to create a new archive page.
  • minthreadsleft - minimum number of discussion threads to keep on the page. You might need to change this value depending on the content of the page you've selected. If the bot can't fulfill this requirement, it doesn't archive any threads.
  • minthreadstoarchive - minimum number of threads to archive. You might need to change this value depending on the content of the page you've selected. If the bot can't fulfill this requirement, it doesn't archive any threads.

For detailed information on the meaning of all template parameters, see the archivebot script documentation on doc.wikimedia.org.

(slide 26 in workshop materials)

Once the template is present on the discussion page, run pwb.py archivebot -page:"<discussion page>" "User:MiszaBot/config". Open the discussion page and see how the number of threads decreased.

Resources:

Checking images[edit]

(slide 28 in workshop materials)

Use the checkimages script to automatically analyze images for problems, for example with description or license. To learn about this script, see Manual:Pywikibot/checkimages.py or the checkimages script documentation on doc.wikimedia.org.

Run the script by calling pwb.py checkimages -simulate -limit:10. This simulates the check of the newest 10 images. You can also specify a page with images that you want to check using the -page option.

Resources:

Running Pywikibot code in a notebook[edit]

Create a new PAWS notebook by opening the Launcher page and then clicking Python 3 (ipykernel) in the Notebook section.

Creating a basic page[edit]

(slide 29 in workshop materials)

Copy and paste the script below into the notebook and run it using the Run the selected cells and advance option in the toolbar. Be sure to specify a unique name to make sure you aren't creating a page that already exists.

create_page.py
from pywikibot import Site, Page

site = Site('test', 'wikipedia') #connect to test.wikipedia.org
page = Page(site, 'Pywikibot Workshop <name>') #title and page name

page.text = 'This is the second in our series of workshops' #page content
page.save()

Open the newly created page in your browser to verify that the script worked.

To learn more about writing your own scripts, see the following pages.

Resources:

Using page generators[edit]

Page generators allow you to run some scripts on a dynamically generated lists of pages. You don't have to know the names of all pages you want to change using a bot. Instead, the generators automatically create a list of these pages based on specific criteria, for example pages in selected categories, or pages that aren't watched by anyone.

To learn more about page generators, see the following pages.

Resources: