Manual:Pywikibot/Workshop Materials/How to run basic scripts (self-study)

This guide walks you through common Pywikibot scripts that you can run locally or on PAWS. It also teaches you how to start writing and running custom Pywikibot scripts on PAWS.

Prerequisites
Before following this guide, make sure that you understand how to use the terminal or command line on your computer, and how to run Python scripts. Also make sure that you have Pywikibot on your device. For information on how to install and configure Pywikibot, see any of the pages below:


 * Other small wiki toolkits workshops
 * Manual:Pywikibot

Having a basic familiarity with Jupyter notebooks should also help.

Additional resources :
 * wikibooks:Non-Programmer's_Tutorial_for_Python_3
 * Bash_Shell_Scripting
 * Jupyter Project documentation

Running Pywikibot scripts locally
Make sure that Pywikibot and its scripts are available in your local environment. The best way to install them is to download the code from the Pywikibot repository as described on Manual:Pywikibot/Installation. This allows you to run Pywikibot from the core directory by calling. The rest of this page assumes you have used this method of installing Pywikibot.

Run  and select the wiki you want to use with this guide. Test Wikipedia (https://test.wikimedia.org) is a good starting point if you have never used bots before. You can select it by picking wikipedia, followed by the test site code. Optionally, specify your user credentials.

Run  to log in to test Wikipedia.

Category script
To work with categories using Pywikibot, use the  script. To learn about it, see Manual:Pywikibot/category.py or the category script documentation on doc.wikimedia.org.

Adding categories


Manually create three pages on test Wikipedia, for example in your sandbox. Edit the first one and create a list of links to the other two pages.

To add a category to the two pages on the list, run.

Specify the name of the first page, and then a category to add - for example Test. The script then adds the category to all pages linked from the page you provided, but not directly to that page. Open the pages you created on test Wikipedia and notice that they now belong to the category you specified.

Removing categories


To remove a category from all pages in that category, run  and then specify a category to remove.

Creating a list of pages in a category


To create a list of pages in a given category, run. Specify a category name, for example Test, and the name of a list page to create. Open the newly created page to confirm that it lists all pages in the category.

Moving pages between categories


To move all pages from one category to another, run. Open one of the affected pages on test Wikipedia to confirm the script worked.

Page maintenance
This section covers different page maintenance utilities available using Pywikibot scripts. You can read more about each script by following links in specific sections.

Cleaning sandbox


Use the  script to clean up your sandbox page. To learn about this script, see Manual:Pywikibot/clean_sandbox.py or the clean_sandbox script documentation on doc.wikimedia.org

Run.

By default, this script sets the sandbox to display the  template. You can use the  option to set page content to custom text instead. Verify that the script worked by opening your sandbox page.

You can also use a page generator to dynamically define which pages the script should clean. For more information on page generators, see the Using page generators section at the end of this page.

Read the documentation for information about other interesting options, for example  and   that allow for basic scheduling of this bot.

Resources :
 * Manual:Pywikibot/clean_sandbox.py
 * clean_sandbox script documentation on doc.wikimedia.org

Checking for broken links


Use the  script to identify broken external links on a page. To learn about this script, see Manual:Pywikibot/weblinkchecker.py, or the weblinkchecker script documentation on doc.wikimedia.org.

This script requires a Python package called. Install it by running.

Run the script using. A good page to test this script on test Wikipedia is Aeroflot.

The script generates a binary file as a starting point for its comparisons. It produces a human-readable list of broken links after running again at least a week later by default. You can use the  option to change that. This mechanism helps you to avoid removing temporarily broken links.

Resources :
 * Manual:Pywikibot/weblinkchecker.py
 * weblinkchecker script documentation on doc.wikimedia.org

Deleting pages


You can use the  script to delete or restore specific pages. To learn about this script, see Manual:Pywikibot/delete.py or the delete script documentation on doc.wikimedia.org

Run the, and then confirm that the page you indicated was removed. You can follow it up with  if you want to restore it.

You can specify a page directly using the  option, or use page generators to construct a list of pages to delete. See the Using page generators section at the end of this page for more information.

Resources :
 * Manual:Pywikibot/delete.py
 * delete script documentation on doc.wikimedia.org

Running Pywikibot scripts on PAWS
Slides in workshop materials that introduce PAWS:
 * (slide 17)
 * (slide 18)
 * (slide 19)
 * (slide 20)
 * (slide 21)
 * (slide 22)
 * (slide 23)

PAWS is a Jupyter Notebook instance hosted by the Wikimedia Foundation. Jupyter Notebook is a web application that allows you to run code and present results without needing to install anything on your computer. To learn about PAWS and how to use it, see PAWS, and PAWS/Getting_started_with_PAWS.

To learn how to use PAWS with Pywikibot, see PAWS/PAWS_and_Pywikibot.

To learn more about Jupyter Notebook, see the Jupyter project documentation.

Running Pywikibot in PAWS terminal
Open the PAWS terminal by scrolling down the Launcher page to the Other section and selecting Terminal.

In PAWS, you can use Pywikibot without calling Python directly. For example, to log in to test Wikipedia, run.

Creating pages from a file
To create a new page based on the contents of a file, use the  script. To learn about it, see Manual:Pywikibot/pagefromfile.py or the pagefromfile script documentation on doc.wikimedia.org.

Create a new file in your PAWS workspace. This file can contain one or more pages and must follow the syntax rules outlined in the script's documentation: Manual:Pywikibot/pagefromfile.py. You can use the following code sample as your starting point.



{{Codesample
 * name = page.txt
 * code = {{-start-}}

Pywikibot Workshop In MONTH YEAR

This was held on DATE

List of Participants
}}
 * Participant 1
 * Participant 2



To create the page, run. Open the newly created page to confirm that the script worked.

Resources :
 * Manual:Pywikibot/pagefromfile.py
 * pagefromfile script documentation on doc.wikimedia.org

Archiving page discussion
Use the  script to archive old discussions. To learn about it, see Manual:Pywikibot/archivebot.py or archivebot script documentation on doc.wikimedia.org.

Before starting, select or create a page with discussion threads that you want to archive.

To automatically archive threads, you need to use a configuration template on the discussion page. One commonly used template is, documented on wikipedia:User:MiszaBot/Archive_HowTo.

Choose a discussion page with threads that you intend to archive.

To configure this page for automatic archiving, add the template at the top of the page, for example by using the following code.



This script uses the following parameters:
 * - name of the archive page. The script moves all archived threads to this page. You can generate its name dynamically using parameters described in the documentation.  is one such parameter.
 * - archiving algorithm - in this case, the script archives every thread that's at least 10 seconds old. Typically, you would set this option to a few weeks or months.
 * - used to dynamically generate the archive page name. You can change it to create a new archive page.
 * - minimum number of discussion threads to keep on the page. You might need to change this value depending on the content of the page you've selected. If the bot can't fulfill this requirement, it doesn't archive any threads.
 * - minimum number of threads to archive. You might need to change this value depending on the content of the page you've selected. If the bot can't fulfill this requirement, it doesn't archive any threads.

For detailed information on the meaning of all template parameters, see the archivebot script documentation on doc.wikimedia.org.



Once the template is present on the discussion page, run. Open the discussion page and see how the number of threads decreased.

Resources :
 * Manual:Pywikibot/archivebot.py
 * archivebot script documentation on doc.wikimedia.org

Checking images


Use the  script to automatically analyze images for problems, for example with description or license. To learn about this script, see Manual:Pywikibot/checkimages.py or the checkimages script documentation on doc.wikimedia.org.

Run the script by calling. This simulates the check of the newest 10 images. You can also specify a page with images that you want to check using the  option.

Resources :
 * Manual:Pywikibot/checkimages.py
 * checkimages script documentation on doc.wikimedia.org

Running Pywikibot code in a notebook
Create a new PAWS notebook by opening the Launcher page and then clicking Python 3 (ipykernel) in the Notebook section.

Creating a basic page


Copy and paste the script below into the notebook and run it using the Run the selected cells and advance option in the toolbar. Be sure to specify a unique name to make sure you aren't creating a page that already exists.

Open the newly created page in your browser to verify that the script worked.

To learn more about writing your own scripts, see the following pages.

Resources :
 * Manual:Pywikibot/Create_your_own_script
 * Using Pywikibot as a library in your own scripts
 * How to write a basic script using Pywikibot - workshop materials
 * TODO - link to the self-study guide for the How to write a basic script via Pywikibot workshop

Using page generators
Page generators allow you to run some scripts on a dynamically generated lists of pages. You don't have to know the names of all pages you want to change using a bot. Instead, the generators automatically create a list of these pages based on specific criteria, for example pages in selected categories, or pages that aren't watched by anyone.

To learn more about page generators, see the following pages.

Resources :
 * Page generators documentation
 * Page generators on doc.wikimedia.org