Manual:Pywikibot/Workshop Materials/How to host a bot on Toolforge (self-study)

From mediawiki.org
This page is based on the How to host a bot on Toolforge workshop that first took place in June 2022[1]. To watch the recording and access meeting notes from this workshop, see meta:Small wiki toolkits/Workshops. For slides, see meta:File:Bot workshop - SWT June 2022.pdf.

This self-study guide will walk you through the process of getting access to Toolforge, setting it up for Pywikibot, and running a bot on Toolforge manually and automatically. You will learn all this using existing materials available on different wikis.

Prerequisites[edit]

Before following this self-study guide, be sure that you understand how to run Pywikibot scripts locally, on your own machine. For information on how to do that, see other small wiki toolkits workshops), Manual:Pywikibot, and How to run basic scripts (self-study).

It will also help if you have a basic understanding of Linux terminal, Bash, and SSH. The list below provides useful resources if you want to learn more about any of these subjects.

Accessing Toolforge[edit]

What is Toolforge?[edit]

(slide 2 in original workshop materials)

Toolforge is a shared hosting platform supported by Wikimedia Foundation staff and volunteers. It provides users with a Linux machine they can use, for example, to run bots or host a website. Toolforge is extensively documented on wikitech:Help:Toolforge - use this portal if you have any questions or want to learn more than covered in this guide.

To use Toolforge, you must agree to its terms and conditions. For details, see wikitech:Portal:Toolforge/Quickstart#Terms_and_conditions.

Creating a developer account and setting up access to Toolforge[edit]

(slide 3 - differences between SUL and developer account; slide 4 - creating a developer account;slide 5 - creating an SSH key)

To use Toolforge for your bot, you need the following:

  • Wikimedia developer account (this is different from your Wikipedia or SUL account)
  • Toolforge membership
  • Tool account on Toolforge
  • SSH key you will use to log in to Toolforge

To fulfill these requirements, follow the Getting started with Toolforge - Quickstart guide.

If you need clarification on any of the steps in this process, check the list of resources below.

When creating the account, pay extra attention to the UNIX shell username. That is the username you will use when logging in to Toolforge.

Resources:

Logging in to Toolforge[edit]

(slide 6 in workshop materials)

To log in to Toolforge on Linux or macOS, use the ssh command in your terminal, for example ssh <your UNIX shell name>@login.toolforge.org. On Windows, you can try the same command in the command line, PowerShell, or Git Bash (you might have SSH installed), or follow the instructions on wikitech:Help:Access_to_Toolforge_instances_with_PuTTY_and_WinSCP.

It's impossible to log in to Toolforge using your developer account login and password. You can only log in to Toolforge using your UNIX shell username and SSH key. The only password you will need is the password for that key.

Resources:

Running a bot on Toolforge[edit]

Setting up Pywikibot[edit]

To set up your tool account to run Pywikibot, follow the instructions on wikitech:Help:Toolforge/Pywikibot#Installation_and_setup. Note that if you intend to use Toolforge to run a script that comes with Pywikibot, you should follow the instructions for installing Pywikibot from git. If you plan to run Pywikibot with a custom script you wrote yourself, you can install Pywikibot from PyPI.

Accessing your code on Toolforge[edit]

If you intend to run one of the built-in Pywikibot scripts, you can skip this section.

If you have a bot that you wrote yourself, you will want to make it available to the tool's account, for example in its $HOME directory. The following list describes different ways to do this.

  • You can open a new file in nano or vim (in edit mode), and copy the contents of your script file directly into the editor in your terminal. You can do this using standard copy and paste key combinations (Ctrl+C and Ctrl+V on Windows and Linux, and Command+C and Command+V on macOS), or combinations supported by your terminal application (for example Ctrl+Shift+C and Ctrl+Shift+V or Command+Shift+C and Command+Shift+V). Some terminals also allow you to right-click and choose Copy or Paste from the context menu. After pasting your code remember to save the file in the text editor.
  • You can use the scp command to copy and paste a file through SSH. See Using SCP for information on how to do this. You can also use FileZilla as described in Access to Toolforge instances with PuTTY and WinSCP - FileZilla.
  • You can commit your code into a git repository, for example on GitLab, and then run git clone <link to your code repository> after switching to your tool account (using become <your tool name>).

Running your bot[edit]

To run Pywikibot on Toolforge, use the jobs framework. This method ensures that your bot runs in a dedicated environment with enough resources, instead of directly using the Toolforge login machine you accessed using SSH. This is important when your bot performs many operations on more than one page. Running such a bot directly in shell could slow down the login.toolforge.org machine and make it unusable for others.

To run your bot, either immediately in the background, or automatically - according to a schedule - follow the instructions in wikitech:Help:Toolforge/Pywikibot#Create_jobs.

When running the bot automatically using a schedule, notice that the --schedule option uses the crontab format to specify when the bot should run. Crontab is commonly used to schedule tasks on UNIX-based operating systems. There are many helpful resources that describe it online. One such resource is https://crontab.guru, but typing "crontab run every 5 minutes" in a search engine should return plenty of useful resources.

Note that the command you run using the jobs framework doesn't need to use python or pwb directly. If your bot is more complicated, for example requires that you run preparatory commands consecutively, you can wrap all commands in a shell script and run that shell script instead.

Resources:


  1. ↑ This workshop originally covered running PHP bots directly and in the background - using Grid Engine (starting at around 37 minutes into the recording). These methods are no longer recommended. This page explains how to use the jobs framework instead.