Manual:Pywikibot/Workshop Materials/How to host a bot on Toolforge (self-study)

This is a self-study guide for the How to host a bot on Toolforge workshop that first took place in June 2022. To watch the recording and access meeting notes from this workshop, see meta:Small_wiki_toolkits/Workshops. For slides, see. Note that the workshop originally covered running PHP bots directly and in the background - using Grid Engine (starting at around 37 minutes into the recording). These methods are no longer recommended. This self-study page explains how to use the jobs framework instead.

Prerequisites
Before following this self-study guide, be sure that you understand how to run Pywikibot scripts locally, on your own machine. This is covered in other Small wiki toolkits workshops (see small wiki toolkits workshops), Manual:Pywikibot, and in other self-study guides (TBA).

It will also help if you have a basic understanding of Linux terminal, Bash, and SSH.


 * Manual:Pywikibot
 * Non-Programmer%27s_Tutorial_for_Python_3
 * Python
 * Linux_Guide/Using_the_shell
 * Bash_Shell_Scripting
 * Internet_Technologies/SSH

What is Toolforge?


Toolforge is a shared hosting platform supported by Wikimedia Foundation staff and volunteers. It provides users with a Linux machine they can use, for example, to run bots or host a website.

Toolforge is extensively documented on wikitech:Portal:Toolforge - use this portal if you have any questions or want to learn more than covered in this guide.

To use Toolforge, you must agree to its terms and conditions. For details, see wikitech:Portal:Toolforge/Quickstart.

Creating a developer account and setting up access to Toolforge
; ; To use Toolforge for your bot, you need the following:


 * Wikimedia developer account (this is different from your Wikipedia or SUL account)
 * Toolforge membership
 * Tool account on Toolforge
 * SSH key you can use to log in to Toolforge

To fulfill these requirements, follow the Getting started with Toolforge - Quickstart guide.

If you need additional clarification on any of the steps in this process, check the list of additional resources below.

When creating the account, pay special attention to the UNIX shell username. That is the user name you will use when logging in to Toolforge.

Resources:


 * Generating a new SSH key
 * Instead of adding the SSH key to your developer account on Wikitech, you can do that directly in the admin console: https://toolsadmin.wikimedia.org/profile/settings/ssh-keys/

Logging in to Toolforge


To log in to Toolforge, follow the instructions on wikitech:Help:Access_to_Toolforge_instances_with_PuTTY_and_WinSCP. Note that on Linux you can use the  command in your terminal, for example.

It is not possible to log in to Toolforge using your developer account login and password. You can only log in to Toolforge using your SSH key, and the only other credentials you will need are your UNIX shell username (login), and the password needed to unlock that SSH key.

Resources:


 * Internet_Technologies/SSH

Setting up Pywikibot
To set up your tool account to run Pywikibot, follow the instructions on wikitech:Help:Toolforge/Pywikibot. Note that if you intend to use Toolforge to run a script that comes with Pywikibot, you should follow the instructions for installing Pywikibot from git. If you plan to run Pywikibot with a custom script you wrote yourself, you can install Pywikibot from PyPI.

Accessing your code on Toolforge
If you intend to run one of the built-in Pywikibot scripts, you can skip this section.

If you have a bot that you wrote yourself, you will want to make it available to the tool's account, for example in its  directory. There are multiple ways to do this.


 * You can open a new file in  or   (in edit mode), and copy the contents of your script file directly into the editor in your terminal. You can usually do this using standard copy and paste key combinations (Ctrl+C and Ctrl+P on Windows and Linux, and Command+C and Command+P on macOS), or combinations supported by your terminal application (for example Ctrl+Shift+C and Ctrl+Shift+P or Command+Shift+C and Command+Shift+P). Some terminals also allow you to right-click and choose Copy or Paste from a context menu. After pasting your code remember to save the file in the text editor.
 * You can use the  command to copy and paste file through SSH. See Internet_Technologies/SSH for information on how to do this. You can also use FileZilla as described in wikitech:Help:Access_to_Toolforge_instances_with_PuTTY_and_WinSCP.
 * You can commit your code into a git repository, for example on GitLab, and then use the  command to download that code to Toolforge.

Running your bot
To run Pywikibot on Toolforge, you should use the jobs framework. This method ensures that your bot runs in a dedicated environment with sufficient resources, instead of directly using the Toolforge login machine you accessed using SSH. This is especially important when your bot performs many operations on multiple pages. Running such a bot directly in shell could potentially slow down the login.toolforge.org machine and make it unusable for others.

To run your bot, either immediately in the background, or automatically - according to a schedule - follow the instructions in wikitech:Help:Toolforge/Pywikibot.

When running the bot automatically using a schedule, notice that the  option uses the crontab format to specify when the bot should run. Crontab is commonly used to schedule tasks on UNIX-based operating systems, and there are many helpful resources that describe it online. One such resource is https://crontab.guru, but typing "crontab run every 5 minutes" in a search engine should return plenty of resources that will provide the appropriate schedule. To learn more about it, see Cron.

To learn more about scheduling jobs to run automatically, see wikitech:Help:Toolforge/Jobs_framework.

Note that the command you run using the jobs framework does not need to run python or pwb directly. If your bot is more complicated, for example requires that you run multiple preparatory commands consecutively, you can wrap all commands in a shell script.