Manual:Pywikibot/Workshop Materials/How to host a bot on Toolforge

For details of the past workshop on this subject, including the recording and the slides, see meta:Small_wiki_toolkits/Workshops.

TODO: Link to any general tips and tricks for workshop organizers if they exist somewhere.

Requirements
To successfully run this workshop you should know how to run bots on Toolforge. This means basic familiarity with a terminal, SSH, Bash scripting, and scheduling jobs with cron. You should also understand how to manage your developer account.

Organizer
To prepare to run this workshop, follow the steps below.


 * 1) Create or choose an existing Pywikibot script that you want to run on Toolforge.
 * 2) Ensure you have access to Toolforge.
 * 3) Create a new Toolforge tool using the Toolforge admin console.
 * 4) Prepare your tool on Toolforge based on wikitech:Help:Toolforge/Pywikibot.
 * 5) Upload or copy and paste your Pywikibot script to the tool's home directory and make sure it works correctly.
 * 6) * You can copy and paste it into nano or vim (in edit mode) directly from your clipboard.
 * 7) * You can place the script in a git repository (for example on https://gitlab.wikimedia.org) and clone it from there directly to your tool's home on Toolforge.
 * 8) * You can use  or a similar command to copy a file from your computer to your Toolforge account. For example:.
 * 9) Make the following decisions on how you want to run the workshop
 * 10) * Do you want the participants to follow along and perform the described activities immediately as you talk about them, or wait until a designated time to perform the activities?
 * 11) * Do you want the participants to interrupt you to ask questions, or wait for the designated time instead?
 * 12) * Do you want to record the workshop? If so, ask the participants if they consent to being recorded.
 * 13) Share your decisions during the introduction to the workshop.
 * 14) Consider if there are any prerequisites or preparations required from the participants.

Participants
Be sure to inform workshop participants about the following points they need to consider to prepare for this workshop.


 * Participants will create their developer accounts (if they don't have them yet). To do this, they will have to come up with:
 * a developer account username
 * a shell username (required to log in to Toolforge)
 * a unique password
 * Each participant needs their own SUL account (regular Wikipedia account) to create a developer account as well.

It will also be helpful for participants to have basic familiarity with the concepts listed below before attending this workshop:


 * Terminal or command line usage
 * SSH and SSH authentication keys
 * Shared hosting concepts

Introduction
Start sharing your screen, ensure the recording is enabled if you planned to record.

Introduce yourself and the topic of the workshop. Explain the organizational choices and rules you've decided upon (for example, whether the participants should ask questions immediately or later, try to follow along or not - see for details).

What is Toolforge?


Explain that Toolforge is a shared hosting platform supported by Wikimedia Foundation staff and volunteers. It provides users with a Linux machine they can use, for example, to run bots or host a website.

Present the wikitech:Portal:Toolforge page - explain that this is where the participants can find a lot of additional materials if they ever need help.

Source: wikitech:Portal:Toolforge/About_Toolforge

Differences between SUL and developer account


Explain the difference between SUL and the developer account. The former is used to log in to different wikis. The latter - to log in to Wikitech, Toolforge, Gerrit, and other services.

Explain that participants need to have the SUL account to create a developer account in the way used in this workshop.

Clarify that the developer account grants users access to Wikitech, Gerrit, Phabricator, in addition to Toolforge.

Sources:


 * meta:Help:Unified_login
 * wikitech:Help:Create_a_Wikimedia_developer_account

Creating a developer account


Explain how to create a developer account in the Toolforge admin console. Mention that it is also possible to create a developer account on Wikitech, so if any participant already did this, they will not be able to create a new account with the same username.

Iterate over the list of information required during account creation. Make it clear that a developer account requires a SUL account, a username, shell username, and a password. Note that some people might have trouble coming up with these during the workshop, so it is a good idea to include these in communication sent to participants beforehand (see ).

Clarify that the UNIX shell username is used to access Toolforge so it is a good idea to make it simple and memorable.

Source: wikitech:Help:Create_a_Wikimedia_developer_account

Requesting access to Toolforge and creating a tool
Explain that creating the account does not immediately grant access to Toolforge. Demonstrate how to log in to the Toolforge admin console and create a new membership request.

Clarify that the request might be declined. You need to provide a valid reason to use Toolforge. Additionally, mention that using Toolforge requires that you accept its terms and conditions.

Once the request is approved, which can take a couple of days because it is done manually, you can create a new tool.

Explain how to create a new tool using the tool creation option in the admin console. Explain that what this does is actually create a tool account and a place on Toolforge for you to host and work with the tool.

Sources:


 * wikitech:Help:Create_a_Wikimedia_developer_account
 * wikitech:Portal:Toolforge/Tool_Accounts

Creating an SSH key


Explain what SSH is, what is an SSH key, and what is the difference between a private and public SSH key.

Mention that the process of creating an SSH key pair might be a bit different on different operating systems.

Clarify that participants might already have their own ssh keys created for another purpose. These keys are typically in.

Note that the two most popular SSH key types are RSA (most popular and compatible) and Ed25519 (more modern, equally secure, usually faster). If you don't specify a type using the  flag, it will default to using RSA.

Create the key using the  command while explaining each step of the creation process. Mention that it is possible to create a key without a password. If you don't want to have to type it in every time you use the key, do not provide a password when creating it.

Point out that the files are created inside the  directory.

Sources:


 * SSH_keys
 * Secure_Shell
 * Public-key_cryptography
 * RSA_(cryptosystem)
 * EdDSA

Adding your SSH key to Toolsadmin
Demonstrate how to add the SSH key to your account in the admin console. Display the content of the public key file, for example by issuing the  command. Copy the public key to the text field on Admin console > Settings (in user menu) > SSH Keys.

Explain that this tells Toolforge to expect the private key matching this public key when you try to authenticate. You can have more than one key pair that you use to connect to Toolforge. It is a good practice to have a separate key pair for each computer you use.

Source: wikitech:Portal:Toolforge/Quickstart

Logging in to Toolforge


Explain how to log in to Toolforge by executing. Emphasize that you are using the shell username (not the developer account name), and the password you used when creating the key (and not the developer account password).

After successfully logging in, use the  command to switch to the tool account. Clarify that the name must match the name of the tool previously created in the admin console. Highlight that the prompt in the terminal changes when the  command is successful.

Sources:


 * wikitech:Help:Access_to_Toolforge_instances_with_PuTTY_and_WinSCP
 * wikitech:Portal:Toolforge/Tool_Accounts

Running the bot directly
Demonstrate how to run the bot directly by issuing a command in shell using the tool account. For example, run  if your bot is a PHP script.

Show the results of such code execution. For example, if your script added some text to a specific page, display that page and point out the addition. You can run the script multiple times to make the addition easier to spot.

Explain that running a short, simple bot like this directly from shell is OK, but doing so for a longer script, for example when it is operating on hundreds of pages, is not. This is because the machine you are currently logged in to has a limited amount of resources, and if every tool developer logged in and executed scripts in this way, these resources would run out.

For a longer script, or a script that you want to execute in the background, use Grid Engine.

Running the bot on Grid Engine
Demonstrate how to run your script once using Grid Engine. Create a single script file that will wrap your execution commands into a single call. For example, create a file called  with the following content.

Explain that having a wrapper file like this is particularly helpful when your run your bot with multiple arguments.

Demonstrate how to run this script on Grid Engine by issuing the following command:. This submits the script to be executed on Grid Engine, in the background, when the appropriate resources become available.

If your bot takes some time to execute, you can also demonstrate the use of the  command to display the list of submitted bot executions.

Again, show how running the bot in this way also resulted in the appropriate changes to the page you opened earlier.

Source: wikitech:Help:Toolforge/Grid

Scheduling the bot to run automatically
Explain that you will now schedule a bot to run on Grid Engine using cron. Explain that cron is a scheduling mechanism available on Toolforge that allows you to execute any script or bot automatically, at a specific time or interval. For example, you can configure cron to execute the  script every minute, which is what you want to do now.

Mention that you will run the bot every minute to demonstrate how it works. It is generally better to schedule your bots to run less often, every five minutes at most.

Explain that cron uses a unique scheduling format, and you might have to search for the correct code or use a tool, for example crontab.guru. The correct schedule to run your bot every minute is.

Run  to open your crontab and add an entry for your bot to run every minute.

Explain what this entry means. The first five characters indicate the schedule you designed earlier. The second part is the full path to the bot script you placed in your tool's home directory. Save the file and exit.

Open the page with bot's modifications as you did earlier. Every minute, refresh the page to demonstrate how the text is added by the bot without your intervention. You can log out of Toolforge and close your terminal to make it more obvious.

Demonstrate how to delete a scheduled job by reopening crontab (log back in to Toolforge, run, followed by  ) and removing or commenting out the entry you added.

Sources:


 * wikitech:Help:Toolforge/Grid
 * crontab.guru

Getting ready to run the bot
Explain that because Pywikibot sees a lot of development, it makes sense to closely control Python and Pywikibot versions when using Toolforge. This is why the recommended method described in this workshop requires a usage of both, a Python virtual environment, and a dedicated Python image and container.

Briefly explain what a virtual environment is and what an image and container are.

Describe the execution environment you have set up. Explain that to get access to a specific version of Python, you used the  command (specifically, using the python3.9 image).

In this new shell, with that specific version of Python, you ran  to create your virtual environment.

You activated the virtual environment by running  and then installed Pywikibot and any other necessary libraries.

Note that the recommended way of installing Pywikibot if you intend to use built-in scripts is using Git. If you do not plan to make use of any built-in scripts, it is more convenient to install the package from PyPI. Choose and describe the selected method depending on the script you intend to present.

Explain that after installing Pywikibot you ran the  command to perform Pywikbot's initial setup.

Explain that the setup process described on Techwiki (link in Sources) wraps the commands you have described into a single script and then runs that script within the container. This method of running that script works the same as running individiual commands inside a container shell.

Sources:


 * https://wikitech.wikimedia.org/wiki/Kubernetes/Images
 * https://wikitech.wikimedia.org/wiki/Help:Toolforge/Pywikibot

Running the bot once
(running the bot using the wikitech:Help:Toolforge/Jobs_framework)

Explain that you will run your bot using the jobs framework. Describe what the jobs framework is and what you can use it for. This framework uses Kubernetes to execute the bot in the background, in a dedicated environment with sufficient resources.

Run the bot once by issuing the

Explain what this command does:


 * is a command you use to execute a job in the background
 * indicates the command to run in the job. In this case, it is an execution of a Pywikibot script
 * indicates what type of container image Kubernetes should use as an execution environment. In this case, it is  - the same image you used to create the Python virtual environment and install the appropriate packages with  . This ensures compatibility between the setup environment and the execution environment.

Show the results of running the bot. For example, if your bot added some text to a specific page, display that page and point out the addition. You can run the script multiple times to make the addition easier to spot.

Sources:


 * https://wikitech.wikimedia.org/wiki/Help:Toolforge/Jobs_framework
 * https://wikitech.wikimedia.org/wiki/Help:Toolforge/Pywikibot

Scheduling the bot to run automatically
(running the bot based on a schedule using wikitech:Help:Toolforge/Jobs_framework)

Explain that scheduling a bot to run in the background, for example every five minutes, is done using the same  command. In addition to the arguments and parameters passed before, you need to specify the  option as well. This parameter uses the crontab entry format. For example, to run a bot every five minutes, use the schedule defined as.

Explain that the crontab format is commonly used to schedule tasks on UNIX-based operating systems, and there are many helpful resources that describe it online. One such resource is https://crontab.guru, but typing "crontab run every 5 minutes" in a search engine should return plenty of resources that will provide the appropriate schedule.

Demonstrate a few different crontab schedules to give participants a better idea how they work.

If your bot is simple enough, you can schedule it to automatically run every minute using the  schedule. Do that using the following command:. Mention that you will run the bot every minute to demonstrate how it works. It is generally better to schedule your bots to run less often, every five minutes at most.

Again, explain all arguments in this command, putting extra emphasis on the. Explain what the different fields mean.

Wait a short and then show the page changed by the bot to demonstrate how the background job ran correctly. To emphasize this point, you can log out of Toolforge and close your terminal if you prefer.

Log back in to Toolforge (, followed by ).

Demonstrate how to delete a scheduled job by running. Wait a short while to demonstrate that the page that was previously changed by the bot remains the same now.

Source: https://wikitech.wikimedia.org/wiki/Help:Toolforge/Jobs_framework#Creating_scheduled_jobs_(cron_jobs)

Closing
Thank the participants and other organizers. Leave some time for final questions the participants might have. List links to additional materials on your last slide (if you are using them), and let the participants know where they can find the slides and any helpful resources.

Frequently asked questions
(questions asked by participants during previous workshops)

Resource reference
(list all links for quick access)