Manual:Pywikibot/Workshop Materials/How to host a bot on Toolforge

From mediawiki.org

This page is a resource for workshop organizers who want to run a workshop about hosting bots on Toolforge. By following the instructions on this page, you will learn everything you need to know to teach others about this subject. This page features a complete script based on a past workshop, along with useful tips for people with little experience running such events.

Note that you don't need to follow this guide to the letter. Feel free to adapt it to your needs, and contribute back to it based on your experience.

If you want to learn more about running bots on Toolforge before committing to running a workshop, or need more concise materials about this subject, see How to host a bot on Toolforge (self-study).

For details of the past workshop on this subject, including the recording and the slides, see meta:Small_wiki_toolkits/Workshops.

How to use this guide?[edit]

Read the guide in its entirety to decide if the content of this workshop matches your expectations. Change whatever you feel doesn't match your style or isn't interesting to your audience.

Next, read the relevant linked materials to make sure you know everything you need to run this workshop.

Finally, prepare for the workshop by going through the #Preparation section.

Requirements[edit]

To run this workshop you should know how to run bots on Toolforge. This means basic familiarity with Pywikibot, terminal, SSH, Bash scripting, and scheduling jobs. You should also understand how to manage your developer account.

Preparation[edit]

Organizer[edit]

To prepare to run this workshop, follow the steps below.

  1. Create or choose an existing Pywikibot script that you want to run on Toolforge.
  2. Ensure you have access to Toolforge.
  3. Create a new Toolforge tool using the Toolforge admin console.
  4. Prepare your tool on Toolforge based on wikitech:Help:Toolforge/Pywikibot.
  5. Upload or copy and paste your Pywikibot script to the tool's home directory and make sure it works.
    • You can copy and paste it into nano or vim (in edit mode) directly from your clipboard.
    • You can place the script in a git repository - for example on https://gitlab.wikimedia.org, see GitLab for details - and clone it from there to your tool's home on Toolforge.
    • You can use scp or a similar command to copy a file from your computer to your Toolforge account. For example: scp script.sh <shell username>@login.toolforge.org:/home/<shell username>.
  6. Make the following decisions on how you want to run the workshop
    • Do you want the participants to follow along and perform the described activities immediately as you talk about them, or wait until a designated time to perform the activities?
    • Do you want the participants to interrupt you to ask questions, or wait for the designated time instead?
    • Do you want to record the workshop? If so, ask the participants if they consent to being recorded.
  7. Share your decisions during the introduction to the workshop.
  8. Consider if there are any prerequisites or preparations required from the participants and share them before the workshop.

Participants[edit]

Be sure to inform workshop participants about the following points they need to consider to prepare for this workshop.

  • Participants will create their developer accounts (if they don't have them yet). To do this, they will have to come up with:
    • a developer account username
    • a shell username (required to log in to Toolforge)
    • two unique passwords - one for their developer account, another for their SSH key
  • Each participant needs their own SUL account (regular Wikipedia account) to create a developer account as well.

It's also helpful for participants to have basic familiarity with the concepts listed below before attending this workshop:

  • Terminal or command line usage
  • SSH and SSH authentication keys
  • Shared hosting

Workshop script[edit]

Introduction[edit]

Start sharing your screen, start the recording if you planned to record.

Introduce yourself and the topic of the workshop. Explain the organizational choices and rules you've decided upon (for example, whether the participants should interrupt you to ask questions, try to follow along or not - see #Organizer for details).

What is Toolforge?[edit]

(slide 2)

Explain that Toolforge is a shared hosting platform supported by Wikimedia Foundation staff and volunteers. It provides users with a Linux machine they can use, for example, to run bots or host a website.

Present the wikitech:Help:Toolforge page - explain that this is where the participants can find a lot of extra materials if they ever need help.

Source: wikitech:Portal:Toolforge/About_Toolforge

Accessing Toolforge[edit]

Differences between SUL and developer account[edit]

(slide 3)

Explain the difference between SUL and the developer account. You use the former to log in to different wikis. The latter - to log in to Wikitech, Toolforge, Gerrit, and other services.

Explain that participants need to have the SUL account to create a developer account in the way used in this workshop.

Clarify that the developer account grants users access to Wikitech, Gerrit, Phabricator, in addition to Toolforge.

Sources:

Creating a developer account[edit]

(slide 4)

Explain how to create a developer account in the Toolforge admin console. Mention that it's also possible to create a developer account on Wikitech, so if any participant already did this, they won't be able to create a new account with the same username.

Go through the list of information required during account creation. Make it clear that a developer account requires a SUL account, a username, shell username, and a password. Note that some people might have trouble coming up with these during the workshop, so it's a good idea to include these in communication sent to participants beforehand (see #Participants).

Clarify that to access Toolforge, you use your UNIX shell username, so it's a good idea to make it simple and memorable.

Source: wikitech:Help:Create_a_Wikimedia_developer_account

Requesting access to Toolforge and creating a tool[edit]

Explain that creating the account doesn't immediately grant access to Toolforge. Show how to log in to the Toolforge admin console and create a new membership request.

Clarify that the request might be declined. You need to provide a valid reason to use Toolforge. Also mention that using Toolforge requires that you accept its terms and conditions.

Once the request is approved, which can take a couple of days because it's done manually, you can create a new tool.

Explain how to create a new tool using the tool creation option in the admin console. Explain that what this does is actually create a tool account and a place on Toolforge for you to host and work with the tool.

Sources:

Creating an SSH key[edit]

(slide 5)

Explain what SSH is, what's an SSH key, and what's the difference between a private and public SSH key.

Mention that the process of creating an SSH key pair might be a bit different on different operating systems.

Clarify that participants might already have their own ssh keys created for another purpose. These keys are typically in ~/.ssh.

Note that the two most popular SSH key types are RSA (most popular and compatible) and Ed25519 (more modern, equally secure, usually faster). If you don't specify a type using the -t flag, it will default to using RSA.

Create the key using the ssh-keygen command while explaining each step of the creation process. Mention that it's possible to create a key without a password. If you don't want to have to type it in every time you use the key, do not provide a password when creating it.

Point out that ssh-keygen created the key files inside the ~/.ssh directory.

Sources:

Adding your SSH key to Toolsadmin[edit]

Show how to add the SSH key to your account in the admin console. Display the content of the public key file, for example by issuing the cat ~/.ssh/<key file>.pub command. Copy the public key to the text field on Admin console > Settings (in user menu) > SSH Keys.

Explain that this tells Toolforge to expect the private key matching this public key when you try to authenticate. You can have more than one key pair that you use to connect to Toolforge. It's a good practice to have a separate key pair for each computer you use.

Source: wikitech:Portal:Toolforge/Quickstart#Getting_started_with_Toolforge_-_Quickstart

Logging in to Toolforge[edit]

(slide 6)

Explain how to log in to Toolforge by executing ssh <your shell username>@login.toolforge.org. Emphasize that you are using the shell username (not the developer account name), and the password you used when creating the key (and not the developer account password).

After successfully logging in, use the become <tool name> command to switch to the tool account. Clarify that the name must match the name of the tool you created in the admin console. Highlight that the prompt in the terminal changes when the become command is successful.

Sources:

Running the bot on Toolforge with Jobs framework[edit]

Getting ready to run the bot[edit]

Explain that because Pywikibot sees a lot of development, it makes sense to control Python and Pywikibot versions when using Toolforge. This is why the recommended method described in this workshop requires a usage of both, a Python virtual environment, and a dedicated Python image and container.

Explain what a virtual environment is and what an image and container are.

Describe the environment you have set up. Explain that to get access to a specific version of Python, you used the webservice python3.9 shell command (using the python3.9 image).

In this new shell, with that specific version of Python, you ran python3 -m venv <virtual environment name and directory> to create your virtual environment.

You activated the virtual environment by running source <virtual environment directory>/bin/activate and then installed Pywikibot and any other necessary libraries.

Note that the recommended way of installing Pywikibot if you intend to use built-in scripts, is using Git. If you do not plan to make use of any built-in scripts, it's more convenient to install the package from PyPI. Choose and describe the selected method depending on the script you intend to present.

Explain that after installing Pywikibot you ran the pwb generate_user_files command to perform Pywikbot's initial setup.

Explain that the setup process described on Wikitech (link in Sources) wraps the commands you have described into a single script and then runs that script within the container. Running that script works the same as running individual commands inside a container shell.

Sources:

Running the bot once[edit]

Explain that you will run your bot using the jobs framework. Describe what the jobs framework is and what you can use it for. This framework uses Kubernetes to run the bot in the background, ensuring a dedicated environment with enough resources.

Run the bot once by issuing the toolforge-jobs run <job name> --command "$HOME/<your script or command>" --image python3.9

Explain what this command does:

  • toolforge-jobs is a command you use to run a job in the background
  • --command indicates the command to run in the job. In this case, you are running a built-in Pywikibot script
  • --image indicates what type of container image Kubernetes should use to run the command. In this case, it's python3.9 - the same image you used to create the Python virtual environment and install the appropriate packages with pip. This ensures compatibility between the setup environment and the execution environment.

Show the results of running the bot. For example, if your bot added some text to a specific page, display that page and point out the addition. You can run the script multiple times to make the addition easier to spot.

Sources:

Scheduling the bot to run automatically[edit]

Explain that you can schedule a bot to run in the background, for example every five minutes, using the same toolforge-jobs command. In addition to the arguments and parameters passed before, you need to specify the --schedule option as well. This parameter uses the crontab entry format. For example, to run a bot every five minutes, use the schedule defined as */5 * * * *.

Explain that the crontab format is commonly used to schedule tasks on Unix-like operating systems, and there are helpful resources that describe it online. One such resource is https://crontab.guru, but typing "crontab run every 5 minutes" in a search engine should return plenty of useful resources.

Show different crontab schedules to give participants a better idea of how they work.

If your bot is simple enough, you can schedule it to automatically run every minute using the * * * * * schedule. Do that using the following command: toolforge-jobs run <job name> --command "$HOME/<your script or command>" --image python3.9 --schedule "* * * * *". Mention that you will run the bot every minute to show how it works. It's generally better to schedule your bots to run less often, every five minutes at most.

Again, explain all arguments in this command, putting extra emphasis on the --schedule option. You can then show that this job is scheduled to run in the background. Run toolforge-jobs show <job name>. Explain what the different fields mean.

Wait a short while and then show the page changed by the bot to demonstrate how the background job ran correctly. To emphasize this point, you can log out of Toolforge and close your terminal if you prefer.

Log back in to Toolforge (ssh <user>@login.toolforge.org, followed by become toolname).

Show how to delete a scheduled job by running toolforge-jobs delete <job name>. Wait a short while to demonstrate that the page that was previously changed by the bot remains unchanged now.

Source: wikitech:Help:Toolforge/Jobs_framework#Creating_scheduled_jobs_(cron_jobs)

Closing[edit]

Thank the participants and other organizers. Leave some time for final questions the participants might have. List links to additional materials on your last slide (if you are using them), and let the participants know where they can find the slides and any helpful resources.