SQL/XML Dumps/Running a dump job

Running dump jobs
At some point you actually want to run one or more dumps jobs, for testing if nothing else. We’ve talked about the list of jobs that is assembled, and how just the jobs requested to run are marked to run, and how a given job runs. Today, let’s look at the worker.py script that is run from the command line, along with all of its options.

worker.py
This, like all python dump scripts, is a python3 script. Because debian stretch has python 2 as the default, we will have to invoke python3 explicitly:

ariel@snapshot1009:/srv/deployment/dumps/dumps/xmldumps-backup$ python3 ./worker.py --help

We won’t discuss all of the options here, just the most useful ones.

Sample commands
We talked about “stage files” a while back. These are all located in /etc/dumps/stages and they contain all of the worker.py commands that are run automatically, so let’s look at one of those. We have stage files for full (page content with all revisions) and partial dump runs, so let’s look at what’s required for a partial run. You cn look at a copy of the stages file from December 2020 which was used for this document.

Configuration
I’ve thought about replacing these standard python config format files with yaml or json. And that’s as far as it’s gotten: thinking about it.

The production configuration settings are in /etc/dumps/confs/wikidump.conf.dumps so let’s look at some of these settings. Note that the file is generated from a puppet template, so I can't link you to a copy of the completed config in our repo. A copy of the generated config file from December 2020 is available though for people to look at.

There may be other settings added from time to time; check the docs and the puppet manifests for details!