SQL/XML Dumps/A dump job using an existing MediaWiki script

Things to consider
Does your dump job run via a MediaWiki maintenance script (in core or an extension) or via some other command like mysqldump or a custom script? Does your dump job script write compressed output directly? Does your dump job script produce progress messages that can be used to judge the % of entries processed or to derive an ETA for when the job will complete?

Your job may integrate slightly differently than this example based on the answers to the above questions.

Code of the module
First, we need the code of the new dumps job python script; see below for the code to sample_job.py:

While we're here, we might as well see how it works. At this point, nothing here should be surprising.

Comments are inline so that anyone who checks out the repo can study this example and see how it works.

Wiring it in
Next we need to make the job known to the infrastructure. We do this by adding an entry for it in the dumpitemlist.py module : diff --git a/xmldumps-backup/dumps/dumpitemlist.py b/xmldumps-backup/dumps/dumpitemlist.py index fb9898ad4..a3c449d0b 100644 --- a/xmldumps-backup/dumps/dumpitemlist.py +++ b/xmldumps-backup/dumps/dumpitemlist.py @@ -20,6 +20,7 @@ from dumps.xmljobs import XmlLogging, XmlStub, AbstractDump from dumps.xmlcontentjobs import XmlDump, BigXmlDump from dumps.recompressjobs import XmlMultiStreamDump, XmlRecompressDump from dumps.flowjob import FlowDump +from dumps.sample_job import SitelistDump def get_setting(settings, setting_name): @@ -241,6 +242,8 @@ class DumpItemList: self.append_job_if_needed(            FlowDump("xmlflowhistorydump", "history content of flow pages in xml format", True)) +       self.append_job_if_needed(SitelistDump("sitelistdump", "List all sites.")) +        if self.wiki.config.revinfostash: recombine_prereq = self.find_item_by_name('xmlstubsdumprecombine') else: That’s all we change: an import line at the end of all of the imports near the top of the module, so that the class name is recognized without an icky old prefix, and adding it to the list of dump jobs that may or may not be run, passing in just what the constructor for the class needs, which isn’t so much.

Because this job is just for purposes of illustration and should not be run in a production environment, we also added a config switch that lets you disable jobs on all runs on all wikis; see the commit if you're interested in more details. Ordinarily you won't have to worry about that, since jobs you add will be jobs you want run :-)

Testing
Now we run it: [ariel@bigtrouble dumptesting]$ python ./worker.py --configfile ./confs/wikidump.conf.current:bigwikis --job sitelistdump Running elwikivoyage, jobs sitelistdump... 2020-09-01 10:43:53: elwikivoyage Checkdir dir /home/ariel/wmf/dumps/testing/xmldumps/dumpruns/public/elwikivoyage/20200901 ... 2020-09-01 10:43:53: elwikivoyage Checkdir dir /home/ariel/wmf/dumps/testing/xmldumps/dumpruns/private/elwikivoyage/20200901 ... 2020-09-01 10:43:53: elwikivoyage Cleaning up old dumps for elwikivoyage 2020-09-01 10:43:53: elwikivoyage No old public dumps to purge. 2020-09-01 10:43:53: elwikivoyage No old private dumps to purge. Preparing for job sitelistdump of elwikivoyage command /usr/bin/php /var/www/html/elwv/maintenance/exportSites.php --wiki=elwikivoyage php://stdout (3305310) started... command /usr/bin/gzip (3305311) started... returned from 3305311 with 0 2020-09-01 10:43:55: elwikivoyage Checkdir dir /home/ariel/wmf/dumps/testing/xmldumps/dumpruns/public/elwikivoyage/latest ... 2020-09-01 10:43:55: elwikivoyage Checkdir dir /home/ariel/wmf/dumps/testing/xmldumps/dumpruns/public/elwikivoyage/latest ... 2020-09-01 10:43:55: elwikivoyage adding rss feed file /home/ariel/wmf/dumps/testing/xmldumps/dumpruns/public/elwikivoyage/latest/elwikivoyage-latest-sitelist.xml.gz-rss.xml 2020-09-01 10:43:55: elwikivoyage Checksumming elwikivoyage-20200901-sitelist.xml.gz via md5 2020-09-01 10:43:55: elwikivoyage Checksumming elwikivoyage-20200901-sitelist.xml.gz via sha1 2020-09-01 10:43:55: elwikivoyage Checkdir dir /home/ariel/wmf/dumps/testing/xmldumps/dumpruns/public/elwikivoyage/latest ... 2020-09-01 10:43:55: elwikivoyage Checkdir dir /home/ariel/wmf/dumps/testing/xmldumps/dumpruns/public/elwikivoyage/latest ... 2020-09-01 10:43:55: elwikivoyage Completed job sitelistdump for elwikivoyage

Output check
And finally we check the output: [ariel@bigtrouble dumptesting]$ zcat /home/ariel/wmf/dumps/testing/xmldumps/dumpruns/public/elwikivoyage/20200901/elwikivoyage-20200901-sitelist.xml.gz  [ariel@bigtrouble dumptesting]$ Looks empty! But that’s because I’m testing on my local instance which has no wikifarm and hence no list of site instances. Nonetheless, the empty list is formatted properly and written to the correct location. Success!

Go forth and do likewise!
The end. Obligatory cute puppies link for reading this through to the end: