SQL/XML Dumps/A dump job using an existing MediaWiki script

Things to consider
Does your dump job run via a MediaWiki maintenance script (in core or an extension) or via some other command like mysqldump or a custom script? Does your dump job script write compressed output directly? Does your dump job script produce progress messages that can be used to judge the % of entries processed or to derive an ETA for when the job will complete?

Your job may integrate slightly differently than this example based on the answers to the above questions.

Code of the module
First, we need the code of the new dumps job python script; see below for the code to sample_job.py:

While we're here, we might as well see how it works. At this point, nothing here should be surprising.

Comments are inline so that anyone who checks out the repo can study this example and see how it works.

Wiring it in
Next we need to make the job known to the infrastructure. We do this by adding an entry for it in the dumpitemlist.py module: [ariel@bigtrouble dumps]$ diff dumpitemlist.py.orig dumpitemlist.py 22a23 > from dumps.sitejob import SitelistDump 232a234,235 > >        self.dump_items.append(SitelistDump("sitelistdump", "List all sites.")) That’s all we change: an import line at the end of all of the imports near the top of the module, so that the class name is recognized without an icky old prefix, and adding it to the list of dump jobs that may or may not be run, passing in just what the constructor for the class needs, which isn’t so much.

Testing
Now we run it: [ariel@bigtrouble dumptesting]$ python ./worker.py --configfile ./confs/wikidump.conf.current:bigwikis --job sitelistdump Running elwikivoyage, jobs sitelistdump... 2020-09-01 10:43:53: elwikivoyage Checkdir dir /home/ariel/wmf/dumps/testing/xmldumps/dumpruns/public/elwikivoyage/20200901 ... 2020-09-01 10:43:53: elwikivoyage Checkdir dir /home/ariel/wmf/dumps/testing/xmldumps/dumpruns/private/elwikivoyage/20200901 ... 2020-09-01 10:43:53: elwikivoyage Cleaning up old dumps for elwikivoyage 2020-09-01 10:43:53: elwikivoyage No old public dumps to purge. 2020-09-01 10:43:53: elwikivoyage No old private dumps to purge. Preparing for job sitelistdump of elwikivoyage command /usr/bin/php /var/www/html/elwv/maintenance/exportSites.php --wiki=elwikivoyage php://stdout (3305310) started... command /usr/bin/gzip (3305311) started... returned from 3305311 with 0 2020-09-01 10:43:55: elwikivoyage Checkdir dir /home/ariel/wmf/dumps/testing/xmldumps/dumpruns/public/elwikivoyage/latest ... 2020-09-01 10:43:55: elwikivoyage Checkdir dir /home/ariel/wmf/dumps/testing/xmldumps/dumpruns/public/elwikivoyage/latest ... 2020-09-01 10:43:55: elwikivoyage adding rss feed file /home/ariel/wmf/dumps/testing/xmldumps/dumpruns/public/elwikivoyage/latest/elwikivoyage-latest-sitelist.xml.gz-rss.xml 2020-09-01 10:43:55: elwikivoyage Checksumming elwikivoyage-20200901-sitelist.xml.gz via md5 2020-09-01 10:43:55: elwikivoyage Checksumming elwikivoyage-20200901-sitelist.xml.gz via sha1 2020-09-01 10:43:55: elwikivoyage Checkdir dir /home/ariel/wmf/dumps/testing/xmldumps/dumpruns/public/elwikivoyage/latest ... 2020-09-01 10:43:55: elwikivoyage Checkdir dir /home/ariel/wmf/dumps/testing/xmldumps/dumpruns/public/elwikivoyage/latest ... 2020-09-01 10:43:55: elwikivoyage Completed job sitelistdump for elwikivoyage

Output check
And finally we check the output: [ariel@bigtrouble dumptesting]$ zcat /home/ariel/wmf/dumps/testing/xmldumps/dumpruns/public/elwikivoyage/20200901/elwikivoyage-20200901-sitelist.xml.gz  [ariel@bigtrouble dumptesting]$ Looks empty! But that’s because I’m testing on my local instance which has no wikifarm and hence no list of site instances. Nonetheless, the empty list is formatted properly and written to the correct location. Success!

Go forth and do likewise!
The end. Obligatory cute puppies link for reading this through to the end: