User:Smalyshev (WMF)/Dump Test

From mediawiki.org

Setting up dumps test for RDF dumps on mw-vagrant (requires working wikidata install with some items loaded):

  1. Enable dumps role for vagrant.
  2. Copy missing scripts: wikidatadumps-shared.sh and dumpwikidatardf.sh to /usr/local/bin, /usr/local/etc/dump_functions.sh and /usr/local/etc/set_dump_dirs.sh.
  3. Script expected /etc/wikidump.conf.dumps but vagrant has /etc/wikidump.conf. Fixed with: sudo ln -s /etc/wikidump.conf /etc/wikidump.conf.dumps
  4. /var/log/wikidatadump/ does not exist, created.
  5. MWScript.php requires the scripts to be run under www-data, so that all output and temp directories need to be writable by www-data
  6. pagesPerBatch=400000 is hardcoded and too big for vagrant test setup, patched manually
  7. Minimal dump size too small for test dump (hardcoded), patched manually
  8. shards=8 hardcoded, too large for test dump, patched manually
  9. /vagrant/srv/dumps/output is not writable by www-data, needs to be fixed externally since vagrant does not own permissions.
  10. /vagrant/srv/dumps/output/temp does not exist, created manually

set_dump_dirs.sh used:

confsdir="/etc"
repodir="/vagrant/srv/dumps/xmldumps-backup"
apachedir="/var/www/w"
cronsdir="/srv/dumps"

Command line:

sudo -u www-data bash /usr/local/bin/dumpwikidatardf.sh all ttl nt

Suggestions for improvement[edit]

  1. Make hardcoded values above configurable (patch)
  2. Do not place temp/output directories inside /vagrant
  3. Create necessary dirs as part of vagrant recipe (patch) Yes Done
  4. Make /var/log/wikidatadump/ log directory configurable too?