SQL/XML Dumps/Useful skills checklist for dumps co-maintainers

From mediawiki.org

Some skills you'll want to have to become a dumps co-maintainer, not in any particular order

DRAFT and just the start of a list etc. Don't take it too seriously.

Part one: Keeping things working while the primary maintainer is away[edit]

  • Be able to run page content generation dumps manually for wikidata at the right time of the full run
  • Be able to determine which snapshot host runs which dumps and shoot them or restart them manually in case of problems
  • Be able to run 7z recompression manually for page content dumps for a given wiki, if needed
  • Be able to run md5 hash generation for bz2 or 7z files for a given wiki manually, if needed
  • Be able to determine whether a report on Phabricator is our issue or someone else's
    Example: a report that new analytics files are not available for public download)
  • ...

Part two: Well-defined tasks[edit]

  • Be able to handle mirror requests with the aid of someone with puppet merge rights:
  • Be able to guide other teams through the process of setting up dumps for a new service or extension (with the aid of someone with puppet merge rights)
  • Be able to bring up a new snapshot host with the testbed role (with the aid of someone with puppet merge rights)
  • Be able to bring up a new dumpsdata host in a secondary role (with the aid of someone with puppet merge rights)
  • Be able to swap roles of snapshot hosts
  • Be able to swap roles of dumpsdata hosts
  • Be able to decommission snapshot and dumpsdata hosts
  • Be able to schedule maintenance on a given host or hosts according to when dumps of particular types are not running
  • Be able to work on capacity planning, understanding current and past baselines and being able to project growth
  • ...

Part three: General troubleshooting[edit]

  • Be able to examine db queries in dumps code and verify that they are optimized well
  • Be able to do performance tests of db queries in production on servers in the inactive dc (in the dumps group!!)
  • Be able to track down the source of a dumps slowdown
    Example: a change in MW code that causes page content to be loaded upon retrieval of the revision metadata
  • Be able to watch incoming gerrit changes that could impact the dumps
  • ...

Part four: Testing[edit]

  • Be able to run dumps tests against a local install
  • Be able to test batches (etc) in docker dumps testbed when that code is ready
  • Be able to fix broken mw code sync or puppet sync on beta, to facilitate dumps testing
  • Be able to test both XML/SQL and "misc" dumps on beta
  • Be able to test a single dump step at scale on the production cluster (small wikis only, writing to a temp area)
  • Be able to test a single "misc" (not XML/SQL dump) in production (small jobs only, writing to a temp area)
  • Be able to do code review of gerrit changes that add to dumps or impact dumps code
  • ...

Part five: Deploying[edit]

  • Be able to rebuild the mwbzutils package after making updates to it, and deploy it to the snapshot hosts
  • Be able to deploy a change to the XML/SQL dumps repo to the snapshot hosts
  • ...

Part six: Enhancements[edit]

  • Be able to write a new maintenance script for the "misc dumps"
  • Be able to write a new dumps step for the XML/SQL dumps
  • Be able to restructure the design of rsyncs of dumps data out to the secondary and labstore hosts
  • Be able to add new dumps appropriately to the infrastructure
    Example: OKAPI HTML dump service from the labstore boxes
  • Be able to redesign part or all of the XML/SQL dumps process, understanding the rationale (lol) for the current architecture and being able to adhere to the important parts of that in the redesign
  • ...