SQL/XML Dumps/Becoming a dumps co-maintainer/Deployment-prep

We do a lot of testing in the Deployment-prep project in the Wikimedia Cloud. You'll want to know how to set up a new snapshot server instance there, how to access it once it's ready, and how to run tests.

Setting up a new snapshot instance
All of our dumps are produced by servers named snapshotNNN and written out to NFS fileshares on hosts named dumpsdataNNN, except in the deployment-prep project. There, we have a single instance that has a local filesystem mounted where the NFS filesystem would usually go, and reads and writes are done directly to it. This means that certain features such as NFS locking are not testable there, but everything else is.

So, to set up a new testbed, we need only one instance, a new "deployment-snapshot0x" where x is the next available number. Currently we are on 03.

You will want the following settings when setting up an instance :
 * image type: g2.cores4.ram8.disk80 (should have: VCPUs: 4, RAM: 8GB, Disk: 80GB)
 * hostname: deployment-snapshotXX
 * OS: currently buster, change as needed
 * Security group: default
 * Sever group: ignore this

Once you click "Launch Instance", the instance will be configured, create and booted. Don't expect it to come up with everything working. Instead:

Wait a while to check that the instance is up and running and has probably run at least part of the initial puppet run. You can tell what happened via the logs at https://horizon.wikimedia.org/project/instances/ Select your new instance and then click the "Logs" tab.

Now go to the "Puppet Configuration" tab. In the "Classes" section, make sure the contents are as follows: role::beta::mediawiki role::dumps::generation::worker::beta_testbed and for "Hiera Config" you have profile::dumps::generation::worker::common::dumps_misc_cronrunner: false profile::dumps::generation::worker::common::nfs_extra_mountopts: actimeo=0 profile::dumps::generation::worker::common::php: /usr/bin/php7.2 profile::dumps::generation_worker_cron_php: /usr/bin/php7.2 profile::envoy::ensure: absent profile::services_proxy::envoy::local_clusters: - swift-https - search-https - search-omega-https - search-psi-https puppetmaster: deployment-puppetmaster04.deployment-prep.eqiad.wmflabs You'll want to double-check the list of instances to be sure the current puppetmaster is indeed 04 and not some later number.

Once you submit this change you'll need to wait a little while for it to take effect, typically 15 to 20 minutes to make it both to the puppetmaster and then to your instance.

At this point you should be able to SSH in to your instance but it will likely tell you that puppet failed to run nicely. As root on the instance, do rm -rf /var/lib/puppet/ssl puppet agent --test to generate a new certificate request from the deployment-prep puppetmaster, SSH to the deployment-prep puppetmaster and do puppet ca list puppet cert sign deployment-snapshotNN.deployment-prep.eqiad1.wikimedia.cloud  (or whatever snapshot name showed up in the list) and then back on the instance, do puppet agent --test This time it should run and do a whole bunch of stuff. You might want to run it more than once to make sure it's got nothing left to process.

You now need to make sure this instance isn't in the WMCS backups since the disk is so large, and that it IS in the list of deployment targets for mediawiki and for dumps. Examples of how to do that in puppet: for backups and mediawiki,  for dumps.

Once you've made the appropriate patchset and merged it, you'll want to wait for it to make it over to the deployment-prep deployment server, again 15 to 20 minutes.

On the deployment server, as you, be in the  directory and do a git pull to make sure that the new target is added to the list.

At this point you should be ready to try running the test suite as the dumpsgen user; installation and setup are done!

Troubleshooting
This procedure can go wrong in a few places. The most common issue is that the sync from our gerrit puppet repo to deployment-prep's puppetmaster is broken for some reason. Typically you will want to do the following to sort it out:
 * ... (soon)