Continuous integration/Data center switch

The core of the CI infrastructure is hosted on two production machines, one in each datacenter. Most services are active solely on one of the hosts, the other host acting as a cold spare. When doing hardware maintenance or operating system upgrades, we move the services and their data from an host to another one. This document describe the steps needed to do the swap.

Hosts and services
We have two bare metal hosts  and , one in each of our primary datacenters. They hosts a variety of services:


 * Zuul: the scheduler and workflow system
 * Zuul mergers and their associated git-daemon. Active on both servers!
 * Jenkins holding jobs, their build history and artifacts
 * The website https://integration.wikimedia.org/ and the proxies to the above services:
 * Zuul status https://integration.wikimedia.org/zuul/
 * Jenkins interface https://integration.wikimedia.org/ci/
 * to build images
 * Docker daemon and images

Switching over
The general idea is to synchronize Jenkins files from the primary to the spare server before anything else. Once done the sequence overview is:


 * synchronize build artifacts
 * Stop all services on the primary
 * rsync data and states
 * change DNS for
 * change primary in Puppet / Hiera
 * Start Jenkins
 * Start Zuul scheduler

synchronize build artifacts
This step should be made ahead of time since it takes hours to transfer.

The Jenkins builds history and their artifacts are solely on the primary Jenkins and located in. It is in the magnitude of hundred of gigabytes and million of files and directories.

On the spare server, ensure  is empty.

TODO: check Transfer.py or MariaDB/ImportTableSpace.

Stop all services
On the primary:

rsync data and states
Using rsync over ssh as root:


 * refresh  from the artifacts from the primary to the spare.

Transfer the Jenkins and Zuul states:
 * , jobs configurations, build indice, plugins etc
 * , duration of functions execution used to speculate an ETA of each build

change DNS
The Varnish/ATS layer points to the backend via the DNS entry  which in turns point to the primary host.

change primary in Puppet / Hiera
TODO: find the changes that need to happen. Ideally should just be a role change.

Start services
On the new primary: