Continuous integration/Zuul

Zuul is a python daemon which act as a gateway between Gerrit and Jenkins. It listens to Gerrit stream-events feed and trigger Jenkins jobs according to a specification written in yaml.

WMF Setup
Zuul source code is maintained by OpenStack, the WMF maintains a copy of their git repository in its own Gerrit installation under the project integration/zuul.git. Integration team manually update our master based on OpenStack master.

Installation is handled by the puppet module zuul which takes care of cloning the source code from the WMF git repository and install it on the server using python setup.py. WMF specific configuration is handled via our puppet role classes: role::zuul::production and role::zuul::labs. The role classes will invoke the zuul module using a set of parameter that fit our context. Changes to that configuration must be approved by the operations team (it is in operations/puppet.git).

Zuul has another configuration to finely tune how to trigger jobs. Since it is going to be updated by people in charge of continuous integration, the related configuration files has been extracted to a git repository out of operations responsibility : integration/zuul-config. This let integration people to do their change without bothering ops with configuration changes which are harmless to most WMF servers. A wrong change can still render Zuul non operant though but the integration people should be able to fix it by themselves.

Log files are available under /var/log/zuul/ and are rotated daily. zuul.log</tt> should cover most needs, if not the debug.log</tt> has extended informations. The logging configuration is handled via the puppet module zuul which copy the file in /etc/zuul/logging.conf</tt>.

As of October 2012, integration/zuul-config</tt> only contains a layout.yaml</tt> file. It is deployed by puppet simply by cloning the repository under /etc/zuul/wikimedia</tt>. The /etc/zuul/zuul.conf</tt> refers to it. Whenever a change is merged in integration/zuul-config, one needs to update the git working directory and reload zuul. Watch out the log file, since Zuul does not validate its configuration, it can well be made unstable whenever a typo appear in the layout.yaml file.

Restart
ssh gallium sudo -su jenkins /etc/init.d/zuul restart && tail -f /var/log/zuul/zuul.log

Changing the configuration
Clone the integration/zuul-config.git</tt> repository:

git clone -o gerrit ssh://gerrit.wikimedia.org:29418/integration/zuul-config.git

As of December 2012, this only hold a single file named layout.yaml</tt>. Edit the file and push your commit to Gerrit then ask for review.

Once your configuration change is merged it needs to be deployed on the continuous integration server. This can be done by someone allowed to sudo as jenkins user:

yourself@host$ sudo -su jenkins jenkins@host$ cd /etc/zuul/wikimedia jenkins@host$ git remote update

Make sure that you are only going to deploy your change by reviewing the log between the local master branch and the remote one: git log master..origin/master</tt>.

jenkins$ git log master..origin/master commit d39b1ed17971e3448f0a4633326ee0698d763460 Author: Antoine Musso <hashar@free.fr> Date:  Mon Dec 3 13:07:53 2012 +0100

(bug 42372) lint translatewiki shell scripts Change-Id: Iac4f4689734a09bd51d9cc387594c4b808e17958 jenkins@host$

Apply the change:

jenkins@host$ git rebase First, rewinding head to replay your work on top of it... Fast-forwarded master to refs/remotes/origin/master. jenkins@host$

IMPORTANT: In a second terminal have a look at the Zuul log file: $ tail -f /var/log/zuul/zuul.log

As the Jenkins user reload the daemon while watching the log file. jenkins@host$ /etc/init.d/zuul reload * Reloading Zuul zuul                    [OK] jenkins@host$

If you see any error in the log file, you should revert your change locally ( git reset --hard HEAD^ ) and reload the daemon again.

After a few seconds, check zuul is correctly running:

$ /etc/init.d/zuul status * zuul is running $

Known issues
Whenever Gerrit restart or ends up being unrecheable, Zuul will attempt to reconnect to Gerrit. It eventually stop trying after a few minutes and never reconnect again. The symptoms are easy: no more jobs are trigger in Jenkins. The fix is to restart Zuul using the init script as the jenkins user.

That issue has been filled upstream https://bugs.launchpad.net/zuul/+bug/1097307  "Zuul does not reconnect to Gerrit properly"