Continuous integration/Zuul

Zuul is a python daemon which acts as a gateway between Gerrit and Jenkins. It listens to Gerrit stream-events feed and trigger Jenkins jobs according to a specification written in yaml.

Architecture overview
''Settings described below comes mostly from /etc/zuul/zuul.conf which is maintained in puppet. They might not be up-to-date on this wiki page''.

Zuul maintains an ssh connection with the Gerrit master. It connects as the user jenkins-bot and issue the Gerrit command stream-events which provides a JSON feed of anything happening in Gerrit which can be seen by the jenkins-bot user.

Whenever a new project is detected, Zuul clones a non-bare repository from Gerrit master under the base path defined by git_dir in zuul.conf. As of September 2013, that is /srv/ssd/zuul/git. Zuul uses non-bare repositories to merge the received patchsets against the tip of the branch they are made against, and produce a merge commit which is then passed to Jenkins to fetch the change.

The local merge commits are not available publicly nor in Gerrit (push_change_refs is set to false). Nonetheless, the Zuul bare repositories are made available to Wikimedia internal network over the git protocol on port 9418. On the server one could clone the mediawiki/core repository using: git clone git://localhost:9418/mediawiki/core/.

Note that the continuous integration server also receives Git repositories under /var/lib/git/</tt>. Thoses are bare repositories which are not suitable for testing patch sets via Zuul. Their main use is the ability to take snapshots via git archive</tt> which is not supported by Gerrit 2.6.

Zuul would trigger jobs in Jenkins according to a specification file (available in integration/zuul-config.git</tt>). Zuul uses the Jenkins API to trigger jobs with a set of parameters such as the project and commit SHA1 to run the job with. Zuul will then wait for (or maybe poll) Jenkins to report back the build status and will report back the tests status to Gerrit. Whenever Jenkins is not reacheable or a job got deleted while running, the build result will be considered lost and Zuul will report the status of the build to be LOST.

Change configuration
Clone the integration/zuul-config.git</tt> repository: git clone -o gerrit ssh://gerrit.wikimedia.org:29418/integration/zuul-config.git

As of December 2012, this only hold a single file named layout.yaml</tt>. Edit the file and push your commit to Gerrit then ask for review.

Deploy configuration
Once your configuration change is merged it needs to be deployed on the continuous integration server. This can be done by someone allowed to sudo as jenkins user.

yourself@host$ sudo -su jenkins jenkins@host$ cd /etc/zuul/wikimedia jenkins@host$ git remote update

Make sure that you are only going to deploy your change by reviewing the log between the local master branch and the remote one:.

jenkins$ git log -p HEAD..origin commit d37a7a3fb7325a34f0b955328c2ab48895e2c0bd Author: Timo Tijhof  Date:  Thu Mar 21 22:21:54 2013 +0100 Make Parsoid jobs voting. They're now fully replacing the old non-JJB managed versions in   Jenkins (which have been disabled). Change-Id: I6b7626acd5be6d2adaca67d3139511454105b2ee diff --git a/layout.yaml b/layout.yaml index 045725c..4bd920a 100644 --- a/layout.yaml +++ b/layout.yaml @@ -348,16 +348,6 @@ jobs: - name: ^mwext-TranslationNotifications-testextensions.* voting: false - # New parsoid tests, not sure if they're working yet, but we want to run it to find out. - - name: ^parsoid-testCommit$ -   voting: false - - name: ^parsoid-server-sanity-check$ -   voting: false - - name: ^parsoid-parse-tool-check$ -   voting: false - - name: ^parsoid-roundtrip-test-check$ -   voting: false - projects: jenkins@host$ Apply the change:
 * 1) Register the Gerrit project name, apply them pipelines that in turn trigger
 * 2) a set of jobs.

jenkins@host$ git rebase First, rewinding head to replay your work on top of it... Fast-forwarded master to refs/remotes/origin/master. jenkins@host$

IMPORTANT: In a second terminal have a look at the Zuul log file: $ tail -f -n100 /var/log/zuul/zuul.log

As the Jenkins user reload the daemon while watching the log file. jenkins@host$ service zuul reload * Reloading Zuul zuul                    [OK] jenkins@host$

If you see any error in the log file, you should revert your change locally ( git reset --hard HEAD^ ) and reload the daemon again.

After a few seconds, check zuul is correctly running:

$ service zuul status * zuul is running $

Restart
Note: Restart is not needed when just deploying a configuration change. Zuul can reread configuration from disk while running. This way no Gerrit events are missed. As such, please do not take restarting Zuul lightly, as it means any Gerrit events during that time will be missed and need to be manually re-triggered.

ssh gallium sudo -su jenkins service zuul restart && tail -f -n100 /var/log/zuul/zuul.log

WMF Setup
Zuul source code is maintained by OpenStack, the WMF maintains a copy of their git repository in its own Gerrit installation under the project integration/zuul.git. Integration team manually update our master based on OpenStack master.

Installation is handled by the puppet module zuul which takes care of cloning the source code from the WMF git repository and install it on the server using python setup.py</tt>. WMF specific configuration is handled via our puppet role classes: role::zuul::production</tt> and role::zuul::labs</tt>. The role classes will invoke the zuul module using a set of parameter that fit our context. Changes to that configuration must be approved by the operations team (it is in operations/puppet.git</tt>).

Zuul has another configuration to finely tune how to trigger jobs. Since it is going to be updated by people in charge of continuous integration, the related configuration files has been extracted to a git repository out of operations responsibility : integration/zuul-config</tt>. This let integration people to do their change without bothering ops with configuration changes which are harmless to most WMF servers. A wrong change can still render Zuul non operant though but the integration people should be able to fix it by themselves.

Log files are available under /var/log/zuul/</tt> and are rotated daily. zuul.log</tt> should cover most needs, if not the debug.log</tt> has extended informations. The logging configuration is handled via the puppet module zuul which copy the file in <tt>/etc/zuul/logging.conf</tt>.

As of October 2012, <tt>integration/zuul-config</tt> only contains a <tt>layout.yaml</tt> file. It is deployed by puppet simply by cloning the repository under <tt>/etc/zuul/wikimedia</tt>. The <tt>/etc/zuul/zuul.conf</tt> refers to it. Whenever a change is merged in integration/zuul-config, one needs to update the git working directory and reload zuul. Watch out the log file, since Zuul does not validate its configuration, it can well be made unstable whenever a typo appear in the layout.yaml file.

upgrading
'''work in progress ... Antoine &#34;hashar&#34; Musso (talk) Nov 2013'''

Python dependencies MUST be available as packages and installed via puppet. You will want to test out the dependencies, if anything is downloaded it will need to be packaged.

Before checking the dependencies, we will add the distribution packages to python path list:

export PYTHONPATH=/usr/lib/python2.7/dist-packages

Then create a virtual environment and attempt an install with download disabled:

$ virtualenv venv $ venv/bin/python setup.py easy_install --allow-hosts=None.

If anything is missing, setup.py will issue a stacktrace or at least exit code 1.

Change the <tt>master</tt> branch to point to the desired commit. On gallium as root:

cd /usr/local/src/zuul git remote update git log --oneline --decorate --graph master..origin/master

If happy with the changes, continue:

git rebase HTTP_PROXY=. HTTPS_PROXY=. python setup.py install

If easy_install attempts to download a python module, it will bails out. You will have to rollback master to whatever previous commit and package the missing python module.

MAKE SURE the layout still validates:

zuul-server -c /etc/zuul/zuul.conf -l /etc/zuul/wikimedia/layout.yaml -t

Any stack trace there mean Zuul will not be able to reload the configuration. Rollback.

Known issues
Whenever Gerrit restart or ends up being unreacheable, Zuul will attempt to reconnect to Gerrit. It eventually stop trying after a few minutes and never reconnect again. The symptoms are easy: no more jobs are trigger in Jenkins. The fix is to restart Zuul using the init script as the jenkins user.

That issue has been filled upstream https://bugs.launchpad.net/zuul/+bug/1097307  "Zuul does not reconnect to Gerrit properly"