Wikimedia Release Engineering Team/Runbooks

This is a list of runbooks for the Wikimedia Release Engineering Team, covering step-by-step lists of what to do when things need doing, especially when things go wrong.

Gerrit

 * Monitoring/Metrics
 * Take a Thread dump
 * Github replicas

Configuration

 * Add/modify CI for a new/existing repo (Zuul)
 * Add/modify a new type of CI job (Jenkins Job Builder)
 * Add/modify a new docker environment for CI jobs (Dockerfiles)
 * Update doc.wikimedia.org static content (docroot)
 * Replay a gerrit CI event into Zuul to re-trigger jobs

Infrastructure

 * Clear part of Jenkins, when jobs are deadlocked ("waiting on executors")
 * Restart zuul (and drop all running jobs!)
 * Agent remote call failed
 * Upgrade Jenkins

Phabricator

 * Phabricator Administrative Commands

Deployments Schedule

 * Generating the wikitech:Deployments Page
 * Generating the train blocking tasks on Phab