Wikimedia Release Engineering Team/Quarterly review, November 2013

Date: TBD

Time: TBD

Slides: TBC

Notes: etherpad

Who: Topics: Deploy process/pipeline, release process, bug fixing (the how of priority), code review, code management, security deploy/release, automation prioritization
 * Leads: Greg G, Chris M
 * Virtual team: Greg G, Chris M, Antoine, Sam, Chris S, Chad, Zeljko, Michelle G, Andre, Ariel
 * Other review participants (invited): Robla, Sumana, Quim, Maryana, James F, Ryan Lane, Ken, Terry, Tomasz, Alolita

Big picture
Release Engineering and QA are where our efforts in Platform can be amplified. When we do things well, we start to see more responsive development with higher quality code. That is our focus.

What we want to accomplish:
 * More appreciation of, response to, and creation of tests in development
 * Better monitoring and reporting out of our development and deployment processes, especially test environments and pre-deployment
 * Reduce time between code being finished and being deployed, while finding issues with code earlier and with more certainty.
 * Provide information about software quality in a way that informs release decisions
 * Help WMF Engineering learn and adapt from experience

...All in an effort to pave the path to a more reliable continuous deployment environment.

Team roles
Many people outside of the virtual team play an important role in releases, but this review will focus on the work of the following people in the following roles:
 * Release engineering: Greg G, Sam, Chris S (security)
 * QA and Test Automation: Chris M, Zeljko, Michelle G,
 * Bug escalation: Andre, Greg G., Chris M, Chris S (security)
 * Beta cluster development/maintenance:  Antoine, Ariel, Sam
 * Development tools (e.g. Gerrit, Jenkins): Chad, Antoine

Goals from last quarter

 * Better align QA effort with high profile features
 * Apply model of Language/Mobile embedded QA to a new feature team (specifically VisualEditor)
 * Include more user contributed code testing (eg: Gadgets)
 * Increase capacity through community training for browser tests
 * Improve our deployment process
 * automate as much as possible
 * improve monitoring
 * see DevOps Sprint 2013
 * improve tooling (eg: atomic updates/rollbacks and cache invalidation)
 * see DevOps Sprint 2013
 * Take the Beta Cluster to the next level
 * monitoring of fatals, errors, performance
 * add more automated tests for eg the API
 * feed experiences/gained knowledge of Beta Cluster automation up to production automation

Measures of success from last quarter

 * Successfully integrate QA support in one more feature team (as defined by: more regular/predictable testing and more test coverage)
 * Automation provides the bulk of what is needed now to deploy code
 * The Beta Cluster has an equal amount of monitoring to that of production (just without the paging to Ops). https://bugzilla.wikimedia.org/show_bug.cgi?id=51497
 * Atomicity in deploys (see DevOps Sprint 2013)

What we've done

 * Proper support for all extensions in beta cluster https://bugzilla.wikimedia.org/show_bug.cgi?id=49846 (from "In-progress last quarterly review)
 * Better align QA effort with high profile features
 * Apply model of Language/Mobile embedded QA to a new feature team (specifically VisualEditor)
 * Increase capacity through community training for browser tests
 * Take the Beta Cluster to the next level
 * monitoring of fatals, errors, performance
 * support for Parsoid
 * support for pre-release extensions e.g. Flow, with full i18n messages
 * Improve our deployment process
 * improve tooling (eg: atomic updates/rollbacks and cache invalidation)
 * see DevOps Sprint 2013

Still in progress

 * Break browser tests out of catch-all /qa/browsertests and into per-feature builds, following the Mobile model. CirrusSearch, ULS, VE https://bugzilla.wikimedia.org/show_bug.cgi?id=52890 https://bugzilla.wikimedia.org/show_bug.cgi?id=52120

Goals for the next quarter
TODO

Stretch activities as time allows

 * Provide hermetic test environments for developers/testers/community. Vagrant shows the way.
 * Use Vagrant for targeted tests within the WMF Jenkins work flow