Wikimedia Release Engineering Team/Checkin archive/20150901

= 2015-09-01 =

Team Business

 * FYI
 * Aaron's multi-DC work in public RFC meeting, Wednesday Sept 2, 21:00 UTC


 * Q2 Goals
 * https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Goals/201516Year
 * https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Goals/201516Q2

Deploy tooling
Objective: Reduce number of deploy tools from 3 to 2 Key result: Migrate all Service team owned services and MW deploys to scap3 - task T109926

We don't have a KPI to track this one, but it's easy to measure success ("did we retire trebuchet and ansible?").

Open questions: 1) Still doable (retirement of Trebuchet and Ansible by December)? 2) Request: We probably want specific subtasks of that one for the individual Service Team owned services 3) Note: The current "scap3" sprint does not need to be tracked by this task (T109926) because it's our Q1 work.

Migrate Gerrit to Differential
Objective: Retire Gerrit in favor of Phabricator (Differential) Key results: .... tbd ...

Open questions: 1) We need to own the Gerrit-Migration board and make it reflect reality (what actually needs to happen). https://phabricator.wikimedia.org/project/board/9/ I presume that will take a conversation between at least Chad, Mukunda, and myself (and Quim? Andre?). 2) I've filed a meta-task to track this work (the creation of the plan): https://phabricator.wikimedia.org/T110623

TODO: Meeting of Roadmap doom: Greg (optional), Chad, Mukunda, Antoine? TODO: Greg make the business case stuff
 * get rid of all the various glue bots

NB: Keep in mind that we've allocated ourselves two quarters to complete this work.

CI Scaling
Objective: Reduce CI wait time Key result: CI cluster responds to spike in queued builds by starting and registering additional jenkins slaves

We can use the "Jenkins/Zuul queue wait" KPI to track the effectiveness of this work: https://phabricator.wikimedia.org/T108750.

Open questions: 1) Is this task still the right task to judge the completion of this work? https://phabricator.wikimedia.org/T47499 2) Are the blockers of that task still accurate? IOW: when all of those blockers are completed we can consider the work done (whether or not it makes a change to the KPI)? 3) Doable in 3 months?
 * or look at the Phabricator board https://phabricator.wikimedia.org/tag/continuous-integration-scaling/

Zuul gate time KPI attempt https://grafana.wikimedia.org/#/dashboard/db/releng-zuul

On going:

** build on a labs instance (integration-dev) ** copy .qcow2 image to /var/www/html ** curl from labnodepool ** sudo -H -u nodepool -s ** cd &&. .profile ** openstack image create .....
 * Jessie image https://phabricator.wikimedia.org/T110735  Give https://gerrit.wikimedia.org/r/#/c/234975/ a try?
 * Actual process is lame:
 * bump Nodepool to support python-statsd 3.x https://phabricator.wikimedia.org/T107268
 * Create a MySQL DB (Jaime on it) https://phabricator.wikimedia.org/T110693

Todo:
 * Refactor MediaWiki tests. Split unit tests in their own jobs and speed up the lame 'integration' tests (10 minutes with Zend).
 * DOCUMENTATION (T2001)
 * figure out a solution to cache npm/pip/composer/rubygems modules (tarballs and compiled)

Q1 goal: nodepool infra build. at least 1 production grade job using Q2: CI cluster responds to spike in queued builds by starting and registering additional jenkins slaves  (and migrate more jobs) Q3: migrate rest / phase out legacy

#together

 * https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Skill_matrix
 * if you want to know how to say #together in baby sign language :) http://www.babysignlanguage.com/dictionary/t/together/

Scrum of Scrums

 * https://phabricator.wikimedia.org/project/board/64/
 * Blocked on us: https://phabricator.wikimedia.org/maniphest/?statuses=open%28%29&projects=PHID-PROJ-arpazvuktn2l647rb6us#R

Isolated CI instances CI Scaling

 * https://phabricator.wikimedia.org/tag/continuous-integration/board/?order=priority
 * Quarterly Priority: Disposable VMs - https://phabricator.wikimedia.org/T47499

Beta Cluster

 * https://phabricator.wikimedia.org/project/board/497/?order=priority

Other Work

 * Željko blocked on https://phabricator.wikimedia.org/T102020, waiting for somebody that knows which folders in operations/puppet contain upstream code
 * put on SOS (done)

Vacations/Confs/etc
Please add your time off to your gcal, **Phabricator**, and ADP, as appropriate


 * Chad - Sept 7-11 (last minute vacation. mostly reachable by e-mail) Sept 18th & 28th (Music festivals/shows)
 * Željko planned to be offline on Wednesday September 2 but that has 1% chance of happening, sick kid
 * Monday Sept 7th - US Holiday (Labor Day)
 * Tyler: Sept 8th—in mountains
 * Mukunda: Sept 4th (This friday) (I can attend the sprint meeting, taking the afternoon off)
 * Andrew out this Friday