Wikimedia Release Engineering Team/Checkin archive/20151207

= 2015-12-07 =

Vacations/Confs/etc
How to do it: https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Time_off
 * Dec 4th: Greg - disconnected, leaving Thursday evening, returning Sunday :)
 * Dec 14-Jan 1: Greg - vacation (3 weeks, will be checking email)
 * Dec 22-29: Chad - Christmas (will be reachable by e-mail, will have laptop in case of emergencies)
 * Dec 23–25: Tyler — Hopeful, probable, Christmas in Kansas!
 * Dec 24-Jan 3: Dan - Holidays
 * Dec 24-30: Antoine - Holidays (bringing laptop - ring phone as needed)
 * Dec 24: mukunda - holiday
 * Dec 25: US HOLIDAY - Christmas Day - December 25
 * Dec 28: mukunda - holiday
 * Dec 31: mukunda - holiday
 * Jan 1: US HOLIDAY - New Year's Day
 * Jan 4 - 8: WikiDev16 + All Hands
 * Jan 16-18: Chad - another music festival
 * Jan 18: US HOLIDAY - Martin Luther King Day
 * Feb 15: US HOLIDAY - President's Day
 * May 17-(?): Dan - paternity leave :D
 * PO Box for pastries? - Antoine
 * May 30: US HOLIDAY - Memorial Day
 * June-ish: Chad - EDC
 * August: France holiday - because french. :)

Actions from last meeting

 * TODO - Antoine + Mukunda: should sit down and talk CI/Harbormaster/Nodepool
 * Mukunda just needs to find time to finally test out harbormaster triggering jenkins jobs.
 * this works fine, still need to get jenkins to report back to harbormaster
 * TODO - Greg: look into reasons for spikes of info-level log spam
 * RunJobs logging changes? See engineering-l/ops-l?
 * yes....
 * TODO - No One Yet: investigate carbon aggregation of stats >1 month old behavior
 * ACTION: Antoine to create a task

WIkiDev16

 * Code Review RFC - https://phabricator.wikimedia.org/T114320
 * Scap3 - https://phabricator.wikimedia.org/T114045

Q3 Goals

 * https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Goals/201516Q3


 * Goals timeline:
 * December 3: Group goal scoped and drafted on mediawiki.org for Technology team.
 * December 10: Group goal + all individual team *drafts completed* on mediawiki.org; discuss at Infra+Tech group and identify dependencies.
 * December 17: individual team goals + group goal *finalized* on mediawiki.org; discuss at Monthly Eng Staff.
 * A "Technology Group Goal"
 * https://www.mediawiki.org/wiki/Wikimedia_Engineering/2015-16_Q3_Goals#Group_goal
 * TODO: Greg add our things for this to our own goals

KPIs

 * https://grafana.wikimedia.org/dashboard/db/releng-kpis

New vs Maint time spent

 * https://docs.google.com/a/wikimedia.org/spreadsheets/d/1FI90AefwdLHGzVVdrLS6AxcTcJtLFyX0aQTQfyc88s4/edit

#together

 * Team workboard: to triage: https://phabricator.wikimedia.org/project/board/20/query/TRiVy4zOMdR./
 * Team workboard: only-in-#releng(ish): https://phabricator.wikimedia.org/project/sprint/board/20/query/g2T5.QSLJVRQ/
 * Rotating Deploys
 * Tyler propose to roll the responsability of cutting branch / deployment train.
 * Tyler cut and deployment Tuesday 12/08 ( group0: 1.27.0-wmf.7->1.27.0-wmf.8 )

Scrum of Scrums

 * https://phabricator.wikimedia.org/project/board/64/
 * Blocked on us: https://phabricator.wikimedia.org/maniphest/query/h7YTCBTJsepS/#R

CI Scaling

 * https://phabricator.wikimedia.org/project/board/1010/
 * Quarterly Goal: "CI cluster responds to spike in queued builds by starting and registering additional jenkins slaves" - https://phabricator.wikimedia.org/T111106


 * Jobs cleanup
 * Nova scheduler went wild on Sunday ~ midnight UTC. Nodepool could no more spawn instances. Fixed by ops (restart some openstack process)
 * Some Nodepool and Zuul tiny upgrade coming in (speedup related)
 * Jenkins security upgrade

Beta Cluster

 * https://phabricator.wikimedia.org/project/board/497/?order=priority


 * Team: might have to watch slow query logs / strict errors etc
 * labs lost DNS aliasing on Monday for a few hours
 * EventBus <-- new extension
 * DB outage followup
 * beta cluster db potential move to dedicated hardware?
 * setup a beta cluster specific tendril?
 * ACTION: Antoine to check in with Jaime on what to do next.
 * goal: monitor slow queries before they land prod

Deployment Cabal

 * Main: https://phabricator.wikimedia.org/project/board/349/
 * Scap3: https://phabricator.wikimedia.org/project/board/1449/
 * Quarterly Goal: "Migrate all Service team owned services and MW deploys to scap3" - https://phabricator.wikimedia.org/T109926


 * (Tyler) Deployed AQS on beta
 * Scap worked flawlessly
 * Mathoid to follow, straightfoward
 * the simpler 'oids have a shared puppet module
 * TODO: Jenkins job to wrap around scap3 deploy on beta

Diff[usion|erential] migration

 * https://etherpad.wikimedia.org/p/diffuerential-weekly
 * Gitblit-Deprecate: https://phabricator.wikimedia.org/project/board/46/
 * Quarterly Goal: https://phabricator.wikimedia.org/T111465
 * Gerrit-Migration: https://phabricator.wikimedia.org/project/board/9/


 * Redirect of Gerrit project name -> Phabricator canonical URL with callsign
 * Couple patches by Paladox pending. Highlights some issues with Phabricator

Other Work

 * Gerrit 2.12 - https://phabricator.wikimedia.org/T70271
 * Antoine: current web UI is gone. Not sure whether it is worth upgrading.
 * It is. The UI isn't that different and we get stability/security fixes. We're not all going to be using Phabricator in less time than it'll take to upgrade.
 * I am all for keeping up with upstream. I am worried about the community drama related to the new UI :-((( I (antoine) dont care about the UI really :D
 * #nodrama. Seriously, people will live with it if they don't like it. It's freaking gerrit, their UI has always sucked.
 * Make sense :-}
 * Security is going to be my go-to answer if anyone complains. 2.8.x is long past EOL and we're probably vuln from more than one issue.
 * ^ that, always the safe answer