Wikimedia Release Engineering Team/Checkin archive/20160307

= 2016-03-07 =

Vacations/Important dates
How to do it: https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Time_off
 * March 11th - draft Q4 (April 1st - June 30th) goals due
 * March 11th - Željko - conference
 * March 14th - Antoine can't make it to weekly team meeting
 * March 25th Friday - Tyler
 * March 28th - Antoine && Željko - local holiday (Easter Monday)
 * March 31st - April 3rd : Hackathon in Israel
 * April 1st - Q4 goals published
 * April - Antoine: holidays one of the two first weeks
 * May 6th Friday - Antoine
 * May 9-Mid June-ish?: Greg - paternity leave - exact dates TBD
 * May 17-(?): Dan - paternity leave :D
 * Late May - draft Q1 (July 1st - Sept 30th) due
 * May 30: US HOLIDAY - Memorial Day
 * June 15-24: Chad - Vegas/EDC
 * June 22nd - 28th : Wikimania in Italy
 * July 1st - Q1 goals published
 * July 1st – Annual Plan, Budget, Risks Document and FAQ are posted
 * August: Antoine - France holiday - because french. :)
 * January 2017 : Dev Summit + All Hands (presumably)

Train conductor
Week of ...
 * Mar 7: Mukunda
 * Mar 14: Mukunda
 * Mar 21: Tyler - Code freeze, due to the eqiad -> codfw switch over (announcement:
 * So we need to make sure Mar 14th week is super stable.
 * Mar 28: Tyler

Scrum of Scrums representative
(bad time for EU folks) Dan, Tyler, Chad, Mukunda Week of ...
 * Mar 7: Chad
 * Mar 14: Chad
 * Mar 21: Mukunda

CI point person

 * reassess later

Actions from last meeting

 * TODO - No One Yet: investigate carbon aggregation of stats >1 month old behavior
 * ACTION: Antoine to create a task
 * Overdue

New vs Maint time spent

 * Q3: https://docs.google.com/spreadsheets/d/1LJDc5W2Mlpzc0L1i7WyPwWU8AgWMn0fXRuNEEmg1EMU/edit#gid=0

Scrum of Scrums

 * https://phabricator.wikimedia.org/project/board/64/
 * Blocked on us: https://phabricator.wikimedia.org/maniphest/query/h7YTCBTJsepS/#R

Only thing new was from Chris Steipp The TOC issue: https://phabricator.wikimedia.org/T124356

For this week:
 * scap adoption shout out
 * link to the adoption milestone https://phabricator.wikimedia.org/project/view/1824/

Annual Planning

 * Spreadsheet (team only) - https://docs.google.com/spreadsheets/d/1GBokh9zeO5vflAAZLjMuagV4FeFQHCFrApjs_KXNZ7o/edit#gid=0
 * Planning worksheet: https://docs.google.com/spreadsheets/d/1ZsB0RCoZD3a6qKsX-qkCpA3HK81mNrZYI3GXeiuzzI0/edit#gid=0

Q4 Goals

 * https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Goals/201516Q4

What we said for next fiscal: https://docs.google.com/spreadsheets/d/1ZsB0RCoZD3a6qKsX-qkCpA3HK81mNrZYI3GXeiuzzI0/edit#gid=0

Phabricator maintenance

Scap decrease in time

Differential increase
 * things we have: debian packages
 * things we need:
 * MW Core (need define CI reality and actually integrate CI into Differential)
 * Ops Puppet

Browser test creation change (the matrix building)
 * defining and enforcing test ownership responsibilities

Not pulling from your repo (including MW Core) unless your tests are green, period. Want it to be deployed? Fix your tests. You own your code and tests.
 * First pass is to only block on what we already block on (ie: voting tests in Jenkins)

TODO: Chad or Tyler to send the "no more Trebuchet for new services, kthx" email to Ops

TODO: make a timeline tin$ find /srv/deployment -maxdepth 2 -wholename '*/*/*/*/*'|wc -l 58 tin$ Should be a list of everything: https://github.com/wikimedia/operations-puppet/blob/production/hieradata/common/role/deployment.yaml#L1 Which is 40 repos grepping operations/puppet for 'provider.*trebuchet' gives 30 (the truth is somewhere in between?)
 * 1) get a list of repos from ops/puppet
 * 2) order by last deploy change, descending
 * 3) schedule x repos per week over the quarter

browser tests discussion
 * when things start failing there are long gaps before diagnosis and then fixing
 * people assume it's just an issue with CI or the tests themselves
 * how to put a little bit of pressure on people to diagnose/fix failed tests
 * integrate diagnosis of tests before train would put the pressure on people
 * if we do this we need a way to correlate failures and changes in code
 * if we had a deploy dashboard, when it started and the commits in between, and the test status
 * we could see if we're going to be in a good place before the train
 * can offer the pre-merge voting browser test job


 * give warning of 2 weeks

Sun  Mon     Tues     Wed     Thur      Fri    Sat g1         g2         g0

Sun  Mon     Tues     Wed     Thur      Fri    Sat g0                   g1          g2

Antoine: deploy to G0, run all browser tests against them. If any is red: DEPLOY FREEZE

Reduce CI Wait time

 * KPI: https://grafana.wikimedia.org/dashboard/db/releng-kpis?panelId=2&fullscreen
 * Migrate remaining CI jobs to Nodepool -
 * php composer (Zend and HHVM) -
 * as many miscellaneous jobs as possible -
 * Migrate Jenkins to Jessie -

Antoine:
 * Looot of reviews
 * Lurking at daily browser tests refactoring
 * Nodepool had files corrupted
 * Nodepool instances hiera is badly configured
 * Nodepool upgrade this 7th march at 20:00 UTC to speed up deletion (faster pool replenishment, might grow pool as well)

Consolidate deploy tools

 * Migrate MediaWiki to scap3 -
 * Q2 Quarterly Goal hold over: Migrate all Service team owned services and MW deploys to scap3 - https://phabricator.wikimedia.org/T109926

Differential Migration

 * https://etherpad.wikimedia.org/p/diffuerential-weekly
 * Integrate Differential with our Continuous Integration infrastructure -
 * build debian packages from differential: https://integration.wikimedia.org/ci/job/beta-build-deb/
 * Shepherd the RFC -
 * Garner early adopter projects (goal: 1 project per WMF "team")