Wikimedia Release Engineering Team/Checkin archive/20160307

From mediawiki.org

2016-03-07[edit]

Vacations/Important dates[edit]

How to do it: https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Time_off

  • March 11th - draft Q4 (April 1st - June 30th) goals due
  • March 11th - Ĺ˝eljko - conference
  • March 14th - Antoine can't make it to weekly team meeting
  • March 25th Friday - Tyler
  • March 28th - Antoine && Ĺ˝eljko - local holiday (Easter Monday)
  • March 31st - April 3rd : Hackathon in Israel
  • April 1st - Q4 goals published
  • April - Antoine: holidays one of the two first weeks
  • May 6th Friday - Antoine
  • May 9-Mid June-ish?: Greg - paternity leave - exact dates TBD
  • May 17-(?): Dan - paternity leave :D
  • Late May - draft Q1 (July 1st - Sept 30th) due
  • May 30: US HOLIDAY - Memorial Day
  • June 15-24: Chad - Vegas/EDC
  • June 22nd - 28th : Wikimania in Italy
  • July 1st - Q1 goals published
  • July 1st – Annual Plan, Budget, Risks Document and FAQ are posted
  • August: Antoine - France holiday - because french. :)
  • January 2017 : Dev Summit + All Hands (presumably)

Team Business[edit]

Rotating positions[edit]

Train conductor[edit]

Week of ...

  • Mar 7: Mukunda
  • Mar 14: Mukunda
  • Mar 21: Tyler - Code freeze, due to the eqiad -> codfw switch over (announcement:
    • So we need to make sure Mar 14th week is super stable.
  • Mar 28: Tyler

Scrum of Scrums representative[edit]

(bad time for EU folks) Dan, Tyler, Chad, Mukunda Week of ...

  • Mar 7: Chad
  • Mar 14: Chad
  • Mar 21: Mukunda


= CI point person[edit]

  • reassess later


Actions from last meeting[edit]

  • TODO - No One Yet: investigate carbon aggregation of stats >1 month old behavior
    • ACTION: Antoine to create a task
      • Overdue

New vs Maint time spent[edit]


Scrum of Scrums[edit]

https://phabricator.wikimedia.org/project/board/64/
Blocked on us: https://phabricator.wikimedia.org/maniphest/query/h7YTCBTJsepS/#R

Only thing new was from Chris Steipp The TOC issue: https://phabricator.wikimedia.org/T124356

For this week:

Other Team Business[edit]

Annual Planning[edit]

Spreadsheet (team only) - https://docs.google.com/spreadsheets/d/1GBokh9zeO5vflAAZLjMuagV4FeFQHCFrApjs_KXNZ7o/edit#gid=0
Planning worksheet: https://docs.google.com/spreadsheets/d/1ZsB0RCoZD3a6qKsX-qkCpA3HK81mNrZYI3GXeiuzzI0/edit#gid=0

Q4 Goals[edit]

https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Goals/201516Q4

What we said for next fiscal: https://docs.google.com/spreadsheets/d/1ZsB0RCoZD3a6qKsX-qkCpA3HK81mNrZYI3GXeiuzzI0/edit#gid=0

Phabricator maintenance

Scap decrease in time

Differential increase

  • things we have: debian packages
  • things we need:
    • MW Core (need define CI reality and actually integrate CI into Differential)
    • Ops Puppet

Browser test creation change (the matrix building)

  • defining and enforcing test ownership responsibilities

Not pulling from your repo (including MW Core) unless your tests are green, period. Want it to be deployed? Fix your tests. You own your code and tests.

  • First pass is to only block on what we already block on (ie: voting tests in Jenkins)

TODO: Chad or Tyler to send the "no more Trebuchet for new services, kthx" email to Ops

TODO: make a timeline

  1. get a list of repos from ops/puppet
  2. order by last deploy change, descending
  3. schedule x repos per week over the quarter
tin$ find /srv/deployment -maxdepth 2 -wholename '*/*/*/*/*'|wc -l
58
tin$

Should be a list of everything: https://github.com/wikimedia/operations-puppet/blob/production/hieradata/common/role/deployment.yaml#L1 Which is 40 repos grepping operations/puppet for 'provider.*trebuchet' gives 30 (the truth is somewhere in between?)


browser tests discussion

  • when things start failing there are long gaps before diagnosis and then fixing
    • people assume it's just an issue with CI or the tests themselves
  • how to put a little bit of pressure on people to diagnose/fix failed tests
  • integrate diagnosis of tests before train would put the pressure on people
  • if we do this we need a way to correlate failures and changes in code
    • if we had a deploy dashboard, when it started and the commits in between, and the test status
    • we could see if we're going to be in a good place before the train
  • can offer the pre-merge voting browser test job
  • give warning of 2 weeks

Sun Mon Tues Wed Thur Fri Sat

                     g1          g2         g0


Sun Mon Tues Wed Thur Fri Sat

          g0                    g1          g2       

Antoine: deploy to G0, run all browser tests against them. If any is red: DEPLOY FREEZE


Q3 goal/project check-in[edit]

Reduce CI Wait time[edit]

KPI: https://grafana.wikimedia.org/dashboard/db/releng-kpis?panelId=2&fullscreen

Antoine:

  • Looot of reviews
  • Lurking at daily browser tests refactoring
  • Nodepool had files corrupted
  • Nodepool instances hiera is badly configured
  • Nodepool upgrade this 7th march at 20:00 UTC to speed up deletion (faster pool replenishment, might grow pool as well)

Consolidate deploy tools[edit]

Migrate MediaWiki to scap3 - task T114313
Q2 Quarterly Goal hold over: Migrate all Service team owned services and MW deploys to scap3 - https://phabricator.wikimedia.org/T109926


Differential Migration[edit]

https://etherpad.wikimedia.org/p/diffuerential-weekly
Integrate Differential with our Continuous Integration infrastructure - task T31
build debian packages from differential: https://integration.wikimedia.org/ci/job/beta-build-deb/
Shepherd the RFC - task T119908
Garner early adopter projects (goal: 1 project per WMF "team")


Other Work[edit]

Browser tests cleanup of red tests[edit]

Beta Cluster[edit]