Wikimedia Release Engineering Team/Checkin archive/20180917

From mediawiki.org


2018-09-17[edit]

Vacations/Important dates[edit]

https://office.wikimedia.org/wiki/HR_Corner/Holiday_List
How to do it
  • Mid september - Mid october, Antoine to take off some weeks/days/part time
  • October 5th (Friday) - Željko on a conference (https://2018.webcampzg.org/ )
  • October 8th - Holiday (Indigenous People's Day, Independence Day - Željko)
  • November 1 (Thursday) - Holiday (All Saints' Day - Željko)
  • November 9th - Holiday (Veteran's Day)
  • November 22+23 - Holidays (Thanksgiving)
  • Week of December 3rd - Team offsite
  • December 24-28 - Holidays (Christmas)

Rotating positions[edit]

Train[edit]

Maniphest query for deployment blocker tasks: https://phabricator.wikimedia.org/maniphest/?project=PHID-PROJ-fmcvjrkfvvzz3gxavs3a&statuses=open%28%29&group=none&order=newest#R
  • July 02 - wmf.11 - Zeljko - no train, Fourth of July
  • July 09 - wmf.12 - Zeljko
  • July 16 - wmf.13 - Zeljko
  • July 23 - wmf.14 - Zeljko
  • July 30 - wmf.15 - Mukunda
  • Aug 06 - wmf.16 - Mukunda
  • Aug 13 - wmf.17 - Mukunda (No train - Wednesday is a holiday)
  • Aug 20 - wmf.18 - Tyler
  • Aug 27 - wmf.19 - Dan && Antoine lurking over the shoulders
  • Sep 03 - wmf.20 - Antoine
  • Sep 10 - wmf.21 - Antoine (No train due to DC switchover)
  • Sep 17 - wmf.22 - Antoine <----
  • Sep 24 - wmf.23 - Zeljko (only one week for me? -- Željko)
  • Oct 01 - wmf.24 - Dan
  • Oct 08 - wmf.25 - Dan (No train due to DC switchover)
  • Oct 15 - wmf.26 - Mukunda (last 1.32 wmf.XX release, 1.33 starts the next week)
  • Oct 22 - wmf.1 - Mukunda

SoS[edit]

  • July 04 - Dan
  • July 11 - Antoine
  • July 18 - Antoine
  • July 25 - Tyler
  • Aug 01 - Tyler
  • Aug 08 - Zeljko
  • Aug 15 - Dan (No SoS this week)
  • Aug 22 - Zeljko
  • Aug 29 - Zeljko
  • Sep 05 - Tyler / Željko
  • Sep 12 - Tyler / Željko
  • Sep 19 - Dan / Željko <----
  • Sep 26 - Zeljko
  • Oct 03 - Zeljko
  • Oct 10 - Zeljko
  • Oct 17 - Zeljko
  • Oct 24 - Zeljko
  • Oct 31 - Zeljko

Team Business[edit]

Hiring[edit]


First Offsite[edit]

Details:

  • Week of December 3rd
  • At the Queen Mary hotel in Long Beach
  • Deb T will be facilitating

Topics!

Development plans[edit]

  • Due end of month
  • We'll review on Wednesday the 26th


Needs attention[edit]

  • [Ops] Use of mwdebug2XXX for mediawiki deployers during codfw switch

Google Code In ?[edit]

Scrum of Scrums[edit]

Greg to copy to etherpad after meeting: https://etherpad.wikimedia.org/p/Scrum-of-Scrums

This week[edit]

Release Engineering[edit]

  • Blocked by:
  • Blocking:
  • Updates:
    • Train Health: no train last week due to DC switchover, train continues this week
    • Log Health:
    • Code Health:
      • Code Health Metrics Working Group Kickoff last week
      • Code Health Metrics Working Group meeting this week - further discuss/define the workgroup's scope and next steps

Last week[edit]

Release Engineering[edit]


Train status and happenings[edit]

https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Roles#Train_Conductor


Past week status updates[edit]

All of it in table form: https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Goals/201718Q4

Quaterly Goals for Q1[edit]

TEC 1[edit]

Output 1.1

  • Determine the procedure and requirements for an automated MediaWiki branch cut


RAW NOTES OMG[edit]
  • Generate a deployment, and reporting
    • automate branch cut
    • automate change log upload
      • reupdate it upon backports
    • add all people with commits in the current train as subscribers to the weekly train task
      • many people would always be on there, though?
  • deployment metrics
    • # of commits/committers
    • on schedule/rollbacks

Pipeline: Move verify stage from Minikube to CI k8s namespace in production context[edit]

tracking task
  • Output 3.1
    • Zotero v2
    • graphoid
    • blubberoid
  • Develop set of metrics to assess incident reports/post mortems. (NB: see the killer spreadsheet)

PUNT:

  • Determine how to gather the Code Health metrics in a programmatic way
    • Q3: Create a deployments report with metrics from the Code Health Group.


Code Health[edit]

  • T199253 - Investigate and propose record of origin (ROO) for deployed code (currently Developers/Maintainers page)
    • reached out to Mark/Faidon to talk about proposal.
  • Perform existing Stewardship review process for Q1 cycle.
    • nothing to review this Q
  • T199254 - Add test evaluation to post mortem review process.
  • Review existing e2e test coverage.
  • Define prioritization scheme.
    • Zeljko and I talked about prioritizing scheme.
  • Prioritize e2e testing gaps.
  • T199257 - make current unit testing coverage more visible by reporting out to Engineering Management.
  • T199259 - Platform and Search Platform teams are using TDM PoC
    • on hold
  • T199262 - Identify key Tech Debt areas
    • on hold
  • T199263 - Put in place Tech Debt management process for PEP
    • on hold
  • T199261 - Define base Code Health metric set.
    • held workgroup kickoff meeting. Additional async discussions took place as well. Meeting again this week.


Developer Productivity[edit]

  • Make a hire to create the capacity needed for this program.
  • Write and share a survey to measure developer satisfaction and areas for investment. - task T197635


Other work[edit]

Selenium[edit]

  • Q1 goals task: T198389 Q1 Selenium framework improvements
    • T179188 Video recording for Selenium tests in Node.js
    • T199113 All repositories with Selenium tests should use wdio-mediawiki
      • 3 out of 13 repos remaining (AdvancedSearch, TwoColConflict, WikibaseLexeme), at legendary 80%, so about 20% of code (or 80% of effort) still TODO
    • T188742 Run tests daily targeting beta cluster for all repositories with Selenium tests
        • 4 out of 13 repos failing (31%), all errors are due repos not using wdio-mediawiki

Gerrit[edit]

Phabricator[edit]

  • Working on a phab blog post about task types and custom fields. Will ask for editorial review before posting.

Jenkins[edit]

QA[edit]

Standup![edit]

Antoine[edit]

  • What I plan to do this week
  • What I'm blocked on
  • Other?


Dan[edit]

  • What I plan to do this week
    • Get integration/{pipelinelib,config} patches merged (finish up https://phabricator.wikimedia.org/T196940)
    • Continue futzing with an integration-prometheus to collect Jenkins build stats
    • Migrate m1.medium instances to bigram instances and continue comparing stats
      • Configure some bigram instances with 6 executors and compare stats with those w/ 4
  • What I'm blocked on
  • Other?


Greg[edit]

  • What I plan to do this week
    • read and do manager development task
    • get everyone's dev plan in place (mostly)
    • get our Q2 goals ready
  • What I'm blocked on
  • Other?


Jean-Rene[edit]

  • What I plan to do this week
    • Review Queue refresh/ROO work
    • Code Health Metrics
    • Code Coverage report
  • What I'm blocked on
  • Other?
    • New laptop seems to no longer be captured in the web of enterprise monitoring


Mukunda[edit]

  • What I plan to do this week
    • Lots of writing this week
      • Phab custom fields blog post
      • scap swat documentation
      • Finish development plan
  • What I'm blocked on
  • Other?


Tyler[edit]

  • What I plan to do this week
    • CoC gerrit and such
    • Make deployment-mwmaint (try to)
    • reviews
      • keyholder
      • pipeline
    • development plan stuffs
    • maintenance script still has some timeouts
  • What I'm blocked on
  • Other?


Zeljko[edit]

  • What I plan to do this week
    • T179188 Video recording for Selenium tests in Node.js
    • T199113 All repositories with Selenium tests should use wdio-mediawiki
      • 3 out of 13 repos remaining (AdvancedSearch, TwoColConflict, WikibaseLexeme), at legendary 80%, so about 20% of code (or 80% of effort) still TODO
    • T188742 Run tests daily targeting beta cluster for all repositories with Selenium tests
        • 4 out of 13 repos failing (31%), all errors are due repos not using wdio-mediawiki
    • Code Health Q1
      • Review existing e2e test coverage.
      • Define prioritization scheme.
      • Prioritize e2e testing gaps.
  • What I'm blocked on
  • Other?
    • My 2006 Kia managed to pass yearly techical review this week :flexing biceps:
    • Slighly reduced availability this and the next week, visiting doctors for another muscle injury :|

Grooming[edit]

Team Kanban Board Review and Triage[edit]


Once / month-ish review of backlog(s)[edit]


Kanban stats[edit]

Burnup chart