Wikimedia Release Engineering Team/Checkin archive/20180917

= 2018-09-17 =

Vacations/Important dates

 * https://office.wikimedia.org/wiki/HR_Corner/Holiday_List
 * How to do it


 * Mid september - Mid october, Antoine to take off some weeks/days/part time
 * October 5th (Friday) - Željko on a conference (https://2018.webcampzg.org/ )
 * October 8th - Holiday (Indigenous People's Day, Independence Day - Željko)
 * November 1 (Thursday) - Holiday (All Saints' Day - Željko)
 * November 9th - Holiday (Veteran's Day)
 * November 22+23 - Holidays (Thanksgiving)
 * Week of December 3rd - Team offsite
 * December 24-28 - Holidays (Christmas)

Train

 * Maniphest query for deployment blocker tasks: https://phabricator.wikimedia.org/maniphest/?project=PHID-PROJ-fmcvjrkfvvzz3gxavs3a&statuses=open%28%29&group=none&order=newest#R


 * July 02 - wmf.11 - Zeljko - no train, Fourth of July
 * July 09 - wmf.12 - Zeljko
 * July 16 - wmf.13 - Zeljko
 * July 23 - wmf.14 - Zeljko
 * July 30 - wmf.15 - Mukunda
 * Aug 06 - wmf.16 - Mukunda
 * Aug 13 - wmf.17 - Mukunda (No train - Wednesday is a holiday)
 * Aug 20 - wmf.18 - Tyler
 * Aug 27 - wmf.19 - Dan && Antoine lurking over the shoulders
 * Sep 03 - wmf.20 - Antoine
 * Sep 10 - wmf.21 - Antoine (No train due to DC switchover)
 * Sep 17 - wmf.22 - Antoine <
 * Sep 24 - wmf.23 - Zeljko (only one week for me? -- Željko)
 * Oct 01 - wmf.24 - Dan
 * Oct 08 - wmf.25 - Dan (No train due to DC switchover)
 * Oct 15 - wmf.26 - Mukunda (last 1.32 wmf.XX release, 1.33 starts the next week)
 * Oct 22 - wmf.1 - Mukunda

SoS

 * July 04 - Dan
 * July 11 - Antoine
 * July 18 - Antoine
 * July 25 - Tyler
 * Aug 01 - Tyler
 * Aug 08 - Zeljko
 * Aug 15 - Dan (No SoS this week)
 * Aug 22 - Zeljko
 * Aug 29 - Zeljko
 * Sep 05 - Tyler / Željko
 * Sep 12 - Tyler / Željko
 * Sep 19 - Dan / Željko <
 * Sep 26 - Zeljko
 * Oct 03 - Zeljko
 * Oct 10 - Zeljko
 * Oct 17 - Zeljko
 * Oct 24 - Zeljko
 * Oct 31 - Zeljko

Hiring

 * Software Engineer position open and reviewing/hiring for now
 * https://boards.greenhouse.io/wikimedia/jobs/1225258

First Offsite
Details:
 * Week of December 3rd
 * At the Queen Mary hotel in Long Beach
 * Deb T will be facilitating

Topics!
 * https://etherpad.wikimedia.org/p/RelEng-Offsite-201811-Topics

Development plans

 * Due end of month
 * We'll review on Wednesday the 26th

Needs attention

 * 2018-09-10 -- Gerrit Privacy Policy & CoC patch
 * https://phabricator.wikimedia.org/T196835
 * 2018-09-17 -- Patches for new UI:
 * (ops/puppet) Replace polygerrit theme in repo: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/458523/
 * (gerrit) Remove from repo: https://gerrit.wikimedia.org/r/#/c/operations/software/gerrit/+/458524/
 * (ops/puppet) Add footer link for new UI: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/458833/
 * (ops/puppet) Add footer link for old UI: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/460914/
 * All applied to: http://gerrit.tylercipriani.com:8080


 * 2018-09-10 -- Run mediawiki::maintenance scripts in Beta Cluster
 * https://phabricator.wikimedia.org/T125976
 * Tyler to create instance
 * 2018-09-17 - not done


 * [Ops] Use of mwdebug2XXX for mediawiki deployers during codfw switch

Google Code In ?

 * https://lists.wikimedia.org/pipermail/wikitech-l/2018-September/090799.html
 * interest? Need small/easy-ish tasks that you're willing to help someone think through and review.

Scrum of Scrums

 * Greg to copy to etherpad after meeting: https://etherpad.wikimedia.org/p/Scrum-of-Scrums

Release Engineering

 * Blocked by:
 * [WMCS] Increased quotas for vcpu and memory in integration project: https://phabricator.wikimedia.org/T204373
 * Blocking:
 * Updates:
 * Train Health: no train last week due to DC switchover, train continues this week
 * Log Health:
 * Code Health:
 * Code Health Metrics Working Group Kickoff last week
 * Code Health Metrics Working Group meeting this week - further discuss/define the workgroup's scope and next steps
 * Code Health Metrics Working Group meeting this week - further discuss/define the workgroup's scope and next steps

Release Engineering

 * Blocked by:
 * DBA (in support of Reedy): https://phabricator.wikimedia.org/T174802 (EducationProgram db dump in prep of removing the extension)
 * Blocking:
 * Language RelEng to review: https://gerrit.wikimedia.org/r/450508
 * Updates:
 * Train:
 * we had a UBN! backport needed on Thursday ( https://phabricator.wikimedia.org/T203566 )
 * This has been thoroughly documented in https://phabricator.wikimedia.org/T156541 and it is a regularly recurring problem which causes production breakage every time the structure of a class is changed in an incompatible way. We can do better!
 * Log Health:
 * Exception thrown for failure to save settings appears ~ 1000 times/day: https://phabricator.wikimedia.org/T202149 (Note: add to SoS Callouts)

Train status and happenings

 * https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Roles#Train_Conductor

Past week status updates

 * All of it in table form: https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Goals/201718Q4

TEC 1
Output 1.1
 * Determine the procedure and requirements for an automated MediaWiki branch cut

RAW NOTES OMG

 * Generate a deployment, and reporting


 * automate branch cut
 * automate change log upload
 * reupdate it upon backports
 * add all people with commits in the current train as subscribers to the weekly train task
 * many people would always be on there, though?
 * deployment metrics
 * # of commits/committers
 * on schedule/rollbacks

Pipeline: Move verify stage from Minikube to CI k8s namespace in production context

 * tracking task


 * Output 3.1
 * Zotero v2
 * graphoid
 * blubberoid
 * Develop set of metrics to assess incident reports/post mortems. (NB: see the killer spreadsheet)

PUNT:
 * Determine how to gather the Code Health metrics in a programmatic way
 * Q3: Create a deployments report with metrics from the Code Health Group.

Code Health

 * T199253 - Investigate and propose record of origin (ROO) for deployed code (currently Developers/Maintainers page)
 * reached out to Mark/Faidon to talk about proposal.
 * Perform existing Stewardship review process for Q1 cycle.
 * nothing to review this Q
 * T199254 - Add test evaluation to post mortem review process.
 * Review existing e2e test coverage.
 * Define prioritization scheme.
 * Zeljko and I talked about prioritizing scheme.
 * Prioritize e2e testing gaps.
 * T199257 - make current unit testing coverage more visible by reporting out to Engineering Management.
 * T199259 - Platform and Search Platform teams are using TDM PoC
 * on hold
 * T199262 - Identify key Tech Debt areas
 * on hold
 * T199263 - Put in place Tech Debt management process for PEP
 * on hold
 * T199261 - Define base Code Health metric set.
 * held workgroup kickoff meeting. Additional async discussions took place as well.  Meeting again this week.

Developer Productivity

 * Make a hire to create the capacity needed for this program.
 * Write and share a survey to measure developer satisfaction and areas for investment. -

Selenium

 * Q1 goals task: T198389 Q1 Selenium framework improvements
 * T179188 Video recording for Selenium tests in Node.js
 * Patch in final review. Timo said code is fine but videos don't work. I've checked and videos work. :| https://gerrit.wikimedia.org/r/c/mediawiki/core/+/422933
 * T199113 All repositories with Selenium tests should use wdio-mediawiki
 * 3 out of 13 repos remaining (AdvancedSearch, TwoColConflict, WikibaseLexeme), at legendary 80%, so about 20% of code (or 80% of effort) still TODO
 * T188742 Run tests daily targeting beta cluster for all repositories with Selenium tests
 * 4 out of 13 repos failing (31%), all errors are due repos not using wdio-mediawiki

Gerrit

 * Merging accounts in notedb feature request (of interest, not us directly): https://bugs.chromium.org/p/gerrit/issues/detail?id=9716

Phabricator

 * Working on a phab blog post about task types and custom fields. Will ask for editorial review before posting.

Jenkins

 * Planning migration for m1.medium instances to bigram instances, based on stats from last week showing improvement gains with the latter
 * https://phabricator.wikimedia.org/T202160

Antoine

 * What I plan to do this week
 * What I'm blocked on
 * Other?
 * Other?
 * Other?

Dan

 * What I plan to do this week
 * Get integration/{pipelinelib,config} patches merged (finish up https://phabricator.wikimedia.org/T196940)
 * Continue futzing with an integration-prometheus to collect Jenkins build stats
 * Migrate m1.medium instances to bigram instances and continue comparing stats
 * Configure some bigram instances with 6 executors and compare stats with those w/ 4
 * What I'm blocked on
 * Quota increase for integration project https://phabricator.wikimedia.org/T204373
 * Other?

Greg

 * What I plan to do this week
 * read and do manager development task
 * get everyone's dev plan in place (mostly)
 * get our Q2 goals ready
 * What I'm blocked on
 * Other?
 * Other?

Jean-Rene

 * What I plan to do this week
 * Review Queue refresh/ROO work
 * Code Health Metrics
 * Code Coverage report
 * What I'm blocked on
 * Other?
 * New laptop seems to no longer be captured in the web of enterprise monitoring
 * New laptop seems to no longer be captured in the web of enterprise monitoring

Mukunda

 * What I plan to do this week
 * Lots of writing this week
 * Phab custom fields blog post
 * scap swat documentation
 * Finish development plan
 * What I'm blocked on
 * Other?
 * Other?

Tyler

 * What I plan to do this week
 * CoC gerrit and such
 * Make deployment-mwmaint (try to)
 * reviews
 * keyholder
 * pipeline
 * development plan stuffs
 * maintenance script still has some timeouts
 * What I'm blocked on
 * Other?
 * Other?

Zeljko

 * What I plan to do this week
 * T179188 Video recording for Selenium tests in Node.js
 * Patch in final review. Timo said code is fine but videos don't work. I've checked and videos work. :| https://gerrit.wikimedia.org/r/c/mediawiki/core/+/422933
 * T199113 All repositories with Selenium tests should use wdio-mediawiki
 * 3 out of 13 repos remaining (AdvancedSearch, TwoColConflict, WikibaseLexeme), at legendary 80%, so about 20% of code (or 80% of effort) still TODO
 * T188742 Run tests daily targeting beta cluster for all repositories with Selenium tests
 * 4 out of 13 repos failing (31%), all errors are due repos not using wdio-mediawiki
 * Code Health Q1
 * Review existing e2e test coverage.
 * Define prioritization scheme.
 * Prioritize e2e testing gaps.
 * What I'm blocked on
 * T179188 Video recording for Selenium tests in Node.js
 * Patch in final review. Timo said code is fine but videos don't work. I've checked and videos work. :| https://gerrit.wikimedia.org/r/c/mediawiki/core/+/422933
 * Other?
 * My 2006 Kia managed to pass yearly techical review this week :flexing biceps:
 * Slighly reduced availability this and the next week, visiting doctors for another muscle injury :|

Team Kanban Board Review and Triage

 * closed and touched in the 7 days
 * No update for 4 weeks
 * No update for 3 weeks
 * No update for 2 weeks
 * No update for 1 week
 * All Open
 * Review To Triage column of #releng
 * Assigned
 * Unassigned

Once / month-ish review of backlog(s)

 * releng Review To Triage column of #releng
 * releng-kanban Review unassigned in kanban
 * releng-kanban Review 'backlog' colum of -kanban
 * releng-next - Review for things we need to put on our kanban backlog
 * releng-backlog - oh my, the huge backlog of things...

Kanban stats

 * Burnup chart