Wikimedia Release Engineering Team/Checkin archive/20180820

= 2018-08-20 =

Vacations/Important dates

 * https://office.wikimedia.org/wiki/HR_Corner/Holiday_List
 * How to do it


 * August 13-24: Greg vacation
 * August 23-24 (Thursday-Friday): Željko vacation
 * August ~: Antoine
 * August 29-31: Dan vacation
 * September a week or so - Antoine

Train

 * Maniphest query for deployment blocker tasks: https://phabricator.wikimedia.org/maniphest/?project=PHID-PROJ-fmcvjrkfvvzz3gxavs3a&statuses=open%28%29&group=none&order=newest#R


 * July 02 - wmf.11 - Zeljko - no train, Fourth of July
 * July 09 - wmf.12 - Zeljko
 * July 16 - wmf.13 - Zeljko
 * July 23 - wmf.14 - Zeljko
 * July 30 - wmf.15 - Mukunda
 * Aug 06 - wmf.16 - Mukunda
 * Aug 13 - wmf.17 - Mukunda  (No train - Wednesday is a holiday)
 * Aug 20 - wmf.18 - Tyler   <
 * Aug 27 - wmf.19 - Dan
 * Sep 03 - wmf.20 - Tyler
 * Sep 10 - wmf.21 - Dan
 * Sep 17 - wmf.22 - Zeljko
 * Sep 24 - wmf.23 - Zeljko
 * Oct 01 - wmf.24 - Antoine
 * Oct 08 - wmf.25 - Antoine
 * Oct 15 - wmf.26 - Mukunda (last 1.32 wmf.XX release, 1.33 starts the next week)
 * Oct 22 - wmf.1 - Mukunda

SoS

 * July 04 - Dan
 * July 11 - Antoine
 * July 18 - Antoine
 * July 25 - Tyler
 * Aug 01 - Tyler
 * Aug 08 - Zeljko
 * Aug 15 - Dan (probably not SoS because it's a WMF holiday?)
 * Aug 22 - Zeljko   < (Željko can go to SoS for the next few weeks since he has done 1 SoS so far)
 * Aug 29 - Mukunda
 * Sep 05 - Tyler
 * Sep 12 - Tyler
 * Sep 19 - Dan
 * Sep 26 - Dan
 * Oct 03 - Zeljko
 * Oct 10 - Zeljko
 * Oct 17 - Antoine
 * Oct 24 - Antoine
 * Oct 31 - Mukunda

First Offsite

 * waiting to hear back confirmation from Travel but... I was told that no more offsites can be scheduled next to TechConf in Portland in October, so the week of Nov 5th it is.

Needs attention

 * Create a production test wiki in group0 to parallel Wikimedia Commons - https://phabricator.wikimedia.org/T197616
 * Status: Mark H and Amanda reached out to me, I asked for a meeting with Mark H.


 * Re-evaluate use of "Dependent Pipeline" in Zuul for gate-and-submit - https://phabricator.wikimedia.org/T94322
 * ^ for antoine

Scrum of Scrums

 * Greg to copy to etherpad after meeting: https://etherpad.wikimedia.org/p/Scrum-of-Scrums


 * Already added the Code Health Metric working group info to the SoS etherpad.

Release Engineering

 * Blocked by:
 * Feedback needed (on how problems could have been prevented) from many people/teams on a recent MediaWiki train related incident report.
 * 1.32.0-wmf.13, 9 blockers, feedback needed for 8 of them: https://wikitech.wikimedia.org/wiki/Incident_documentation/20180717-Train
 * Aaron Schulz (Performance), Adam Wight (Scoring Platform), Bartosz Dziewoński (Contributors), Brad Jorsch (MediaWiki Platform), C. Scott Ananian (Contributors), Daniel Kinzler (Wikimedia Deutschland), Timo Tijhof (Performance), Prateek Saxena (Audiences Design)
 * Blocking:
 * MediaWiki 1.29 final release and EOL; was due in June: https://phabricator.wikimedia.org/T197669 (w/ Security)
 * Updates
 * New general purpose CI job that builds and runs test containers via Blubber/Docker based on config provided in each project (think `.travis.yml` file)
 * Read more about Blubber here: https://wikitech.wikimedia.org/wiki/Blubber
 * See recent builds at https://integration.wikimedia.org/ci/blue/organizations/jenkins/blubber-test/activity
 * Gives developers one major benefit of the CD pipeline work now, having control over their pre-merge and gating tests without having to mess with integration/config
 * Only scheduled to run for a few repos at the moment, but will eventually be expanded to many more projects (we need to tune CI infra around it first)
 * Looking for more participants to join the Code Health Metrics working group.  This group's purpose is to define and later implement a set of core metrics that we will use to asses the health of our code base.  More info:  https://www.mediawiki.org/wiki/Code_Health_Group/projects/Code_Health_Metrics

Last week
Last week didn't happen due to holiday


 * Blocked by:
 * Blocking:
 * Feedback needed from various teams (too many to name each one) on two recent MediaWiki train related incident reports. Specifically, how problems could have been prevented.
 * 1.32.0-wmf.13, 9 blockers, feedback needed for all of them: https://wikitech.wikimedia.org/wiki/Incident_documentation/20180717-Train
 * 1.32.0-wmf.14, 6 blockers, feedback needed for 2 of them: https://wikitech.wikimedia.org/wiki/Incident_documentation/20180724-Train
 * Updates
 * Blubber test
 * Code health working group -- join up!
 * Quarterly cross-dependencies
 * Quarterly cross-dependencies

Train status and happenings

 * https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Roles#Train_Conductor

Past week status updates

 * All of it in table form: https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Goals/201718Q4

Pipeline: Move verify stage from Minikube to CI k8s namespace in production context

 * tracking task


 * Done?
 * Dan made a blubberd for fun (`curl -s --data-binary @blubber.yaml http://tools.wmflabs.org/blubber/test`)
 * Evaluate strategy for Docker/CI capacity https://phabricator.wikimedia.org/T202160

Code Health

 * T199253 - Investigate and propose record of origin (ROO) for deployed code (currently Developers/Maintainers page)
 * Perform existing Stewardship review process for Q1 cycle.
 * T199254 - Add test evaluation to post mortem review process.
 * Added test evaluation to PM template.  Need to wiki-ize it now.
 * Review existing e2e test coverage.
 * Define prioritization scheme.
 * Prioritize e2e testing gaps.
 * T199257 - make current unit testing coverage more visible by reporting out to Engineering Management.
 * T199259 - Platform and Search Platform teams are using TDM PoC
 * T199262 - Identify key Tech Debt areas
 * T199263 - Put in place Tech Debt management process for PEP
 * T199261 - Define base Code Health metric set.
 * Base Workgroup defined (Kumal, Petr, Guillaume, me).  TechCom to participate as reviewers.

Developer Productivity

 * Make a hire to create the capacity needed for this program.
 * Write and share a survey to measure developer satisfaction and areas for investment. -

Selenium

 * Q1 goals task: T198389 Q1 Selenium framework improvements
 * T179188 Video recording for Selenium tests in Node.js
 * T193157 Quibble does not install ffmpeg - comments from Antoine at https://gerrit.wikimedia.org/r/c/integration/quibble/+/451645

Gerrit

 * Kaldari is able to log in again! https://phabricator.wikimedia.org/T197083
 * \o/ nice!

Phabricator

 * Antivandalism
 * I intend to publish the source for phabricator-antivandalism after I move some parameters to configuration values so that the "secret sauce" isn't in the source. https://phabricator.wikimedia.org/T202080
 * I fixed a couple of other false positives last week, need to deploy the code this week.

QA

 * Had discussion with Audiences team (EMs and QA folks) regarding QA Career path.

Antoine

 * What I plan to do this week
 * More Nodepool/Quibble migrations :/
 * Write a document about running less tests
 * What I'm blocked on
 * Mail backlog
 * Other?
 * Are we migrating Differential repos to Gerrit?
 * Mukunda: Harbormaster/Nodepool job is only used for Scap

Dan

 * What I plan to do this week
 * Refactor service-pipeline job using integration/pipelinelib
 * Trying out larger CI instances with more Jenkins executors
 * What I'm blocked on
 * Other?
 * Other?

Greg

 * What I plan to do this week
 * Be on Vacation
 * What I'm blocked on
 * Other?
 * Other?

Jean-Rene

 * What I plan to do this week
 * T199261 - Define base Code Health metric set.
 * Organize/set up WG kickoff
 * complete Group wiki page and add it to SoS section
 * T199254 - Add test evaluation to post mortem review process
 * Wikiize PM template
 * Perform existing Stewardship review process for Q1 cycle.
 * Kickoff Q1 review cycle


 * What I'm blocked on
 * Other?
 * Other?

Mukunda

 * What I plan to do this week
 * Deploy updates to phabricator-antivandalism
 * Develop a plan/schedule for upcoming work
 * Phabricator wishlist stuff
 * Look at swat workflow changes \o/
 * What I'm blocked on
 * Other?
 * Other?

Tyler

 * What I plan to do this week
 * review blubberoid
 * releng.team https
 * Move dist/pipeline -> .pipeline
 * Review paladox work on gerrit avatars
 * Deploy depool for nodes where disk is > 95% full
 * What I'm blocked on
 * Other?
 * Other?

Zeljko

 * What I plan to do this week
 * T179188 Video recording for Selenium tests in Node.js
 * T193157 Quibble does not install ffmpeg - will merge and deploy today with Tyler https://gerrit.wikimedia.org/r/c/integration/quibble/+/451645
 * Retrospective - Train Conducting
 * Review edits for two recent train incident reports (.13 and .14) to see if more feedback is needed
 * https://www.mediawiki.org/wiki/Wikimedia_Technology/Annual_Plans/FY2019/TEC13:_Code_Health/Goals#Outcome_2_/_Output_2.2
 * What I'm blocked on
 * Other?
 * Other?

Team Kanban Board Review and Triage

 * closed and touched in the 7 days
 * No update for 4 weeks
 * No update for 3 weeks
 * No update for 2 weeks
 * No update for 1 week
 * All Open
 * Review To Triage column of #releng
 * Assigned
 * Unassigned

Once / month-ish review of backlog(s)

 * releng Review To Triage column of #releng
 * releng-kanban Review unassigned in kanban
 * releng-kanban Review 'backlog' colum of -kanban
 * releng-next - Review for things we need to put on our kanban backlog
 * releng-backlog - oh my, the huge backlog of things...

Kanban stats

 * Burnup chart