Wikimedia Release Engineering Team/Checkin archive/20190624

= 2019-06-24 =

Vacations/Important dates

 * https://office.wikimedia.org/wiki/HR_Corner/Holiday_List
 * How to do it


 * June 24 - Željko, vacation
 * June 25 - Željko, Statehood Day


 * July 2 - Greg's birthday, unsure if taking off, already have one meeting
 * July 4 (US Independence Day) - US Staff
 * July 5 - thcipriani vacation
 * July 5 - Lars off (swapping with weekend)
 * July 10 - Lars off (swapping with weekend)
 * July 22 - August 9 - Željko vacation


 * August 7–19 - James off (inc. Wikimania)
 * August 12 - September 8 - Dan leave
 * August 12 (Glorious Twelfth) - US Staff
 * August ??? - ??? - Antoine
 * August 14–18 - Wikimania
 * Attending: James, Lars, Jean-Rene
 * August 15 - Željko, Assumption of Mary
 * August 25 - September 4 - Brennen vacation


 * September 2 (Labor Day) - US Staff


 * October 14 (Indigenous Peoples' Day) - US Staff


 * November 11 (Veterans' Day) - US Staff
 * November 28–29 (Thanksgiving) - US Staff


 * December 6 - Lars, Finnish Independence Day
 * December 25–31 (Christmas) - US Staff
 * December 25–26 - Lars, Christmas


 * 2020 January 1 (New Year's Day) - US Staff, Lars

Train

 * Maniphest query for deployment blocker tasks: https://phabricator.wikimedia.org/maniphest/query/s3KW8bpsXhYF/#R


 * June 10 - wmf.9 - No Train (SRE Summit)
 * June 17 - wmf.10 - Mukunda (but Juneteenth on the Wednesday? Yes. Do group0 and group1 an hour apart on Tuesday)
 * June 24 - wmf.11 - Jeena (with Mukunda)
 * July 1 - wmf.12 - No train (Fourth of July)
 * July 8 - wmf.13 - Jeena
 * July 15 - wmf.14 - Lars (with Antoine)
 * July 22 - wmf.15 - Lars
 * July 29 - wmf.16 - Brennen (with Tyler)
 * Aug 5 - wmf.17 - Brennen
 * Aug 12 - wmf.18 - No Train (Wikimania)
 * Aug 19 - wmf.19 - Zeljko 😱
 * Aug 26 - wmf.20 - Zeljko 😭

SoS

 * Zeljko 4eva! :)

Timespent spreadsheet

 * For the avoidance of doubt: fill out the sheet week number for the previous week


 * link to week starting June 17: https://docs.google.com/spreadsheets/d/1urCLNQXeEi1DOR8Iu0qW0yPt-glxX1laqlMovbGyCW0/edit#gid=1318258617

Book club

 * https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Book_club
 * Notes: https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Book_club/Continuous_Delivery
 * Next: June 28th, chapters 12+13 (9am Pacific)

Fall Offsite + TechConf19

 * Decided: 1 long trip, offsite after TechConf
 * dates? 2019-11-1{6,7}--2019-11-21..ish

Annual Planning

 * https://docs.google.com/spreadsheets/d/1TrkGTfPLR0C74va3XyY6faYplSh6UggGiPdmxIVm1uo/edit#gid=0
 * All of our Outcomes/Key Deliverables and projects for next year
 * We need to determine Q1 goals this week/next week.
 * Yes, Tyler is wearing glasses

Changes to the meeting

 * Turn into more of a real stand-up (see new section: What I did last week) so that we can answer most of the other questions (e.g. what is the team blocked on?) from those individual updates.
 * Might also move this meeting to not be on Monday, e.g. Thursday/Friday so the accuracy of "what I did this week" will be much higher.
 * Annual plan/etc. discussions will move into one-off meetings rather than crashing the stand-up.
 * Engineering Productivity won't meet as a whole each week. Sub-team meetings will continue (for RelEng and Performance) and be set up (for Q&T) :-) Annual planning managed by managers.
 * SoS managed somehow?

Monthly reflection on accomplishments - May '19 edition

 * https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Monthly_notable_accomplishments
 * Add as you have them!


 * Phabricator vandalism rollback tool completed 🎉 (blog post? 😉)
 * Upgrade Zuul to 2.5.1-wmf6 (which unblocks the Gerrit upgrade to 2.16) - https://phabricator.wikimedia.org/T208426
 * Team offsite in Chicago
 * Repository-hosted CI/CD pipeline configurations now supported (.pipeline/config.yaml) - https://phabricator.wikimedia.org/T210267
 * Train notes published on branch cut
 * Codehealth pipeline beta - https://phabricator.wikimedia.org/phame/live/1/post/160/introducing_the_codehealth_pipeline_beta/
 * Some baseline local development images published
 * Speculative CI meta-architecture published within WMF for feedback
 * Old image versions automatically removed from jenkins agents when /var/lib/docker space > 80%
 * scap 3.10.0 cut
 * Jenkins build timings reports: https://people.wikimedia.org/~dduvall/jenkins/
 * Helped Kask team sketch an outline of its architecture (https://www.mediawiki.org/wiki/Kask)
 * Fatal Monitor with marker lines for deployments: https://logstash.wikimedia.org/app/kibana#/dashboard/77cc3e90-aa27-11e7-9109-51bd3197f7a9?_g=

Incoming/Needs attention

 * REL1_33 branching for extensions: https://phabricator.wikimedia.org/T220653
 * Reedy said he'll move forward with rc0 announcement soon.
 * Mukunda tried to run the script but it ran into trouble. Will re-try, manually.
 * Switching on HTTP Auth again still seems blocked. Barricade should help with this; review when Tyler gets back.
 * Update 2019-06-03: Fighting fires last; should be able to do this week.
 * Update 2019-06-10: Done with a quick hack by Reedy; do we need to fix the script for next time?
 * http auth patches merged in upstream, next week is the earliest it'll be released
 * Update 2019-06-17: Gerrit 2.15.14 is out, need to build and release, hopefully this week


 * Documentation!
 * Zuul and force merge: https://www.mediawiki.org/wiki/Topic:V14dlv7nt5ne7gsd
 * Antoine to file task and reply
 * Update 2019-06-24: https://phabricator.wikimedia.org/T225955 filed.

Release Engineering

 * Blocked by:
 * Security team (already acknowledged): Make phan-taint-check-plugin work on PHP > 7.0 so we can move CI to PHP72 https://phabricator.wikimedia.org/T207344
 * Core Platform Team:
 * (low priority): https://phabricator.wikimedia.org/T205361 is blocking undeployment of CodeReview.
 * MediaWiki installer silently ignores invalid extensions https://phabricator.wikimedia.org/T225512
 * SRE:
 * Traffic Team (low priority): https://phabricator.wikimedia.org/T213769 is blocking undeployment of Wikipedia Zero.
 * ServiceOps Team:
 * Thanks to DC Ops, contint1001 now has extra drives; how do we get them mounted? https://phabricator.wikimedia.org/T207707
 * Phabricator tweak for allowing "silenced" job runs by more RelEngers https://gerrit.wikimedia.org/r/c/operations/puppet/+/517140
 * Unknown team (?): wikimania-scholarships hosting needs to move to PHP7 so we can drop php56 from CI. https://phabricator.wikimedia.org/T224906
 * Blocking:
 * Updates:
 * Train Health
 * Last week: 1.34.0-wmf.10 - https://phabricator.wikimedia.org/T220735 (derailed due to blockers)
 * This week: 1.34.0-wmf.11 - https://phabricator.wikimedia.org/T220736
 * Next week: 1.34.0-wmf.12 - NO TRAIN, WMF HOLIDAY (4 July)
 * Code Health
 * Log Health
 * All: Input greatly wished for on the "Future of CI" planning document: https://lists.wikimedia.org/pipermail/wikitech-l/2019-June/092227.html
 * All: Input greatly wished for on the "Future of CI" planning document: https://lists.wikimedia.org/pipermail/wikitech-l/2019-June/092227.html

Callouts

 * Release Engineering
 * All: Input greatly wished for on the "Future of CI" planning document: https://lists.wikimedia.org/pipermail/wikitech-l/2019-June/092227.html
 * Unknown team (?): wikimania-scholarships hosting needs to move to PHP7 so we can drop php56 from CI. https://phabricator.wikimedia.org/T224906

Train status and happenings

 * https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Roles#Train_Conductor


 * New filtered fatal monitor dashboard including markers for scap deployments: https://logstash.wikimedia.org/app/kibana#/dashboard/77cc3e90-aa27-11e7-9109-51bd3197f7a9?_g=
 * Need to fix scap clean :\
 * thcipriani has a crappy fix in mind until http tokens in gerrit are back
 * Any idea when HTTP tokens will come back? Weeks? Months? Never? :-(
 * ~Weeks
 * 2019-05-06: cleaned up stuff last week on deploy hosts, just not the gerrit branches
 * 2019-05-13: …
 * 2019-06-03: upstream issues/patches we want resolved before doing this
 * cf: https://phabricator.wikimedia.org/T218750#5128424
 * looks like these patches merged -- I'll check what release they're going out with
 * 2019-06-10: upstream cutting new version with security fixes (hopefully) end of week, ETA early next week
 * 2019-06-17: gerrit 2.15.14 is out, need to build and release, hopefully this week
 * 2019-06-24: gerrit upgrade 19:00 UTC today, announce going out after

Quarterly Goals for Q4
https://www.mediawiki.org/wiki/Wikimedia_Technology/Goals/2018-19_Q4

TEC1 (Maint): Outcome 1 / Output 1.1

 * GOAL: Undeploy the CodeReview extension.
 * WHO: James, need help from CPT


 * James will ping CPT about this this week (April 8th)
 * … and again w/c 15 April.
 * … and again w/c 6 May (in SoS).
 * … and again w/c 27 May (in SoS).
 * [Recurring item]

TEC1 (Maint): Outcome 1 / Output 1.1

 * GOAL: Setup 1-3 of the CI WG options (Zuul v3, Argo, GitLab)
 * WHO: Lars


 * Gitlab:
 * https://wmf-gitlab3.vm.liw.fi/ is up and accepts registrations with wikimedia.org (and liw.fi) email addresses
 * Please play with it and tell Lars anything that seems iffy

TEC3 (Pipeline): Outcome 1 / Output 1.2

 * GOAL: Instrument Quibble for data collection
 * WHO: Mukunda, Antoine


 * Blocked

TEC3 (Pipeline): Outcome 1 / Output 1.2

 * GOAL: Create a graph where time is spent and make a prioritized list for improvements.
 * WHO: Mukunda, Antoine


 * Blocked

TEC3 (Pipeline): Outcome 1 / Output 1.2

 * GOAL: Prepare the Deployment Pipeline for changes to our CI tooling.
 * WHO: ???, ???


 * Blocked by not having new CI tooling yet

TEC3 (Pipeline): Outcome 3 / Output 3.1

 * GOAL: Create a .pipeline/config.yaml standard to give users more control over how their tests are run in the pipeline and allow the easy saving of artifacts at pipeline completion. (RelEng)
 * WHO: Dan, Tyler, ???

✅

TEC3 (Pipeline): Outcome 3 / Output 3.1

 * GOALS:
 * Adopt more services into Deployment pipeline -
 * Wikidata Termbox SSR, Kask for Session Storage Service, cpjobqueue (stretch), ORES (stretch)
 * WHO: Dan, Tyler, Lars

There are tasks: https://phabricator.wikimedia.org/T220403


 * Wikidata Termbox SSR


 * Kask for Session Storage Service


 * cpjobqueue (stretch)
 * ❌ -> later


 * ORES
 * cf: Dan's comments
 * ❌ -> later

TEC12 (DevProd): Outcome 1 / Output 1.1

 * GOAL: Provide an "Official" Docker base image for local development of MediaWiki based on the production tooling.
 * WHO: Jeena, Brennen
 * https://phabricator.wikimedia.org/T212449


 * Done for MediaWiki, for some values of "done" and "MediaWiki". Production-likeness needs considerable work.

TEC13 (Code Health): Outcome 1 / Output 3

 * GOALs: Presentation/session(s) at the Wikimedia Hackathon on the current state of Code Health projects (technical debt and code stewardship)
 * WHO: JR

✅

TEC13 (Code Health): Outcome 1 / Output 1.1

 * GOAL:
 * Publish a re-imagination of the Review Queue process.
 * Develop and implement metrics around task and code-review responsiveness
 * WHO: Greg, JR (and Andre)


 * Review Queue
 * Blocked on Greg time


 * Task and code-review responsiveness metrics
 * No progress

TEC13 (Code Health): Outcome 4 / Output 4.2

 * GOALs:
 * Expand SonarQube reporting into CI infrastructure
 * Perform SonarQube analysis on all extensions
 * Engage user communities in direct feedback solicitation
 * WHO: JR, Zeljko, Code Health Metrics

Release MW 1.33

 * Handed off to Reedy along with security releases.

Gerrit

 * 2.15.14 deploy

Phabricator

 * Built a new sshd with a patch applied for https://phabricator.wikimedia.org/T224677 - need to test the fix then get sre to upload the package

SCAP

 * Enhance MediaWiki deployments for support of php7.x

Antoine

 * What I did last week
 * Reviewed Quibble patches
 * assisted with jessie/php5.5 phase out
 * random maintenance
 * What I plan to do this week
 * Work on phasing out injection of extension/skin dependencies by CI https://phabricator.wikimedia.org/T220199 and others
 * Puppet cleanup following up conversation with Brennen/Tyler https://gerrit.wikimedia.org/r/#/q/bug:T225735 and https://phabricator.wikimedia.org/T225735
 * What I'm blocked on
 * contint1001 partitionning, need RAID/LVM to be setup + FS formatting https://phabricator.wikimedia.org/T207707
 * Other?
 * Get docker-ce uploaded and start rebuilding Jenkins agents to Stretch, with less RAM and s/slave/agent/ https://phabricator.wikimedia.org/T226233
 * Early leave on Friday. Volunteer for the school party and have to fry 40kg of french fries

Brennen

 * What I did last week
 * trigger agent stuff merged
 * Discussion about local dev interface, starting messing with golang
 * Merged docker-installation removal
 * Read about dependency resolution
 * Tested some minor local-charts changes
 * Paired on pipeline validation with Tyler, began getting head around pipelinelib
 * What I plan to do this week
 * Turn pipeline stage steps into objects with run/validate methods
 * Discussion with tech writer?
 * Prepare something useful for Docker SIG on Thursday / advertise its continued existence
 * Read chapter 12
 * Take better notes
 * What I'm blocked on
 * Other?
 * Other?

Dan

 * What I did last week
 * What I plan to do this week
 * What I'm blocked on
 * Other?
 * What I'm blocked on
 * Other?
 * Other?

Greg

 * What I did last week
 * Security council meeting kickoff
 * Met new Test Engineer for Language
 * CTO candidate Meet and Greet (more this week)
 * EngProd management thinking/planning
 * TechConf19 PC nominations
 * Phabricator workboard skeleton creation for the future
 * What I plan to do this week
 * CTO candidate meet and greets
 * DockerSIG
 * Internal Comms discussion
 * EngProd management planning
 * Phabricator workboard massaging/triaging with the new skeleton/framework
 * What I'm blocked on
 * Other?
 * Other?

James

 * What I did last week
 * Landed the unit/integration test framework split in MediaWiki, yay. Now to use it!
 * Migrated all php70 generic jobs to php72 except phan-seccheck (blocked externally) https://phabricator.wikimedia.org/T225457
 * Some gentle killing off of php55, php56, and jessie.
 * [10%] Launched new TimedMediaHandler beta feature (yay) https://phabricator.wikimedia.org/T148103
 * What I plan to do this week
 * [Again] Fixing quibble node10 follow-up for MobileFrontend https://phabricator.wikimedia.org/T224997
 * [Again] Building a proof of concept of shims in WikimediaMessages so we can undeploy things better: https://phabricator.wikimedia.org/T222918
 * What I'm blocked on
 * Blocked by SRE ServiceOps (migrate contint1001 to stretch -> php7x): Dropping php56 CI testing https://phabricator.wikimedia.org/T224906
 * Blocked by Security (make phan-seccheck php72 compat): Migrating CI phan jobs over to php72 https://phabricator.wikimedia.org/T207344
 * [low priority] Blocked by SRE Traffic (scary VCL management): Dropping WikipediaZero https://phabricator.wikimedia.org/T213769
 * [low priority] Blocked by Core Platform (dump of review commennts): Dropping CodeReview https://phabricator.wikimedia.org/T116948


 * Other?

Jean-Rene

 * What I did last week
 * got Code Review WG kickoff setup
 * continued reviews of Code Stewardship requests.
 * started drafting up statement re: Code Stewardship of Betacuster (after chat with Tyler)
 * CTO Cadidates M/G
 * What I plan to do this week
 * Quality and Test Engineering team planning/kickoff
 * more Code Stewardship work
 * Extending Code Health pipeline to 3 extensions (MobileFrontend, Minerva, and Popups)
 * CTO Cadidates M/G
 * What I'm blocked on
 * Other?
 * Other?

Jeena

 * What I did last week
 * Read book
 * bug fix for local-charts
 * deployment-charts migration work/discussion
 * meet and greet thing
 * Some other stuff I can't remember
 * What I plan to do this week
 * Train train
 * choo choo
 * meet and greet
 * Learn about beta environment?
 * What I'm blocked on
 * Waiting for someone in SRE to take a look at my deployment-charts work
 * Other?

Lars

 * What I did last week
 * Worked on v2 of CI arch doc, incorporating a lot of feedback
 * Arranged travel for Wikimania
 * Read annual plan parts relevant for RelEng and updated my goals for Q1
 * Reviewed new Acceptable Use Policy, made sure my personal devices have no access, excepto calendar and IRC
 * Participated in a meet & greet with CTO candidate Scott Noteboom
 * What I plan to do this week
 * Finish reading ch 13, attend book club
 * Finish v2 of CI arch doc
 * Plan (and document) CI arch implementation around GitLab
 * Attend meet & greets with the two other CTO candidates
 * What I'm blocked on
 * Other?
 * I like the what-I-did-last-week part
 * I like the what-I-did-last-week part

Mukunda

 * What I did last week
 * Built a new sshd with a patch applied for https://phabricator.wikimedia.org/T224677
 * Played around with kubernetes (fun!)
 * Did a bit of testing with go cobra command: https://github.com/spf13/cobra
 * Train
 * Read a book
 * What I plan to do this week
 * Read a book
 * Train (With Jeena)
 * https://phabricator.wikimedia.org/T224677 - need to test the fix then get SRE to upload the package
 * What I'm blocked on
 * Train blockers: https://phabricator.wikimedia.org/T220736
 * Other?

Tyler

 * What I did last week
 * Planned
 * Prep Gerrit 2.15.14 release, ❌ deploy Gerrit 2.15.14
 * ❌ Remove barricade v2 lucene dependency (with dcausse)
 * ❌ Finish Blubberoid policy file work
 * get up-to-date on the lib/extension dependency work -- meeting didn't happen, but I know the background, I think
 * Other work
 * Started work on .pipeline/config.yaml validation -- crammed things into my brain with brennen
 * What I plan to do this week
 * Deploy Gerrit 2.15.14
 * Turn back on HTTP auth + announce
 * Bump blubber version (to finish policy file work)
 * lib/extensions deps meeting
 * More talking about Enhance MediaWiki deployments for support of php7.x
 * Cannot assign user name "XXX" to account ####; name already in use. https://phabricator.wikimedia.org/T216605
 * What I'm blocked on
 * scap 3.10.0-1
 * Other?

Zeljko

 * What I did last week
 * What I plan to do this week
 * What I'm blocked on
 * Other?
 * What I'm blocked on
 * Other?
 * Other?

Team Kanban Board Review and Triage

 * closed and touched in the 7 days
 * No update for 4 weeks
 * No update for 3 weeks
 * No update for 2 weeks
 * No update for 1 week
 * All Open
 * Review To Triage column of #releng
 * Assigned
 * Unassigned

Once / month-ish review of backlog(s)

 * releng Review To Triage column of #releng
 * releng-kanban Review unassigned in kanban
 * releng-kanban Review 'backlog' colum of -kanban
 * releng-next - Review for things we need to put on our kanban backlog
 * releng-backlog - oh my, the huge backlog of things...

Kanban stats

 * Burnup chart