Wikimedia Release Engineering Team/Checkin archive/20190220

= 2019-02-20 =

Vacations/Important dates

 * https://office.wikimedia.org/wiki/HR_Corner/Holiday_List
 * How to do it


 * February 19 - March 1 - Dan, vacation
 * March 11 (WMF Holiday) - US Staff
 * April 22 (WMF Holiday) - US Staff
 * April 22-27: Team offsite in Chicago
 * April 22nd - Antoine, Easter - we're flying to Chicago?
 * May 1st - Antoine and Željko, Labor Day / May Day
 * May 8th - Antoine, 1945 victory
 * May 17-19 - Wikimedia Hackathon 2019 (Prague, Czechia)
 * May 30th-31th - Antoine, Feast of the Ascension
 * June 10th - Antoine, Pentecost -- see https://en.wikipedia.org/wiki/Eastertide for Antoine/France Easter holidays
 * May 27 (Memorial Day) - US Staff
 * June 19 (Juneteenth) - US Staff

Train

 * Maniphest query for deployment blocker tasks: https://phabricator.wikimedia.org/maniphest/query/s3KW8bpsXhYF/#R


 * Jan 07 - wmf.12 - Dan
 * Jan 14 - wmf.13 - Dan
 * Jan 21 - wmf.14 - Mukunda
 * Jan 28 - wmf.15 - No Train (All Hands)
 * Feb 04 - wmf.16 - Mukunda
 * Feb 11 - wmf.17 - Tyler
 * Feb 18 - wmf.18 - Tyler
 * Feb 25 - wmf.19 - Antoine
 * Mar 04 - wmf.20 -
 * Mar 11 - wmf.21 -
 * Mar 18 - wmf.22 -
 * Mar 25 - wmf.23 -
 * Apr 01 - wmf.24 -
 * Apr 08 - wmf.25 -

SoS

 * Zeljko 4eva! :)

Book club

 * https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Book_club
 * March 4th: discuss Part I plus Chapter 5.
 * TWO MORE WEEKS!!!!1eleven

Spring Offsite

 * Location: Chicago, IL (Central timezone, UTC-5 while we're there)
 * Dates: Arrive Monday 4/22, Depart Saturday 4/27.
 * BOOK FLIGHTS BY
 * Activity day: Send your suggestions to me if you have them :) I'll make the voting spreadsheet later.
 * Chicago Bulls!!!11!oneone
 * April 10 -- Regular Season ends, so only if they're good this year :)
 * I've heard there's good pizza :P
 * I'm sure we'll have some of that for our dinners, unless you want to do a cooking class :)
 * Greenfield park conservatory?
 * Museum of Science and Industry - https://www.msichicago.org/
 * Program: Haven't started yet :)
 * Any American sport would be fun (basketball, football, baseball..) (Lars doesn't like watching sports, but would be happy to sit somewhere quite for the duration) (thcipriani: baseball isn't so much about watching baseball :))

Technical Advice IRC Meetings

 * What: Joint WMF/WMDE lead IRC advice/q&a session. See: https://docs.google.com/document/d/1kXE2k6nM_eyIzcFHkU-d8G2asD5EpgYc64XiKLFgSAI/edit
 * When: every Wednesday at 16:00 UTC (we always keep the meeting at 17:00 MEZ)
 * Signup to co-host: https://docs.google.com/spreadsheets/d/1ExZWzV8vQJ6WQbQrhpzBggGia9Tr7SrpbSfNjOOeosw/edit#gid=0

Monthly reflection on accomplishments

 * Let's start keeping a list of accomplishments we've had over the last month (instead of monthy or weekly)
 * Purpose: helps with morale :) and can be a way of identifying good blog post/other ways of showcases
 * blubber uses blubberoid.wikimedia.org in the pipeline and pipeline is almost there for end-to-end functionality (can't yet deploy to production, but nearly can)
 * scap development back on gerrit -- new contributors
 * local-charts repo created
 * docker SIG announced/setup
 * Developer satisfaction survey results https://www.mediawiki.org/wiki/Developer_Satisfaction

beta-update-databases-eqiad still failing

 * https://integration.wikimedia.org/ci/view/Beta/job/beta-update-databases-eqiad/

Instances hosting the master and slave MySQL database crashed last week.


 * slave: https://phabricator.wikimedia.org/T216067
 * master: https://phabricator.wikimedia.org/T216635 (instance got recovered via https://phabricator.wikimedia.org/T216404 )
 * Innodb crashed:
 * The log sequence numbers 212565209189 and 212565209189 in ibdata files do not match the log sequence number 233682420105 in the ib_logfiles!

"What does the Pipeline mean for X?"

 * Beta Cluster?
 * https://phabricator.wikimedia.org/T215217
 * What I said on task: "Things will be changing with what is possible and what is needed as we migrate more and more parts of our infrastructure to the Deployment Pipeline. We (RelEng and SRE) should scope out what that is and how that impacts Beta Cluster in the short, medium, and long term (read: that's the conversation that should happen to move this stewardship review forward)."

Questions

 * How long will it get worse?
 * What is the replacement?
 * What to do if it breaks fatally before the replacement is ready?
 * How does this fit with the Staging/Canaries work?

discussion

 * If one of the outcome of stewardship is that beta continues to exist, we need to better define the use-cases that beta will support. We need to break apart the use-cases: i.e., new use-cases covered by "staging", x-use-cases covered by "beta"
 * More resources not necessarily going to solve this until we sort use-cases
 * Some of the use-cases identified during discussion with SRE re:staging -- might be a good next step
 * Greg does not volunteer :)
 * ACTION: thcipriani: bring up the eventual "staging" -- what that means -- during cross-team meeting


 * CI "k8s cluster"
 * See email from Alexandros I forwarded to the team list
 * Nebulous specs from Antoine last year: https://docs.google.com/document/d/1IV4bprNRDWBX-OZHZC5tS1-c7GCY3HEfoblqMT_CRJs/edit
 * This would be *instead of* the WMCS VPS usage.
 * Lars: Does the dependent builds thing affect this?
 * original idea: not running docker on VMs in WMCS and run containers on k8s instead
 * How will the pipeline build images?
 * Unclear
 * still ideal to move off of WMCS infra
 * Blocked on unknowns related to zuul
 * Currently have a legacy fork of upstream zuulv2
 * migration to zuulv3 is a large overhaul
 * Requires nodepool, zookeeper, etc
 * Make zuulv3 move to k8s
 * Open question: does this involve some k8s cluster?


 * TODO Make a group to make a decision on this
 * Lars
 * 
 * Guest speaker on Zuulv3: Paladox?

Scrum of Scrums

 * Greg to copy to etherpad after meeting: https://etherpad.wikimedia.org/p/Scrum-of-Scrums

Incoming from last week

 * Blocking:
 * Multimedia: Adding WikibaseMediaInfo to the gate; patch is https://gerrit.wikimedia.org/r/c/integration/config/+/480463 and needs deployment

Release Engineering

 * Blocked by:
 * Blocking:
 * Core Platform: Flaky quibble-vendor-mysql-hhvm-docker test in Jenkins https://phabricator.wikimedia.org/T216069
 * Editing: https://phabricator.wikimedia.org/T216045#4966360 Testing of everything is stalled due to Beta cluster being read-only/down
 * Search Platform: CI for WikibaseLexemeCirrusSearch https://gerrit.wikimedia.org/r/c/integration/config/+/490792
 * Fundraising Tech: we might reach out to rel-eng this week if we need help updating our fundraising-branch tests to REL1_31 and composer merge plugin
 * Updates:
 * Developer satisfaction survey results https://www.mediawiki.org/wiki/Developer_Satisfaction
 * Train Health:
 * Last week: 1.33.0-wmf.17 - https://phabricator.wikimedia.org/T206671
 * This week: 1.33.0-wmf.18 - https://phabricator.wikimedia.org/T206672
 * Thanks to folks who responded to wmf.18 train email: Mainframe98, Krinkle, JForrester, Thiemo, Anomie, AOtto!
 * Next week: 1.33.0-wmf.19 - https://phabricator.wikimedia.org/T206673
 * Log Health:
 * Code Health:
 * Notable updates:

Callouts

 * Release Engineering

Train status and happenings

 * https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Roles#Train_Conductor

Quarterly Goals for Q3
https://www.mediawiki.org/wiki/Wikimedia_Technology/Goals/2018-19_Q3

TEC1 (Maint): Outcome 1 / Output 1.1

 * GOAL: Automate the generation of change log notes
 * WHO: Mukunda, (Tyler on backup)


 * Still want to trigger on branch creation of mw/core...still in large list of TODOs

TEC1 (Maint): Outcome 1 / Output 1.1

 * GOAL: Investigate notification methods for developers with changes that are riding any given train
 * WHO: Mukunda, Tyler


 * No movement this week

TEC3 (Pipeline): Outcome 1 / Output 1.2

 * GOAL: Instrument Quibble for data collection
 * WHO: Mukunda, Antoine

TEC3 (Pipeline): Outcome 1 / Output 1.2

 * GOAL: Create a graph where time is spent and make a prioritized list for improvements.
 * WHO: Mukunda, Antoine

TEC3 (Pipeline): Outcome 2 / Output 2.1

 * GOAL: Select and integrate a code health metric solution into our tooling.
 * WHO: JR, ...


 * Pending code metrics workgroup work

TEC3 (Pipeline): Outcome 3 / Output 3.1

 * GOALS:
 * Adopt more services into Deployment pipeline -
 * cxserver, ORES (partially), citoid, changeprop, cpjobqueue (stretch)
 * Deploy eventgate
 * WHO: Dan, Tyler, Lars


 * citoid, eventgate: have images built via the pipeline
 * just merged cxserver move to pipeline
 * feedback from pipeline in progress

TEC12 (DevProd): Outcome 1 / Output 1.1

 * GOAL: Conduct interviews with development stakeholders and compile a report that informs future work creation of a rubric.
 * WHO: Jeena, Mukunda


 * Results are posted: https://www.mediawiki.org/wiki/Developer_Satisfaction

TEC13 (Code Health): Outcome 1 / Output 1.1

 * GOALs:
 * Develop and communicate guidelines and best practices for successful Code Stewardship.
 * (Continued from Q2) Update/refresh review queue (review process for initial code deployment)
 * WHO: JR

minor progress

TEC13 (Code Health): Outcome 2 / Output 2.2

 * GOAL: 5 of the 15 prioritized repositories have at least 1 end-to-end test -
 * WHO: Zeljko

TEC13 (Code Health): Outcome 2 / Output 2.3

 * GOALs:
 * Evolve/develop tools and processes to support the PE refactoring effort to improve code health.
 * Develop common test strategy that enable teams to engage in more effective and efficient testing practices. (maybe should be output 2.4?)
 * WHO: JR, Core Platform Team

Met up with CPT last week to discuss unti testing and code coverage tooling/process. Next steps defined.

TEC13 (Code Health): Outcome 3 / Output 3.2

 * GOALs:
 * Speak at All Hands on the status of Technical Debt
 * Engage and coach development teams on their approach to managing technical debt.
 * WHO: JR, Core Platform Team

No progress

TEC13 (Code Health): Outcome 4 / Output 4.1

 * GOALs: Code Health Dashboard with 50% of repositories covered.
 * WHO: JR, Core Platform Team

Core platform codebase now included in SonarQube POC.

Selenium

 * T216424 The first Selenium test for ContentTranslation - https://gerrit.wikimedia.org/r/c/mediawiki/extensions/ContentTranslation/+/491448

Phabricator

 * I Spent some time over the weekend experimenting with running phabricator in kubernetes
 * Most of this was time spent learning all of the tooling: minikube, kubectl and helm
 * Limited success! Got a vanilla phabricator container running from helm

QA/Code Health
Dom joined the foudnation as a QA engineer.

Discussions starting up again re: Code Maintenance. How to properly plan and resource for the ongoing work. Corey and Marcella driving the discussion.

Antoine

 * What I plan to do this week
 * What I'm blocked on
 * Other?
 * Other?
 * Other?

Brennen

 * What I plan to do this week
 * local-charts docs changes based on https://gerrit.wikimedia.org/r/c/releng/local-charts/+/491566 feedback
 * Refine Debian / Ubuntu install process for local-charts
 * i.e., it's broken again because I installed VirtualBox on the same machine, fixing that behavior
 * Much reading
 * What I'm blocked on
 * Other?
 * Other?

Dan

 * What I plan to do this week
 * Vacation!
 * What I'm blocked on
 * Other?
 * Other?

Greg

 * What I plan to do this week
 * ISOSSTWG results, maybe
 * Read the book
 * Get James approved for offsite
 * Review queue brain dump
 * Schedule retro of ExternalGuidance deployment
 * What I'm blocked on
 * Is everyone selected for the Hackathon going to go? (JR, Zeljko (yeah), James (yes), Greg, Jeena)
 * Other?

James

 * What I plan to do this week
 * Helped Krinkle with Fresnel/performance testing
 * Still reading into docker/CD stuff
 * What I'm blocked on
 * Other?
 * Out next week.
 * Out next week.

Jean-Rene

 * What I plan to do this week
 * Code Stewardship reviews continued
 * Code Stewardship best practices
 * What I'm blocked on
 * Other?
 * Other?

Jeena

 * What I plan to do this week
 * Add restbase to local charts
 * document install process on Mac for local charts
 * read book
 * What I'm blocked on
 * Other?
 * Other?

Lars

 * What I plan to do this week
 * Set up and run Quibble locally on my laptop
 * Skim Quibble source code, see about instrumenting it to see where time is spent
 * Read CD book
 * Read Go book
 * What I'm blocked on
 * Other?
 * Not getting Phabricator notification emails about new comments to tickets I'm subscribed to - is this normal?
 * Not getting Phabricator notification emails about new comments to tickets I'm subscribed to - is this normal?

Mukunda

 * What I plan to do this week
 * Release MediaWiki 1.32.1
 * Release is done, just need to announce it ( https://phabricator.wikimedia.org/T213595 )
 * Deploy phabricator update!
 * This hasn't happened for a long time due to a long list of interruptions and delays: difficult to resolve merge conflicts, followed by offsite, holidays, all-hands, moving into my new place and a broken local test environment.
 * Woohoo.
 * Lots of good upstream changes are incoming, so we should get some nice new functionality.
 * I intend to write a phame blog post covering some key changes.
 * Also continue to work on phabricator in kubernetes for local dev / test environment.
 * What I'm blocked on
 * Other?
 * Other?

Tyler

 * What I plan to do this week
 * train
 * scap release
 * scap local dev stuff (pairing friday)
 * maybe
 * pipelinelib gerrit commenting
 * branch notes on branch creation
 * What I'm blocked on
 * Other?
 * Other?

Zeljko

 * What I plan to do this week
 * T206621 5 of the 15 prioritized repositories have at least 1 end-to-end test
 * T214478 The first Selenium test for AbuseFilter
 * T216424 The first Selenium test for ContentTranslation - https://gerrit.wikimedia.org/r/c/mediawiki/extensions/ContentTranslation/+/491448
 * T214480 The first Selenium test for TimedMediaHandler
 * T204068 QA: Automation Testing - port Echo Notification tests to Node.js
 * T207046 Code health metrics spike
 * What I'm blocked on
 * Other?
 * Other?

Team Kanban Board Review and Triage

 * closed and touched in the 7 days
 * No update for 4 weeks
 * No update for 3 weeks
 * No update for 2 weeks
 * No update for 1 week
 * All Open
 * Review To Triage column of #releng
 * Assigned
 * Unassigned

Once / month-ish review of backlog(s)

 * releng Review To Triage column of #releng
 * releng-kanban Review unassigned in kanban
 * releng-kanban Review 'backlog' colum of -kanban
 * releng-next - Review for things we need to put on our kanban backlog
 * releng-backlog - oh my, the huge backlog of things...

Kanban stats

 * Burnup chart