Wikimedia Release Engineering Team/Checkin archive/20181022

= 2018-10-22 =

Vacations/Important dates

 * https://office.wikimedia.org/wiki/HR_Corner/Holiday_List
 * How to do it


 * October 21-28 - Greg in Portland for TechConf+TechMgrs F2F
 * November 1 (Thursday) - Holiday (All Saints' Day - Željko)
 * November 12th - Holiday (Veteran's Day, Observed)
 * November 22+23 - Holidays (Thanksgiving)
 * November 25-december 2nd: Mukunda vacation (in California ahead of the offsite)
 * Week of December 3rd - Team offsite
 * December 24-28 - Holidays (Christmas)

Train

 * Maniphest query for deployment blocker tasks: https://phabricator.wikimedia.org/maniphest/?project=PHID-PROJ-fmcvjrkfvvzz3gxavs3a&statuses=open%28%29&group=none&order=newest#R


 * Oct 08 - wmf.25 - Dan (No train due to DC switchover)
 * Oct 15 - wmf.26 - Mukunda (last 1.32 wmf.XX release, 1.33 starts the next week)
 * Oct 22 - wmf.1 - Mukunda  < (warning, TechConf happening, ping Greg if you need responses from anyone there...)
 * Oct 29 - wmf.2 - Tyler
 * Nov 05 - wmf.3 - Tyler
 * Nov 12 - wmf.4 - Antoine
 * Nov 19 - wmf.5 - No Train (Thanksgiving)
 * Nov 26 - wmf.6 - Antoine
 * Dec 03 - wmf.7 - No Train (Offsite)
 * Dec 10 - wmf.8 - Zeljko
 * Dec 17 - wmf.9 - Zeljko
 * Dec 24 - wmf.10 - No Train (Holiday break)
 * Dec 31 - wmf.11 - No Train (Holiday break)
 * Jan 07 - wmf.12 - Dan
 * Jan 14 - wmf.13 - Dan
 * Jan 21 - wmf.14 - Mukunda
 * Jan 28 - wmf.15 - No Train (All Hands)
 * Feb 04 - wmf.16 - Mukunda
 * Feb 11 - wmf.17 - Tyler
 * Feb 18 - wmf.18 - Tyler
 * Feb 25 - wmf.19 - Antoine

SoS

 * Oct 10 - Zeljko
 * Oct 17 - Zeljko
 * Oct 24 - Zeljko <
 * Oct 31 - Zeljko
 * Nov 07 - Zeljko
 * Nov 14 - Zeljko
 * Nov 21 - Zeljko
 * Nov 28 - Zeljko
 * Dec 05 - Zeljko
 * Dec 12 - Zeljko
 * Dec 19 - Zeljko
 * Dec 26 - Zeljko
 * Jan 02 - Zeljko
 * Jan 09 - Zeljko
 * Jan 16 - Zeljko
 * Jan 23 - Zeljko
 * Jan 30 - Zeljko
 * Feb 06 - Zeljko
 * Feb 13 - Zeljko
 * Feb 20 - Zeljko
 * Feb 27 - Zeljko

Hiring

 * Software Engineer position open and reviewing/hiring for now
 * https://boards.greenhouse.io/wikimedia/jobs/1225258

Circle up meeting was at 8am Pacific today. Greg may not be able to update this etherpad after with update before he runs off to the conference.

First Offsite
Details:
 * Week of December 3rd
 * At the Queen Mary hotel in Long Beach
 * Deb T will be facilitating

Topics!
 * https://etherpad.wikimedia.org/p/RelEng-Offsite-201811-Topics

Needs attention

 * gerrit security release 2018-10-08
 * https://groups.google.com/forum/m/#!topic/repo-discuss/eH0iLt2XawU
 * jGit update, we are unaffected
 * may want to hold off until next week: https://bugs.chromium.org/p/gerrit/issues/detail?id=9836
 * 2018-10-15 -- paladox tells me they're working on a fix and should have a 2.15.6 tagged Soon™
 * 2018-10-22 -- jGit updated to fix leaks https://gerrit-review.googlesource.com/c/gerrit/+/201273


 * deploy1001:/srv/mediawiki out of date?
 * https://phabricator.wikimedia.org/T207602
 * Found because the Security team noticed that a previously deployed security patch was no longer deployed, should sync up with them this week about that (Reedy or Brian)
 * See: https://phabricator.wikimedia.org/T207600
 * no idea, thcipriani will look, I guess

Scrum of Scrums

 * Greg to copy to etherpad after meeting: https://etherpad.wikimedia.org/p/Scrum-of-Scrums

Incoming from last week

 * Blocking:
 * Fundraising Tech: CRM tests still regularly failing due to full mysql partition on integration hosts. Possible fix noted by Eileen on https://phabricator.wikimedia.org/T205950

Release Engineering

 * Blocked by:
 * Blocking:
 * Updates:
 * Train Health:
 * Train is blocked on group0
 * WDQS problems
 * For callouts: revision-create event date format has changed (we think) -- new format example "2018-10-24T00:28:24.162300+00:00" -- if you know anything chime in on task please
 * Last week: 1.32.0-wmf.26 deployment blockers https://phabricator.wikimedia.org/T191072 - all good
 * This week:
 * Next week:
 * Log Health:
 * Code Health:
 * Log Health:
 * Code Health:

Callouts

 * Release Engineering

Train status and happenings

 * https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Roles#Train_Conductor

TEC1 (Maint): Outcome 1 / Output 1.1

 * GOAL: Determine the procedure and requirements for an automated MediaWiki branch cut.
 * WHO: Mukunda, Tyler, Antoine

- Filed task for figuring out job storage for releases-jenkins

TEC3 (Pipeline): Outcome 1 / Output 1.2

 * GOAL: Formalize the collection of CI infrastructure and tooling metrics
 * WHO: Dan, Antoine

TEC3 (Pipeline): Outcome 2 / Output 2.3

 * GOAL: Develop set of metrics to assess incident reports/post mortems -
 * WHO: Greg, Zeljko

https://docs.google.com/spreadsheets/d/1AUqMgzThBHNL7DgI8C9PO_YQ1oD5CSd0iWvcVbowzdg/edit#gid=1154483822

TEC3 (Pipeline): Outcome 3 / Output 3.1

 * GOALS:
 * Adopt more services into Deployment pipeline -
 * Migrate graphoid to the Deployment pipeline
 * Deploy zotero v2 to the Deployment pipeline
 * Deploy blubberoid
 * WHO: Dan, Tyler, Lars

- docker-registry.wikimedia.org/wikimedia/mediawiki-services-zotero:20181019165254-production exists - still working on OpenAPI for blubber -- reached crisis around config format and validation (i.e. if we had used JSON, api spec and validation would be easier, unfortunately)

TEC12 (DevProd): Outcome 2 / Output 2.1

 * GOAL: The Annual Developer Productivity Survey results are synthesized and shared, creating a first year baseline.
 * WHO: Mukunda, Greg


 * We got a response from Legal, I need to finalize with them this week and create the survey in google doc forms.

TEC13 (Code Health): Outcome 1 / Output 1.1

 * GOAL: Update/refresh review queue (review process for initial code deployment)
 * WHO: JR

little progress.

TEC13 (Code Health): Outcome 2 / Output 2.2

 * GOAL: 5 of the 15 prioritized repositories have at least 1 end-to-end test -
 * WHO: Zeljko

made good progress with data analysis/transformation approaches.

TEC13 (Code Health): Outcome 2 / Output 2.3

 * GOAL: Assess Platform unit test practices and define improvement plan
 * WHO: JR, Core Platform Team

no progress

TEC13 (Code Health): Outcome 3 / Output 3.2

 * GOAL: Core Platform and Search Platform teams are using TDM PoC
 * WHO: JR, Core Platform Team

no progress

TEC13 (Code Health): Outcome 3 / Output 3.4

 * GOALs:
 * Identify key Tech Debt areas
 * Put in place Tech Debt management process for PEP
 * WHO: JR, Core Platform Team

no progress

TEC13 (Code Health): Outcome 4 / Output 4.1

 * GOAL: Metrics defined and deployed for all 4 Code Health areas.
 * WHO: JR, Code Health Metrics Working Group

task breakdown, discussion and investigations re tools.

Phabricator

 * Fixed the deployment blocker task plugin so the task series goes in the right order
 * But wait, there's moar! The link to submit a blocking subtask uses the custom form created by Krinkle.

SCAP
Still investigation: https://phabricator.wikimedia.org/T121597#4652873
 * "probably not scap", maybe eval.php or some with stderr
 * next....

Antoine

 * What I plan to do this week
 * What I'm blocked on
 * Other?
 * I wrote the first draft about testing extensions. It is still too rough to be shared, hoping to feel better to enhance it
 * operations/dns repository has a test running on Jessie which no more has the latest version of our DNS server (gdnsd). We would want to migrate to a Docker container.
 * I got DonationInterface tests to pass with Quibble, they need to update the composer merge plugin and their mediawiki/vendor branch. I have piked Ejegg about it but have to push harder.
 * Wikibase, gotta audit tests run with Quibble and the ones run by the old jobs. If that matches, we can drop the old jobs (they run on Nodepool).
 * I got DonationInterface tests to pass with Quibble, they need to update the composer merge plugin and their mediawiki/vendor branch. I have piked Ejegg about it but have to push harder.
 * Wikibase, gotta audit tests run with Quibble and the ones run by the old jobs. If that matches, we can drop the old jobs (they run on Nodepool).

Dan

 * What I plan to do this week
 * Continue working on Blubber and the JSON conundrum
 * What I'm blocked on
 * Other?
 * Other?

Greg

 * What I plan to do this week
 * Platform Evolution Program's offsite on Sunday (yesterday)
 * TechConf Mon-Thur
 * Tech-Mgrs F2F Fri&Sat
 * Hiring
 * What I'm blocked on
 * Other?
 * Other?

Jean-Rene

 * What I plan to do this week
 * Update/refresh review queue
 * Metrics WG stuff
 * What I'm blocked on
 * Other?
 * Other?
 * Other?

Lars

 * What I plan to do this week
 * Learn more about how the deployment pipeline works, and should work
 * Update my write-up of that and put it in a suitable wiki
 * What I'm blocked on
 * Head exploding from cramming too much information too fast
 * Other?
 * Setting up home office

Mukunda

 * What I plan to do this week
 * Train
 * Finish dev productivity survey
 * Help tyler code-review keyholder
 * Scap pre-deployment checks - https://phabricator.wikimedia.org/T121597


 * What I'm blocked on
 * Other?
 * Other?

Tyler

 * What I plan to do this week
 * crush the review of keyholder, finally
 * investigate scap pull
 * keep eye on gerrit upgrade/mailing list
 * investigate jenkins per-job security
 * poke zotero pipeline work where possible
 * What I'm blocked on
 * Other?
 * Other?

Zeljko

 * What I plan to do this week
 * Finish T199133 Find top 15 target projects that could use Selenium tests to prevent incidents
 * What I'm blocked on
 * Other?
 * Other?

Team Kanban Board Review and Triage

 * closed and touched in the 7 days
 * No update for 4 weeks
 * No update for 3 weeks
 * No update for 2 weeks
 * No update for 1 week
 * All Open
 * Review To Triage column of #releng
 * Assigned
 * Unassigned

Once / month-ish review of backlog(s)

 * releng Review To Triage column of #releng
 * releng-kanban Review unassigned in kanban
 * releng-kanban Review 'backlog' colum of -kanban
 * releng-next - Review for things we need to put on our kanban backlog
 * releng-backlog - oh my, the huge backlog of things...

Kanban stats

 * Burnup chart