Wikimedia Release Engineering Team/Checkin archive/20181015

= 2018-10-15 =

Vacations/Important dates

 * https://office.wikimedia.org/wiki/HR_Corner/Holiday_List
 * How to do it


 * Beginning October - Mid october, Antoine to take off some weeks/days/part time (October 1-14 according to https://phabricator.wikimedia.org/E40)
 * October 21-28 - Greg in Portland for TechConf+TechMgrs F2F
 * November 1 (Thursday) - Holiday (All Saints' Day - Željko)
 * November 12th - Holiday (Veteran's Day, Observed)
 * November 22+23 - Holidays (Thanksgiving)
 * November 25-december 2nd: Mukunda vacation (in California ahead of the offsite)
 * Week of December 3rd - Team offsite
 * December 24-28 - Holidays (Christmas)

Train

 * Maniphest query for deployment blocker tasks: https://phabricator.wikimedia.org/maniphest/?project=PHID-PROJ-fmcvjrkfvvzz3gxavs3a&statuses=open%28%29&group=none&order=newest#R


 * Oct 08 - wmf.25 - Dan (No train due to DC switchover)
 * Oct 15 - wmf.26 - Mukunda  < (last 1.32 wmf.XX release, 1.33 starts the next week)
 * Oct 22 - wmf.1 - Mukunda (warning, TechConf happening, ping Greg if you need responses from anyone there...)
 * Oct 29 - wmf.2 - Tyler
 * Nov 05 - wmf.3 - Tyler
 * Nov 12 - wmf.4 - Antoine
 * Nov 19 - wmf.5 - No Train (Thanksgiving)
 * Nov 26 - wmf.6 - Antoine
 * Dec 03 - wmf.7 - No Train (Offsite)
 * Dec 10 - wmf.8 - Zeljko
 * Dec 17 - wmf.9 - Zeljko
 * Dec 24 - wmf.10 - No Train (Holiday break)
 * Dec 31 - wmf.11 - No Train (Holiday break)
 * Jan 07 - wmf.12 - Dan
 * Jan 14 - wmf.13 - Dan
 * Jan 21 - wmf.14 - Mukunda
 * Jan 28 - wmf.15 - No Train (All Hands)
 * Feb 04 - wmf.16 - Mukunda
 * Feb 11 - wmf.17 - Tyler
 * Feb 18 - wmf.18 - Tyler
 * Feb 25 - wmf.19 - Antoine

SoS

 * Oct 10 - Zeljko
 * Oct 17 - Zeljko <
 * Oct 24 - Zeljko
 * Oct 31 - Zeljko
 * Nov 07 - Zeljko
 * Nov 14 - Zeljko
 * Nov 21 - Zeljko
 * Nov 28 - Zeljko
 * Dec 05 - Zeljko
 * Dec 12 - Zeljko
 * Dec 19 - Zeljko
 * Dec 26 - Zeljko
 * Jan 02 - Zeljko
 * Jan 09 - Zeljko
 * Jan 16 - Zeljko
 * Jan 23 - Zeljko
 * Jan 30 - Zeljko
 * Feb 06 - Zeljko
 * Feb 13 - Zeljko
 * Feb 20 - Zeljko
 * Feb 27 - Zeljko

Hiring

 * Software Engineer position open and reviewing/hiring for now
 * https://boards.greenhouse.io/wikimedia/jobs/1225258

"all candidates are good at being them" - Greg

First Offsite
Details:
 * Week of December 3rd
 * At the Queen Mary hotel in Long Beach
 * Deb T will be facilitating

Topics!
 * https://etherpad.wikimedia.org/p/RelEng-Offsite-201811-Topics

Needs attention

 * gerrit security release 2018-10-08
 * https://groups.google.com/forum/m/#!topic/repo-discuss/eH0iLt2XawU
 * jGit update, we are unaffected
 * may want to hold off until next week: https://bugs.chromium.org/p/gerrit/issues/detail?id=9836
 * 2018-10-15 -- paladox tells me they're working on a fix and should have a 2.15.6 tagged Soon™

Scrum of Scrums

 * Greg to copy to etherpad after meeting: https://etherpad.wikimedia.org/p/Scrum-of-Scrums

Release Engineering

 * Blocked by:
 * (MediaWiki-General-or-Unknown) T207288 Text in the Sidebar does no longer show the message text, only the message name
 * Blocking:
 * Fundraising Tech: CRM tests still regularly failing due to full mysql partition on integration hosts. Possible fix noted by Eileen on https://phabricator.wikimedia.org/T205950
 * Updates:
 * Interviewing on-going for our Developer Productivity position: https://boards.greenhouse.io/wikimedia/jobs/1225258?gh_src=f15731e11
 * Train Health:
 * Last week: no train, datacenter switchover T191071 1.32.0-wmf.25 deployment blockers
 * This week: the last 1.32 release T191072 1.32.0-wmf.26 deployment blockers
 * Still open/blocking: T207288 Text in the Sidebar does no longer show the message text, only the message name (MediaWiki-General-or-Unknown)
 * Resolved: T207220 AFComputedVariable.php: Argument to getLinksFromDB must be an instance of Article - the cause has been identified and reverted.
 * Next week: 1.33 starts the next week - T191072 1.32.0-wmf.26 deployment blockers
 * Log Health:
 * T204871 Deployments of MediaWiki with scap cause a spam of "web request took longer than 60 seconds and timed out"
 * Code Health:
 * Metrics group meets weekly
 * T207046 Code health metrics spike

Callouts

 * Release Engineering
 * Train blocked: (MediaWiki-General-or-Unknown) T207288 Text in the Sidebar does no longer show the message text, only the message name

Train status and happenings

 * https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Roles#Train_Conductor


 * no train last week

TEC1 (Maint): Outcome 1 / Output 1.1

 * GOAL: Determine the procedure and requirements for an automated MediaWiki branch cut.
 * WHO: Mukunda, Tyler, Antoine


 * No update this week
 * Need to decide where to keep JJB/whether or not to use JJB
 * TODO: thcipriani to create task to discuss with relevant folks

TEC3 (Pipeline): Outcome 1 / Output 1.2

 * GOAL: Formalize the collection of CI infrastructure and tooling metrics
 * WHO: Dan, Antoine

TEC3 (Pipeline): Outcome 2 / Output 2.3

 * GOAL: Develop set of metrics to assess incident reports/post mortems -
 * WHO: Greg, Zeljko


 * nothing this week

TEC3 (Pipeline): Outcome 3 / Output 3.1

 * GOALS:
 * Adopt more services into Deployment pipeline -
 * Migrate graphoid to the Deployment pipeline
 * Deploy zotero v2 to the Deployment pipeline
 * Deploy blubberoid
 * WHO: Dan, Tyler, Lars


 * Zotero v2 blocked on the new release of blubber: https://phabricator.wikimedia.org/T206766
 * Alexandros said he'd look at it this week
 * Pairing with Lars to get it setup...maybe?
 * thcipriani: will schedule pairing session for CD pipeline setup for zotero v2 on Friday

TEC12 (DevProd): Outcome 2 / Output 2.1

 * GOAL: The Annual Developer Productivity Survey results are synthesized and shared, creating a first year baseline.
 * WHO: Mukunda, Greg


 * Legal wants to know about mailing lists and annonymized results.
 * Greg: please respond to the email to confirm that I've got the details correct.
 * subject: Developer Productivity Survey - Privacy Statement Request

TEC13 (Code Health): Outcome 1 / Output 1.1

 * GOAL: Update/refresh review queue (review process for initial code deployment)
 * WHO: JR


 * task breakdown activities

TEC13 (Code Health): Outcome 2 / Output 2.2

 * GOAL: 5 of the 15 prioritized repositories have at least 1 end-to-end test -
 * WHO: Zeljko


 * no activity last week

TEC13 (Code Health): Outcome 2 / Output 2.3

 * GOAL: Assess Platform unit test practices and define improvement plan
 * WHO: JR, Core Platform Team


 * no activity

TEC13 (Code Health): Outcome 3 / Output 3.2

 * GOAL: Core Platform and Search Platform teams are using TDM PoC
 * WHO: JR, Core Platform Team


 * no activity

TEC13 (Code Health): Outcome 3 / Output 3.4

 * GOALs:
 * Identify key Tech Debt areas
 * Put in place Tech Debt management process for PEP
 * WHO: JR, Core Platform Team


 * no activity

TEC13 (Code Health): Outcome 4 / Output 4.1

 * GOAL: Metrics defined and deployed for all 4 Code Health areas.
 * WHO: JR, Code Health Metrics Working Group


 * no meeting last week
 * WG members made some progress async
 * sharing information about various tool.
 * updated tasks with more core metric candidates

Selenium

 * T206624 Q2 Selenium framework improvements
 * T179188 Video recording for Selenium tests in Node.js
 * Waiting for clarification on code review feedback, I'm not sure what to do :| https://gerrit.wikimedia.org/r/c/mediawiki/core/+/422933


 * T206640 selenium-daily-beta-Popups CI job failing - will debug

Gerrit

 * Upgrade gerrit to 2.15.4 2.15.5
 * may want to hold off until next week: https://bugs.chromium.org/p/gerrit/issues/detail?id=9836

Phabricator

 * Need to get phab1002 ready with Daniel

SCAP
Still investigation: https://phabricator.wikimedia.org/T121597#4652873
 * "probably not scap", maybe eval.php or some with stderr

Antoine

 * What I plan to do this week
 * I am 40 on this October 15th (It is also my 16th wikipedia birthday)
 * 7 year hire-anniversay on Wednesday, too :)
 * Doc about overhauling MediaWiki testing
 * Get back to phasing out Nodepool (really the vast bulk of it is done)
 * (done) Formally meet/chat with Lars
 * What I'm blocked on
 * (done) 700+ emails to triage
 * Other?
 * More wall painting

Dan

 * What I plan to do this week
 * Finish up Blubberoid Swagger spec
 * Get back on integration-prometheus now that CI is stable
 * What I'm blocked on
 * Other?
 * Other?

Greg

 * What I plan to do this week
 * Get back to Petaluma on Tuesday night
 * respond to legal re survey -- done
 * lots of last minute TechConf planning work
 * Quarterly Reviews tomorrow
 * probably some follow-up on l10nupdate, probably
 * What I'm blocked on
 * Other?
 * Other?

Jean-Rene

 * What I plan to do this week
 * interviews
 * QCI prep/prez
 * QA strategy stuff
 * Update/refresh review queue
 * Metrics WG task creation/breakdown
 * What I'm blocked on
 * Other?
 * Other?

Lars

 * What I plan to do this week
 * Learn how the deployment pipeline currently works. IIUC it deployes one microservice to Kubernetes fow now.
 * Also how it's meant to work.
 * Find and review any documentation relvant to this.
 * What I'm blocked on
 * Lost on a sea of accounts and services.
 * Other?

Mukunda

 * What I plan to do this week
 * Train
 * More troubleshooting of the scap pre-deploy fatal check.
 * Dev Productivity survey
 * Developer productivity interviews at 9:00 AM on Monday and Tuesday
 * What I'm blocked on
 * Legal: need privacy statement
 * Other?

Tyler

 * What I plan to do this week
 * pairing on Zotero v2 pipeline
 * probable gerrit upgrade this week
 * further troubleshooting of scap initial check
 * Carry over from last week
 * Releases-jenkins icinga stuff
 * Moar keyholder review
 * Docs for ORES github sync problem (with heavy disclaimer)
 * What I'm blocked on
 * Other?
 * Other?

Zeljko

 * What I plan to do this week
 * T199133 Find top 15 target projects that could use Selenium tests to prevent incidents
 * T204068 QA: Automation Testing - port Echo Notification tests to Node.js
 * "60 seconds" task https://phabricator.wikimedia.org/maniphest/query/bUA0dYsX1iBb/#R
 * What I'm blocked on
 * Other?
 * T207018 RuntimeError: scap failed: average error rate on 4/11 canaries increased by 10x
 * T207018 RuntimeError: scap failed: average error rate on 4/11 canaries increased by 10x

Team Kanban Board Review and Triage

 * closed and touched in the 7 days
 * No update for 4 weeks
 * No update for 3 weeks
 * No update for 2 weeks
 * No update for 1 week
 * All Open
 * Review To Triage column of #releng
 * Assigned
 * Unassigned

Once / month-ish review of backlog(s)

 * releng Review To Triage column of #releng
 * releng-kanban Review unassigned in kanban
 * releng-kanban Review 'backlog' colum of -kanban
 * releng-next - Review for things we need to put on our kanban backlog
 * releng-backlog - oh my, the huge backlog of things...

Kanban stats

 * Burnup chart