Wikimedia Release Engineering Team/Checkin archive/20190313

= 2019-03-13 =

Vacations/Important dates

 * https://office.wikimedia.org/wiki/HR_Corner/Holiday_List
 * How to do it


 * April 9-12: Greg at tech-mgt F2F in Portland
 * April 17-19 (Wednesday - Friday) - Željko vacation
 * April 22 (WMF Holiday) - US Staff
 * April 22-27: Team offsite in Chicago
 * April 29: Moved WMF Holiday for US staff at offsite
 * May 1st - Lars, Antoine and Željko, Labor Day / May Day
 * May 8th - Antoine, 1945 victory
 * May 15 (Wednesday) - Željko vacation
 * May 16-20 - Wikimedia Hackathon 2019 (Prague, Czechia)
 * Attending: Greg, JR, Zeljko, James, and Jeena
 * May 30th-31th - Antoine, Feast of the Ascension
 * June 10th - Antoine, Pentecost -- see https://en.wikipedia.org/wiki/Eastertide for Antoine/France Easter holidays
 * May 27 (Memorial Day) - US Staff
 * June 6-7 - Brennen, Apogaea
 * June 19 (Juneteenth) - US Staff
 * June 17 - July 5 - Željko vacation

Train

 * Maniphest query for deployment blocker tasks: https://phabricator.wikimedia.org/maniphest/query/s3KW8bpsXhYF/#R


 * Jan 07 - wmf.12 - Dan
 * Jan 14 - wmf.13 - Dan
 * Jan 21 - wmf.14 - Mukunda
 * Jan 28 - wmf.15 - No Train (All Hands)
 * Feb 04 - wmf.16 - Mukunda
 * Feb 11 - wmf.17 - Tyler
 * Feb 18 - wmf.18 - Tyler
 * Feb 25 - wmf.19 - Antoine
 * Mar 04 - wmf.20 - Antoine
 * Mar 11 - wmf.21 - Zeljko
 * Mar 18 - wmf.22 - Zeljko
 * Mar 25 - wmf.23 - Dan
 * Apr 01 - wmf.24 - Dan
 * Apr 08 - wmf.25 - Mukunda
 * Apr 15 - wmf.26 - Mukunda
 * Apr 22 - 1.34.0-wmf.1 - NO TRAIN, team offsite
 * Apr 29 - wmf.2 - Tyler
 * May 06 - wmf.3 - Tyler
 * May 13 - wmf.4 - Antoine
 * May 20 - wmf.5 - Antoine
 * May 27 - wmf.6 - Zeljko
 * June 03 - wmf.7 - Zeljko

SoS

 * Zeljko 4eva! :)

Book club

 * https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Book_club
 * Notes: https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Book_club/Continuous_Delivery
 * Next: March 21st at the "same" time (9am Pacific/16:00 UTC)

Spring Offsite

 * Location: Chicago, IL (Central timezone, UTC-5 while we're there)
 * Dates: Arrive Monday 4/22, Depart Saturday 4/27.
 * BOOK YOUR FLIGHTS BY: March 21
 * Activity day
 * Fill out the spreadsheet: https://docs.google.com/spreadsheets/d/1zqO8Mk1wUU2ZtyAM9xU68CQTpJFEOPALfDKCj7aMNo4/edit
 * Program:
 * start listing your topics! https://etherpad.wikimedia.org/p/releng-offsite-201904-topics

Monthly reflection on accomplishments - March '19 edition

 * https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Monthly_notable_accomplishments
 * Add as you have them!


 * CI tooling future WG started, blogged
 * GerritBot comments on patches going through the pipeline (with fancy badges and the like)
 * Train deploy notes are now automatically generated on branch push
 * Scap 3.9.2-1 released in production
 * Phabricator upgrade: https://phabricator.wikimedia.org/phame/post/view/147/projects_forms_and_subtypes_oh_my/
 * Published the ISOSTWG results and recommendation on officewiki and announced: https://office.wikimedia.org/wiki/Internal_Support_for_Open_Source_Tools_Working_Group

Q4 Goals planning

 * etherpad: https://etherpad.wikimedia.org/p/releng-1819Q4-goals
 * Due: Monday March 18th, aka this Friday


 * Q3 goals question from debt:
 * https://www.mediawiki.org/wiki/Wikimedia_Technology/Annual_Plans/FY2019/TEC1:_Reliability,_Performance,_and_Maintenance/Goals#Q3_Goals
 * both done

Annual Planning is coming up

 * I emailed mark re future testing/"evaluation" environments
 * See notes here: https://docs.google.com/document/d/1QU_6Svn4iduK0TPLSOghYP4g1lK-byCv-0ZKoHfIAVY/edit#heading=h.6gq2j7lm5pz8

Pywikibot CI

 * https://phabricator.wikimedia.org/T132138
 * Antoine to take a time boxed look into this, this week

Post-mortem "MWException: No localisation cache found for English."
> I think we missed running a scap pull and the cache generation. [when the server was repooled] > So that is a glitch in how we repool a MediaWiki server?
 * https://phabricator.wikimedia.org/T217719
 * next steps?
 * greg to follow-up

Merge blocker: The table 'l10n_cache' is full in quibble-vendor-mysql-hhvm-docker

 * https://phabricator.wikimedia.org/T217654
 * "The bump from 256M to 320M must be good enough and I have updated the Jenkins jobs. Lowering priority to High." -- https://phabricator.wikimedia.org/T217654#5020364

Merge blocker: quibble-vendor-mysql-hhvm-docker in gate fails for most merges (exit status -11)

 * https://phabricator.wikimedia.org/T216689
 * "I have rollbacked the jobs container:" -- https://phabricator.wikimedia.org/T216689#5020757
 * See T218209 though. :-(

Merge blocker: Failed to create /nonexistent/.pki/nssdb directory

 * https://phabricator.wikimedia.org/T218209
 * Caused by revert for T216689?

FYI: Wikimedia-production-error (Shared Build Failure)

 * https://phabricator.wikimedia.org/project/profile/3298/

Cannot access beta cluster db

 * https://phabricator.wikimedia.org/T217938
 * Mukunda to take a look

Scrum of Scrums

 * Greg to copy to etherpad after meeting: https://etherpad.wikimedia.org/p/Scrum-of-Scrums

Incoming from last week

 * Blocking:

Release Engineering

 * Blocked by:
 * Blocking:
 * Language: Several CI failures
 * Readers Infrastructure: Review needed for deploying Extension:WikimediaEditorTasks to production (https://phabricator.wikimedia.org/T218136 )
 * Search Platform: Thanks RelEng for working on https://phabricator.wikimedia.org/T216689
 * Updates:
 * Work progresses on CI tool evaluation https://phabricator.wikimedia.org/phame/post/view/149/work_progresses_on_ci_tool_evaluation/
 * Train Health:
 * Last week: 1.33.0-wmf.20 - https://phabricator.wikimedia.org/T206674
 * This week: 1.33.0-wmf.21 - https://phabricator.wikimedia.org/T206675
 * Next week: 1.33.0-wmf.22 - https://phabricator.wikimedia.org/T206676
 * Code Health:
 * SonarQube is available as an experimental job for all extensions https://gerrit.wikimedia.org/r/c/integration/config/+/490950

Callouts

 * Release Engineering

Train status and happenings

 * https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Roles#Train_Conductor


 * minor issue in MFE yesterday (undeclared variable, somehow not caught somewhere first)

Quarterly Goals for Q3
https://www.mediawiki.org/wiki/Wikimedia_Technology/Goals/2018-19_Q3

TEC1 (Maint): Outcome 1 / Output 1.1

 * GOAL: Automate the generation of change log notes
 * WHO: Mukunda, (Tyler on backup)


 * should now run on branch cut https://integration.wikimedia.org/ci/job/train-deploy-notes/
 * problem with ref filter: https://gerrit.wikimedia.org/r/#/c/integration/config/+/494778/

TEC1 (Maint): Outcome 1 / Output 1.1

 * GOAL: Investigate notification methods for developers with changes that are riding any given train
 * WHO: Mukunda, Tyler

TEC3 (Pipeline): Outcome 1 / Output 1.2

 * GOAL: Instrument Quibble for data collection
 * WHO: Mukunda, Antoine

TEC3 (Pipeline): Outcome 1 / Output 1.2

 * GOAL: Create a graph where time is spent and make a prioritized list for improvements.
 * WHO: Mukunda, Antoine

TEC3 (Pipeline): Outcome 2 / Output 2.1

 * GOAL: Select and integrate a code health metric solution into our tooling.
 * WHO: JR, ...

TEC3 (Pipeline): Outcome 3 / Output 3.1

 * GOALS:
 * Adopt more services into Deployment pipeline -
 * cxserver, ORES (partially), citoid, changeprop, cpjobqueue (stretch)
 * Deploy eventgate
 * WHO: Dan, Tyler, Lars


 * cxserver
 * Images built via deployment pipeline
 * Namespaces created for k8s eqiad/codfw
 * helm charts created


 * ✅ citoid
 * Images built via deployment pipeline
 * Deployed
 * Traffic switched


 * changeprop


 * ✅ eventgate
 * Image built via pipeline
 * Chart
 * Deployed


 * ORES
 * cf: Dan's comments

TEC12 (DevProd): Outcome 1 / Output 1.1

 * GOAL: Conduct interviews with development stakeholders and compile a report that informs future work creation of a rubric.
 * WHO: Jeena, Mukunda


 * Results are posted: https://www.mediawiki.org/wiki/Developer_Satisfaction

TEC13 (Code Health): Outcome 1 / Output 1.1

 * GOALs:
 * Develop and communicate guidelines and best practices for successful Code Stewardship.
 * (Continued from Q2) Update/refresh review queue (review process for initial code deployment)
 * WHO: JR


 * Created mockup for Code Stewardship dashboard
 * Created metrics tracking spreadsheet

TEC13 (Code Health): Outcome 2 / Output 2.2

 * GOAL: 5 of the 15 prioritized repositories have at least 1 end-to-end test -
 * WHO: Zeljko

TEC13 (Code Health): Outcome 2 / Output 2.3

 * GOALs:
 * Evolve/develop tools and processes to support the PE refactoring effort to improve code health.
 * Develop common test strategy that enable teams to engage in more effective and efficient testing practices. (maybe should be output 2.4?)
 * WHO: JR, Core Platform Team


 * made progress on addressing some of the action items from discussions with CPT
 * Started putting strategy to paper

TEC13 (Code Health): Outcome 3 / Output 3.2

 * GOALs:
 * Speak at All Hands on the status of Technical Debt
 * Engage and coach development teams on their approach to managing technical debt.
 * WHO: JR, Core Platform Team


 * This goal area to be absorbed into broader Code Health goals moving forward.

TEC13 (Code Health): Outcome 4 / Output 4.1

 * GOALs: Code Health Dashboard with 50% of repositories covered.
 * WHO: JR, Core Platform Team


 * SonarQube is available as experimental job for all extensions. Key step towards general availability of Code Health metrics dashboard.

Phabricator

 * James: I've very excited that https://secure.phabricator.com/T10578 and https://secure.phabricator.com/T10333 are now Resolved upstream. It's only been three years. ;-)
 * the `user.transactions` api method is now deployed to production, this will facilitate rollback of vandalism should anyone get past the antivandalism extension.

Jenkins

 * 2.15.11 still needs deployed due to healthcheck rollback

Antoine

 * What I plan to do this week
 * What I'm blocked on
 * Other?
 * Other?
 * Other?

Brennen

 * What I plan to do this week
 * CI WG
 * Evaluate Zuul v3 - https://phabricator.wikimedia.org/T218138
 * Pivotal/Concourse discussion
 * Rough out docker-pkg templates for use by local-charts
 * Script sshfs setup in local-charts
 * Revisit docs questions - https://phabricator.wikimedia.org/T217614
 * What I'm blocked on
 * Other?
 * Other?

Dan

 * What I plan to do this week
 * Evaluate Tekton for CI WG https://phabricator.wikimedia.org/T217912
 * Modify blubber.yaml configs in projects for v4 https://phabricator.wikimedia.org/T218142
 * Deploy blubberoid
 * Draft email to Analytics about feedback on Jenkins/Gerrit event-log datastore
 * Begin implementation of .pipeline/config.yaml https://phabricator.wikimedia.org/T210267
 * What I'm blocked on
 * Other?
 * Other?

Greg

 * What I plan to do this week
 * Slides for c-level/board(?) meeting at end of month
 * Book reading
 * TechConf planning with Deb (meeting with big group on Monday)
 * What I'm blocked on
 * Other?
 * Other?

James

 * What I plan to do this week
 * Mostly still working with the Multimedia team on SDC stuff
 * Book reading!
 * What I'm blocked on
 * Other?
 * Other?

Jean-Rene

 * What I plan to do this week
 * Work on stewardship best practices, include relocate Code Stewardship page
 * Work on test strategy goal
 * What I'm blocked on
 * Other?
 * Other?

Jeena

 * What I plan to do this week
 * Work on Localsettings in local-charts (automate manual config/install steps)
 * Other local-charts work
 * Read Book
 * What I'm blocked on
 * Other?
 * Other?

Lars

 * What I plan to do this week
 * CI WG
 * Pivotal meeting
 * Concourse
 * Read CD book
 * What I'm blocked on
 * possibly getting ill
 * Other?

Mukunda

 * What I plan to do this week
 * look into beta cluster db issue ( https://phabricator.wikimedia.org/T217938 )
 * Phabricator, Phabricator, Phabricator
 * Finish rolling out the Vandalism rollback stuff with Andre
 * More dabbling with phabricator on minikube
 * Read a book ( https://www.youtube.com/watch?v=GlKL_EpnSp8 )
 * What I'm blocked on
 * Other?
 * Other?

Tyler

 * What I plan to do this week
 * Deploy notes fix deployment
 * Gerrit 2.15.11 re-rollforward
 * GerritHealthCheckBot setup for healthcheck plugin
 * blubber policyfile
 * update docker-pkg documentation
 * What I'm blocked on
 * review for https://gerrit.wikimedia.org/r/#/c/integration/config/+/494778/
 * Other?
 * code health metrics (Kosta) blocked on releng (Tyler/Antoine) https://gerrit.wikimedia.org/r/c/integration/config/+/494548

Željko

 * What I plan to do this week
 * T206675 1.33.0-wmf.21 deployment blockers
 * T217901 Evaluate Phabricator Harbormaster
 * Mukunda will be glad to have a 1:1 if you'd like help with this one.
 * T214478 The first Selenium test for AbuseFilter
 * T217051 Echo notifications automation smoke test
 * What I'm blocked on
 * code health metrics (Kosta) blocked on releng (Tyler/Antoine) https://gerrit.wikimedia.org/r/c/integration/config/+/494548
 * thcipriani: I talked to Kosta a bit about this on Friday, I'd like to make sonarqube be triggered after the existing coverage jobs rather than reimplement the coverage jobs (I think thtat makes sense anyway)
 * Other?
 * Google calendar and Deployments calendar are not in sync :(

Team Kanban Board Review and Triage

 * closed and touched in the 7 days
 * No update for 4 weeks
 * No update for 3 weeks
 * No update for 2 weeks
 * No update for 1 week
 * All Open
 * Review To Triage column of #releng
 * Assigned
 * Unassigned

Once / month-ish review of backlog(s)

 * releng Review To Triage column of #releng
 * releng-kanban Review unassigned in kanban
 * releng-kanban Review 'backlog' colum of -kanban
 * releng-next - Review for things we need to put on our kanban backlog
 * releng-backlog - oh my, the huge backlog of things...

Kanban stats

 * Burnup chart