Wikimedia Release Engineering Team/Checkin archive/20190603

= 2019-06-03 =

Vacations/Important dates

 * https://office.wikimedia.org/wiki/HR_Corner/Holiday_List
 * How to do it


 * June 6-7 - Brennen, Apogaea
 * June 10th - Antoine, Pentecost -- see https://en.wikipedia.org/wiki/Eastertide for Antoine/France Easter holidays
 * June 10 – July 21 - Dan leave (6 weeks, then additional leave later)
 * June 19 (Juneteenth) - US Staff - on a Wednesday!?
 * June 20 - Željko, Corpus Christi
 * June 25 - Željko, Statehood Day


 * July 4 (US Independence Day) - US Staff
 * July 22 - August 9 - Željko vacation
 * July 22 - Lars, Midsummer


 * August 7–19 - James off (inc. Wikimania)
 * August 12 - September 8 - Dan leave
 * August 12 (Glorious Twelfth) - US Staff
 * August ??? - ??? - Antoine
 * August 14–18 - Wikimania
 * Attending: James, Lars, ? …
 * August 15 - Željko, Assumption of Mary
 * August 25 - September 4 - Brennen vacation


 * September 2 (Labor Day) - US Staff


 * October 14 (Indigenous Peoples' Day) - US Staff


 * November 11 (Veterans' Day) - US Staff
 * November 28–29 (Thanksgiving) - US Staff


 * December 6 - Lars, Finnish Independence Day
 * December 25–31 (Christmas) - US Staff
 * December 25-26 - Lars, Christmas


 * 2020 January 1 (New Year's Day) - US Staff, Lars

Train

 * Maniphest query for deployment blocker tasks: https://phabricator.wikimedia.org/maniphest/query/s3KW8bpsXhYF/#R


 * May 27 - wmf.7 - Zeljko 😢
 * June 03 - wmf.8 - Zeljko 😭
 * June 10 - wmf.9 - No Train (SRE Summit)
 * June 17 - wmf.10 - Mukunda (but Juneteenth on the Wednesday? Yes.)
 * June 24 - wmf.11 - Mukunda
 * July 1 - wmf.12 - No train (Fourth of July)
 * July 8 - wmf.13 - Tyler
 * July 15 - wmf.14 - Tyler
 * July 22 - wmf.15 - Antoine
 * July 29 - wmf.16 - Antoine
 * Aug 5 - wmf.17 - one of Mukunda/Tyler (Antoine and Zeljko on vacation)
 * Aug 12 - wmf.18 - No Train (Wikimania) 😳 Last year we discussed not having train during Wikimania https://wikitech.wikimedia.org/wiki/Incident_documentation/20180717-Train
 * Aug 19 - wmf.19 - Zeljko (after Wikimania) 😱
 * Aug 26 - wmf.20 - Zeljko

SoS

 * Zeljko 4eva! :)

Timespent spreadsheet

 * For the avoidance of doubt: fill out the sheet week number for the previous week


 * link to week stating May 27: https://docs.google.com/spreadsheets/d/1urCLNQXeEi1DOR8Iu0qW0yPt-glxX1laqlMovbGyCW0/edit#gid=1684896476

Book club

 * https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Book_club
 * Notes: https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Book_club/Continuous_Delivery
 * Next: June 14th, chapters 10+11 (9am Pacific)

Spring Offsite
Follow-ups:


 * Greg: email mark about capex request for next year for pipeline
 * I'm actually not sure what this is about/what the ask is, help?!
 * "staging" pipeline?
 * Production access?
 * CapEx budget now locked.


 * ????: re Integration environments: establish SLAs between the teams for what is their responsibility and ours, what is the working relationship
 * I think there's something more here that needs to be fleshed out, see the relevant section here: https://docs.google.com/document/d/1Y-cYrPKT0dvN2oj0hScIjRjkM2zWL5NY9xMYfMuC2Do/edit?ts=5c9cd50b#heading=h.vbm26ktfhprv
 * Greg: flesh out/say more on this
 * 2019-05-13: not yet...


 * Mukunda: talk with Timo and Fillipo about our prioritized of feature requests for LMM
 * Note: Gergo confirmed that SRE is going to work on Sentry in Q1/Q2 (from a conversation with Faidon and Filippo)
 * See: https://docs.google.com/document/d/1Y-cYrPKT0dvN2oj0hScIjRjkM2zWL5NY9xMYfMuC2Do/edit?ts=5c9cd50b#heading=h.ra3pbkbq71i4
 * 2019-06-03: Mukunda sent discussion starting email to Timo and Filipo
 * Filippo responded: he did not seem to think that we are close to having sentry in production.
 * Greg: announce that RelEng is backup only for SWAT (removal of person’s names from getting pinged everytime on IRC) and we’ll start working on automating the train
 * Still need to do Q4 goals...table this “doing” until Q1?
 * Greg will send a signed email if someone writes it up ;)
 * Željko will write the e-mail this week - done
 * Greg to follow-up...

Fall Offsite + TechConf19

 * Travel travel travel!
 * Two short trips, or one long?
 * https://docs.google.com/forms/d/e/1FAIpQLScxVG8xz_CCacGusirtxrz2dfGKVvHZ5jes4attCh0BtdVcjw/viewform?usp=sf_link

Monthly reflection on accomplishments - May '19 edition

 * https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Monthly_notable_accomplishments
 * Add as you have them!


 * Phabricator vandalism rollback tool completed 🎉 (blog post? 😉)
 * Upgrade Zuul to 2.5.1-wmf6 (which unblocks the Gerrit upgrade to 2.16) - https://phabricator.wikimedia.org/T208426
 * Team offsite in Chicago
 * Repository-hosted CI/CD pipeline configurations now supported (.pipeline/config.yaml) - https://phabricator.wikimedia.org/T210267
 * Train notes published on branch cut
 * Codehealth pipeline beta - https://phabricator.wikimedia.org/phame/live/1/post/160/introducing_the_codehealth_pipeline_beta/
 * Some baseline local development images published

Annual Reviews
Overview: https://office.wikimedia.org/wiki/FY_2018-19_Annual_Review_and_Retrospective
 * Note: there is a workshop you can attend to get advice: https://office.wikimedia.org/wiki/FY_2018-19_Annual_Review_and_Retrospective#Sprints_&_trainings_-_support_from_T&C

Deadlines
Everyone:
 * Starting now: You and I discuss who your peer reviewers should be
 * April 26th: Enter your peer reviewers into Namely (please run them by me first)
 * May 17th: Deadline to complete self-reviews, peer reviews, and reviews of your manager.
 * May 20th: I start reviewing the peer reviews and writing my feedback on you.

Non SafeGuard (aka US Employees):
 * June 14th: Deadline for managers to complete all 1:1 meetings with direct reports and provide written feedback in Namely.

SafeGuard:
 * June 14th - Managers of those employed by Safeguard submit their reviews to HR for submission to Safeguard
 * July 12th - Deadline to have a 1:1 and share final manager review with direct report in Namely

Incoming/Needs attention

 * REL1_33 branching for extensions: https://phabricator.wikimedia.org/T220653
 * Reedy said he'll move forward with rc0 announcement soon.
 * Mukunda tried to run ther script but it ran into trouble. Will re-try, manually.
 * Switching on HTTP Auth again still seems blocked. Barricade should help with this; review when Tyler gets back.
 * Update 2019-06-03: Fighting fires last; should be able to do this week.


 * CI Node 10 migration – let's JFDI? https://phabricator.wikimedia.org/T222406 Will need to pair with a CI expert (hashar?)
 * James and Antoine to pair next week.
 * Update 2019-06-03: In progress. Paired on it for a couple of hours so far today, more to come.


 * We need to merge this patch publicly:
 * https://phabricator.wikimedia.org/T205563#4999568
 * Tyler can do this


 * Jenkins plugin update
 * https://phabricator.wikimedia.org/T224745
 * Antoine to do in the EU morning this week


 * Phabricator puppet patch needing merged
 * https://gerrit.wikimedia.org/r/c/operations/puppet/+/513713 && https://phabricator.wikimedia.org/T224752
 * Blocked on SRE, obvs


 * Enhance MediaWiki deployments for support of php7.x
 * https://phabricator.wikimedia.org/T224857
 * Tyler can reply this week


 * [Wikitech-l] Separating MediaWiki core unit and integration tests — help welcome!
 * https://lists.wikimedia.org/pipermail/wikitech-l/2019-June/092136.html
 * James has looked

Incoming from last week

 * Blocking:

Release Engineering

 * Blocked by:
 * Core Platform Team (low priority): https://phabricator.wikimedia.org/T205361 is blocking undeployment of CodeReview.
 * SRE Traffic Team (low priority): https://phabricator.wikimedia.org/T213769 is blocking undeployment of Wikipedia Zero.
 * Wikidata: We need to update wikiba.se tests to PHP7 so we can drop php56 from CI. https://phabricator.wikimedia.org/T224905
 * Blocking:
 * Fundraising Tech: Need to update Fundraising Tech CiviCRM tests to PHP7: https://phabricator.wikimedia.org/T223348
 * Parsing: Can RelEng team take a look at https://phabricator.wikimedia.org/T221872 ? We seem to be babysitting merges a lot more than we would like to because of having to "recheck" patches frequently.
 * Updates:
 * Train Health
 * Last week: 1.34.0-wmf.7 - https://phabricator.wikimedia.org/T220732
 * This week: 1.34.0-wmf.8 - https://phabricator.wikimedia.org/T220733
 * Next week: 1.34.0-wmf.9 - NO TRAIN OR ANY OTHER DEPLOYS due to SRE Off-site
 * Code Health
 * Log Health
 * Log Health

Callouts

 * Release Engineering

Train status and happenings

 * https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Roles#Train_Conductor


 * Need to fix scap clean :\
 * thcipriani has a crappy fix in mind until http tokens in gerrit are back
 * Any idea when HTTP tokens will come back? Weeks? Months? Never? :-(
 * ~Weeks
 * 2019-05-06: cleaned up stuff last week on deploy hosts, just not the gerrit branches
 * 2019-05-13: …
 * 2019-06-03: upstream issues/patches we want resolved before doing this
 * cf: https://phabricator.wikimedia.org/T218750#5128424
 * looks like these patches merged -- I'll check what release they're going out with


 * 1.33 branch cut for extensions is blocked (except tarball ones, which James did manually)
 * 2019-05-06: Mukunda to do it this week
 * Greg: email Cindy re process of this release
 * 2019-05-13: We talked on Thursday. Mukunda will review hexmode's work, Cindy will email Greg with plan of action re timeline.
 * 2019-06-03: See above.

Quarterly Goals for Q4
https://www.mediawiki.org/wiki/Wikimedia_Technology/Goals/2018-19_Q4

TEC1 (Maint): Outcome 1 / Output 1.1

 * GOAL: Undeploy the CodeReview extension.
 * WHO: James, need help from CPT


 * James will ping CPT about this this week (April 8th)
 * … and again w/c 15 April.
 * … and again w/c 6 May (in SoS).
 * … and again w/c 27 May (in SoS).

TEC1 (Maint): Outcome 1 / Output 1.1

 * GOAL: Setup 1-3 of the CI WG options (Zuul v3, Argo, GitLab)
 * WHO:


 * Focus on a couple noteworthy repos: e.g.,
 * core
 * extensions
 * ops/puppet
 * Maybe setup in serial, i.e., a week per evaluation


 * Questions:
 * RelEng/Extended working group?
 * At least in the WG eval it was good to have non-familiar people
 * But maybe with the setup of options it might be beneficial to have experienced with current setup people.
 * Folks outside the original working group to join-in to setup options; people TBD
 * Do we need a rubric before we do this prototyping? (yes)
 * DONE lars to work on rubric week of 2019-04-01
 * See email 2019-04-08
 * CI arch doc in team google drive now, open for feedback


 * 2019-05-06: Feedback from Android. Working on an arch document. Do in Q1?

TEC3 (Pipeline): Outcome 1 / Output 1.2

 * GOAL: Instrument Quibble for data collection
 * WHO: Mukunda, Antoine



TEC3 (Pipeline): Outcome 1 / Output 1.2

 * GOAL: Create a graph where time is spent and make a prioritized list for improvements.
 * WHO: Mukunda, Antoine


 * Blocked

TEC3 (Pipeline): Outcome 1 / Output 1.2

 * GOAL: Prepare the Deployment Pipeline for changes to our CI tooling.
 * WHO: ???, ???


 * Blocked by not having new CI tooling yet

TEC3 (Pipeline): Outcome 3 / Output 3.1

 * GOAL: Create a .pipeline/config.yaml standard to give users more control over how their tests are run in the pipeline and allow the easy saving of artifacts at pipeline completion. (RelEng)
 * WHO: Dan, Tyler, ???


 * Implementation is working, but in testing a Blubber .pipeline/config.yaml there are some glaring deficiencies
 * Re: https://gerrit.wikimedia.org/r/c/blubber/+/511784/6/.pipeline/config.yaml
 * A high degree of repetition/duplication. Could use some sort of includes functionality (ala Blubber's variants) and/or a `defaults` section up top for `chart`, `blubberfile`, etc.
 * A configuration validation system with human readible errors.
 * Long term concerns about Groovy implementation include:
 * Dependencies on Jenkins and many plugins
 * Groovy CPS is a huge pain to debug and it's rarely clear that CPS is the issue when things go awry; Instead, the code just executes in unexpected ways.

TEC3 (Pipeline): Outcome 3 / Output 3.1

 * GOALS:
 * Adopt more services into Deployment pipeline -
 * Wikidata Termbox SSR, Kask for Session Storage Service, cpjobqueue (stretch), ORES (stretch)
 * WHO: Dan, Tyler, Lars

There are tasks: https://phabricator.wikimedia.org/T220403


 * changeprop


 * ORES
 * cf: Dan's comments


 * Wikidata Termbox SSR


 * Kask for Session Storage Service


 * cpjobqueue (stretch)

TEC12 (DevProd): Outcome 1 / Output 1.1

 * GOAL: Provide an "Official" Docker base image for local development of MediaWiki based on the production tooling.
 * WHO: Jeena, Brennen
 * https://phabricator.wikimedia.org/T212449


 * Done for MediaWiki, for some values of "done" and "MediaWiki". Production-likeness needs considerable work.

TEC13 (Code Health): Outcome 1 / Output 3

 * GOALs: Presentation/session(s) at the Wikimedia Hackathon on the current state of Code Health projects (technical debt and code stewardship)
 * WHO: JR

TEC13 (Code Health): Outcome 1 / Output 1.1

 * GOAL:
 * Publish a re-imagination of the Review Queue process.
 * Develop and implement metrics around task and code-review responsiveness
 * WHO: Greg, JR (and Andre)


 * Closed call for participation in Code Review workgroup. Working on scheduling kickoff meeting.

TEC13 (Code Health): Outcome 4 / Output 4.2

 * GOALs:
 * Expand SonarQube reporting into CI infrastructure
 * Perform SonarQube analysis on all extensions
 * Engage user communities in direct feedback solicitation
 * WHO: JR, Zeljko, Code Health Metrics


 * We currently have 6 extensions in the new pipeline, will be at 10 within the week. Planning to push for the rest of extensions starting next week.
 * Core is not currently in a state to be added due to extended unit test run times. Project (non CHMWG) underway to seperate unit for integration tests.
 * Looking to expand to other areas of analysis such as Python/SCAP. This will require some wramp up and assistance from others.

Selenium

 * T223774 The first Selenium test for WikibaseCirrusSearch - started at the hackathon, have to finish it

Phabricator

 * git-ssh.wikimedia.org got broken by the migration to a new phab server.
 * This is still broken and I have no idea what is wrong or where to look for the solution.
 * We have exhausted all reasonable troubleshooting steps already.
 * email-based replies to tasks was broken after migration to a new server, this was a bug in puppet related to php-fpm: https://phabricator.wikimedia.org/T224752
 * Mukunda wrote a patch ( https://gerrit.wikimedia.org/r/c/operations/puppet/+/513713 )
 * it's hotfixed in prod
 * epriestley has been triaging a lot of issues on the phabricator-upstream workboard: https://phabricator.wikimedia.org/tag/phabricator-upstream/
 * Reading some of his comments reveals that there are tons of upstream changes that need to be merged and deployed.

QA/Code Health

 * Community project started for Core to start splitting unit and integration tests.

Antoine

 * What I plan to do this week
 * Mail backlog/catchup
 * CI node 10.
 * Civicrm to docker - Run wikimedia/fundraising/crm CI jobs on PHP7x, not PHP5x - https://phabricator.wikimedia.org/T223348
 * composer-package-php73-docker seems to fail often on Parsoid builds - https://phabricator.wikimedia.org/T221872
 * Quibble refactor
 * What I'm blocked on
 * Other?
 * Other?

Brennen

 * What I plan to do this week
 * Remove Docker dependency from local-charts: https://phabricator.wikimedia.org/T223715
 * Get various incoming local-dev patchsets merged.
 * What I'm blocked on
 * Nada
 * Other?
 * Off-grid June 6-9
 * If I seem to be operating at like 20% of my cognitive baseline, it's 'cause allergy season just started in Colorado.

Dan

 * What I plan to do this week
 * Wrap up before leave
 * How to best pass off pipelinelib work?
 * Meeting with Analytics delayed. They want a document of use cases. JR or Mukunda may want to pick this up or wait. Either works for me
 * JR: "We'll give it a go"
 * Get calendar in order and decline meetings
 * What I'm blocked on
 * Nothing really
 * Other?
 * Thoughts on MediaWiki dependency resolution/installation
 * https://phabricator.wikimedia.org/T193824#5227012
 * A "dev-requires" field has been added to extension.json schema, and a maintenance/checkDependencies.php script is now in core
 * I rambled about an idea of aggregating dependencies into a central service on postmerge, a service that could potentially map extension/skin version contraints to gitremotes/refs. This needs fleshing out if it's generally sane.

Greg

 * What I plan to do this week
 * Annual Reviews writing
 * TechConf19
 * DMV appointment on Thursday :-(
 * Read book, hopefully!
 * What I'm blocked on
 * The linearity of time.
 * Other?
 * SRE summit week of Jun 9

James

 * What I plan to do this week
 * Node 10 CI stuff
 * Pipeline documentation
 * Unit vs. Integration test split help
 * What I'm blocked on
 * Extension undeployment stuff, as before.
 * Other?
 * Whatever blows up.

Jean-Rene

 * What I plan to do this week
 * More time on Code Stewardship review
 * Continued Code Review workgroup setup/planning
 * Work with CPT on Integration Testing framework decision
 * What I'm blocked on
 * Other?
 * Other?

Jeena

 * What I plan to do this week
 * Organize local-charts backlog/workboard
 * Create phabricator task for liveness/readiness
 * Discuss/plan interface for interacting with local-charts
 * Read book
 * What I'm blocked on
 * Other?
 * Other?

Lars

 * What I plan to do this week
 * reading CD book
 * update CI arch doc, and reach out more for more feedback
 * look at installing GtLab somewhere
 * What I'm blocked on
 * Other?
 * Other?

Mukunda

 * What I plan to do this week
 * Lots of upstream changes to Phabricator that need to merge and deploy
 * git-ssh.wikimedia.org is still broken
 * Create release branches for mediawiki extensions: https://phabricator.wikimedia.org/T220653
 * continue working with MarkAHershberger on the branching and release automation stuff: https://phabricator.wikimedia.org/T222829


 * What I'm blocked on
 * yaks
 * Other?

Tyler

 * What I plan to do this week
 * get things from dan's brain
 * l10n checker patches
 * scap release
 * reply on "rethinking deployment" task
 * dig out from my vacation email backlog
 * dcausse annual review
 * What I'm blocked on
 * Other?
 * Other?

Zeljko

 * What I plan to do this week
 * T220733 1.34.0-wmf.8 deployment blockers
 * T223774 The first Selenium test for WikibaseCirrusSearch
 * What I'm blocked on
 * Other?
 * Didn't do a swat since the hackathon :D
 * Didn't do a swat since the hackathon :D

Team Kanban Board Review and Triage

 * closed and touched in the 7 days
 * No update for 4 weeks
 * No update for 3 weeks
 * No update for 2 weeks
 * No update for 1 week
 * All Open
 * Review To Triage column of #releng
 * Assigned
 * Unassigned

Once / month-ish review of backlog(s)

 * releng Review To Triage column of #releng
 * releng-kanban Review unassigned in kanban
 * releng-kanban Review 'backlog' colum of -kanban
 * releng-next - Review for things we need to put on our kanban backlog
 * releng-backlog - oh my, the huge backlog of things...

Kanban stats

 * Burnup chart