Wikimedia Release Engineering Team/Checkin archive/20180115

= 2018-01-15 =

Vacations/Important dates

 * https://office.wikimedia.org/wiki/HR_Corner/Holiday_List
 * How to do it


 * Jan 15 (Mon): Martin Luther King Day (All US Staff)
 * Jan 22/23: Dev Summit
 * Jan 24: Tech Management F2F
 * Jan 25/26: WMF All Hands
 * Jan 29-31: Team offsite
 * Feb 19 (Mon): President's Day (All US Staff)
 * Mar 30 (Fri): WMF Holiday
 * April 14 (Fri): WMF Holiday
 * May 15?/16/17: Team offsite in Barcelona
 * May 18-20: Wikimedia Hackathon in Barcelona
 * May 21 (Mon): Tech-Mgt F2F

Rotating positions and absences
Maniphest query for deployment blocker tasks: https://phabricator.wikimedia.org/maniphest/?project=PHID-PROJ-fmcvjrkfvvzz3gxavs3a&statuses=open%28%29&group=none&order=newest#R

Jan 15 and Jan 22

 * Train: Tyler
 * wmf.17
 * wmf.18 - NO TRAIN THIS WEEK
 * SoS: Mukunda
 * Out
 * Jan 15 (Mon): Martin Luther King Day (All US Staff)
 * Jan 22/23: Dev Summit
 * Jan 24: Tech Management F2F
 * Jan 25/26: WMF All Hands

Jan 29 and Feb 05

 * Train: Chad
 * wmf.19 - NO TRAIN THIS WEEK
 * wmf.20
 * SoS: Tyler
 * Out
 * Jan 29-31: Team offsite

Feb 12 and Feb 19

 * Train: Mukunda
 * wmf.21
 * wmf.22
 * SoS: Chad
 * Out:
 * Feb 19 (Mon): President's Day (All US Staff)

Scrum of Scrums

 * Greg to copy to etherpad after meeting: https://etherpad.wikimedia.org/p/Scrum-of-Scrums

Release Engineering

 * Blocking
 * None?
 * Blocked
 * "Stack overflow when Redis is down" - https://phabricator.wikimedia.org/T185055
 * Need help from Operations and/or Performance
 * Updates
 * Catching up the train this week and rolling out the last version before DevSummit/All Hands and RelEng team offsite weeks. [wiki[email]]
 * https://phabricator.wikimedia.org/T180749#3897321
 * We moved Wednesday morning’s SWAT window 1 hour earlier (to 10am) to give us an hour break before the new MW version rolls to second set of wikis (all non-wikipedias) which was a follow-up from a recent post-mortem. [wiki][email]
 * https://lists.wikimedia.org/pipermail/wikitech-l/2018-January/089404.html
 * https://phabricator.wikimedia.org/T182733
 * We broke git-fat deploy repos in scap (old config no longer valid), workaround/fix available in all relevant repos.
 * https://phabricator.wikimedia.org/T184882#3899710
 * (Yes, we’re re-doing how the CI for scap is done, see: https://phabricator.wikimedia.org/T184628 )
 * Updated the Debian packaging for Zuul (CI task scheduler) and released 2.5.0-8-gcbc7f62-wmf6, unblocking an upgrade of Gerrit.
 * https://phabricator.wikimedia.org/T158243
 * Converted our home-grown docker image builder to `docker-pkg` from Giuseppe
 * https://phabricator.wikimedia.org/T177276
 * Getting started with the basics of planning our team offsite pre Barcelona Hackathon. Submitted travel request form and let eng-admin@ know.
 * Working on browser tests with Search (“selenium-CirrusSearch-jessie daily Jenkins job”).

Last week

 * Blocking
 * None?
 * Blocked
 * ops: zuul package update (blocks gerrit upgrade)
 * ops: node-tunnel-agent package update (blocks moving node testing to docker in ci)
 * Updates
 * 2 weeks of normal MediaWiki deploys (this and next) followed by 2 weeks of no MediaWiki but SWATs as needed (DevSummit/All Hands followed by RelEng team offsite)
 * Currently building nightlies of Mediawiki on the new “releases” (aka non-CI) Jenkins host. Working with Security on best way to handle security patches (which is the goal, to ensure security patches stay cleanly applicable).

Puppet SWAT

 * list of patches you want to submit to Puppet SWAT

Logspam \ Last week's train updates

 * https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Roles#Train_Conductor

** The problem was apparently fixed on Friday but it missed the window of opportunity for deploying the fix during the week. ** Monday was a US holiday ** Therefore, 1.31.0-wmf.16 is finally to be deployed on Tusday January 16th just as wmf.17 is being cut from master.
 * Train was rolled back on thursday due to a critical bug introduced by the new Revision storage infrastructure: https://phabricator.wikimedia.org/T184749

Q3 goal/project check-in

 * All of it in table form: https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Goals/201718Q3

Program 1: Outcome 5: Milestone 1: Develop and migrate to a JavaScript-based browser testing stack

 * Due: End of this quarter
 * What: Specific improvements to the now canonical framework, see: task T182421, notably:
 * Upgrade webdriverIO to version 4.9
 * Investigate replacing nodemw with mwbot
 * Video recording for Selenium tests in Node.js
 * Task:


 * T175179 Create selenium-CirrusSearch-jessie daily Jenkins job
 * Talked with David Causse and we decided that he will rewrite smoke tests (just a few tests in one file) from Cucumber to Mocha.
 * The WIP commit is created https://gerrit.wikimedia.org/r/#/c/381785/
 * Job configuration is ready: https://gerrit.wikimedia.org/r/#/c/398030/
 * Test job is created and working: https://integration.wikimedia.org/ci/view/Selenium/job/selenium-CirrusSearch-jessie-381785/

Program 1: Outcome 5: Objective 1: Maintain existing shared Continuous Integration infrastructure

 * Goals
 * Draft requirements for a Kubernetes based solution for CI -
 * Migrate MediaWiki PHPUnit tests to Shipyard (docker-based CI) (~40% of Nodepool usage) -
 * Unify production and CI docker image build process -



Program 3: Outcome 1: Objective 2: Identify and find stewards for high-priority/high use code segment orphans

 * Due: End of quarter


 * SLAs defintion
 * the SLA structure is currently dependent on task/bug priority. As this is not a consistently used attribute, basing SLAs on it could be problematic.  Started dialog with Andre to see what alternatives we might have (such as severity) to segment the bugs into manageable sizes.


 * Stewardship definition is being reviewed by Toby and Victoria. Desire is to get their support in rolling this out across WMF.

Program 3: Outcome 2: Objective 2: Define and implement a process to regularly address technical debt across the Foundation

 * Due: End of quarter


 * restarted work on TechDebt blog post series. Targeting 1/18 for review and following week for publish.

Program 3: Outcome 2: Objective 3: Promote and surface important technical debt topics at large gatherings of Wikimedia developers (e.g., DevSummit and Hackathon(s))

 * Due: End of next quarter

no progress

Program 6: Outcome 2: Objective 2: Set up a continuous integration and deployment pipeline

 * Due: End of this quarter
 * Keyword: SSD
 * phab project: https://phabricator.wikimedia.org/project/view/2453/
 * Goal:
 * Verify basic functionality of 'production' deployment and image (initially targeting mathoid):
 * Functional PoC within integration in the deployment-pipeline
 * Deploy to isolated k8s


 * Minikube packaging going slowly but happening (may need to pair with Dan at some point)

Program 1: Outcome 1: Objective 1: Scap (Tech Debt Sprint FY201718-Q2)

 * workboard

Program 1: Outcome 6: Milestone 2: Maintain Phabricator

 * Streamline logspam workflows by adding some integration with phabricator
 * Store git-lfs (and other phab uploads) in swift:

Other work
*New service reviews and the review queue **One of the outcomes of a recent post portem review meeting was the desire to better understand what we currently due to review new components/extensions/services prior to their first deployment to production. In addition to the initial review, I am also investigating what ongoing reviews are done to deployed components/extensions/services. *** Started conversation with Marko and Daniel on this topic. Goal is to see if "active stewardship" should be one of the pre-deployment requirements.

Team Kanban Board Review and Triage

 * closed and touched in the 7 days
 * No update for 4 weeks
 * No update for 3 weeks
 * No update for 2 weeks
 * No update for 1 week
 * All Open
 * Review To Triage column of #releng
 * Assigned
 * Unassigned

Once / month-ish review of backlog(s)

 * releng Review To Triage column of #releng
 * releng-kanban Review unassigned in kanban
 * releng-kanban Review 'backlog' colum of -kanban
 * releng-next - Review for things we need to put on our kanban backlog
 * releng-backlog - oh my, the huge backlog of things...

Kanban stats

 * Burnup chart