Wikimedia Release Engineering Team/Checkin archive/20180326

= 2018-03-19 =

Vacations/Important dates

 * https://office.wikimedia.org/wiki/HR_Corner/Holiday_List
 * How to do it


 * Mar 26-29 (week since WMF holiday Fri): thcipriani vacation
 * Mar 30 (Fri): WMF Holiday
 * April 2: Željko (Holidays in Croatia - Easter Monday)
 * Apr 3-13: Greg vacation
 * April 16 (Mon): WMF Holiday
 * May 1: Željko (Holidays in Croatia - Labor Day / May Day)
 * May 14-17: Team offsite in Barcelona
 * May 18-21: Wikimedia Hackathon in Barcelona
 * May 21 (Mon): Tech-Mgt F2F
 * May 31: Željko (Holidays in Croatia - Corpus Christi)

Train

 * Maniphest query for deployment blocker tasks: https://phabricator.wikimedia.org/maniphest/?project=PHID-PROJ-fmcvjrkfvvzz3gxavs3a&statuses=open%28%29&group=none&order=newest#R


 * Feb 19 - wmf.22 - Mukunda
 * Feb 26 - wmf.23 - Tyler
 * Mar 05 - wmf.24 - Tyler
 * Mar 12 - wmf.25 - Chad
 * Mar 19 - wmf.26 - Chad
 * Mar 26 - wmf.27 - Mukunda <
 * Apr 02 - wmf.28 - Mukunda
 * Apr 09 - wmf.29 - Tyler
 * Apr 16 - wmf.30 - Tyler

SoS

 * Feb 19 - Chad
 * Feb 26 - Mukunda
 * Mar 05 - Mukunda
 * Mar 12 - Tyler
 * Mar 19 - Tyler
 * Mar 26 - Chad <
 * Apr 02 - Chad
 * Apr 09 - Mukunda
 * Apr 16 - Mukunda

Scrum of Scrums

 * Greg to copy to etherpad after meeting: https://etherpad.wikimedia.org/p/Scrum-of-Scrums

Release Engineering

 * Blocking
 * Blocked
 * Updates
 * Updates
 * Updates

Release Engineering

 * Blocking
 * Blocked
 * Updates
 * Minor Gerrit upgrade planned for this week (2.14.6 -> 2.14.7)
 * Incident analysis started last week of the last year’s worth of incidents reports
 * Scap 3.7.7 should be rolled out to production this week
 * Quarterly goal dependency update:
 * Continue improving the ways that users can download articles of interest for later consumption
 * Reading Web: Tech Ops/RelEng (work is currently blocked on https://phabricator.wikimedia.org/T187821 which is part of a larger epic https://phabricator.wikimedia.org/T181084)
 * Talked about in team meeting Monday
 * is there a task?
 * Talked about in team meeting Monday
 * is there a task?



Train status and happenings

 * https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Roles#Train_Conductor

Past week status updates

 * All of it in table form: https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Goals/201718Q3

Program 1: Outcome 5: Milestone 1: Develop and migrate to a JavaScript-based browser testing stack

 * Due: End of this quarter
 * What: Specific improvements to the now canonical framework, see: task T182421, notably:
 * Upgrade webdriverIO to version 4.9
 * Investigate replacing nodemw with mwbot
 * Video recording for Selenium tests in Node.js
 * Task:

Priority: high


 * T179188 Video recording for Selenium tests in Node.js - in progress - will do this week
 * T180144 Upgrade WebdriverIO to 4.12.0 - resolved
 * T181284 Replace nodemw with mwbot - in progress - almost done, updating documentation
 * T190426 Refactor AdvancedSearch browser tests which use nodemw module - in progress - helping AdvancedSearch team

Priority: normal


 * T164721 Run Selenium tests in CI for extensions - not started - CI changing
 * T180125 Refactor mediawiki-core-qunit-selenium-jessie Jenkins job so qunit/karma and webdriverio are invoked via npm script - not started
 * T180482 Create mediawiki-core-qunit-selenium-composer-jessie - not started
 * T182692 Document differences between Ruby and Node.js Selenium frameworks - not started - not hard to do, will do this week
 * T185011 Create selenium-MediaWiki-jessie daily Jenkins job - in progress
 * T188740 Retrospective for T139740 Port Selenium tests from Ruby to Node.js - resolved

Priority: low


 * T182412 Investigate if WebdriverIO `sync: false` would be useful to us and document how to use it - in progress - it could be useful for some tests, documentation pending, will do this week
 * T182691 Selenium tests should be easier to run - in progress - blocked by upstream bug
 * T183160 Sample code in Node.js for repositories that still have Selenium+Ruby tests - not started
 * T183162 Patches in Gerrit deleting Selenium+Ruby tests for repositories that still have them - not started
 * T185094 Update page object pattern in Selenium tests - in progress - done, but probably will not be implemented, discussion with upstream to revert to previous recommendation is probably the best thing to do
 * T187859 Move one Selenium tests from mediawiki/core to mediawiki/skins/Vector - in progress - blocked on understanding how it breaks Minerva
 * T188742 Should selenium-EXTENSION-jessie run for all repositores with Selenium tests? - not started - have to contact repository owners

Program 1: Outcome 5: Objective 1: Maintain existing shared Continuous Integration infrastructure

 * Goals
 * Draft requirements for a Kubernetes based solution for CI -
 * Migrate MediaWiki PHPUnit tests to Shipyard (docker-based CI) (~40% of Nodepool usage) -
 * Will be worked on after the long tail
 * Unify production and CI docker image build process -
 * ✅ 01/15

Program 3: Outcome 1: Objective 2: Identify and find stewards for high-priority/high use code segment orphans

 * Due: End of quarter

Pivoted on the stewardship review process. Working with delegates prior to engaging with Toby and Victoria. Scheduled standing review monthly with Toby and Victoria

Program 3: Outcome 2: Objective 2: Define and implement a process to regularly address technical debt across the Foundation

 * Due: End of quarter

worked on technical debt avoidance framework.

Program 3: Outcome 2: Objective 3: Promote and surface important technical debt topics at large gatherings of Wikimedia developers (e.g., DevSummit and Hackathon(s))

 * Due: End of next quarter

No activity

Program 6: Outcome 2: Objective 2: Set up a continuous integration and deployment pipeline

 * Due: End of this quarter
 * Keyword: SSD
 * phab project: https://phabricator.wikimedia.org/project/view/2453/
 * Goal:
 * Verify basic functionality of 'production' deployment and image (initially targeting mathoid):
 * Functional PoC within integration in the deployment-pipeline
 * Deploy to isolated k8s

thcipriani update
This is a severely long bit of notes about what I did last week so that you all can pick up where I left off...hopefully


 * We are sooo close to getting the PoC working, I was trying to build an image that worked that I could then puppetize
 * I ended up blocked on a few things, some of which were resolved over the weekend.
 * https://phabricator.wikimedia.org/T190584 upgrade docker agents to stretch
 * Needed for changes in systemd mostly (minikube shells out to it and says "Stopped" on Jessie)
 * Subtasks are mostly resovled

Creating a new minikube agent 1. Create a new machine in horizon named like: integration-slave-k8s-10XX 2. ssh to machine (have to wait a bit, puppet needs to run there) 3. Fix weird self hosted puppet issues (see https://www.mediawiki.org/wiki/Continuous_integration/Docker#Jenkins_Agent_Creation )
 * * sudo rm -fR /var/lib/puppet/ssl

* sudo mkdir -p /var/lib/puppet/client/ssl/certs * sudo puppet agent -tv * sudo cp /var/lib/puppet/ssl/certs/ca.pem /var/lib/puppet/client/ssl/certs * sudo puppet agent -tv 4. Apply the role role::ci::slave::labs::docker to the instance via horizon 5. sudo puppet agent -tv (this was failing last week see https://phabricator.wikimedia.org/T190584 ) 6. Setup minikube:

sudo apt-get install -y helm minikube kubernetes-client export MINIKUBE_WANTUPDATENOTIFICATION=false export MINIKUBE_WANTREPORTERRORPROMPT=false export MINIKUBE_HOME=$HOME export CHANGE_MINIKUBE_NONE_USER=true mkdir $HOME/.kube || true touch $HOME/.kube/config export KUBECONFIG=$HOME/.kube/config

sudo -E minikube start --vm-driver none --bootstrapper=localkube

7. Clone all necessary repos git clone https://gerrit.wikimedia.org/r/operations/deployment-charts git clone https://gerrit.wikimedia.org/r/mediawiki/services/mathoid

8. Build mathoid image cd mathoid blubber dist/pipeline/blubber.yaml production | docker build -t mathoid -f -.

9. Setup helm/tiller This is where I got stuck :( See: https://phabricator.wikimedia.org/T190589

10. helm install && helm test? Maybe? Didn't get this far :(

Program 1: Outcome 1: Objective 1: Scap (Tech Debt Sprint FY201718-Q2)

 * workboard


 * Worked with awight on git-lfs + scap

Program 1: Outcome 5: Objective 1: Maintain existing shared Continuous Integration infrastructure

 * https://phabricator.wikimedia.org/T189660
 * Fixed the phabricator-jessie-diffs job. Thanks to Antoine for identifying the problem.
 * Also improved the logging on failures so jenkins-bot will now comment with more useful info.

Program 1: Outcome 6: Milestone 2: Maintain Phabricator

 * Streamline logspam workflows by adding some integration with phabricator
 * Store git-lfs (and other phab uploads) in swift:


 * Finally got back into this during the second half of the week.
 * Found out that there is already a swift cluster in deployment-prep and started configuring phab.wmflabs.org to work with this shared swift cluster.

Other work
Selenium retrospective tool place last week. See: https://phabricator.wikimedia.org/phame/post/view/88/selenium_tests_in_node.js_project_retrospective/ Post Mortem on 20180129-MediaWiki Incident. See: https://etherpad.wikimedia.org/p/postmortem-20180129-MediaWiki_Incident Code Health Group Meeting: See: https://etherpad.wikimedia.org/p/codehealthgroup-20180321

Antoine

 * What I plan to do this week
 * Demo of quibble right now
 * Add experimental job to CI for mediawiki/core that would run some subset of phpunit/qunit/composer test/npm test and webdriver.io
 * What I'm blocked on
 * Patch for MediaWiki
 * https://gerrit.wikimedia.org/r/#/c/421500/ //Let built-in web server handle .php requests//
 * https://gerrit.wikimedia.org/r/#/c/419605/ //Let install.php detect and inject extensions// + backports
 * Could use a PHP 7.0 roadmap. Anyone knows who is in charge?
 * Other?
 * mediawiki/core suite fails on sqlite or when LANG is different from C.
 * I didn't know there were other LANGs ;-)

Chad

 * What I plan to do this week
 * abusefilter private logs / data pruning
 * gerrit missing branch thingie? I hate git
 * helm helm helm
 * MW general release planning?
 * What I'm blocked on
 * Other?
 * Other?

Dan

 * What I plan to do this week
 * Integrate new Blubber release into pipeline script
 * Publish a common policy file for Blubber to integration.wikimedia.org
 * Refactor scap's CI jobs to use blubber
 * Starting working on composer support in Blubber
 * What I'm blocked on
 * Re-review from Antoine on https://phabricator.wikimedia.org/D993
 * Other?
 * thcipriani: see update Program 6 Outcome 2 Objective 2 for where I left off last week...

Greg

 * What I plan to do this week
 * MW Release meeting as well
 * talking with Mark&Faidon re 'staging' tomorrow
 * apparently another budget [urgent] review item
 * Q4 team goals
 * SWAT changes
 * What I'm blocked on
 * Other?
 * Other?

Jean-Rene

 * What I plan to do this week
 * Finish up Q3 goal work re Technical Debt process
 * Q3 Stewardship review
 * What I'm blocked on
 * Other?
 * Other?

Mukunda

 * What I plan to do this week
 * Swift, Swift and train
 * more Swift
 * What I'm blocked on
 * n/a
 * Other?

Tyler

 * What I plan to do this week
 * Vacation
 * What I'm blocked on
 * Blocked? Baby I'm on vacation!
 * Other?
 * <3 you all -- have a good week (I posted an update in program 6)

Zeljko

 * What I plan to do this week
 * Should I move tasks marked not started to T182986 Selenium framework improvements?
 * Greg: yeah, I think so
 * T179188 Video recording for Selenium tests in Node.js
 * T190426 Refactor AdvancedSearch browser tests which use nodemw module
 * T182692 Document differences between Ruby and Node.js Selenium frameworks
 * T188740 Retrospective for T139740 Port Selenium tests from Ruby to Node.js
 * T185011 Create selenium-MediaWiki-jessie daily Jenkins job
 * T182412 Investigate if WebdriverIO `sync: false` would be useful to us and document how to use it
 * What I'm blocked on
 * T182691 Selenium tests should be easier to run - blocked by upstream or a new idea
 * T185094 Update page object pattern in Selenium tests - waiting to see if Timo will explain to upstream that they are doing it wrong
 * T187859 Move one Selenium tests from mediawiki/core to mediawiki/skins/Vector - blocked on understanding how it breaks Minerva
 * Other?
 * T190039 - CirrusSearch smoke selenium tests cause failures of mediawiki-core-qunit-selenium-jessie job for extensions - CI fixed
 * Will there be Q4 Selenium framework improvements?
 * Ordered Kinesis Advantage2 <3

Team Kanban Board Review and Triage

 * closed and touched in the 7 days
 * No update for 4 weeks
 * No update for 3 weeks
 * No update for 2 weeks
 * No update for 1 week
 * All Open
 * Review To Triage column of #releng
 * Assigned
 * Unassigned

Once / month-ish review of backlog(s)

 * releng Review To Triage column of #releng
 * releng-kanban Review unassigned in kanban
 * releng-kanban Review 'backlog' colum of -kanban
 * releng-next - Review for things we need to put on our kanban backlog
 * releng-backlog - oh my, the huge backlog of things...

Kanban stats

 * Burnup chart