Wikimedia Release Engineering Team

This is the team responsible for Release Engineering/Management and QA at the Wikimedia Foundation. We predominately use the QA mailing list along with conversations on the appropriate IRC channels (especially, , , and ).

= Status =

= Quarterly Reviews =
 * April 2014
 * February 2014
 * November 2013
 * August 2013

= Backlog/Wishlist =

We maintain a list of projects that would be great to have done, but are not on our roadmap in in the near term. See the wishlist here.

= Meetings =

See /Meetings for notes from the Showcase meetings of the Release and QA Team.

= Quarterly Progress =

Feb - Apr '14 Goal Progress
See the WMF Engineering 13-14 goals page for the yearly view.


 * Process through all (useful) pain points from the Dev/Deploy review session (Greg)
 * Scap incremental improvements
 * 1) Refactor existing scap scripts to enhance maintainability and reveal hidden complexity of current solution (Bryan)
 * 2) create matrix of tool requirements per software stack (MW, Parsoid, ElasticSearch) (Greg)
 * 3) Use above matrix to add/fix functionality in scap (or related) tooling for ONE software stack, prioritized by cross stack use (Bryan)
 * Use the API to create test data for given tests at run time. (Jeff, Chris, Željko)
 * Create the ability to test headless (Željko, Jeff, Chris)
 * Run versions of tests compatible with target test environments (Chris, all)
 * Make database(s) in Beta Cluster emulate production (set up db slaves) (Antoine)

Greg

 * - Update/create new MW Release Management RFP
 * - Process through pain points
 * - security patches
 * - LD SWAT team
 * - config changes not being deployed after merge
 * - Refine deployment system requirements with Bryan
 * Deployment_tooling/Notes/Deployment_system_requirements

Bryan

 * Scap refactor/python port
 * Setup test environment in beta (carried forward from Feb & March; much harder than I'd hoped)
 * ✅ Scap functionality available from deployment-bastion including fanout rsync and l10n rebuilds
 * ✅ Jenkins job running scap after each beta-code-update job finished
 * ✅ Remove NFS dependency from beta for MediaWiki deploys
 * ✅ Convert hand-built Jenkins jobs to CI/JJB
 * ✅ Use trigger publisher instead of trigger-builds builder (fixed deadlock problem on Jenkins slave)
 * Continue to enhance and simplify scripts
 * ✅  Compile wikiversions.json to cdb on deploy server
 * ✅ Allow hyphen (-) in dsh hostnames
 * ✅ Fix IRCSocketHandler to work from Jenkins
 * ✅ Make logging destinations configurable
 * ✅ Improve error message when scap lock fails
 * ✅ Exit with non-zero status on soft failure (Helps Jenkins jobs identify partial failures)
 * ✅ Convert scap-rebuild-cdbs to python
 * ✅ Convert mw-update-l10n to python
 * Build .mw-git-info.json caches during scap and support in GitInfo.php
 * Use trebuchet to deploy scap scripts
 * Add scap/scap trebuchet target
 * Provision scap scripts using trebuchet
 * Next generation deployment tooling
 * Organize requirements into wiki page (carried forward from Feb & March; pairing with Greg)
 * Support train deploys

Chris
Continuing from Feb:


 * Test and announce general availability of feature to check for ResourceLoader upon page load: ResourceLoader error checking more globally.


 * Test and announce general availability of feature to use the API to create wiki pages and users: create-page (and create-user) API.


 * Investigate how release branches are described in make-deploy-notes
 * Tracking this at https://bugzilla.wikimedia.org/show_bug.cgi?id=62509 . I have an idea about how to make this work.

Also:


 * Update documentation on mw.o to reflect the refactoring and consolidation of recent times. Much of what exists now is long out of date.
 * Also see https://www.mediawiki.org/wiki/Browser_testing/shared_features

Greg

 * - Update/create new MW Release Management RFP


 * - Process through pain points
 * - security patches
 * - LD SWAT team
 * - config changes not being deployed after merge


 * - Refine deployment system requirements with Bryan
 * Deployment_tooling/Notes/Deployment_system_requirements


 * - create blockdiag version of flow chart
 * commons:File:Wikimedia_development_and_deployment_flowchart.png
 * https://gerrit.wikimedia.org/r/#/c/116750/


 * LD SWAT team
 * - Kick off
 * - PushBot? - NO (without more dev than Greg can provide right now)


 * - quarterly post-mortem review
 * See the review

Antoine

 * Fix up VisualEditor browser tests and make it voting in Gerrit
 * Makes CirrusSearch browser tests voting
 * ✅ Publish sphinx documentation (use jenkins jobs). Examples:
 * https://doc.wikimedia.org/mw-tools-releng/html/devdeployflow/
 * https://doc.wikimedia.org/mw-tools-scap/docs/_build/html/
 * Migrate beta cluster from pmtpa to eqiad!
 * ✅ Varnishes instances creation + puppet passing
 * ✅ Application servers created + puppet passing
 * ✅ Apache config in git (operations/apache-config.git branch: betacluster)
 * ✅ Bunch of files / git repos copied from pmtpa to eqiad
 * ✅ Sean Pringle to create MariaDB
 * ✅ Ariel Glenn to set up the swift emulator (copy pasted instance)
 * ✅ last minute sync of files
 * ✅ add Jenkins slaves to sync Parsoid and MW code
 * ✅ write jobs to update Parsoid and MW code on eqiad

Bryan

 * Scap refactor/python port
 * ✅ Extract common logic for making a command line interface into a class
 * ✅ Fixed sync-wikiversions to use common dsh arguments
 * ✅ Converted sync-wikiversions to python
 * ✅ Invented scap-purge-l10n-cache script to cleanup l10n cache on unused branches
 * ✅ Documented process for retiring a branch from tin
 * Setup test environment in beta (carried forward from Feb; harder than I'd hoped)
 * ✅ Salt master in eqiad beta project
 * ✅ Puppet master in eqiad beta project
 * ✅ logstash host in eqiad beta project
 * ❌ scap host in eqiad beta project (did this and then nuked it)
 * Continue to enhance and simplify scripts
 * Next generation deployment tooling
 * Organize requirements into wiki page (carried forward from Feb; pairing with Greg)
 * Support train deploys
 * ✅ Deploy 1.23wmf16 to group1
 * ✅ Deploy 1.23wmf17 to group0
 * ✅ Deploy 1.23wmf17 to group1
 * ✅ Cleanup old l10n cache files on cluster
 * ✅ Deploy 1.23wmf18 to group0
 * ✅ Deploy 1.23wmf18 to group1
 * ✅ Deploy 1.23wmf20 to group0

Chris
Continuing from Feb:


 * Test and announce general availability of feature to check for ResourceLoader upon page load: ResourceLoader error checking more globally.


 * Test and announce general availability of feature to use the API to create wiki pages and users: create-page (and create-user) API.


 * Investigate how release branches are described in make-deploy-notes
 * Tracking this at https://bugzilla.wikimedia.org/show_bug.cgi?id=62509 . I have an idea about how to make this work.

Also:


 * Update documentation on mw.o to reflect the refactoring and consolidation of recent times. Much of what exists now is long out of date.

Greg

 * Process through pain points
 * security patches
 * Kick off LD SWAT team
 * next...
 * create blockdiag version of flow chart
 * commons:File:Wikimedia_development_and_deployment_flowchart.png
 * https://gerrit.wikimedia.org/r/#/c/116750/
 * Refine deployment system requirements with Bryan
 * Deployment_tooling/Notes/Deployment_system_requirements
 * quarterly post-mortem kickoff
 * set up morgue?

Antoine

 * Complete integration of browsertests for VisualEditor
 * One build worked and triggered two successful scenario!!
 * Parsoid is stopped by the job now, was not previously :/
 * ✅ Integration of browsertests for CirrusSearch. Cause me to slightly rethink the browsertests infra to closely match production.
 * ✅ Train Zeljkof on Jenkins Job Builder script
 * Zeljkof started on it. Will pair with him to finish up.
 * Zeljkof found his way through JJB arcanes \O/
 * ✅ Help migrating Cloudbees Jenkins template to JJB YAML templates
 * Zeljkof started on it. Will pair with him to finish up.
 * Zeljkof found his way through JJB arcanes \O/

Bryan

 * Scap refactor/python port
 * ✅ Local test environment in a MW-Vagrant instance
 * ✅ scap converted to python
 * ✅ scap-1 converted to python
 * ✅ add detailed duration timing for scap & scap-1
 * ✅ remove external script dependencies from scap-1
 * ✅ add progress bar for dsh commands
 * ✅ Scap logs in json format for easy parsing
 * ✅ Scap logs sent to florine via udp2log
 * ✅ Scap logs sent to logstash via udp2log
 * ✅ Converted mwversionsinuse to python
 * ✅ Deleted obsolete scripts: scap-1, scap-2, find-nearest-rsync, scap-old
 * Setup test environment in beta (will pair with Antione)
 * Continue to enhance and simplify scripts
 * Next generation deployment tooling
 * ✅ gather requirements via etherpad and Ops-l mailing list
 * Organize requirements into wiki page (will pair with Greg)
 * Support train deploys
 * ✅ Deploy 1.23wmf16 to group0

Chris

 * Investigate how release branches are described in make-deploy-notes


 * Investigate using ResourceLoader error checking more globally.


 * Refactor tests to use the create-page (and create-user) API.

Dec-Feb Goal Progress
See the WMF Engineering 13-14 goals page for the yearly view.


 * - Browser tests managed in feature repos with feature teams (Chris, Zeljko, Jeff, Rummana)
 * - Successfully managed the first release of MediaWiki in conjunction with our outside contractor (Greg, Antoine)
 * - More comprehensive quarter assessments of postmortems (Greg)
 * - Create process documentation for ideal test/deployment steps (Greg, Reedy, Chris, and others)
 * Automated API integration tests in important areas (Chris, Zeljko, Jeff, Rummana, Antoine, also with Mark Holmquist)
 * - UploadWizard
 * - Parsoid / VisualEditor
 * - ResourceLoader

Greg

 * do post Dev and Deploy process review follow up
 * post images to Commons - - included in commons:Category:Wikimedia_Foundation_software_development
 * send one email with list of grouped red-cards
 * send off first email on first topic (make it a good one)
 * Create plan to evaluate progress on postmortem/retrospective actions
 * archive/put on wikitech missing post-mortems -
 * https://wikitech.wikimedia.org/wiki/Incident_documentation/20140203-LVS
 * create BZ whiteboard entry to track retrospective bugs -
 * RT something or other?

Antoine

 * Complete integration of browsertests for VisualEditor
 * Train Zeljkof on Jenkins Job Builder script
 * Help migrating Cloudbees Jenkins template to JJB YAML templates

Chris

 * Refactor tests to use API article creation
 * Continue creating a suite of tests for local environments
 * Requires the API gem in mediawiki-selenium 1.20


 * Finish headless Xvfb integration
 * In Jenkins, investigate pulling master branch for beta labs builds but pull release branch for test2wiki builds
 * Follow up on ResourceLoader error reporting

Jeff

 * New task: Create browser test for VisualEditor availability on production wikis
 * https://bugzilla.wikimedia.org/show_bug.cgi?id=60797
 * Add browser tests for using VisualEditor via Mobile UI
 * https://bugzilla.wikimedia.org/show_bug.cgi?id=60290
 * Continuing to work getting Jenkins (CloudBees) versions of VE automated tests from red to green
 * Clean-up work on browser test for the "Nearby" page for the Mobile team
 * https://bugzilla.wikimedia.org/show_bug.cgi?id=58720
 * Working on a request from Antoine to setup Visual Editor browser test triggers
 * https://bugzilla.wikimedia.org/show_bug.cgi?id=53691
 * Continuing work on adding browser tests for outstanding VE regression items
 * https://www.mediawiki.org/wiki/Quality_Assurance/VisualEditor_browser_regression_test_backlog

Greg

 * Prep for Dev and Deploy process meeting (Jan 22nd)
 * Create baseline flowchart of dev/deploy documentation for use in the January in-person meeting -
 * the chart
 * refinements (ie: suggestions from Chris) -
 * write up final agenda/notes, share before meeting (clean up notes from Robla) -
 * make physical version of flowchart -
 * do post-review followup -
 * Create plan to evaluate progress on postmortem/retrospective actions
 * archive/put on wikitech missing post-mortems -
 * create BZ whiteboard entry to track retrospective bugs -

Antoine

 * For January: got to prepare myself for the MediaWiki summit (two weeks left + one week summit)
 * Mostly focused on CI (Zuul upgrade, new jenkins jobs..) last week.
 * ✅ Parsoid self update on beta cluster via a Jenkins job.
 * See 'Parsoid update' on the CI dashboard https://integration.wikimedia.org/dashboard/
 * ✅ Parsoid job migrated to new repos mediawiki/services/parsoid and mediawiki/services/parsoid/deploy
 * VE browser tests from Gerrit. Waiting for them to be passing with (a fresh wiki + phantomjs)
 * ✅ mediawiki/extensions.git out of sync since Jan 2nd

Chris

 * Bring about monitoring uploads in production with existing test, with failure messages emailed to QA staff Željko and Antoine to discuss hiding the password properly.
 * Create a suite of local bare-wiki tests to use PhantomJS in WMF Jenkins
 * https://bugzilla.wikimedia.org/show_bug.cgi?id=60347
 * Only one existing test under/qa/browsertests passes in a bare wiki
 * Use the API to create test data (e.g. a wiki page) on a target wiki Jeff working on https://gerrit.wikimedia.org/r/#/c/106548/ .  Jeff and Chris paired Jan 24 to make progress. Commit is pending unexpected auth issue:  https://bugzilla.wikimedia.org/show_bug.cgi?id=60407
 * Continue contributing code and review to Mobile tests paired with Arthur on https://gerrit.wikimedia.org/r/#/c/106833/, etc.
 * Continue contributing code and review to Flow tests Reported/fixed upstream Selenium bug, continuing...
 * Monitor fatal errors in beta labs, send email alerts
 * merged a monitor script, needs to be cron'd/puppet'd

Jeff

 * New task: add browser tests for using VisualEditor via Mobile UI
 * https://bugzilla.wikimedia.org/show_bug.cgi?id=60290
 * Worked with Aaron Arcos to create initial browser test for MultimediaViewer repo
 * Creating scripts for adding new wiki users and articles via Mediawiki API
 * https://bugzilla.wikimedia.org/show_bug.cgi?id=58939
 * Continuing to work getting Jenkins (CloudBees) versions of VE automated tests from red to green
 * Clean-up work on browser test for the "Nearby" page for the Mobile team
 * https://bugzilla.wikimedia.org/show_bug.cgi?id=58720
 * Working on a request from Antoine to setup Visual Editor browser test triggers
 * https://bugzilla.wikimedia.org/show_bug.cgi?id=53691
 * Continuing work on adding browser tests for outstanding VE regression items
 * https://www.mediawiki.org/wiki/Quality_Assurance/VisualEditor_browser_regression_test_backlog

Greg

 * Create baseline flowchart of dev/deploy documentation for use in the January in-person meeting -
 * the chart
 * Create plan to evaluate progress on postmortem/retrospective actions

Antoine

 * jenkins job for VisualEditor / Parsoid (reporting to James F / Gabriel Wicke)
 * making sure a change in VE or in Parsoid does not break the other since they are tightly coupled
 * ✅ parsoid init script doesn't play well when run over ssh
 * Worked on, wrote an upstart job to wrap around parsoid server + logrotate configuration
 * browser tests in Gerrit.
 * ✅ First with ULS cause it is simpler, pairing with Zeljkof and i18n team
 * Second MobileFrontend https://gerrit.wikimedia.org/r/#/c/97497/
 * later VisualEditor
 * ✅ polishing up the mw release tarball job (was )

Slowdowns:
 * deployed bunch of favicons for Google Code-in
 * helped on gwtoolset, an extension to mass import materials from museum libraries
 * bunch of CI changes to make jobs running in parralel

Chris

 * Get everyone interested in API testing and monitoring pulling together: Multimedia + Jenkins + QA
 * Met with Mark Holmquist and Aaron Aarcos Dec 5
 * Progress: https://bugzilla.wikimedia.org/show_bug.cgi?id=58555
 * Test is merged and is pending adding to Jenkins https://gerrit.wikimedia.org/r/#/c/102603/
 * Move login method to shared code (Željko working on this right now), demonstrate improved pass rates pairticularly for IE
 * In process, interim commit is https://gerrit.wikimedia.org/r/#/c/100579/
 * Upgraded login method and corrected other issues affecting IE pass rate, builds are significantly more green, will pass on benefit to other repos when consolidated
 * Concrete steps for Mobile QA after Michelle's departure
 * Chris to support automation effort, Chris/Jeff/Rummana to contribute as requested, e.g. post-deployment checking
 * Chris is monitoring the backlog of mingle cards for tests immediately. Next step is to collaborate with Mobile devs for regression tests like https://gerrit.wikimedia.org/r/#/c/103761/
 * Browser test coverage for Flow
 * Coverage ongoing
 * Continuing working with S on refinements

Jeff

 * Creating scripts for adding new wiki users and articles via Mediawiki API
 * https://bugzilla.wikimedia.org/show_bug.cgi?id=58939
 * Continuing to work getting Jenkins (CloudBees) versions of VE automated tests from red to green
 * Clean-up work on browser test for the "Nearby" page for the Mobile team
 * https://bugzilla.wikimedia.org/show_bug.cgi?id=58720
 * Working on a request from Antoine to setup Visual Editor browser test triggers
 * https://bugzilla.wikimedia.org/show_bug.cgi?id=53691
 * Continuing work on adding browser tests for outstanding VE regression items
 * https://www.mediawiki.org/wiki/Quality_Assurance/VisualEditor_browser_regression_test_backlog

Rummana

 * Regular exploration testing on betalabs and test2
 * Verifying each week's VE deployment
 * Verifying resolved bugs
 * Verifying new copy-paste implementation
 * Track the already reported bugs on bugzilla and changing their status accordingly

= Check-ins = These are here just for historical reasons, we no longer use this format.
 * /Checkin-20131202
 * /Checkin-20131119
 * /Checkin-20131105
 * /Checkin-20131008
 * /Checkin-20130924
 * /Checkin-20130917
 * /Checkin-20131022
 * /Checkin-20130910
 * /Checkin-20130903