Development process improvement/2014-01-22/Notes

From mediawiki.org

Pain Points[edit]

Inter-team Collaboration[edit]

  • Product review doesn't always happen
  • Getting security review can take a long time
  • WMF product should be consulted on some shellbugs

Task/Bug/Story tracking[edit]

  • Keeping non-Bugzilla tracking systems (Mingle/Trello) synced with Bugzilla is hard
  • Sometimes shellbug requests bypass bugzilla
  • How do we know that a shellbug request has consensus

Deployment software and configuration (eg: scap, sartoris)[edit]

  • Security patches don't always get reapplied when extensions are redeployed
  • Beta cluster can be broken by a production config change
  • External software dependencies keep some software from riding the train

Deployment process/cultural norms[edit]

  • People sometimes merge wmfconfig changes without deploying
  • Some teams/products don't ride the train
  • Error apathy. Lots of known bugs that nobody is fixing ("Meh. That error is always there or ignore it")
  • Time between merge and release branch cut can be 1m to 1w.
  • "Minor" changes deployed outside windows
  • Sometimes people deploy during reserved deploy windows that they don't own
  • Need for backwards compatibility with schema changes limits velocity
  • Instrumentation is not sufficent for continuous deployment
  • Bug fixes don't roll out quickly enough

(automated) Testing[edit]

  • Unit test coverage is inadequate across features and projects.
  • No facility for pre-merge full stack tests
  • Browser tests are slow (and always will be, even at their fastest)
  • We don't test integration across repos at branch cut time (extensions with core, config with extensions; not an easy task)
    • Could run browser tests on branch cut. Integration/API tests would be useful.
  • Labs configuration is not like production
  • Setting up a complex wiki environment in Labs is often manual/difficult
  • Can't easily run automated browser tests against Vagrant. Improvements to this in process now: https://bugzilla.wikimedia.org/show_bug.cgi?id=58939
  • Bootstrapping a wiki on Vagrant isn't automated

Other[edit]

  • No official Vagrant maintainer
  • Gerrit's workflow is "not like github"

Deploy Train[edit]

  • "Most" things ride the train
  • But lots of things go as Lightning deploys
    • Is it broken in prod?
    • Is it going to break prod?
  • And then there is Parsoid...

Wants[edit]

  • Block commit from production unless a related commit is in production (from Core or Extension)
    • Has bitten Cirrus on more than one occasion; primarily on the old branch
    • Would be nice to automate a "-2 until other change merges" workflow (used by VE)
    • Backports suffer from same/sililar problem and it's possibly exacerbated
  • Integrate browser tests with Jenkins (CI is working on this; browser tests being slow is a problem)
  • Replace (most of) lightning deploys with a task force of rotating deployers that gathers bug fixes and deploys them during a daily window
    • Hopefully makes nominating things for fast deployments more egalitarian
  • Visual regression testing. We have spiked this using Sikuli but the value seems low for now at least.

Investigate[edit]