Development process improvement/2014-01-22/Pain points

Being Processed

 * - security patches (on list)
 * - SWAT team
 * - config changes not being deployed after merge (on list)
 * https://rt.wikimedia.org/Ticket/Display.html?id=7427
 * - WMF product should be consulted on some shellbugs

Inter-team Collaboration

 * Product review doesn't always happen
 * Getting security review can take a long time
 * WMF product should be consulted on some shellbugs

Task/Bug/Story tracking

 * Keeping non-Bugzilla tracking systems (Mingle/Trello) synced with Bugzilla is hard
 * Sometimes shellbug requests bypass bugzilla
 * How do we know that a shellbug request has consensus

Deployment software and configuration (eg: scap, sartoris)

 * Security patches don't always get reapplied when extensions are redeployed
 * Beta cluster can be broken by a production config change
 * External software dependencies keep some software from riding the train

Deployment process/cultural norms

 * People sometimes merge wmfconfig changes without deploying
 * Some teams/products don't ride the train
 * Error apathy. Lots of known bugs that nobody is fixing ("Meh. That error is always there or ignore it")
 * Time between merge and release branch cut can be 1m to 1w.
 * "Minor" changes deployed outside windows
 * Sometimes people deploy during reserved deploy windows that they don't own
 * Need for backwards compatibility with schema changes limits velocity
 * Instrumentation is not sufficent for continuous deployment
 * Bug fixes don't roll out quickly enough

(automated) Testing

 * Unit test coverage is inadequate across features and projects.
 * Browser/full stack tests are effective, but we rely on them too much
 * Our "test pyramid" is upside-down: http://martinfowler.com/bliki/TestPyramid.html.
 * No facility for pre-merge full stack tests
 * Browser tests are slow (and always will be, even at their fastest)
 * And Cloudbees is flaky, and lots of other known problems with browser tests. See: https://www.mediawiki.org/wiki/Browser_testing/architecture
 * We don't test integration across repos at branch cut time (extensions with core, config with extensions; not an easy task)
 * Could run browser tests on branch cut. Integration/API tests would be useful.
 * Labs configuration is not like production
 * Setting up a complex wiki environment in Labs is often manual/difficult
 * Can't easily run automated browser tests against Vagrant. Improvements to this in process now:  https://bugzilla.wikimedia.org/show_bug.cgi?id=58939
 * Bootstrapping a wiki on Vagrant isn't automated

Other

 * No official Vagrant maintainer
 * Gerrit's workflow is "not like github"