Wikimedia Platform Engineering/Site performance and architecture

Rationale
Many small architectural changes and improvements are being done all of the time without a lot of fanfare. This is a general activity area where we communicate changes made along these lines.

April-June 2013

 * JobQueue improvements
 * Eqiad migration wrapup
 * Migrate fenari to tin.eqiad.wmnet
 * Migration to Ceph - still running sync scripts, possible split-brain issues with memcache
 * Migrate hume to terbium.eqiad.wmnet

Mysterious future
As yet unscheduled work for the (hopefully) near term:

Deployment sprint
We plan to put the items below in a deployment infrastructure sprint sometime between July and December 2013: Kill deployment hacks with fire
 * - mwscript.php/mctest.php does not know about memcache in both datacenters
 * Database config cleanup -- multisite awareness in MediaWiki
 * git-deploy/sartoris
 * Better 500 error/PHP exception monitoring
 * - resetUserTokens.php not usable on large wikis
 * Improve file syncing with production on Apaches
 * Make updates atomic (e.g. symlink + directory move tricks)
 * Reconciling the use of timestamps on Javascript files (rsync vs ResourceLoader vs git)
 * live hacks that are still applied as of 2013-05-16

Shell automation sprint
As yet completely unscheduled
 * - Enable importing across all Wikimedia projects

Documents

 * Task management: Bugzilla
 * Release management plan:
 * Communications plan: