Wikimedia Platform Engineering/Site performance and architecture

Rationale
Many small architectural changes and improvements are being done all of the time without a lot of fanfare. This is a general activity area where we communicate changes made along these lines.

April-June 2013

 * JobQueue improvements
 * Eqiad migration wrapup
 * Migrate fenari to tin.eqiad.wmnet
 * Migration to Ceph - still running sync scripts, possible split-brain issues with memcache
 * Migrate hume to terbium.eqiad.wmnet

Mysterious future
As yet unscheduled work for the (hopefully) near term:

Deployment sprint
We plan to put the items below in a deployment infrastructure sprint sometime between July and December 2013:
 * - mwscript.php/mctest.php does not know about memcache in both datacenters
 * Database config cleanup -- multisite awareness in MediaWiki
 * - git-deploy/sartoris
 * depends on (auditing salt scripts for completeness)
 * - Better 500 error/PHP exception monitoring
 * - resetUserTokens.php not usable on large wikis
 * - Some improvements for the deployment scripts
 * Improve file syncing with production on Apaches
 * - Make updates atomic (e.g. symlink + directory move tricks or git-deploy?)
 * - Reconciling the use of timestamps on Javascript files (rsync vs ResourceLoader vs git)
 * Kill deployment hacks with fire
 * live hacks that are still applied as of 2013-05-16
 * Beta related:
 * - allowing extensions to be run from not master
 * - migrate scripts from hume to terbium

Performance sprint

 * - resetUserTokens.php not usable on large wikis
 * - Rewrite jobs-loop.sh in a proper programming language
 * - Separate Cache-Control header for proxy and client

Shell automation sprint
As yet completely unscheduled
 * - Enable importing across all Wikimedia projects

Documents

 * Task management: Bugzilla
 * Release management plan:
 * Communications plan: