Wikimedia Engineering/Report/2012/March

 Engineering metrics in March:
 * 98 unique committers contributed code to MediaWiki.
 * About code commits were reviewed.
 * The total number of unreviewed commits went from 31 to.
 * About 34 shell requests were processed.
 * developers got commit access, among which volunteers.
 * Wikimedia Labs now hosts projects,  instances and  users.

Major news in March include:
 * The completion of the [//blog.wikimedia.org/2012/03/09/transfer-of-wikipedia-sites-from-godaddy-complete/ move of all our domain names from registrar GoDaddy];
 * The [//blog.wikimedia.org/2012/02/15/wikimedia-engineering-moving-from-subversion-to-git/ move from Subversion to git] as our primary code versioning system;
 * The kick-off of [//blog.wikimedia.org/2012/03/16/project-ideas-students-and-mentors-wanted-gsoc-2012/ Wikimedia's participation to Google Summer of Code 2012];
 * The kick-off of [//blog.wikimedia.org/2012/03/16/project-ideas-students-and-mentors-wanted-gsoc-2012/ Wikimedia's participation to Google Summer of Code 2012];

Hover your mouse over the green question marks to see the description of a particular project.

Recent events

 * Chennai Hackathon March 2012 (17 March 2012, Chennai, India) — Yuvaraj Pandian and volunteer Srikanthlogic held this one-day hackathon for experienced developers.  Pandian's report praised the 21 participants for coming up with 13 completed hacks, including 2 core MediaWiki patches, 3 Tamil Wikipedia userscript updates, and 2 new deployed tools.

Upcoming events

 * Berlin hackathon (1–3 June 2012, Berlin, Germany) —


 * Wikimania hackathon (10–11 July 2012, Washington, D.C., USA) —

Work with us
Are you looking to work for Wikimedia? We have a lot of hiring coming up, and we really love talking to active community members about these roles.


 * Lucene Search Operations Engineer (RFP)
 * Mobile Quality assurance (RFP)
 * Senior Software Frontend Engineer
 * Software Developer Backend
 * Software Developer Frontend
 * Software Developer Mobile
 * Software Security Engineer
 * Technical Product Analyst
 * Interaction Designer

Site infrastructure

 * Ashburn data center — Mark Bergsma completed the Squid to Varnish conversion for image caching, and successfully deployed Varnish on 8 servers in our Ashburn data center for about half a day. During that time, he monitored and assessed the behavior of the software and the impact on the servers. Where there are currently 24 Squid servers, 8 varnish servers would provide sufficient capacity to replace them. However, there are concerns about overloading the NIC cards and concentrating too much cache on each server. Mark is now working on improving the Varnish implementation and possibly adding a few more servers. Also, because the Ashburn data center seem to be experiencing a higher server outage ratio than the Tampa site, Rob Halsell reviewed and added extra earth grounding to the cabinets, as a precaution. We are monitoring the situation to see if that does reduce server issues. Last, Peter Youngmeister and Jeff Green are making good progress in testing, preparing  and bringing up the Ashburn Search clusters . Full scale testing has just started and results have been quite promising. In the coming weeks, they will conduct limited trial deployments of the Ashburn clusters, running parallel to the ones at the Tampa datacenter. The Ashburn datacenter added network peering last week and in that short time, Leslie, peered with over 10 other big sites/ISPs with  our network  thus  reducing latency especially to Europe, Japan and HK for many of our users there (and reducing our bandwidth costs too!).


 * Amsterdam data center — Mark Bergsma restacked, re-arranged and decommissioned servers, and started racking the new router and switches. The actual network switchover at Evoswitch is still to be scheduled; however, Mark did replace the old core router in Vancis and deployed the new one there.


 * Media Storage — After addressing earlier issues with the Swift deployment, Ben Hartshorne re-deployed it and it has been stable since.  He removed the original testing hardware from the cluster and added the final production node to the cluster, bring a total of 5 new Swift nodes to be the thumbnails object store at Tampa.  Swift is also now running in the Labs environment and ready to be used by other Labs projects that interact with Swift in production.  Volunteer attention to the Swift Labs cluster is welcome to improve monitoring, analyze the configuration, and in any other way understand this component of our infrastructure better.

Testing environment

 * Wikimedia Labs —

Backups and data archives

 * Data Dumps — We sorted out the network issues to our mirror sites on our end by replacing a switch, set up a new host told hold a copy of all uploaded media for copying to oour mirror sites, and and the first copy of this media to an external mirror is now underway.  Mirror sites will also be able to pick up a list of dump files to copy (the last 1, 2 or 5 good dumps) in a few different formats, produced by a new script.  The first copy of recent dumps to a gluster share available to labs users is available but already out of date; one process is too slow, so a script is being tested that will dispatch copy requests to several processes running at once.  Christian is working on PHPUnit tests now for the maintenance scripts used for the dumps.  We've improved our process for deployment of new versions of the xml dump scripts so that new code can be rolled out more often.

Other news

 * Performance engineer Asher Feldman published an [//blog.wikimedia.org/2012/03/15/measuring-site-performance-at-the-wikimedia-foundation/ article explaining how site performance is measured] at the Wikimedia Foundation. He notably presented graphite and a limited version available at http://gdash.wikimedia.org.
 * Operations engineer Ryan Lane, who is leading the Wikimedia Labs project, was [//blog.wikimedia.org/2012/03/20/a-profile-in-free-collaboration/ featured on the Wikimedia Blog] this month.
 * We started investigating the possibility of a caching center on the West Coast of the US. We believe it would improve the experience for users in Asia and America's West Coast.
 * Readers reported intermittent performance issue on March 25th. Tim Starling investigated and determined it was a network problem. Leslie Carr quickly found the root cause, redirected the traffic and thus resolved the problem. Rob Halsell later swapped and replaced the problematic fiber and transceiver.

Offline Projects

 * Kiwix UX initiative —

Future
The engineering management team continues to update the Software deployments page weekly, providing up-to-date information on the upcoming deployments to Wikimedia sites, as well as the engineering roadmap, listing ongoing and future Wikimedia engineering efforts. In March, a particular focus of the engineering management team was also the annual goal and budgeting process.