Wikimedia Engineering/Report/2012/March

 Engineering metrics in March:
 * 98 unique committers contributed code to MediaWiki.
 * About code commits were reviewed.
 * The total number of unreviewed commits went from 31 to.
 * About 34 shell requests were processed.
 * developers got commit access, among which volunteers.
 * Wikimedia Labs now hosts 75 projects, 126 instances and 222 users.

Major news in March include:
 * The completion of the [//blog.wikimedia.org/2012/03/09/transfer-of-wikipedia-sites-from-godaddy-complete/ move of all our domain names from registrar GoDaddy];
 * The [//blog.wikimedia.org/2012/02/15/wikimedia-engineering-moving-from-subversion-to-git/ move from Subversion to git] as our primary code versioning system;
 * The kick-off of [//blog.wikimedia.org/2012/03/16/project-ideas-students-and-mentors-wanted-gsoc-2012/ Wikimedia's participation to Google Summer of Code 2012];
 * The kick-off of [//blog.wikimedia.org/2012/03/16/project-ideas-students-and-mentors-wanted-gsoc-2012/ Wikimedia's participation to Google Summer of Code 2012];

Hover your mouse over the green question marks to see the description of a particular project.

Recent events

 * Chennai Hackathon March 2012 (17 March 2012, Chennai, India) — Yuvaraj Pandian and volunteer Srikanthlogic held this one-day hackathon for experienced developers.  Pandian's report praised the 21 participants for coming up with 13 completed hacks, including 2 core MediaWiki patches, 3 Tamil Wikipedia userscript updates, and 2 new deployed tools.

Upcoming events

 * Berlin hackathon (1–3 June 2012, Berlin, Germany) — Registration opened in March for this three-day "inreach" hackathon for the Wikimedia technical community, including MediaWiki developers, Toolserver users, bot writers and maintainers, Gadget creators, and other Wikimedia technologists. The event, hosted by Wikimedia Germany, will mostly involve focused sprints, bugbashing, and other coding, with a few focused tutorials and trainings on Git, Lua, Gadgets changes, or other topics of interest. Wikimedia Germany will also use this event to consult on and discuss the Wikidata structured data project. Developers are encouraged to register now, and to mention in the registration form if they will need financial subsidies or help with accommodation or visa.  Developers who will need that sort of assistance are urged to register as soon as possible, preferably before May 1st.


 * Wikimania hackathon (10–11 July 2012, Washington, D.C., USA) — Katie Filbert, Gregory Varnum, and Sumana Harihareswara are planning the hybrid inreach/outreach hackathon occurring just prior to Wikimania. Experienced Wikimedia technologists will collaborate, while interested new developers will be able to learn introductory MediaWiki development. The organizers are still deciding on themes and focus topics for the event, possibly including accessibility.

Work with us
Are you looking to work for Wikimedia? We have a lot of hiring coming up, and we really love talking to active community members about these roles.


 * Lucene Search Operations Engineer (RFP)
 * Mobile Quality assurance (RFP)
 * Senior Software Frontend Engineer
 * Software Developer Backend
 * Software Developer Frontend
 * Software Developer Mobile
 * Software Security Engineer
 * Technical Product Analyst
 * Interaction Designer

Site infrastructure

 * Ashburn data center — Mark Bergsma completed the Squid to Varnish conversion for image caching, and successfully deployed Varnish on 8 servers in our Ashburn data center for about half a day. During that time, he monitored and assessed the behavior of the software and the impact on the servers. Where there are currently 24 Squid servers, 8 varnish servers would provide sufficient capacity to replace them. However, there are concerns about overloading the NIC cards and concentrating too much cache on each server. Mark is now working on improving the Varnish implementation and possibly adding a few more servers. Also, because the Ashburn data center seem to be experiencing a higher server outage ratio than the Tampa site, Rob Halsell reviewed and added extra earth grounding to the cabinets, as a precaution. We are monitoring the situation to see if that does reduce server issues. Last, Peter Youngmeister and Jeff Green are making good progress in testing, preparing  and bringing up the Ashburn Search clusters . Full scale testing has just started and results have been quite promising. In the coming weeks, they will conduct limited trial deployments of the Ashburn clusters, running parallel to the ones at the Tampa datacenter. The Ashburn datacenter added network peering last week and in that short time, Leslie, peered with over 10 other big sites/ISPs with  our network  thus  reducing latency especially to Europe, Japan and HK for many of our users there (and reducing our bandwidth costs too!).


 * Amsterdam data center — Mark Bergsma restacked, re-arranged and decommissioned servers, and started racking the new router and switches. The actual network switchover at Evoswitch is still to be scheduled; however, Mark did replace the old core router in Vancis and deployed the new one there.


 * Media Storage — After addressing earlier issues with the Swift deployment, Ben Hartshorne re-deployed it and it has been stable since.  He removed the original testing hardware from the cluster and added the final production node to the cluster, bring a total of 5 new Swift nodes to be the thumbnails object store at Tampa.  Swift is also now running in the Labs environment and ready to be used by other Labs projects that interact with Swift in production.  Volunteer attention to the Swift Labs cluster is welcome to improve monitoring, analyze the configuration, and in any other way understand this component of our infrastructure better.

Testing environment

 * Wikimedia Labs — Gluster project storage is now available. In total 71TB are available for use. Each project has a default quota of 300GB that can be upped on request. Soon public datasets (such as xml dumps) will also be available within Labs. There were two Labs downtime events this month. Both downtimes were due to glusterfs instance storage. The first downtime was due to a limitation in the fuse filesystem (in regards to recreating deleted directories); this downtime was relatively short (roughly 2 hours). The second downtime was due to malfunctioning hardware, which caused the glusterfs storage to go into a splitbrain situation that was unresolvable. There was no dataloss, but the instance's images had to be recovered manually from gluster's backend. Total downtime for the second outage was roughly 24 hours. Andrew Bogott has finished his work on the SharedFS support in Nova, with a gluster driver. Proposal for this for inclusion to nova is set for the folsem release; this will be discussed at the upcoming OpenStack design summit. Andrew has begun work on adding support for updating MediaWiki on nova changes.

Backups and data archives

 * Data Dumps — We sorted out the network issues to our mirror sites on our end by replacing a switch, set up a new host told hold a copy of all uploaded media for copying to oour mirror sites, and and the first copy of this media to an external mirror is now underway.  Mirror sites will also be able to pick up a list of dump files to copy (the last 1, 2 or 5 good dumps) in a few different formats, produced by a new script.  The first copy of recent dumps to a gluster share available to labs users is available but already out of date; one process is too slow, so a script is being tested that will dispatch copy requests to several processes running at once.  Christian is working on PHPUnit tests now for the maintenance scripts used for the dumps.  We've improved our process for deployment of new versions of the xml dump scripts so that new code can be rolled out more often.

Other news

 * Performance engineer Asher Feldman published an [//blog.wikimedia.org/2012/03/15/measuring-site-performance-at-the-wikimedia-foundation/ article explaining how site performance is measured] at the Wikimedia Foundation. He notably presented graphite and a limited version available at http://gdash.wikimedia.org.
 * Operations engineer Ryan Lane, who is leading the Wikimedia Labs project, was [//blog.wikimedia.org/2012/03/20/a-profile-in-free-collaboration/ featured on the Wikimedia Blog] this month.
 * We started investigating the possibility of a caching center on the West Coast of the US. We believe it would improve the experience for users in Asia and America's West Coast.
 * Readers reported intermittent performance issue on March 25th. Tim Starling investigated and determined it was a network problem. Leslie Carr quickly found the root cause, redirected the traffic and thus resolved the problem. Rob Halsell later swapped and replaced the problematic fiber and transceiver.

Offline Projects

 * Kiwix UX initiative —

Future
The engineering management team continues to update the Software deployments page weekly, providing up-to-date information on the upcoming deployments to Wikimedia sites, as well as the engineering roadmap, listing ongoing and future Wikimedia engineering efforts. In March, a particular focus of the engineering management team was also the annual goal and budgeting process.