Wikimedia Engineering/Report/2012/April

 Engineering metrics in April:
 * 53 unique committers contributed code to MediaWiki.
 * The total number of unreviewed commits went from about 100 to 138.
 * About 34 shell requests were processed.
 * 63 developers got developer access to Git and Wikimedia Labs, among which are volunteers.
 * Wikimedia Labs now hosts 81 projects, 136 instances and 305 users.

Major news in April include:
 * work on Wikimedia engineering's goals for the next fiscal year

Hover your mouse over the green question marks to see the description of a particular project.

Recent events

 * OpenStack Design Summit and Conference — The Wikimedia Labs team attended this San Francisco event, collaborated on upcoming OpenStack design decisions, spoke to other users, and publicized the Wikimedia Labs project.

Upcoming events

 * Berlin hackathon (1–3 June 2012, Berlin, Germany) — The 120 tickets available for this three-day "inreach" hackathon disappeared as the event sold out in April. The Wikimedia technical community, including MediaWiki developers, Toolserver users, bot writers and maintainers, Gadget creators, and other Wikimedia technologists, showed substantial interest in the hackathon.  The event, hosted by Wikimedia Deutschland, will mostly involve focused sprints, bugbashing, and other coding, with a few focused tutorials and trainings on Git, Lua, Gadgets changes, security, and performance optimization. Wikimedia Deutschland will also use this event to consult on and discuss the Wikidata structured data project.


 * Wikimania hackathon (10–11 July 2012, Washington, D.C., USA) — Katie Filbert, Gregory Varnum, and Sumana Harihareswara are planning the hybrid inreach/outreach hackathon occurring just prior to Wikimania. Experienced Wikimedia technologists will collaborate, while interested new developers will be able to learn introductory MediaWiki development Accessibility will be one of the event themes.

Work with us
Are you looking to work for Wikimedia? We have a lot of hiring coming up, and we really love talking to active community members about these roles.



New hires

 * Matthias Mullie joined the Features team (announcement).
 * Faidon Liambotis joined the Operations team to work on Wikimedia Labs (announcement).
 * Chris Steipp joined the Platform team as Senior Security Engineer (announcement).
 * Tauhida Parveen joined the Platform team to work on QA and testing (announcement).

Site infrastructure

 * US data centers
 * Added additional servers to bits.wikimedia.org @ eqiad for capacity growth and redundancy
 * Deployed SWIFT again with added capacity after addressing some initial teething problems. All thumbnails are served using SWIFT now.
 * After months of preparation and refactoring work with our dated Lucene search implementation @ Tampa, we are glad to report Peter (with help from Asher, rainmain_sr and Jeff) successfully built and deployed the new Search infrastructure at our EQIAD datacenter. The performance improvement is quite amazing, at the 99th percentile level, search latency dropped from a high of 9 seconds to 1 second, and the average search is only 100ms, down from 700ms. In addition, the new infrastructure addresses some of the previous single point of failures and capacity limitations.
 * Asher completed the dbschema migration/upgrade to support SHA-1 hashes in the coming mediawiki release
 * Varnish is now used in Eqiad to serve all upload (images & media) traffic (other than Europe, which has its own servers). Mark implemented varnish to replace our Squid instances which are running in Tampa. In addition to having consistent hashing, Mark ran half of the Eqiad Varnish instances with the experimental persistent storage backend. Unfortunately, after a few days, he found showstopper bugs and reverted it to the stable version.
 * A secondary network transit link has been added to our Eqiad network, providing us redundancy and capacity, and comes with IPV6 enabled.
 * Deployed a new udp2log server (in Eqiad) thus providing added extra capacity to collect new data for the Analytic folks.


 * Media Storage — April saw two areas of progress: the Mediawiki code to allow original media storage in swift was deployed to production (though it is not yet in use) and further investigation into old corrupted objects continued with new evidence and cleanup.  During May we hope to begin the data migration from the older storage system into Swift as well as deploy improved monitoring and metrics.

Testing environment

 * Wikimedia Labs — A new version of OpenStackManager was released, adding project filters for all interfaces, usability fixes and a number of bug fixes. OpenStackManager and LdapAuthentication were switched to git, allowing a few more changes to be pushed thanks to being able to keep a stable master branch. Notable changes were per-project sudo management, allowing sysadmins in a project to manage who gets which sudo permissions in a fine grained manner for their projects, and a change in how groups are added to LDAP for projects. A compute node (virt5) was added to the compute cluster, allowing for another 40 instances of capacity. We had an outage towards the end of the month, again due to glusterfs. We will start looking for alternatives to glusterfs soon. Per-project ganglia has been added thanks to Sara Smollett, displaying resource graphs for instances in projects. Andrew Bogott finished work on a plugin framework for OpenStack nova, and has added an example plugin for a SharedFS driver, which would allow us to manage gluster volumes via an API.

Backups and data archives

 * Data Dumps — The gluster share with the last 5 or so good dumps for all projects is ready for use by lab projects.  A first copy of uploaded media, accessible via rsync, was announced and some work was done on tine infrastructure to generate downloadable bundles of media per project.  We're working with the Internet Archive to produce media bundles that they can host for download as well.  A new version of the dump scripts was deployed with some minor bug fixes.  Christian Aisteitner wrapped up work on the PHPUnit tests for the dump maintenance scripts and discovered a problem with the database schema which we will need to discuss with the user community in order to find a resolution that works for everyone.

Other news

 * There was a short site incident at our Amsterdam site on 4/26/12 at around 0600 UTC, which lasted for half an hour, and impacted some of our European users. We experienced an unusual traffic surge which overwhelmed one some resources. That was quickly addressed once we found the cause.

Offline Projects

 * Kiwix UX initiative — The team released 0.9 RC1...

Future
The engineering management team continues to update the Software deployments page weekly, providing up-to-date information on the upcoming deployments to Wikimedia sites, as well as the engineering roadmap, listing ongoing and future Wikimedia engineering efforts.