Wikimedia Engineering/Report/2012/April

 Engineering metrics in April:
 * 53 unique committers contributed code to MediaWiki.
 * The total number of unreviewed commits went from about 100 to 138.
 * About 34 shell requests were processed.
 * 63 developers got developer access to Git and Wikimedia Labs, among which 60 are volunteers.
 * Wikimedia Labs now hosts 81 projects, 136 instances and 305 users.

Major news in April include:
 * Substantial work on Wikimedia engineering's goals for the next fiscal year;
 * The selection of 9 Google Summer of Code students and the start of their work;
 * The [//blog.wikimedia.org/2012/04/12/mediawiki-1-20wmf1-deployment/ shift to a rapid deployment cycle];
 * A new mobile skin deployed to Wikimedia sites;
 * The Wikipedia mobile app for iOS switching to using OpenStreetMap data.

Recent events

 * OpenStack Design Summit and Conference — The Wikimedia Labs team attended this San Francisco event, collaborated on upcoming OpenStack design decisions, spoke to other users, and publicized the Wikimedia Labs project.

Upcoming events

 * Berlin hackathon (1–3 June 2012, Berlin, Germany) — The 120 tickets available for this three-day "inreach" hackathon disappeared as the event sold out in April. The Wikimedia technical community, including MediaWiki developers, Toolserver users, bot writers and maintainers, Gadget creators, and other Wikimedia technologists, showed substantial interest in the hackathon.  The event, hosted by Wikimedia Deutschland, will mostly involve focused sprints, bugbashing, and other coding, with a few focused tutorials and trainings on Git, Lua, Gadgets changes, security, and performance optimization. Wikimedia Deutschland will also use this event to consult on and discuss the Wikidata structured data project.


 * Wikimania hackathon (10–11 July 2012, Washington, D.C., USA) — Katie Filbert, Gregory Varnum, and Sumana Harihareswara are planning the hybrid inreach/outreach hackathon occurring just prior to Wikimania. Experienced Wikimedia technologists will collaborate, while interested new developers will be able to learn introductory MediaWiki development. Accessibility will be one of the event themes.

Work with us
Are you looking to work for Wikimedia? We have a lot of hiring coming up, and we really love talking to active community members about these roles.



Announcements

 * Matthias Mullie joined the Features team (announcement).
 * Faidon Liambotis joined the Operations team to work on Wikimedia Labs (announcement).
 * Chris Steipp joined the Platform team as Senior Security Engineer (announcement).
 * Tauhida Parveen joined the Platform team to work on QA and testing (announcement).
 * Sumana Harihareswara was promoted to Engineering Community Manager (announcement).

Site infrastructure

 * Servers — We added additional servers to bits.wikimedia.org in our Virginia data center for network capacity growth and redundancy. We also deployed a new udp2log server (in Eqiad) thus providing added extra capacity to collect new data for the Analytics team.


 * Search — After months of preparation and refactoring work with our dated Lucene implementation at the Tampa data center, we are glad to report that Peter Youngmeister (with help from Asher Feldman, Robert Stojnic and Jeff Green) successfully built and deployed the new Search infrastructure at our EQIAD data center. The performance improvement is quite amazing, at the 99th percentile level, search latency dropped from a high of 9 seconds to 1 second, and the average search is only 100ms, down from 700ms. In addition, the new infrastructure addresses some of the previous single point of failures and capacity limitations.


 * Databases — Asher also completed the database migration/upgrade to support SHA-1 hashes in the coming MediaWiki release.


 * Caching —  Varnish is now used in EQIAD to serve all upload (images & media) traffic (other than Europe, which has its own servers). Mark Bergsma implemented Varnish to replace our Squid instances which are running in Tampa. In addition to having consistent hashing, Mark ran half of the Eqiad Varnish instances with the experimental persistent storage back-end. Unfortunately, after a few days, he found showstopper bugs and reverted it to the stable version.


 * Networking — A secondary network transit link has been added to our Eqiad network, providing us redundancy and capacity, and comes with IPV6 enabled.


 * Media Storage — April saw two areas of progress: the MediaWiki code to allow original media storage in Swift was deployed to production (though it is not yet in use), and further investigation into old corrupted objects continued with new evidence and cleanup.  During May, we hope to begin the data migration from the older storage system into Swift, and to deploy improved monitoring and metrics.

Testing environment

 * Wikimedia Labs — A new version of OpenStackManager was released, adding project filters for all interfaces, usability fixes and a number of bug fixes. OpenStackManager and LdapAuthentication were switched to Git, allowing a few more changes to be pushed thanks to being able to keep a stable master branch. Notable changes were per-project sudo management, allowing sysadmins in a project to manage who gets which sudo permissions in a fine grained manner for their projects, and a change in how groups are added to LDAP for projects. A compute node (virt5) was added to the compute cluster, allowing for another 40 instances of capacity. We had an outage towards the end of the month, again due to glusterfs. We will start looking for alternatives to glusterfs soon. Sara Smollett added Per-project ganglia monitoring, displaying resource graphs for instances in projects. Andrew Bogott finished work on a plugin framework for OpenStack Nova, and has added an example plugin for a SharedFS driver, which would allow us to manage gluster volumes via an API.

Backups and data archives

 * Data Dumps — The gluster share with the last 5 or so good dumps for all projects is ready for use by Wikimedia Labs projects.  A first copy of uploaded media, accessible via rsync, was announced, and some work was done on tine infrastructure to generate downloadable bundles of media per project.  We're working with the Internet Archive to produce media bundles that they can host for download as well.  A new version of the dump scripts was deployed with some minor bug fixes.  Christian Aistleitner wrapped up work on the PHPUnit tests for the dump maintenance scripts, and discovered a problem with the database schema, which we will need to discuss with the user community in order to find a resolution that works for everyone.

Other news

 * There was a short site incident at our Amsterdam site on April 26th at around 6:00 (UTC), which lasted for 30 minutes and impacted some of our European users. We experienced an unusual traffic surge that overwhelmed some resources. That was quickly addressed once we found the cause.

Internationalization and Editor Engagement Experiments

 * Editor Engagement Experiments (E3) — Karyn Gladstone, Steven Walling, Maryana Pinchuk and Ryan Faulkner conducted the Necromancy experiment, [//blog.wikimedia.org/2012/05/02/enticing-wikipedians-back/ emailing lapsed editors] to encourage them to edit Wikipedia again. Work on the Template A/B testing project is wrapping up; a full report is expected in May. The E3 team will also be publishing details on each experiment on meta and the English Wikipedia. The technical specifications for each implementation will be posted on mediawiki.org. The team also began recruiting for its open positions; the first software engineer for the team will be joining mid May.

Future
The engineering management team continues to update the Software deployments page weekly, providing up-to-date information on the upcoming deployments to Wikimedia sites, as well as the engineering roadmap, listing ongoing and future Wikimedia engineering efforts.