Wikimedia Engineering/Report/2012/May

 Engineering metrics in May:
 * 77 unique committers contributed code to MediaWiki.
 * The total number of unreviewed commits went from about 140 to 250.
 * About 58 shell requests were processed.
 * 108 developers got developer access to Git and Wikimedia Labs.
 * Wikimedia Labs now hosts 97 projects, 177 instances and 431 users.

Major news in May include:
 * the publication of the [//blog.wikimedia.org/2012/05/11/book-architecture-mediawiki-open-source-applications/ Architecture of Open-Source Applications book], which contains a chapter on MediaWiki;
 * initial designs for a [//blog.wikimedia.org/2012/05/21/introducing-designs-for-the-universal-language-selector/ universal language selector];
 * a new and easier way to view a wiki's [//blog.wikimedia.org/2012/05/29/wikimedia-wikis-reveal-interwiki-map/ interwiki map];
 * [//blog.wikimedia.org/2012/05/29/1-million-media-files-uploaded-using-upload-wizard/ 1 million files] uploaded with our UploadWizard;
 * the [//blog.wikimedia.org/2012/05/31/wikidata-summit-kicks-off-in-berlin/ Wikidata/RENDER summit] in Berlin, followed by the [//blog.wikimedia.org/2012/06/02/diverse-wikimedia-tech-crowd-gathers-in-berlin/ hackathon].

Upcoming events
Berlin hackathon (1–3 June 2012, Berlin, Germany)
 * The Wikimedia technical community prepared tutorials and plans for the event. MediaWiki developers, Toolserver users, systems administrators, bot writers and maintainers, Gadget creators, and other Wikimedia technologists looked forward to learning about and working on Lua, Git, Gadgets changes, security, Wikidata, RENDER, and other Wikimedia technologies. More information will be available in the June engineering report.

Wikimania hackathon (10–11 July 2012, Washington, D.C., USA)
 * Katie Filbert, Gregory Varnum, and Sumana Harihareswara are organizing a hybrid inreach/outreach hackathon occurring just prior to Wikimania, and aim to make it welcoming for both novices and experts. Experienced Wikimedia technologists will collaborate on their own projects, while interested new developers will be able to learn introductory MediaWiki development. Accessibility will be one of the event themes.

Work with us
Are you looking to work for Wikimedia? We have a lot of hiring coming up, and we really love talking to active community members about these roles.



Announcements

 * Danielle Benoit joined the Platform Engineering team as a contractor working on development tutorials (announcement).
 * Ori Livneh joined the Editor Engagement Experiments (E3) engineering team as Software Developer (announcement).
 * Vibha Bamba joined the Product team as Interaction Designer (announcement).
 * James Forrester joined the Product team as Technical Product Analyst, focusing on the Visual Editor (announcement).
 * Subramanya Sastry joined the Features team as Senior Software Engineer (announcement).

Site infrastructure
Data Centers
 * May has been a busy month for racking, stacking and provisioning of newly purchased servers, especially by Rob Halsell, Chris Johnson, Leslie Carr and Mark Bergsma. Recently, we purchased new hardware for server refresh (many are out of warranty and over 4 years old), adding capacity and redundancy, and for new projects, including servers for  Search, Analytics, Fundraising, OpenStreetMap, databases, Varnish, Memcached and backups. Much effort was put into OS installation and servers network; they are now ready for the various system and application deployments.


 * IPv6 work went into full swing as well, in order to be ready for IPv6 Launch Day on June 6. As of end of May, the database schemas were updated, and work started on refactoring LVS, PayPal, Varnish, Squid, DNS, Nagios monitoring and puppetization.


 * In April, we deployed the newly built Search cluster at our Ashburn datacenter, and disabled the Tampa search cluster. This month, Peter Youngmeister went through the exercise of upgrading the 4-year-old Tampa search cluster infrastructure, and brought it back up. We now have a cross-datacenter hot standby for the 'Search' service.


 * With Ubuntu 12.4 (Precise Pangolin) available, we have packaged and started using it selectively in some of our systems, including Search at Tampa and half of the LVS servers. Next, we will be setting up the Apache servers at Ashburn data center using Precise as well.


 * Recently, we were experiencing a few systems rebooting themselves. Faidon Liambotis investigated and reported a bug with our kernel in 10.4 that caused servers to reboot after about 208 days of uptime. We applied the necessary kernel and security patches to the impacted servers.


 * Ben Hartshorne has been working with the SwiftStack folks on enhancing Swift to provide additional Swift-specific monitoring to our ganglia tool. Next, they will work on identifying potential Swift performance bottlenecks (when under load) in our implementation and recommend mitigation. Ben has started testing the upgrading of our current version to the just-released 1.4.8. This should improve stability of the software.

Testing environment
Wikimedia Labs
 * The Labs infrastructure had a couple outages, due to excess load and the GlusterFS system. As a result, Ryan Lane, Faidon Liambotis and Andrew Bogott are working on a get well plan, which includes finding a suitable replacement of GlusterFS. The short term plan that is in the works however would expose us to a non-redundant infrastructure, by placing the instances in local storage on each node. Longer term plans are evaluating Ceph and possibly writing a new filesystem mode in OpenStack to use DRBD in a way similar to Ganeti. Faidon implemented a new way of managing puppet that allows users to test all of their changes locally before pushing them in for review. Sara Smollett moved her changes for ganglia from Labs to production. Andrew Bogott has been working on bringing up the new cluster in eqiad, for testing ceph, testing the upgrade of OpenStack from Diablo to Essex, and preparing for the new zone we'll add there. Ryan Lane wrote a new software deployment system, slated for use in Labs and in production, using git-deploy and saltstack.

Backups and data archives
Data Dumps
 * We've been busy creating bundles of media in use per project and the first set of files is almost complete. For each wiki, there is now one or more files containing all media uploaded locally to the wiki, and one or more files containing all media used by the wiki but uploaded to Commons. We've also been preparing for the media back-end switch to Swift; since we won't be able to make copies of all media files in the usual way, some scripts were hacked together which will check the  and   tables and will retrieve and/or update media files via http as needed. Your.org and Masaryk University mirrors officially came online; we're still looking for other partners to host media backups and pageview statistics.

Offline
Kiwix


 * We set up a fully virtualized compilation farm using technologies like Buildbot, Virtualbox and Qemu. This will allow for a better continuous integration and more frequent releases. We have also developed our first proof-of-concept for kiwix-mobile using cordova-qt. Kiwix was featured as "project of the week" on SourceForge the last week of May, which helped us reach the milestone of 25.000 monthly software downloads for the first time.

Wikidata

 * The Wikidata project is funded and executed by Wikimedia Deutschland.

The team made good progress on their work on interwiki links. The demo system shows the current state of development. They published a draft showing how interwiki links should work in the future, which was amended after the recent work done on the [//blog.wikimedia.org/2012/05/21/introducing-designs-for-the-universal-language-selector/ universal language selector]. They published another document explaining how data from Wikidata is going to be included in Wikipedia sites, also rewritten based on community feedback. Last, members of the Wikidata team attended a lot of events (like LinuxTag, re:publica and the 2nd ESWC Summer School) and held IRC office hours. At the end of the month, the team met with Foundation staff and community members in Berlin at the [//blog.wikimedia.org/2012/05/31/wikidata-summit-kicks-off-in-berlin/ Wikidata/RENDER summit] to present the work done so far, and discuss important decisions for the future of the project.

Future
The engineering management team continues to update the Software deployments page weekly, providing up-to-date information on the upcoming deployments to Wikimedia sites, as well as the engineering roadmap, listing ongoing and future Wikimedia engineering efforts.