Wikimedia Engineering/Report/2013/May

Engineering metrics in May:
 * unique committers contributed patchsets of code to MediaWiki.
 * The total number of unresolved commits went from about 815 to.
 * About shell requests were processed.
 * Wikimedia Labs now hosts 165 projects and 1382 users; to date 1943 instances have been created.

Major news in May include:
 * Tool Labs is now operational
 * Tool Labs is now operational

''Note: We're also providing a shorter, simpler and translatable version of this report that does not assume specialized technical knowledge.

Upcoming events
There are many opportunities for you to get involved and contribute to MediaWiki and technical activities to improve Wikimedia sites, both for coders and contributors with other talents.

For a more complete and up-to-date list, check out the Project:Calendar.

Work with us
Are you looking to work for Wikimedia? We have a lot of hiring coming up, and we really love talking to active community members about these roles.

Announcements

 * May Galloway joined the Product Development team as Visual Designer (announcement).
 * Jared Zimmerman joined the Engineering Department as Director of User Experience (announcement).
 * Summer of Code 2013
 * Summer of Code 2013

Technical Operations
Site infrastructure
 * The migration to MariaDB continued, with Wikimedia Commons now fully moved over. Additional database infrastructure work was completed in support of the Tool Labs, producing a row based replication stream with all PII removed for the publicly accessible Tool Labs databases.
 * We'll be upgrading to Request Tracker version 4 and migrating the service to eqiad soon. Most of the ground work is laid for this; RT4 is puppetized and we've done a dry run.

Data Dumps
 * The routine dumps of wikidata ran into two roadblocks, one of them related to the way that rc patrol and autopatrolling are handled in MediaWiki. While a local workaround is in place, there has been discussion of revamping the patrolling mechanism including changes to the database .  The other issue, affecting the full history content dumps, also has a temporary workaround until we can decide what the meaning of the 'rev_len' field in the database for revisions really means.
 * The mwxml2sql utils have been through some testing and bug fixes, and a volunteer is interested in packaging them for Debian to accompany his code for local installation of a mirror of Wiipedia (or the project of one's choice).
 * Incremental dumps will be hacked on this summer by User:Svick as part of this year's GSOC program. We're happy to have him and can't wait for the finished code!

Wikimedia Labs
 * This month was mostly dedicated to bringing Tool Labs online, but a number of other changes occurred as well. Work has progressed on AJAXification of the OpenStackManager interface. Instance reboot actions are now using this and there are gerrit changes awaiting review for instance console output, instance deletion, and some IP address actions. The custom virtual machine image had a number of fixes this month improving reliability and boot speed. We expect further improvements with upgrades to the OpenStack grizzly or havana releases. OpenStack was upgraded from the essex release to the folsom release, increasing speed of operations and bringing some new features (such as instance resizing). We'll be making these features available for use soon. All virtualization hosts and all instances were upgraded for a kernel security vulnerability and were rebooted this month, causing roughly an hour of scheduled downtime. Work has also been progressing on creating instances pre-configured for doing MediaWiki development; this has been working in the past, but it is now more reliable, faster, and meets our legal team's requirements for MediaWiki installations in Labs. Work has also progressed on a more reliable development environment for Labs itself. Soon it should be possible to install a pre-configured instance ready for making changes to the Labs infrastructure.
 * Tool Labs is now operational, with roughly 150 tools already migrated. With the completion of the basics of replication (all but s7 are operational), there remains no roadblocks for migration.  During the week since the Amsterdam Hackaton, a fair number of minor issues have been found and fixed, and the general consensus from users is that the environment is functional. On the roadmap for the next month is cleaning up some of the management for replication (so that it is more generalized), finish s7, and help users with their migration issues.


 * Tool Labs work has also added a new feature available to all of Labs: service groups. Service groups are a user/group combination available locally within a project. Service group membership allows regular users to sudo to the service groups, allowing per-project service users, rather than needing to create local users via puppet, or create global users via wikitech.


 * Work progresses on puppetizing more OpenStack services and the OpenStackManager extension. Currently OSM development is hampered by a lack of test installs; soon we should have the ability to easily create new labs instances running Openstack and OpenStackManager for testing and development.

Varnish Cache Invalidation Improvements
 * We've written a replacement for varnishhtcpd called vhtcpd, and deployed it to the production Varnish machines. The new daemon is ~50x more efficient at the same basic job.  This buys us some performance on Varnish machines in general, but more importantly it gets rid of invalidation failures due to network buffer overruns when the old daemon couldn't keep up.  This should also rid us of the random software failures in the old daemon that resulted in missing some or all cache purge requests for extended periods of time.   The initial deployment has just been a basic swap of the two daemons.  Near-future further improvements include turning on the new daemon's HTCP regex filtering configuration for more efficiency gains and tying its HTCP packet statistics back into our normal monitoring and analysis infrastructure so that we can better see any further multicast invalidation delivery issues that may arise.

Support

 * Operations moved db1025 into the firewalled fundraising cluster (frack), rebuilt it on precise with mariadb. RAID monitoring tools updated to support RAID controllers used in frack. We've allocated mostly finished building/puppetizing the new payments listener (thulium) as well as a new civicrm host (barium), both of which are in frack.

Kiwix
The Kiwix project is funded and executed by Wikimedia CH.



Wikidata
The Wikidata project is funded and executed by Wikimedia Deutschland.



Future

 * The engineering management team continues to update the Deployments page weekly, providing up-to-date information on the upcoming deployments to Wikimedia sites, as well as the engineering roadmap, listing ongoing and future Wikimedia engineering efforts.