Wikimedia Engineering/Report/2012/July

 Engineering metrics in July:
 * The total number of unreviewed commits went from about 320 to about 360.
 * About 35 shell requests were processed.
 * About 80 developers got access to Git and Wikimedia Labs.
 * Wikimedia Labs now hosts 114 projects, 211 instances and 559 users.

Major news in July include:

Recent events
Pre-Wikimania hackathon (10–11 July 2012, Washington, D.C., USA)

This year's pre-Wikimania Hackathon was special in that it had a full track for newcomers, going beyond tutorials. The Hackathon was a collaboration with OpenHatch, an open source teaching non-profit. The new efforts included identifying appropriate first-time tasks to orient newcomers into more advanced Wikipedia editing and tech contribution, creating a laptop setup guide that steps attendees through the process of configuring development environments, and providing constant in-person assistance to help people past problems they encountered. While at the event, we saw many people learning more about templates, editing Wikipedia, and using and modifying bots to improve the encyclopedia and media on it. At least 65 people signed in, with more surely more in attendance. A more full report on the Hackathon is forthcoming to the wikitech-l list.

Wikipedia Engineering Meetup
To showcase the interesting problems and products that Wikimedia engineering work on to the local developer community, the Tech group has created a Wikipedia Engineering Meetup. The Meetup plans meet every two months at WMF offices in San Francisco and tentatively consists of three short 15 minute engineering presentations followed by a question and answer period bracketed by mingling.

The inaugural meetup will be on August 15 and the talks scheduled will be:
 * Tomasz Finc and Jon Robson talking about Wikipedia Mobile
 * David Schonover talking about Analytics at the Foundation
 * Trevor Parscal and Roan Kattouw talking about the VisualEditor

Work with us
Are you looking to work for Wikimedia? We have a lot of hiring coming up, and we really love talking to active community members about these roles.



Announcements

 * Peter Youngmeister, who was working as a contractor for the Operations team, was converted to full-time Technical Operations Engineer (announcement).
 * S Page joined the Editor engagement experiments team as Software Engineer (announcement).
 * S Page joined the Editor engagement experiments team as Software Engineer (announcement).

Site infrastructure

 * July was a relatively quiet month for Operations, and the team was working mostly behind the scene, focusing on the key projects. Mark has successfully integrated and tested the upgraded Varnish software (with persistent cache patch) on some of our mobile caching servers. They are working very well and the plan is to roll it widely in the coming weeks.


 * Mark made several feature upgrades to the LVS/Pybal. Those features include ipv6 BGP support, DNS recursor implemented as an LVS cluster and on the back of Unbuntu 12.4. The package has been puppetized and deployed. It also means EQIAD servers no longer hop to Tamoa to get DNS answers (finally).


 * Peter has been packaging, puppetizing and testing the new application server build. This build runs on Precise (Ubuntu 12.04) and works with the Swift object store (rather than the current nfs filer). There will be further performance and scalability tests across bigger portion of Tampa application servers shortly.


 * Asher has deployed an upgraded version of the parser cache server and the results have been impressive. Comparing p90 (90th percentile) and p99 (99th percentile) cache get times averaged over several days (July 3-5) for db40 vs the last 8 hour for new improved parsercache shows p90 dropping from 53.6ms to 7.17ms, and p99 dropping from 185.3ms to 17.1ms. This is relevant to every page request from logged in and cookied logged out users so should have a meaningful impact on the user experience.  In addition, Asher has completed and deployed his latest(Precise) MySQL build on one of the database slaves.

Object Store/Swift
 * Migration to Swift is progressing to the final stages now that the performance bottleneck issue identified in the June report has been resolved. Mediawiki is now operating fully using Swift as the primary object store for thumbnails (the NFS filer is relegated to a secondary fail-over backup). The 'originals' (uploaded images and multimedia contents) have been copied over to Swift as well, setting the stage to migrate away from the NFS filer next.

Testing environment
Wikimedia Labs
 * Labs is in a stability cycle of development. This month was focused on adding new hardware, working on upgrading OpenStack infrastructure, and other stability efforts. virt6-8 have been added to the cluster and about 20 instances have been migrated to these nodes so far. Another 40 instances have been created on virt6-8 since addition. Initial instance migration efforts ended in 30 instances being corrupted due to a KVM block migration bug. A cold migration process was created as a workaround, with an automated script. Development effort is ongoing to upgrade OpenStackManager to support the essex release of OpenStack. Keystone support is complete and OpenStack API support is being added currently. Development work on OpenStack continues as well. Andrew Bogott's openstack-common plugin framework has been merged. novaclient work is progressing and should be merged after some cleanup efforts. Some changes needed for OpenStack Keystone's LDAP backend to work for the essex (stable) release were pushed in collaboratively between Ryan Lane and (OpenStack developer) Adam Young. A blueprint has been submitted for using Keystone to manage LDAP entries via templates, so that we can move to Keystone as an LDAP manager in the future. Work has begun on using OpenStack Nova for managing DNS entries. GlusterFS project storage has been upgraded to version 3.3. A tutorial on Using Puppet with Labs was hosted at the pre-Wikimania Hackathon by Leslie Carr and Ryan Lane. A presentation on Labs and the State of Our Open Source Infrastructure was given at Wikimania.

Backups and data archives
Data Dumps
 * The YAS3 library for uploading to archive.org and to other s3-compatible sites, along with several command line clients, is now usable (though still under heavy development). This library handles 100 Continue correctly; this means that for large file uploads, the upload is only attempted once the client has been redirected to the right host, a great time saver. The library also supports uploads of large files in multiple chunks automatically, rather than requiring the user to split the file into separate pieces. That's a necessity for us since many of our dump files are quite large.

Offline
Kiwix


 * We finally released Kiwix 0.9 rc1 (see the CHANGELOG). All the binary files were compiled using our new continuous integration build platform. In collaboration with Wikimedia France (for the Afripedia project), we released a first version of kiwix-plug, a standalone WiFi hotspot using cheap plug computers. The Black&White project, contracted by Wikimedia CH, was completed; a recent achievement was the introduction of Kiwix in the official Debian package repository. Also in collaboration with Wikimedia CH, we started a new project called ZIM autobuild aiming to quickly and automatically generate ZIM files of our projects.

Wikidata

 * The Wikidata project is funded and executed by Wikimedia Deutschland.

The Wikidata team has made good progress towards their first roll-out. The initial deployment plans are being made and the Hungarian Wikipedia community stepped up to be the first to use the interwiki part of Wikidata in a few weeks. You can follow the deployment planning at Wikidata/Deployment. This also means the demo system needs to be tested more. If you have five spare minutes, have a look at the demo system and report any bugs you might find there so they can be fixed before the initial deployment.

The team also started to collect future use-cases of Wikidata that should be kept in mind during development. You can find the existing ones here and are invited to refine them or add your own. Additionally the team is looking for feedback on the third iteration of the storyboard for linking Wikipedia articles in the future.

Future
The engineering management team continues to update the Software deployments page weekly, providing up-to-date information on the upcoming deployments to Wikimedia sites, as well as the engineering roadmap, listing ongoing and future Wikimedia engineering efforts.