Wikimedia Engineering/Report/2012/June

 Engineering metrics in June:
 * 92 unique committers contributed 1401 patchsets of code to MediaWiki.
 * The total number of unreviewed commits went from about 250 to about 320.
 * About 53 shell requests were processed.
 * 45 developers got developer access to Git and Wikimedia Labs.
 * Wikimedia Labs now hosts 100 projects, 182 instances and 468 users.
 * The Wikipedia Android app has now been downloaded over 4,000,000 times and retains a rating of 4.5/5.
 * 4 Wikimedia projects are now mobile default: Wikipedia, Wiktionary, Wikinews & Wikisource.

Major news in June include:
 * the [//blog.wikimedia.org/2012/06/02/diverse-wikimedia-tech-crowd-gathers-in-berlin/ Berlin hackathon], the largest gathering of Wikimedia technologists to-date, co-organized with Wikimedia Deutschland;
 * the [//blog.wikimedia.org/2012/06/21/help-us-shape-wikimedias-prototype-visual-editor/ June milestone release of the Visual Editor and Parsoid] to mediawiki.org;
 * the mobile team kicking off app development for Wiki Loves Monuments
 * the launch of IPv6 support for all Wikimedia projects.

Recent events
Berlin hackathon (1–3 June 2012, Berlin, Germany)
 * Approximately 104 participants from 30 countries came to Berlin, including MediaWiki developers, Toolserver users, systems administrators, bot writers and maintainers, Gadget creators, and other Wikimedia technologists. The community also learned more about the Wikidata and RENDER projects. More updates, links to videos, and followups are on the talk page.

Upcoming events
Pre-Wikimania hackathon (10–11 July 2012, Washington, D.C., USA)
 * Open source teaching nonprofit OpenHatch will be aiding in organizing and running this two-day event, with Katie Filbert, Gregory Varnum, and Sumana Harihareswara. Experienced Wikimedia technologists will collaborate on their own projects, while interested new developers will be able to learn introductory MediaWiki development. Accessibility will be one of the event themes.  The event is free to attend even for those not attending Wikimania itself.

Work with us
Are you looking to work for Wikimedia? We have a lot of hiring coming up, and we really love talking to active community members about these roles.



Announcements

 * Munaf Assaf joined the Product team as UX Designer, mainly working on the Editor engagement experiments (announcement)
 * Adam Wight joined the Features team as Fundraising Engineer (announcement)
 * Bugmeister Mark Hershberger's time with WMF ended on May 31st (announcement)
 * Ian Baker's time with the Foundation ended on June 8th.
 * Jon Robson became a fulltime employee of the Foundation.

Site infrastructure

 * June was another busy month for racking, stacking and provisioning of newly purchased equipment for Chris and Rob. In the works are additional servers to clusters such as External Store, Memcached, Parser Cache, Object Store and Labs. Meantime, new servers were rolled out in EQIAD for analytics, DNS resolver, and UDP2Log. Servers and firewalls were racked and cabled for the new EQIAD payments cluster. Storage3's RAID controller failure was repaired, and a replacement machine was ordered.


 * IPV6 Launch day (6/6/12) came and went without much fanfare. Much work was put into the infrastructure and system-stack by Mark, Faidon, Ryan and Asher, especially into LVS, PyBal, Varnish, Squid, DNS, database, Nagios monitoring and puppetization. We also took this opportunity to update those technologies as well as run them on Precise (12.04) where possible. We have been keeping IPV6 traffic on since. As part of risk mitigation, only half of the LVS and Pybal servers were upgraded to run IPV6 and the enhanced features, allowing us to fallback if needed. Since we have now one month of stability, we will soon begin the rest of the migration.


 * During the Berlin Hackathon, the TechOps team got together for about 2 hours to review the year's progress. A blog post on this will follow soon. In summary, the team completed 19 priority 1 projects (e.g., deploy Mobile, SSL, Labs, Db upgrades & Network redundancy) that were identified at the beginning of the year. We followed up with a list of high priority projects for this new fiscal year. A blog post with more details on this will also follow soon. In addition to working on IPv6-related work, the team did a major cleanup of jobs creating cronspam, making the logfiles more readable.


 * Asher performed benchmark testing on the External Store, comparing the current ISAM engine with InnoDB. He dispelled the myth that MyISAM is faster for external store for this use case. He has started migrating them to use InnoDB engine with this new information. You can read his report here.

Data Centers Object Store/Swift
 * We have identified a new colocation facility to be the new West Coast caching center, and it is located at 200 Paul Street, San Francisco. Work on building up the infrastructure is planned to begin this coming August/September. With this caching center, we will be able to improve users' site experience for US west coast and Asia Pacific.
 * A severe bottleneck has been identified in doing container listings in Swift and Ben Hartshorne is adding SSD drives to the swift back end storage nodes to provide faster container listings. Testing has been completed to verify that this change will solve the problem and it is being deployed to production this month.  Additionally, integration of the SwiftStack monitoring improvements was accepted to the mainline Swift codebase last month and will be deployed to our environment in July.

Testing environment
Wikimedia Labs
 * The Labs infrastructure had a DNS outage, caused by glue records that must be updated via a manual process. To combat that issue in the future, Labs DNS resolvers are now on service IPs with service host names. A DNS resolver was brought up in EQIAD, as well as an additional LDAP replica. Faidon's puppetmaster::self class is being put into use. It's working well enough that the test branch for puppet was merged into the production branch, and Labs now runs directly off of the production branch. The very annoying "No nova credentials for your account" bug has been fixed. virt6-12 in pmtpa have been racked, wired and installed. They will soon be put into production. Andrew Bogott's work on the nova plugin framework continued this month. The plugin framework has been moved into openstack-common, making it the plugin framework for all openstack services. Work is now ongoing to merge the changes back into nova. Per-project Debian repositories (for ubuntu-precise and above) are now available. An all-in-one MediaWiki puppet class is now available as well.

Backups and data archives
Data Dumps
 * Media downloads per project are now live, along with one or two "incremental" downloads per month. The new deployment system (which actually uses scripts instead of moving files around by hand) was completed and is in place.  It was even used this month to push some minor changes. We're working with another organization that wants to mirror media, and we're still looking for more mirror sites for media, dumps or pageview stats; send us ideas!  The archive.org uploader code was rewritten as a core S3 uploader library with archive.org extensions and new features we need are being added; this will be extended for Google Storage usage as well.

Other news

 * We had our fair share of several short site incidents in the month of June. On June 7, users reported experiencing API service slowness and unavailability. Tim was around to resolve that incident (detailed report). On June 20 (and also on June 21), users reported about getting Apache HTTP timeout issue. It was found that in both cases, one of the memcached servers was experiencing high load and restarting them resolved the issue (detailed report). The incident on June 19 did not impact our MediaWiki production clusters, though it caused our email system to be held up for half a day.


 * Jeff discovered a mail distributed spam attack on our mailsystem involving what appeared to be a few thousand malicious hosts. They were flooding our secondary mailserver with undeliverable messages to fake addresses at various WMF domains. The secondary mailserver forwarded those to the primary mailserver, which overloaded and became slow in processing legitimate mail. A temporary fix was put in place to drop those fake and spam messages, but it took a day for the mail system to catch up. We subsequently put a proper fix in place.

Wikidata

 * The Wikidata project is funded and executed by Wikimedia Deutschland.

The team published an easier-to-understand version of their data model, updated their story boards for how to link between Wikipedias in the future, and submitted a proposal to the Knight News Challenge to make Wikidata a central, persistent repository for identifiers on the web in a second year of development. Also, proposed logos went up for public voting.

Future
The engineering management team continues to update the Software deployments page weekly, providing up-to-date information on the upcoming deployments to Wikimedia sites, as well as the engineering roadmap, listing ongoing and future Wikimedia engineering efforts.