Wikimedia Engineering/Report/2012/June

 Engineering metrics in June:
 * unique committers contributed code to MediaWiki.
 * The total number of unreviewed commits went from about 250 to about 320.
 * About 53 shell requests were processed.
 * 45 developers got developer access to Git and Wikimedia Labs.
 * Wikimedia Labs now hosts 100 projects, 182 instances and 468 users.

Major news in June include:
 * the [//blog.wikimedia.org/2012/06/02/diverse-wikimedia-tech-crowd-gathers-in-berlin/ Berlin hackathon], the largest gathering of Wikimedia technologists to-date, co-organized with Wikimedia Deutschland;
 * the [//blog.wikimedia.org/2012/06/21/help-us-shape-wikimedias-prototype-visual-editor/ June milestone release of the Visual Editor and Parsoid] to mediawiki.org;
 * the launch of IPv6 support for all Wikimedia projects.

Recent events
Berlin hackathon (1–3 June 2012, Berlin, Germany)
 * Approximately 104 participants from 30 countries came to Berlin, including MediaWiki developers, Toolserver users, systems administrators, bot writers and maintainers, Gadget creators, and other Wikimedia technologists. The community also learned more about the Wikidata and RENDER projects. More updates, links to videos, and followups are on the talk page.

Upcoming events
Pre-Wikimania hackathon (10–11 July 2012, Washington, D.C., USA)
 * Open source teaching nonprofit OpenHatch will be aiding in organizing and running this two-day event, with Katie Filbert, Gregory Varnum, and Sumana Harihareswara. Experienced Wikimedia technologists will collaborate on their own projects, while interested new developers will be able to learn introductory MediaWiki development. Accessibility will be one of the event themes.  The event is free to attend even for those not attending Wikimania itself.

Work with us
Are you looking to work for Wikimedia? We have a lot of hiring coming up, and we really love talking to active community members about these roles.



Announcements

 * Munaf Assaf joined the Product team as UX Designer, mainly working on the Editor engagement experiments (announcement)
 * Adam Wight joined the Features team as Fundraising Engineer (announcement)



Site infrastructure

 * June was another busy month for racking, stacking and provisioning of newly purchased equipment. In the works are additional servers to clusters such as External Store, Memcached, Parser Cache, Object Store and Labs. Meantime, new servers were rolled out in EQIAD for analytics, DNS resolver, UDP2Log and for Storage3 replacement.


 * IPV6 Launch day (6/6/12) came and went without much fanfare. Much work was put into the infrastructure and system-stack  by Mark, Faidon, Ryan and Asher, especially into  LVS, PayBal, Varnish, Squid, DNS, database, Nagios monitoring and puppetization. We also took this opportunity to update those technology as well as run them on Precise (12.04) where possible. We are have been keeping IPV6 traffic on since. As part of risk mitigation, only half of the LVS and Paybal servers were upgraded to run IPV6 and the enhanced features, allowing us to fallback if needed. Since we have now one month of stability, we will soon begin the rest of the migration.


 * During the Berlin Hackathon, the TechOps team got together for about 2 hours to review the year's progress. A blog on this will follow soon. In summary, the team completed 20 priority 1 projects (e.g., implement Mobile, SSL, Labs, Db upgrades & Network redundancy) that were identified at the beginning of the year. We followed up with a list of high priority projects for this new fiscal year. A blog with more details on this will follow soon. In addition to working on IPv6-related work, the team did a major cleanup of jobs creating cronspam, making the logfiles more readable.


 * Asher performed benchmark testing on the External Store, comparing the current ISAM engine with InnoDB. He dispelled the myth that myisam is faster for external store for this use case. He has started migrating them to use Innodb engine with this new information. You can read his report here.

Data Centers Object Store/Swift
 * We have identified a new colocation to be the new West Coast caching center, and it is located at 200 Paul Street, San Francisco. Work on building up the infrastructure is planned to begin this coming August/September. With this caching center, we will be able to improve users' site experience for US west coast and Asia Pacific.
 * A sever bottleneck has been identified in doing container listings in Swift and Ben Hartshorne is adding SSD drives to the swift back end storage nodes to provide faster container listings. Testing has been completed to verify that this change will solve the problem and it is being deployed to production this month.  Additionally, integration of the SwiftStack monitoring improvements was accepted to the mainline Swift codebase last month and will be deployed to our environment in July.

Testing environment
Wikimedia Labs
 * The Labs infrastructure had a DNS outage, caused by glue records that must be updated via a manual process. To combat that issue in the future, Labs DNS resolvers are now on service IPs with service host names. A DNS resolver was brought up in eqiad, as well as an additional LDAP replica. Faidon's puppetmaster::self class is being put into use. It's working well enough that the test branch for puppet was merged into the production branch, and Labs now runs directly off of the production branch. The very annoying "No nova credentials for your account" bug has been fixed. virt6-12 in pmtpa have been racked, wired and installed. They will soon be put into production. Andrew Bogott's work on the nova plugin framework continued this month. The plugin framework has been moved into openstack-common, making it the plugin framework for all openstack services. Work is now ongoing to merge the changes back into nova. Per-project debian repositories (for ubuntu-precise and above) are now available. An all-in-one MediaWiki puppet class is now available as well.

Backups and data archives
Data Dumps
 * Media downloads per project are now live, along with one or two "incremental" downloads per month. The new deployment system (which actually uses scripts instead of moving files aroudn by hand) was completed and is in place.  It was even used this month to push some minor changes. We're working with another organization that wants to mirror media, and we're still looking for more mirror sites for media, dumps or pageview stats; send us ideas!  The archive.org uploader code was rewritten as a core S3 uplader library with archive.org extensions and new features we need are being added; this will be extended for Google Storage usage as well.

Other news

 * We had our fair share of several short site incidents in the month of June. On June 7, users reported experiencing API service slowness and unavailability. Tim was around to resolve that incident. For more detail report please go to http://wikitech.wikimedia.org/view/Site_issue_June_7_2012 . On June 20 (and also on June 21), users reported about getting Apache HTTP timeout issue. It was found that in both cases, one of the memcached servers was experiencing high load and restarting them resolved the issue. For more detail report please go to  http://wikitech.wikimedia.org/view/Site_issue_June_20_2012 . The incident on June 19 did not impact our Mediawiki production clusters though it caused our email system to be held up for half a day. Jeff discovered a mail distributed mail spam on our mailsystem involving what appeared to be a few thousand malicious hosts. They were flooding our secondary mailserver with undeliverable messages to fake addresses at various WMF domains. The secondary mailserver forwarded those to the primary mailserver, which overloaded and became slow in processing legitimate mail. A temporary fix was put in place to drop those fake/spam messages but it took a day for the mail system to catch up. A proper fix was put in place subsequently.

Wikidata

 * The Wikidata project is funded and executed by Wikimedia Deutschland.

Future
The engineering management team continues to update the Software deployments page weekly, providing up-to-date information on the upcoming deployments to Wikimedia sites, as well as the engineering roadmap, listing ongoing and future Wikimedia engineering efforts.