Wikimedia Engineering/Report/2010/December

(to be posted on techblog.wikimedia.org by RobLa-WMF)

Welcome to the January monthly report from WMF Engineering! As always, we're reporting on what we've been working on and what's coming up. In December, the fundraiser was in full swing, with a portion of the Engineering team (Arthur Richards, Ryan Kaldari, Nimish Gautam, and Tomasz Finc) supporting the fundraising infrastructure. Danese Cooper, Erik Möller, and Alolita Sharma were in India for most of the month, while much of the rest of the team was focused on the ramp-up to MediaWiki 1.17. More below the fold...

Events
India — Senior Wikimedia Engineering staff go to India. Danese Cooper, Alolita Sharma and Erik Möller traveled to India (along with other Wikimedia senior staff) in December. One of the primary purposes of this trip was to assess technical gaps to success in India. Some areas of localization were identified to standardize on Indic language wikis, such as editing tools and font rendering; this work will be useful for non-Indic languages as well. Offline reading was a recurring topic, and copies of an offline version of the Malayalam Wikipedia were shown to various government agencies and technology distributors. It raised a lot of interest, as did the work on Kiwix (see below) that will make it possible to produce offline versions of Wikipedia.

Data Summit (February 4th, California) — A working session about semantic data, analytics and research into data dumps. Invitations have been sent for this invite-only working session in California.

Amsterdam Hack-a-Ton (January 14-15, the Netherlands) — A coding event for MediaWiki developers. The Dutch Wikimedia chapter is organizing a coding event around the 10th Anniversary party there. More information is available on the chapter's wiki in Dutch and English.

Other events

 * StrataConf 2011 (February 1-3, 2011, Santa Clara, California) — Many Wikimedians will be attending this O'Reilly conference, and there will be a birds-of-a-feather session with WMF staff Erik Zachte on Wikimedia Data.
 * FOSDEM 2011 (February 5-6, Brussels, Belgium) — Tomasz Finc, Arthur Richards and Roan Kattouw will be at FOSDEM 2011 this year. We'll speaking about data collection at Wikimedia.
 * GNUnify 2011 (February 11-12, Pune, India) — This year's GNUnify conference will have a special focus on Wikimedia Engineering in a dedicated track, in which WMF engineers will be speaking.
 * Wikimedia Conference 2011 (Spring, Berlin) — The Wikimedia Foundation and Wikimedia Deutschland are currently discussing the format of the Berlin Developers meeting and Hacking Days (coinciding with the annual Chapters Meeting).

Hiring
Are you looking to work for Wikimedia ? We have a lot of hiring coming up before the end of the year. Job descriptions are already posted for the following:
 * Performance Engineer
 * Software Developer (Features)
 * Software Developer (Mobile)
 * Data Analyst

In addition, we hope to post the following positions over the next few months: (Note: all of these positions may change as our requirements evolve.)
 * Senior QA Engineer
 * Release Engineer
 * Technical Writer
 * DBA/Storage Engineer (contractor)
 * Network Engineer (contractor)
 * Volunteer Development Coordinator

We also recently hired C.T. Woo as Director of Technical Operations (read the announcement).

Operations
Virginia Data Center — Installation of a world-class primary data center for Wikimedia Foundation websites. Status: We signed a lease with a co-location facility in Ashburn, Virginia, and our data center cage is being built over January. At the same time, we are procuring all of the necessary equipment to start installing our equipment and building our infrastructure there in February. Program manager: Mark Bergsma

Media Storage — Improvement of our media storage architecture to accommodate expected increase in media uploads. Status: WMF Contractor Russ Nelson is evaluating and actively assessing two existing distributed file systems and media storage systems on test hardware. This will allow us to make an informed decision on the development of our new Media Storage Architecture, which is planned to use commodity hardware and to be more extensible than our existing architecture. This work is needed because of increased demand for and contribution of images and other media to Commons. Program manager: Mark Bergsma

Monitoring — Operations and public monitoring system to improve overall uptime, prevent outages, increase transparency and support progress tracking. Status: We are now tuning text message notification of downtime on critical services in both Amsterdam and Tampa. We improved monitoring of our database servers and other critical services. We're turning our attention to deepening our use of Watchmouse, which will help us spot trends and provide a public status page. Program manager: Mark Bergsma

Virtualization cluster — Environment to deploy temporary machines for testing and experimentation, for use by WMF staff and volunteers working on important projects (as capacity allows). Status: New production hardware has arrived for implementation of this virtualization cluster. On the software and development side, Ryan Lane (and volunteers) have built virtual machine images for Nova testing, completed about 70% of the OpenStackManager MediaWiki extension, and integrated LDAP support into OpenStack Nova (read Ryan's article). In the office, Ryan also implemented a virtualized test server to reduce the need for copies of separate end-user operating environments (such as various configurations of Windows, IE, MacOS, etc.) for Features testing. Program manager: Mark Bergsma

Backups — Improvement of backup coverage of Wikimedia-hosted data. Status: We're turning our attention to offsite backups in Amsterdam of all critical data in Tampa. The concurrent buildout of the new primary data center will provide additional backup. We are looking at more reliable generic purpose storage. Program manager: Mark Bergsma

Data Dumps — Improvement of processes to create and provide public copies of public Wikimedia data. Status: After a serious hardware failure that halted our usual Data Dumps, we've bought a new server which should be more reliable and give us some more redundancy. Additionally, we are seeking additional public mirrors of our Data Dumps, which are very popular with researchers. We have begun generating new XML dumps and expect to complete a full run of dumps of all wikis this month. Program manager: Mark Bergsma

Content Quality Tools
Article Feedback — A feature to collaboratively assess article quality and incorporate reader ratings on Wikipedia. Status: In development. The initial version (phase 1) has been in production since the end of September on English Wikipedia. Phase 2, slated for release in January 2011, has been under development. Planning for phase 3 will begin soon. Program manager: Alolita Sharma

Pending Changes — A feature to allow changes made by logged-out and new users to be reviewed before they appear as the primary version of an article. Status: We deployed another minor update for Pending Changes on December 22. Our colleagues from the Community department plans to work with the community to come up with a long-term plan for the feature on the English Wikipedia. Program manager: Rob Lanphier

Threaded Discussions
Liquid Threads — A feature that brings threaded discussions capabilities to Wikimedia projects and MediaWiki. Status: During the latter part of December, WMF designer Brandon Harris focused on the next iteration of this feature, with development slated to begin in January. Program manager: Alolita Sharma

Multimedia Tools
Upload wizard — A feature that provides an easier way of uploading files to Wikimedia Commons, the media library associated with Wikipedia. Status: After the deployment of this feature in beta on November 30th, we've spent the past month working on analyzing feedback from users, fixing bugs and polishing the feature. Additional localized versions of the Licensing tutorial were also created. Program manager: Alolita Sharma

Media Projects — A set of features to improve media handling and key infrastructure support tools, many developed with Kaltura, such as Metavid, MwEmbed, and the Video Editor. Status: Michael Dale from Kaltura continues work on the HTML5 player, improving the core player with better embedability and Wikimedia Commons integration. He is now working on migrating this to the Resource Loader in MediaWiki 1.17, which should help with overall integration of the player. There are further timeline and image editing improvements to the media sequencer made available last September. Look for a more detailed update from Michael later this month. Program manager: Alolita Sharma

MediaWiki Infrastructure
Resource loader — A feature to improve the load times for JavaScript and CSS in MediaWiki, enabling faster loading of the Vector skin, media extensions, and anything else that makes extensive use of Javascript and CSS. Status: Integration and testing are in progress. We hope to have this ready for deployment in January as part of the MediaWiki 1.17 release, so that it can be part of a MediaWiki release sometime after that. Program manager: Alolita Sharma

MediaWiki development
MediaWiki 1.16.1 — security release Status: we released MediaWiki 1.16.1, which is a security release to provide a new set of defense mechanisms for clickjacking. See the release announcement for more information.

MediaWiki 1.17 — The upcoming MediaWiki release. Status: Our team of code reviewers continues to make headway on our backlog of outstanding code commits in "new" status. We're now under 300 unreviewed commits (see the evolution over time). We hope this means we can push a version of MediaWiki from our 1.17 branch to production this month. In the weeks after the production push, we plan to fix the more urgent bugs we find in production, further test and fix the installer and alternate databases (e.g. PostgreSQL), and provide a tarball for download by users. Program manager: Rob Lanphier

Test framework deployment — Creation of an automated test environment for MediaWiki using CruiseControl, Selenium, and PHPUnit. Status: The bulk of the work in December has been fleshing out Selenium tests. Calcey Technologies has been working on installer testing. There is renewed interest in phpUnit tests, with many commits from both WMF staff and volunteers alike. Program manager: Rob Lanphier

Technical Documentation – Improvement of our technical documentation by making small, incremental improvements to the docs and docs process. Status: WMF contractor Zak Greant is currently pairing up with many of the core committers in documentation sessions. Program manager: Rob Lanphier & Zak Greant

Wikimedia analytics
udp2log — A custom data analytics logging system (formerly "Analytics upgrade"). Status: Now that the fundraiser is over, our next step is to deploy some changes to this critical piece of infrastructure. We've reached the limits of what our current architecture can do; each time we want to track a new metric, we have to kill another metric we're tracking. Our current version of the software uses unicast UDP to a single machine that has to handle all of the log traffic. Our new version of the software multicasts the log messages, so that we can process the packets on multiple machines. We're targeting a deployment in the first quarter of 2011 for this version. Program manager: Rob Lanphier

OWA — Installation and customization of an Open Web Analytics (OWA) platform to process data that can help the Wikimedia movement understand how Wikimedia sites are used. Status: We deployed OWA in beta form in our infrastructure at the end of December. We haven't yet had a chance to fully put it through the paces or pull a lot of interesting information out yet, but we plan to spend time in January working out requirements for uses of this tool besides fundraising. For more information about OWA, see our recent blog post on the OWA release candidate. Program managers: Rob Lanphier & Tomasz Finc

Wikimedia deployment
wmsync – Replacement of our current deployment tools (e.g. "scap") with more robust software. Status: This project is on hold. Tim Starling and Mark Bergsma got an early start, but other priorities intervened. Program manager: Rob Lanphier

Fundraising
2010 Fundraiser — Engineering support for the yearly fundraiser (includes fraud prevention, CentralNotice, and the analytics upgrade). Status: We're done! There's not much to say here that hasn't already been said about the fundraiser itself. On the tech side, we did a number of things behind the scenes that drastically cut down on the level of fraud and bad credit card transactions. We handled over 500,000 transactions, tested 500 banners, and tested 1200 landing pages over the course of the fundraiser. Program manager: Tomasz Finc

Mobile
Mobile site rewrite — Port of our existing gateway to another framework for easier support & collaborative development. Status: We're still taking applications for the Software Engineer (Mobile) position. Once we have the staff in place, we'll take the lessons learned from our current Ruby-based implementation to create something that integrates more completely into our system architecture. Program manager: Tomasz Finc

Offline
Offline — Better support for offline reading of Wikimedia content. Status: We signed a contract with PediaPress to add openZim export functionality to the Collection extension. We're beginning a usability evaluation on the Kiwix reader this month. Program manager: Tomasz Finc

Internal support
We have a number of small projects of note:
 * Working with our HR department on evaluating HR management software evaluation (Project Manager: Alolita Sharma)
 * Evaluating online stores (Project Manager: Tomasz Finc)