Wikimedia Engineering/Report/2011/January

January 2011 was a tough month for Wikimedia engineers. About 75% of us caught the "WikiPlague" (a.k.a. RSV) and were out of commission between 3 and 10 days. Also, with the end of the Fundraiser coming early, this past month has been a time of re-starting and re-setting priorities as we shift major focus away from supporting money making and on to money spending...

Major accomplishments this month include:
 * the completion of equipment specs and negotiations to order all equipment for the new primary data center in Ashburn, Virginia.
 * major work on getting MediaWiki 1.17 released, especially by reducing the Code Review queue to releasable levels.
 * work on increasing Nagios and Watchmouse monitoring.

Recent events

 * Amsterdam Hack-a-Ton— A ton of great work was accomplished by developers during this coding event that happened on January 14-15 in the Netherlands, particularly around multimedia. Alpha tools and projects include a WordPress plugin to embed pictures from Commons, an iPhone app to directly upload photos to Commons, and the integration of license information into MediaWiki.
 * StrataConf 2011 (February 1-3, 2011, Santa Clara, California) — Many Wikimedians attended this O'Reilly conference, and WMF staff Erik Zachte proposed a birds-of-a-feather session on Wikimedia Data.
 * Data Summit (today, California) — Most invited attendees have arrived to California to participate in this working session about semantic data, analytics and research into data dumps.

Upcoming events

 * FOSDEM 2011 (February 5-6, Brussels, Belgium) — Tomasz Finc, Arthur Richards and Roan Kattouw will be at FOSDEM this year to speak about data collection at Wikimedia.
 * GNUnify 2011 (February 11-12, Pune, India) — This year's GNUnify conference will have a special focus on Wikimedia Engineering in a dedicated track. WMF engineers Tomasz Finc and Arthur Richards will be doing presentations on Wikimedia, Mobile, Becoming a MediaWiki Developer & writing extensions, and Drupal & CiviCRM within Wikimedia.
 * Wikimedia Conference 2011 (May, Berlin) — Wikimedia Deutschland has held an annual Chapter's meeting in Berlin since 2009. Last year's event included a "Developer's Symposium" rather than a Hackathon, and was co-located with the annual Chapter's meeting. This year, Daniel Kinzler, from Wikimedia Deutschland, announced that the Berlin Developers' meeting would be separate from the Wikimedia Chapters' meeting. Instead, a hacking event will happen in late May.
 * Wikimania (August 2-6, Haifa, Israel) — This year's Wikimania will be preceded by two days of hacking. Mark your calendar for August 2-3! You can also submit a talk or workshop for the Technology tracks of the actual conference (August 4-7).

Hiring
Are you looking to work for Wikimedia? We have a lot of hiring coming up this year, and we really love talking to active community members about these roles. The following positions are currently open:
 * Volunteer Development Coordinator
 * Performance Engineer
 * Software Developer (Features)
 * Software Developer (Mobile)
 * Data Analytics Engineer
 * Operations Engineer
 * Senior QA Engineer

In addition, we hope to post the following positions over the next few months:
 * Release Engineer
 * Technical Writer
 * Network Engineer (contractor)

Operations
Virginia Data Center — Installation of a world-class primary data center for Wikimedia Foundation websites.
 * Status: The data center cage build is almost complete, and almost all hardware has been ordered. We are on track for starting our build out in February.
 * Program manager: Mark Bergsma

Media Storage — Improvement of our media storage architecture to accommodate expected increase in media uploads.
 * Status: Russell Nelson has evaluated both MogileFS and Swift as possible solutions to base our rearchitected Media Storage on. Although both systems performed reliably, we have selected OpenStack Swift for further testing, as it performed slightly better and is slightly more suitable to our needs. We're going to build a test setup consisting of three machines using a small portion of our media storage production data, and we will replay part of our actual live requests in our mirrored setups to our test cluster.
 * Program manager: Mark Bergsma

Monitoring — Operations and public monitoring system to improve overall uptime, prevent outages, increase transparency and support progress tracking.
 * Status: We've been testing and improving monitoring checks using Watchmouse. Although we will continuously be updating and tweaking our monitoring, a public status dashboard is now available.
 * Program manager: Mark Bergsma

Virtualization cluster — Environment to deploy temporary machines for testing and experimentation, for use by WMF staff and volunteers working on important projects (as capacity allows).
 * Status: Ryan Lane announced the release of the OpenStackManager extension for MediaWiki, which interacts with OpenStack, the EC2 API and LDAP to manage a virtual machine infrastructure (read more). However, there was a bit of a setback with some missing features in OpenStack testing, so we're waiting for the next OpenStack release before deploying this. We also have a little more hardware to configure as well.
 * Program manager: Mark Bergsma

Backups — Improvement of backup coverage of Wikimedia-hosted data.
 * Status: We're focusing on the new data center so that we have all of the data in both Tampa and our Virginia data center, and we're evaluating our options for a dedicated storage solution.
 * Program manager: Mark Bergsma

Data Dumps — Improvement of processes to create and provide public copies of public Wikimedia data.
 * Status: The data sets are back up and new dumps are running. We brought up a second machine for redundantly storing dumps, and we're also buying more machines for generating dumps faster. We'll have more capacity for dumps once the new data center is fully online.
 * Program manager: Mark Bergsma

Content Quality Tools
Article Feedback — A feature to collaboratively assess article quality and incorporate reader ratings on Wikipedia.
 * Status: Phase 2 is currently in development. The requirements have validated against the current codebase and UI. This feature will be dark launched on prototype on February 4. We'll wait until after the 1.17 deployment before making this feature live due to Resource Loader dependencies.
 * Program manager: Alolita Sharma

Pending Changes — A feature to allow changes made by logged-out and new users to be reviewed before they appear as the primary version of an article.
 * Status: Development has been largely on hold with minor bug fixes done. This is due to a wait for further community discussion, including a next poll, which now that the dust is settling on the fundraiser and Wikipedia 10 celebration, should be soon. See continuing discussion.
 * Program manager: Rob Lanphier

Threaded Discussions
Liquid Threads — A feature that brings threaded discussions capabilities to Wikimedia projects and MediaWiki.
 * Status: An evaluation of LiquidThreads is underway, both at the UI and code infrastructure design. We are modifying the design to incorporates input from several community discussions as well as from our engineering staff. We plan to consolidate all of our documentation on these discussions on mediawiki.org in the next month.
 * Program manager: Alolita Sharma

Multimedia Tools
Upload wizard — A feature that provides an easier way of uploading files to Wikimedia Commons, the media library associated with Wikipedia.
 * Status: The final Multimedia usability project report was published on meta, and the licensing tutorial was localized into more languages, including Hindi and Farsi. Bugfixes are in progress.
 * Program manager: Alolita Sharma

Media Projects — A set of features to improve media handling and key infrastructure support tools, many developed with Kaltura, such as Metavid, MwEmbed, and the Video Editor.
 * Status: Integration of Resource Loader in MediaWiki 1.17 in timedMediaHandler code base in progress. This will help overall integration of the video editor.
 * Program manager: Alolita Sharma

MediaWiki infrastructure
Resource loader — A feature to improve the load times for JavaScript and CSS in MediaWiki, enabling faster loading of the Vector skin, media extensions, and anything else that makes extensive use of Javascript and CSS.
 * Status: One aspect of Resource Loader is the automatic compression of JavaScript files as they are delivered to the browser. Trevor Parscal announced that the customized version of JSMin (our previous JavaScript compressor) had been replaced by JavaScriptDistiller, which is more efficient. ResourceLoader will be deployed as part of the 1.17 upgrade next week.
 * Program manager: Alolita Sharma

MediaWiki development
MediaWiki 1.17 — The upcoming MediaWiki release.
 * Status: As announced earlier this week, we will be deploying MediaWiki 1.17 to the site on February 8th. We're still working through the code review queue (revisions, graph) and fixing some integration issues. The major noticeable feature of this deployment will be Resource Loader (see above), which will enable the release of Article Feedback sometime after that. We're also dark-launching the improved category collation code that Aryeh Gregor wrote back in August, and we'll enable this feature very soon after the 1.17 launch. Some new extensions will also wait until after the dust settles on the 1.17 launch.
 * Program manager: Rob Lanphier

Test framework deployment — Creation of an automated test environment for MediaWiki using CruiseControl, Selenium, and PHPUnit.
 * Status: Markus Glaser continues to work on the Selenium framework. Most of the other people working on this have shifted their focus to 1.17 deployment for the time being.
 * Program manager: Rob Lanphier

Technical Documentation – Improvement of our technical documentation by making small, incremental improvements to the docs and docs process.
 * Status: We're continuing to work on improving the technical documentation, and we're looking to expand volunteer involvement in this area.
 * Program manager: Rob Lanphier & Zak Greant

Wikimedia analytics
udp2log — A custom data analytics logging system.
 * Status: Our new version of the software multicasts log messages through UDP, so that we can process the packets on multiple machines. We have hardware reserved for this project, and now will be deploying a second logging machine. This will allow us to add more metrics without taking any away.
 * Program manager: Rob Lanphier

OWA — Installation and customization of an Open Web Analytics (OWA) platform to process data to support decision making
 * Status: We're still going through the information recorded during the fundraiser and working on requirements to use this tool in other areas.
 * Program managers: Rob Lanphier & Tomasz Finc

Fundraising
2010 Fundraiser — Engineering support for the yearly fundraiser (includes fraud prevention, CentralNotice, and the analytics upgrade).
 * Status: January has been our cleanup and documentation update month.
 * Program manager: Tomasz Finc

Mobile
Mobile site rewrite — Port of our existing gateway to another framework for easier support & collaborative development.
 * Status: We're still in hiring mode. We're putting together a roadmap for our mobile development and starting to coordinate research and development. We're drafting a survey now.
 * Program manager: Tomasz Finc

Offline
Offline — Better support for offline reading of Wikimedia content.
 * Status: Our partners from PediaPress have started to extend the Collection extension to support openZim, the file format we decided to support. We're also preparing a usability study to improve the user experience of Kiwix, an offline app to read Wikimedia content. Last, we're working to improve the Wikipedia version tools. A more detailed update on Offline-related work was recently published. Starting next month, these three projects will get dedicated updates.
 * Program manager: Tomasz Finc