Wikimedia Engineering/Report/2010/December

(to be posted on techblog.wikimedia.org by RobLa-WMF)

Welcome to the January monthly report from WMF Engineering! As always, we're reporting on what we've been working on and what's coming up. In December, ...

(insert the article break here)

Events
India — Danese Cooper, Alolita Sharma and Erik Möller traveled to India (along with other Wikimedia senior staff) in December. One of the primary purposes of this trip was to assess technical gaps to success in India. We found some areas of localization to standardize on Indic language wikis, such as editing tools and font rendering. We think that work will also be useful for non-Indic languages as well. We spoke a lot about offline reading, including taking the Malayalam Wikipedia offline version around to various government agencies and technology distributors, and got a lot of interest in that work as well as the Kiwix work (see below) that will make it possible to produce Wikipedia for offline versions.
 * XXX Link to full summary / dedicated blog post would be great XXX

Data Summit (February 4th, California) — Invitations have been sent for this invite-only working session in California. We are still accepting requests for invitations up until January 17. Topics discussed at the summit will include semantic data, analytics, and research into data dumps.

Amsterdam Hack-a-Ton (February 14-15, the Netherlands) — The Dutch Wikimedia chapter is organizing a coding event around the 10th Anniversary party there. More information is available on the chapter's wiki in Dutch and English. (XXX needs English translation before we publish the blog post XXX)

Other events

 * StrataConf 2011 (February 1-3, 2011, Santa Clara, California) — Many Wikimedians will be attending this O'Reilly conference, and there will be a birds-of-a-feather session with WMF staff Erik Zachte on Wikimedia Data.
 * FOSDEM 2011 (February 5-6, Brussels, Belgium) — XXX pending XXX
 * GNUnify 2011 (February 11-12, Pune, India) — This year's GNUnify conference will have a special focus on Wikimedia Engineering in a dedicated track, in which WMF engineers will be speaking.
 * Wikimedia Conference 2011 (Spring, Berlin) — The Wikimedia Foundation and Wikimedia Deutschland are currently discussing the format of the Berlin Developers meeting and Hacking Days (coinciding with the annual Chapters Meeting).

Operations
Virginia Data Center — Installation of a world-class primary data center for Wikimedia Foundation websites.
 * Status: We signed a lease with a co-location facility in Ashburn, VA, and our data center cage is being built over January. At the same time, we are procuring all of the necessary equipment to start installing our equipment and building our infrastructure there in February.
 * Program manager: Mark Bergsma

Media Storage — Improvement of our media storage architecture to accommodate expected increase in media uploads.
 * Status: WMF Contractor Russ Nelson is evaluating and actively testing two existing distributed file systems / media storage systems on test hardware. This will allow us to make an informed decision on the development of our new Media Storage Architecture, which is planned to use commodity hardware and to be more extensible than our existing architecture. This work is needed because of increased demand for and contribution of images and other media to Commons.
 * Program manager: Mark Bergsma

Monitoring — Operations and public monitoring system to improve overall uptime, prevent outages, increase transparency and support progress tracking.
 * Status: We now have and are tuning text message notification of downtime on critical services in both Amsterdam and Tampa. We improved monitoring of our database servers and other critical services. We're turning our attention to deepening our use of Watchmouse, which will help us spot trends and provide a public status page. We plan to report separately on our site reliability on a periodic basis.
 * Program manager: Mark Bergsma

Virtualization cluster — Environment to deploy temporary machines for testing and experimentation, for use by WMF staff and volunteers working on important projects (as capacity allows).
 * Status: New production hardware has arrived for implementation of this virtualization cluster. On the software and development side, Ryan Lane (and volunteers) have built virtual machine images for Nova testing, completed about 70% of the OpenStackManager MediaWiki extension, and integrated LDAP support into OpenStack Nova. In the office, Ryan also implemented a virtualized test server to reduce the need for copies of separate end-user operating environments (such as various configurations of Windows, IE, MacOS, etc.) for Features testing.
 * Program manager: Mark Bergsma

Backups — Improvement of backup coverage of Wikimedia-hosted data.
 * Status: We're turning our attention to offsite backups in Amsterdam of all critical data in Tampa. The concurrent buildout of the new primary data center will provide additional backup. We are looking at more reliable generic purpose storage.
 * Program manager: Mark Bergsma

Data Dumps — Improvement of processes to create and provide public copies of public Wikimedia data.
 * Status: After a serious hardware failure that halted our usual Data Dumps, we've bought a new server which should be more reliable and give us some more redundancy. Additionally, we are seeking additional public to mirrors of our Data Dumps, which are very popular with researchers. By next week we expect to have restored full generation of dumps of all wikis.
 * Program manager: Mark Bergsma

Content Quality Tools
Article Feedback — A feature to collaboratively assess article quality and incorporate reader ratings on Wikipedia.
 * Status: In development. The initial version (phase 1) has been in production since the end of September on English Wikipedia. Phase 2 of the tool, slated for release in January 2011, has been under development. Planning for phase 3 will begin soon.
 * Program manager: Alolita Sharma

Pending Changes/FlaggedRevs — A feature to allow changes made by logged-out and new users to be reviewed before they appear as the primary version of an article.
 * Status: We deployed another minor update for Pending Changes on December 22. Our community team plans to work with the community to come up with a long term plan for the feature on English Wikipedia.
 * Program manager: Rob Lanphier

Threaded Discussions
Liquid Threads — A feature that brings threaded discussions capabilities to Wikimedia projects and MediaWiki.
 * Status: - Brandon spent the latter part of December focused on design work for the next iteration of this feature, with development slated to begin this month.
 * Program manager: Alolita Sharma

Multimedia Tools
Upload wizard — A feature that provides an easier way of uploading files to Wikimedia Commons, the media library associated with Wikipedia.
 * Status: after the deployment of this feature, we've spent the past month working on bugfixing and polish for the feature.
 * Program manager: Alolita Sharma

Media Projects — A set of features to improve media handling and key infrastructure support tools, many developed with Kaltura, such as Metavid, MwEmbed, and the Video Editor.
 * Status: XXX-find status from Michael Dale (email on 2011-01-04?)
 * Program manager: Alolita Sharma

MediaWiki Infrastructure
Resource loader — A feature to improve the load times for JavaScript and CSS in MediaWiki, enabling faster loading of the Vector skin, media extensions, and anything else that makes extensive use of Javascript and CSS.
 * Status: Integration and testing in progress. We hope to have this ready for deployment in January, so that it can be part of a MediaWiki release sometime after that.
 * Program manager: Alolita Sharma

General Engineering
udp2log — A custom data analytics logging system (formerly "Analytics upgrade").
 * Status: now that the fundraiser is over, our next step is to deploy some changes to this critical piece of infrastructure. We've reached the limits of what our current architecture can do; each time we want to track a new metric, we have to kill another metric we're tracking. Our current version of the software uses unicast UDP to a single machine that has to handle all of the log traffic. Our new version of the software multicasts the log messages, so that we can process the packets on multiple machines. We're targeting a deployment in the first quarter of 2011 for this version.
 * Program manager: Rob Lanphier

OWA — Installation and customization of an OpenWebAnalytics platform to process and analyze analytics data that can help the Wikimedia movement understand how Wikimedia sites are used.
 * Status: We deployed OWA in beta form in our infrastructure at the end-of-month. We haven't yet had a chance to fully put it through the paces or pull a lot of interesting information out yet, but we plan to spend some time this month working out requirements for non-fundraising uses of this tool. For more information about OWA, see our recent blog post on the OWA release candidate
 * Program managers: Rob Lanphier & Tomasz Finc

Test framework deployment — Creation of an automated test environment for MediaWiki using CruiseControl, Selenium, and PHPUnit.
 * Status: The bulk of the work in December has been fleshing out Selenium tests. Calcey Solutions has been working on installer testing. There is renewed interest in phpUnit tests, with many commits from both WMF staff (Mark H.) as well as volunteer activity (soxred93(?)).
 * Program manager: Rob Lanphier

Code review — Improvement of the way collaboratively-developed MediaWiki code is peer-reviewed.

MediaWiki 1.17 — The upcoming MediaWiki release.
 * Status: Our team of code reviewers continues to make headway on the backlog of outstanding checkins in "new" status. We peaked at 1400 unreviewed checkins back in September, last month we were at 800, and now we're now under 300. We hope this means we can push a 1.17 version of MediaWiki to production sometime this month, with a tarball available some time after that. See the statistics for 1.17. We're now also at the point where we need to either address the issues marked "FIXME" or revert the commits that have that marking.
 * Program manager: Rob Lanphier

Technical Documentation – Improvement of our technical documentation by making small, incremental improvements to the docs and docs process.
 * Status: - Zak Greant is currently pairing up with many of the core committers in documentation sessions.
 * Program Managers: Rob Lanphier / Zak Greant

wmsync – Replacement of our current deployment tools (e.g. "scap") with more robust software. Tim Starling and Mark Bergsma are in the very early stages of collaborating on the design of this, with some prototype software written in Python.
 * Status: This project is now on hold.
 * Program Manager: Rob Lanphier

Fundraising
2010 Fundraiser - The engineering tasks necessary to run a successful fundraiser, with sub-projects involving fraud prevention, CentralNotice, and the analytics upgrade.
 * Status: We're done! There's not much to say here that hasn't already been said about the fundraiser itself. On the tech side, we did a number of things behind the scenes that drastically cut down on the level of fraud and bad credit card transactions. We handled 500,000 transactions over the course of the fundraiser, and tested XXX-fixme banners.
 * Program manager: Tomasz Finc

Mobile
Mobile site rewrite - Porting our existing gateway for easier support, development and participation
 * Status: We're still taking applications for the Software Engineer (Mobile) position here at Wikimedia Foundation. Once we have the staff in place, we're taking the lessons learned from our current Ruby-based implementation to create something that integrates more completely into our system architecture.
 * Program manager: Tomasz Finc

Offline
Offline - better support for offline reading of Wikimedia projects
 * Status: We signed a contract with PediaPress to add openZim export funtionality to the collections extension. We're beginning a usability evaluation on the Kiwix reader this month.
 * Program manager: Tomasz Finc