Wikimedia Engineering/Report/2010/December

(to be posted on techblog.wikimedia.org by RobLa-WMF)

Welcome to the January monthly report from WMF Engineering! As always, we're reporting on what we've been working on and what's coming up. In December, ...

(insert the article break here)

Engineering Organization
Engineering Organization - Events and changes to overall WMF Engineering

India Tech Trip - One of the primary purposes of this trip was to determine the assessment of technical gaps to success in India. We found some areas of localization to standardize on Indic language wikis, such as editing tools and font rendering. We think that work will also be useful for non-Indic languages as well. We spoke a lot about offline reading, including taking the Malayalam Wikipedia offline version around to various government agencies and technology distributors, and got a lot of interest in that work as well as the Kiwix work (see below) that will make it possible to produce Wikipedia for offline versions.

Data Summit - Invitations have been sent for the February 4th event, an invite-only working session in California. We are still accepting requests for invitations up until January 17. Topics discussed at the summit will include semantic data, analytics, and research into data dumps.

Hack-a-Ton Amsterdam - The Dutch Wikimedia chapter is organizing the Amsterdam Hack-a-Ton around the 10th Anniversary party there. Those interested in attending should xxxx.

Planning for Hacking Days in Berlin - The Wikimedia Foundataion (WMF) is working with Wikimedia Deutschland (WMDE) on the format of the Berlin Developers meeting and Hacking Days (coinciding with the annual Chapters Meeting).

Other events

 * Strata - Many Wikimedians will be attending, and there will be a birds of a feather session with WMF's Erik Zachte on Wikimedia Data
 * FOSDEM - pending
 * GNUnify Wikimedia Track - this year's GNUnify in Pune, India will have a special focus on Wikimedia Engineering. WMF will be sending engineers to speak at this event.

Operations
Virginia Data Center - Setting up a world-class primary data center for Wikimedia Foundation websites
 * Status: We signed a lease with a co-location facility in Ashburn, VA, and our data center cage is being built over January. At the same time, we are procuring all of the necessary equipment to start installing our equipment and building our infrastructure there in February.
 * Program manager: Mark Bergsma

Media Storage - Re-working our media storage architecture to accommodate expected increase in media uploads.
 * Status: WMF Contractor Russ Nelson is evaluating and actively testing two existing distributed file systems / media storage systems on test hardware. This will allow us to make an informed decision on the development of our new Media Storage Architecture, which is planned to use commodity hardware and to be more extensible than our existing architecture. This work is needed because of increased demand for and contribution of images and other media to Commons.
 * Program manager: Mark Bergsma

Monitoring - As part of our initiative to improve overall uptime, we are enhancing both Operations and Public Monitoring to a) notice potential outages sooner, b) increase transparency to the community, c) support progress tracking required in the 5-year plan.
 * Status: We now have and are tuning text message notification of downtime on critical services in both Amsterdam and Tampa. We improved monitoring of our database servers and other critical services.  We're turning our attention to deepening our use of Watchmouse, which will help us spot trends and provide a public status page.  We plan to report separately on our site reliability on a periodic basis.
 * Program manager: Mark Bergsma

Virtualization cluster - more easily deploy temporary machines for testing and experimentation. This cluster is intended for use not just by WMF staff, but will be available to volunteers working on important projects as capacity allows.
 * Status: New production hardware has arrived for implementation of this virtualization cluster. On the software and development side, Ryan Lane (and volunteers) have built virtual machine images for Nova testing, completed about 70% of the OpenStackManager MediaWiki extension, and integrated LDAP support into OpenStack Nova. In the office, Ryan also implemented a virtualized test server to reduce the need for copies of separate end-user operating environments (such as various configurations of Windows, IE, MacOS, etc.) for Features testing.
 * Program manager: Mark Bergsma

Backups - Improve backup coverage of Wikimedia-hosted data
 * Status: We're turning our attention to offsite backups in Amsterdam of all critical data in Tampa. The concurrent buildout of the new primary data center will provide additional backup. We are looking at more reliable generic purpose storage.
 * Program manager: Mark Bergsma

Data Dumps - Providing public dumps of public Wikimedia data
 * Status: After a serious hardware failure that halted our usual Data Dumps, we've bought a new server which should be more reliable and give us some more redundancy. Additionally, we are seeking additional public to mirrors of our Data Dumps, which are very popular with researchers. By next week  we expect to have restored full generation of dumps of all wikis.
 * Program manager: Mark Bergsma

Content Quality Tools
Article Feedback - Feature to collaboratively assess article quality and incorporate reader ratings on Wikipedia.
 * Status: In development. The initial version (phase 1) has been in production since the end of September on English Wikipedia. Phase 2 of the tool, slated for release in January 2011, has been under development.  Planning for phase 3 will begin soon.
 * Program manager: Alolita Sharma

Pending Changes/FlaggedRevs - Pending Changes is a new review feature deployed to English Wikipedia, which allows changes made by anonymous and new users to be reviewed before they appear as the primary version of an article. The code is shared with the traditional FlaggedRevs implementation found on many other Wikipedia projects.
 * Status: We deployed another minor update for Pending Changes on December 22.  Our community team plans to work with the community to come up with a long term plan for the feature on English Wikipedia.
 * Program manager: Rob Lanphier

Threaded Discussions
Liquid Threads - LiquidThreads is an extension that brings threaded discussions capabilities to Wikimedia projects and MediaWiki.
 * Status: - Brandon spent the latter part of December focused on design work for the next iteration of this feature, with development slated to begin this month.
 * Program manager: Alolita Sharma

Multimedia Tools
Upload wizard - The upload wizard is an extension for MediaWiki that provides an easier way of uploading files to Wikimedia Commons, the media library associated with Wikipedia.
 * Status: after the deployment of this feature, we've spent the past month working on bugfixing and polish for the feature.
 * Program manager: Alolita Sharma

Media Projects - improved media handling and key infrastructure support tools. Many media components are developed in an open source development partnership with Kaltura, including Metavid, MwEmbed, and the Video Editor.
 * Status: XXX-find status from Michael Dale (email on 2011-01-04?)
 * Program manager: Alolita Sharma

MediaWiki Infrastructure
Resource loader - The resource loader aims to improve the load times for JavaScript and CSS in MediaWiki. This will enable faster loading of the Vector skin, media extensions, and anything else that makes extensive use of Javascript and CSS.
 * Status: Integration and testing in progress. We hope to have this ready for deployment in January, so that it can be part of a MediaWiki release sometime after that.
 * Program manager: Alolita Sharma

General Engineering
XXX-delete - replaced by OWA and udp2log sections Analytics Revamp - Incorporate a web analytics solution that can help the Wikimedia movement understand how Wikimedia sites are used. This project ncludes reworking our primary system (udp2log) as well as augmenting it with other systems (like Open Web Analytics) ^^^ XXX-delete - replaced by OWA and udp2log sections
 * Status:
 * Program managers: Rob Lanphier & Tomasz Finc

OWA -
 * Status: We deployed OWA in beta form in our infrastructure at the end-of-month.  We haven't yet had a chance to fully put it through the paces or pull a lot of interesting information out yet, but we plan to spend some time this month working out requirements for non-fundraising uses of this tool.   For more information about OWA, see our recent blog post on the OWA release candidate
 * Program managers: Rob Lanphier & Tomasz Finc

udp2log:
 * Status: now that the fundraiser is over, our next step is to deploy some changes to this critical piece of infrastructure.  We've reached the limits of what our current architecture can do; each time we want to track a new metric, we have to kill another metric we're tracking.  Our current version of the software uses unicast UDP to a single machine that has to handle all of the log traffic.  Our new version of the software multicasts the log messages, so that we can process the packets on multiple machines.  We're targetting a Q1 deployment for this version.
 * Program manager: Rob Lanphier

Test framework deployment - Build an automated test environment for MediaWiki using CruiseControl, Selenium, and PHPUnit.
 * Status: The bulk of the work in December has been fleshing out Selenium tests. Calcey Solutions has been working on installer testing.  There is renewed interest in phpUnit tests, with many commits from both WMF staff (Mark H.) as well as volunteer activity (soxred93(?)).
 * Program manager: Rob Lanphier

Code review - improving the way we provide code reviews for MediaWiki

1.17
 * Status: Our team of code reviewers continues to make headway on the backlog of outstanding checkins in "new" status. We peaked at 1400 unreviewed checkins back in September, last month we were at 800, and now we're now under 300.  We hope this means we can push a 1.17 version of MediaWiki to production sometime this month, with a tarball available some time after that.  See the statistics for 1.17.  We're now also at the point where we need to either address the issues marked "FIXME" or revert the commits that have that marking.
 * Program manager: Rob Lanphier

Technical Documentation – Improve our technical documentation by making small, incremental improvements to the docs and docs process.
 * Status: -  Zak Greant is currently pairing up with many of the core committers in documentation sessions.
 * Program Managers: Rob Lanphier / Zak Greant

wmsync – Replace our current deployment tools (e.g. "scap") with more robust software. Tim Starling and Mark Bergsma are in the very early stages of collaborating on the design of this, with some prototype software written in Python.
 * Status: This project is now on hold.
 * Program Manager: Rob Lanphier

Fundraising
2010 Fundraiser - The engineering tasks necessary to run a successful fundraiser, with sub-projects involving fraud prevention, CentralNotice, and the analytics upgrade.
 * Status:  We're done!  There's not much to say here that hasn't already been said about the fundraiser itself.  On the tech side, we did a number of things behind the scenes that drastically cut down on the level of fraud and bad credit card transactions.  We handled 500,000 transactions over the course of the fundraiser, and tested XXX-fixme banners.
 * Program manager: Tomasz Finc

Mobile
Mobile site rewrite - Porting our existing gateway for easier support, development and participation
 * Status:  We're still taking applications for the Software Engineer (Mobile) position here at Wikimedia Foundation.  Once we have the staff in place, we're taking the lessons learned from our current Ruby-based implementation to create something that integrates more completely into our system architecture.
 * Program manager: Tomasz Finc

Offline
Offline - better support for offline reading of Wikimedia projects
 * Status: We signed a contract with PediaPress to add openZim export funtionality to the collections extension.  We're beginning a usability evaluation on the Kiwix reader this month.
 * Program manager: Tomasz Finc

Misc
Data summit -