Wikimedia Engineering/Report/2011/May

Major news this month include:
 * the Berlin Hackathon, where about 70 developers and engineers met to improve our technical infrastructure.
 * the deployment of the Upload Wizard as default uploader on Wikimedia Common;
 * the continued development, deployment and roll-out of the Article feedback tool on the English Wikipedia;
 * major progress in reducing our code review backlog.

Recent events

 * Berlin Hackathon 2011 (May 13-15, Berlin) — About 70 MediaWiki developers and engineers participated in this event, organized by Wikimedia Deutschland. A lot of coding, bug squashing and discussion happened over these three days, including on the new parser, performance improvement and infrastructure; see the dedicated blog posts for more information (Friday, Saturday, Sunday, Monday). A special effort was made on documentation and remote attendance: all the talks were recorded, photos were taken and notes were taken in real time on all three days (notes for Friday, Saturday and Sunday).


 * CiviCRM code sprint (May 24-25, Berlin) — The Wikimedia Foundation office in San Francisco hosted a coding sprint for about 8 CiviCRM developers in May. Participants squashed many bugs, and also improved contact/contribution search performance by 15-25x. This is particularly useful for major users of CiviCRM with large databases, like the Wikimedia Foundation and its donors database. The Wikimedia Foundation, who endeavors to contribute to the free software ecosystem, had already hosted meet-ups for the CiviCRM community in the past.

Upcoming events

 * Wikimania (August 2-7, Haifa, Israel) —


 * Check out the Software deployments page on the wikitech wiki for up-to-date information on the upcoming deployments to Wikimedia sites.

Job openings
Are you looking to work for Wikimedia? We have a lot of hiring coming up, and we really love talking to active community members about these roles.

The following positions have opened this month:
 * Operations Engineer — Special projects
 * Software Developer Front-end — General
 * Software Developer Back-end — General

The following positions are still open:
 * Engineering Program Manager — Data Analytics
 * Software Developer — Features
 * Systems Engineer — Data Analytics (previously Data Analytics Engineer)
 * Operations Engineer
 * Senior QA Engineer
 * Networking Contractor — Amsterdam
 * Software Developer, Rich Text Editing — Features
 * Product Manager — Features

In addition, we hope to post the following positions over the next few months:
 * Release Engineer
 * Technical Writer

Short news

 * Visitors —
 * Hires and changes
 * Asher Feldman was hired as full-time Performance Engineer.
 * Andrew Shields is a new contractor working as a Technician in our Tampa data center.
 * Nimish Gautam started to transition from the Technology department to the Global Development department of the Wikimedia Foundation.
 * Software Engineer — Community R&D

Operations

 * Program manager: Mark Bergsma

Site operations
Virginia Data Center — Installation of a world-class primary data center for Wikimedia Foundation websites.
 * Status: Unfortunately the delivery of our connectivity has been incurring additional delays, which has prevented us from bringing services in the new data center live. The latest estimation for delivery is June 10th, after which we should be able to deploy some services running actively from the new location.

Media Storage — Improvement of our media storage architecture to accommodate expected increase in media uploads.
 * Status: The Swift cluster on the test servers was upgraded from version 1.1 to 1.3, to fix some problems we were observing, and to test with the latest released code as well. Russ Nelson also continued to develop a MediaWiki extension to integrate MediaWiki with Swift.

Testing environment
Virtualization test cluster — Environment to deploy temporary machines for testing and experimentation, for use by WMF staff and volunteers working on important projects (as capacity allows).
 * Status:

Backups and data archives
Data Dumps — Improvement of processes to create and provide public copies of public Wikimedia data.
 * Status: We incurred delays in moving to the new server, but started to puppetize the rollout of new servers. This will simplify not only the setup of new hosts, but also the maintenance of current servers. We are still trying to identify the cause of an issue with compression of files sometimes quitting partway through. We missed our target of producing new English Wikipedia dumps once a month (holding out for the new server), but data will be available in early June.

Other activities

 * Backups — This project was on hold in May, as we were still waiting for connectivity between our two data centers to be installed. We expect to have live replication or daily backups of all important data by the end of June.
 * Router upgrade — Due to multiple ongoing issues on several network devices, we decided to schedule an hour of maintenance on Tuesday May 24th, to upgrade software on nearly all devices, fixing multiple bugs that we were experiencing. This appears to have resolved the issues that we were seeing. A summary of the event is posted here.
 * HTTPS & IPv6 — Ryan Lane, Peter Youngmeister and Mark Bergsma have started working on HTTPS and IPv6 support for the wiki platforms. A new test cluster is being set up to serve these protocols, and limited testing on a subset of wikis will commence soon.
 * Tampa Data Center contractor - Rich Cole is no longer with us and we have now Andrew Shields taking his place.
 * Additional servers for m.wikimedia.org - To cater for the growth, two new servers are racked and ready for application installation.

Features Engineering

 * Program manager: Alolita Sharma

Editing tools
Visual editor 0.1 — Exploratory work to identify & prototype initial ideas for a visual editor for MediaWiki.
 * Status: Trevor Parscal and Neil Kandalgaonkar have done exploratory work on the visual editor project. Neil worked with developers of HackPad (a custom version of real-time collaborative editing software Etherpad) on a proof of concept of integration between Etherpad and MediaWiki (read more). They're now working on turning it into a MediaWiki extension. Work on the visual editor is also intersecting with the groundwork done on the new parser.

Content Quality and Editorial Tools
Article Feedback — A feature to collaboratively assess article quality and incorporate reader ratings on Wikipedia.
 * Status: Version 3 was deployed to the English Wikipedia on May 9 with new features, like the article feedback dashboard, a summary page showing general rating trends. The experiment was expanded to 100,000 articles and may be expanded further after analysis of the results. The article feedback feature is now in maintenance and bugfixing mode.

Discussions and Interactions
Liquid Threads — A feature that brings threaded discussions capabilities to Wikimedia projects and MediaWiki.
 * Status: Lead developer Andrew Garrett continued to work on his new object model for the back-end. Timo Tijhof or Jan Paul Posma  be working with Andrew on the new front-end.

WikiLove 1.0 — An extension to encourage praise and virtual gifts between users.
 * Status: Ryan Kaldari and Jan Paul Posma completed the first version of the WikiLove extension, including documentation for the API. They also made changes to the code in order to add parameters to the configuration to add new gifts. The code is now pending review.

MoodBar 0.1 — A feature to encourage new users to provide feedback.
 * Status: Brandon Harris published initial designs for this feature allowing new users to quickly provide feedback to the community. Erik Möller reviewed the designs with the team, and prototyping began.

Multimedia Tools
Upload wizard — A feature that provides an easier way of uploading files to Wikimedia Commons, the media library associated with Wikipedia.
 * Status: The Upload wizard was enabled as the default upload system on Wikimedia Commons on May 9. It was disabled shortly after that because of issues believed to come from a bug in ResourceLoader. It was re-enabled on May 17 after further investigation. This project is now in maintenance and bugfixing mode.

Other projects

 * ResourceLoader 2.0 — ResourceLoader 1.0 is now in maintenance mode. Roan Kattouw and Timo Tijhof discussed requirements during the Berlin Hackathon, but there are currently no engineering resources available to work on ResourceLoader 2.0.
 * Non-Roman character set localization — Alolita Sharma and Erik Möller are currently gathering requirements on this project with the help of possible customers, including the language committee.
 * Community feature prototyping —
 * German Wikipedia editor survey support — Wikimedia Deutschland will be running its own editor survey in mid June, to assess community health. The survey will run on the English and German Wikipedia. Wikimedia Deutschland will handle the development work needed to integrate CentralNotice with the user profile information. The Features team of the Wikimedia tech department is helping with code review and deployment.
 * Mobile survey support — The Global Development department of the Wikimedia Foundation is planning to run a survey about mobile usage on the English Wikipedia in early June. Ryan Kaldari and Arthur Richards provided engineering support for the survey; Nimish Gautam took over the project as he transitioned to the Global Development department.

Wikimedia Labs
Media projects — A set of features to improve media handling and key infrastructure support tools, many developed with Kaltura, such as Metavid, MwEmbed, and the Video Editor.
 * Status:


 * Program manager: Alolita Sharma

Special projects

 * Program manager: Tomasz Finc

Mobile
Mobile projects — All things Mobile and Wikimedia.
 * Status:

Mobile Research — A research project to help determine our Mobile strategy.
 * Status: Our India fieldwork in Bangalore and Delhi continued in May, consisting of about 30 interviews. A follow up workshop with interview participants, as well as community members, took place in Bangalore on May 15. We continued to recruit and prepare for the parallel study in Brazil, consisting of about 20 interviews, that will be conducted in June. We also sent out an RfP for the third mobile research study in the United States. The mobile survey launch was delayed due to reallocation of resources, but is planned to go live in mid July.

Mobile site rewrite — Port of our Ruby-based mobile gateway to PHP.
 * Status: We demoed the mobile extension at the Berlin hackathon and answered concerns about implementing it as a skin vs. an extension (see the follow-up discussion on wikitech-l). We started to prepare an upcoming QA cycle for which volunteer help will be very much welcome.

Fundraising support
2011 Fundraiser — Support and development for the annual fundraiser of the Foundation.
 * Status:

Offline
Wikipedia version tools — Support and development of a series of tools to select Wikipedia content for offline use.
 * Status: GSoC student Yuvi Panda began to port the WP 1.0 Bot to a MediaWiki extension. He drafted a project plan and started to develop a way to parse and track assessment data found in articles.

OpenZim for Collections — Integration of openZim into the Collections extension.
 * Status: All the existing critical bugs were fixed. We are now talking with PediaPress and numerous others about where to take the projects next.

Kiwix — Improvement of the user experience of the Kiwix app to access offline Wikimedia content.
 * Status:
 * Kiwix usability study in Berlin: results?

General Engineering

 * Program manager: Rob Lanphier

MediaWiki development and tools
MediaWiki 1.17 release — The upcoming MediaWiki release.
 * Status: Almost all the blockers have been dealt with.



Code review management — Review of changes made to the MediaWiki code.
 * Status: Until a few weeks ago, the amount of unreviewed commits was increasing at the same rate as they were before the 1.17 code review sprint. The group of reviewers (Brion Vibber, Tim Starling, Chad Horohoe, Trevor Parscal, Roan Kattouw and Sam Reed) was expanded to include Timo Tijhof, Bryan Tong Minh, Alexandre Emsenhuber, Aryeh Gregor, Neil Kandalgaonkar and Andrew Garrett. Thanks to a conscious effort made by code reviewers, the trend was reversed, and the backlog of unreviewed commits slated for inclusion in the next 1.18 release is now decreasing.

Bugmeistering — Management of our bug tracker.
 * Status: Mark Hershberger continued his efforts to watch, assign and resolve bugs, notably by leading the bug squashing sessions at the Berlin Hackathon. He also worked with Priyanka Dhanda to get meaningful reports and metrics out of bugzilla.

Summer of Code 2011 — A sponsored community program allowing students to join the community as developers.
 * Status: In May, our eight Summer of Code students learned how MediaWiki works, as software and as a community. They checked out IRC and the mailing lists, talked with their mentors, asked for commit access, and started investigating the components that would include their project areas. On May 23rd, they started working on their projects full-time. WMF staffers and community members are mentoring the students, assisted by Sumana Harihareswara.

Parser & gadgets — Groundwork for the next generation visual editor of MediaWiki.
 * Status: Activity in Berlin on that front. Wikitext-l was revived.
 * Parser plan, Abstract syntax tree, Parser test cases, Gadget Studio

App-level monitoring — Implementation of application-level parameter monitors using the API.
 * Status: Sam Reed started to implement a job queue monitor, as the first of a set of application-level parameters monitoring tools using the API.

Wikimedia analytics
udp2log — A custom data analytics logging system.
 * Status: Our recent router upgrade made it possible do enable multicast logging

A/B testing — A set of tools to perform A/B testing on Wikimedia sites.
 * Status:

Wikimedia Report Card 2.0 — Usability improvements and streamlining of the creation of this monthly report of key metrics measuring community health
 * Status: Erik Zachte, Nimish Gautam Erik Möller, Mani Pande and Asher Feldman laid down the requirements and groundwork of the next version of the Report Card. Erik Zachte's scripts will be modified to enter the data into a database, that can then be accessed with a dedicated API to automatically generate the report card and other charts using a visualization framework. The API will also be puclicly available for third parties to access the data.

Technical communications
Engineering project documentation — An activity to ensure that project documentation of Wikimedia engineering projects is complete and up-to-date.
 * Status: The Berlin Hackathon and Wikimedia tech days were an opportunity to start catching up on missing project pages. Stubs were created using the new, lighter format, and some existing pages were transitioned to the new format.


 * Comms support

Other activities

 * Disk-backed object cache (DBOC) — The deployment of a disk-backed object cache to increase the parser cache hit ratio was on hold in May, in favor of the 1.17 release work.
 * API maintenance — Sam Reed continued to fix bugs and to do general maintenance on the MediaWiki API.
 * Shell bugs — The backlog of shell bugs was on the agenda for the Berlin hackathon, and the team continued to hold dedicated triage sessions. Priyanka Dhanda also started to work on a process to streamline configuration changes, especially for popular requests.
 * Access to Subversion — About 13 new Developers were granted commit access in May, among which 6 Summer of Code 2011 students, and 2 Wikimedia Foundation employees. Volunteer development coordinator Sumana Harihareswara joined the review team, and will become the primary point of contact for commit access requests.
 * Migration to Git —
 * Heterogeneous deployment — Priyanka Dhanda prepared a project plan and did the initial preparatory work, including setting up a prototype.
 * HipHop support — HipHop was discussed during the Berlin hackathon, and it was agreed that HipHop support would be part of the MediaWiki 1.20 release.
 * Configuration management — Chad Horohoe committed to svn his initial work on a configuration management system. The goal is to move from the current system (where configuration mostly happens through globals in LocalSettings.php) to one where all the configuration parameters are contained in a configuration object.