Wikimedia Engineering/Report/2011/May

Major news this month include:
 * the Berlin Hackathon, where about 70 developers and engineers met to improve our technical infrastructure.
 * the deployment of the Upload Wizard as default uploader on Wikimedia Commons;
 * the continued development, deployment and roll-out of the Article feedback tool on the English Wikipedia;
 * major progress in reducing our code review backlog.

Recent events



 * Berlin Hackathon 2011 (May 13-15, Berlin) — About 70 MediaWiki developers and engineers participated in this event, organized by Wikimedia Deutschland. A lot of coding, bug squashing and discussion happened over these three days, including on the new parser, performance improvement and infrastructure; see the dedicated blog posts for more information (Friday, Saturday, Sunday, Monday). A special effort was made on documentation and remote attendance: all the talks were recorded, photos were taken and notes were taken in real time on all three days (notes for Friday, Saturday and Sunday).


 * CiviCRM code sprint (May 24-25, Berlin) — The Wikimedia Foundation office in San Francisco hosted a coding sprint for about 8 CiviCRM developers in May. Participants squashed many bugs, and also improved contact/contribution search performance by 15-25x. This is particularly useful for major users of CiviCRM with large databases, like the Wikimedia Foundation and its donors database. The Wikimedia Foundation, who endeavors to contribute to the free software ecosystem, had already hosted meet-ups for the CiviCRM community in the past (read more about the event).

Upcoming events

 * Wikimania (August 2-7, Haifa, Israel) — A small delegation of the engineering staff will be attending Wikimania to report on their work and to continue to discuss with the rest of the community. We'll provide additional information once the list of attendees and talks is finalized.
 * Check out the Software deployments page on the wikitech wiki for up-to-date information on the upcoming deployments to Wikimedia sites.

Job openings
Are you looking to work for Wikimedia? We have a lot of hiring coming up, and we really love talking to active community members about these roles.

The following positions have opened this month:
 * Operations Engineer — Special projects (now closed)
 * Software Developer Front-end — General
 * Software Developer Back-end — General

The following positions are still open:
 * Engineering Program Manager — Data Analytics
 * Software Developer — Features
 * Systems Engineer — Data Analytics (previously Data Analytics Engineer)
 * Operations Engineer
 * Senior QA Engineer
 * Networking Contractor — Amsterdam
 * Software Developer, Rich Text Editing — Features
 * Product Manager — Features

In addition, we hope to post the following positions over the next few months:
 * Release Engineer
 * Technical Writer

Short news

 * Visitors —
 * Hires and changes
 * Asher Feldman has joined the TechOps team as Performance Engineer.
 * Andrew Shields is a new contractor working as a Technician in our Tampa data center. He is replacing Rich Cole.
 * Nimish Gautam started to transition from the Technology department to the Global Development department of the Wikimedia Foundation.
 * Katie Horn joined on May 31 as Software Engineer for Community R&D, to support prototyping efforts.

Operations

 * Program manager: Mark Bergsma

Site operations
Virginia Data Center — Installation of a world-class primary data center for Wikimedia Foundation websites.
 * Status: Unfortunately, the delivery of our connectivity has been incurring additional delays, which has prevented us from bringing services in the new data center live. The latest estimation for delivery is June 10, after which we should be able to deploy some services running actively from the new location.

Media Storage — Improvement of our media storage architecture to accommodate expected increase in media uploads.
 * Status: The Swift cluster on the test servers was upgraded from version 1.1 to 1.3, to fix some problems we were observing, and to test with the latest released code as well. Russ Nelson also continued to develop a MediaWiki extension to integrate MediaWiki with Swift.

Testing environment
Virtualization test cluster — Environment to deploy temporary machines for testing and experimentation, for use by WMF staff and volunteers working on important projects (as capacity allows).
 * Status: MediaWiki is now installed, and most of the networking is configured. We are testing a couple new network configurations to avoid single points of failure in Openstack Nova's current design. We will begin installing instances in early June for basic infrastructure in the test/dev environment.

Backups and data archives
Data Dumps — Improvement of processes to create and provide public copies of public Wikimedia data.
 * Status: We incurred delays in moving to the new server, but started to puppetize the rollout of new servers. This will simplify not only the setup of new hosts, but also the maintenance of current servers. We are still trying to identify the cause of an issue with compression of files sometimes quitting partway through. We missed our target of producing new English Wikipedia dumps once a month (holding out for the new server), but data will be available in early June.

Other activities

 * Backups — This project was on hold in May, as we were still waiting for connectivity between our two data centers to be installed. We expect to have live replication or daily backups of all important data by the end of June.
 * Router upgrade — Due to multiple ongoing issues on several network devices, we decided to schedule an hour of maintenance on Tuesday May 24th, to upgrade software on nearly all devices, fixing multiple bugs that we were experiencing. This appears to have resolved the issues that we were seeing.
 * HTTPS & IPv6 — Ryan Lane, Peter Youngmeister and Mark Bergsma have started working on HTTPS and IPv6 support for the wiki platforms. A new test cluster has been set up to serve these protocols, and limited testing on a subset of wikis has commenced.
 * m.wikimedia.org — To cater for the growth of our mobile portal, two new servers were added and are now ready for application installation.

Features Engineering

 * Program manager: Alolita Sharma

Editing tools
Visual editor 0.1 — Exploratory work to identify & prototype initial ideas for a visual editor for MediaWiki.
 * Status: Trevor Parscal and Neil Kandalgaonkar have done exploratory work on the visual editor project. Neil worked with developers of HackPad (a custom version of real-time collaborative editing software Etherpad) on a proof of concept to integrate Etherpad and MediaWiki (read more). They're now working on turning it into a MediaWiki extension. Trevor continues to work on WikiDom, a storage structure and functionality acting as an intermediate layer between the parser and a visual editor. This work also intersects with the groundwork done on the new parser.

Content Quality and Editorial Tools
Article Feedback — A feature to collaboratively assess article quality and incorporate reader ratings on Wikipedia.
 * Status: Version 3 was deployed to the English Wikipedia on May 9 with new features, like the article feedback dashboard, a summary page showing general rating trends. The experiment was expanded to 100,000 articles and may be expanded further after analysis of the results. The next version of the Article Feedback feature is currently in development.

Discussions and Interactions
Liquid Threads — A feature that brings threaded discussions capabilities to Wikimedia projects and MediaWiki.
 * Status: Lead developer Andrew Garrett continued to work on his new object model for the back-end. Timo Tijhof will be working with Andrew on the new front-end.

WikiLove 1.0 — An extension to encourage praise and virtual gifts between users.
 * Status: Ryan Kaldari and Jan Paul Posma completed the first version of the WikiLove extension, including documentation for the API. They also made changes to the code in order to add parameters to the configuration to add new gifts. The code is now pending review and should be deployed in June.

MoodBar 0.1 — A feature to encourage new users to provide feedback.
 * Status: Brandon Harris published initial designs for this feature allowing new users to quickly provide feedback to the community. Erik Möller reviewed the designs with the team, and Brandon is working on a second round of designs. Andrew Garrett will be the lead developer on this project.

Multimedia Tools
Upload wizard — A feature that provides an easier way of uploading files to Wikimedia Commons, the media library associated with Wikipedia.
 * Status: The Upload wizard was enabled as the default upload system on Wikimedia Commons on May 9. It was disabled shortly after that because of issues believed to come from a bug in ResourceLoader. It was re-enabled on May 17 after further investigation. The next phase is being planned, which includes changing how images are stored prior to completion of the wizard.

Other projects

 * ResourceLoader 2.0 — ResourceLoader 1.0 is now in maintenance mode. Roan Kattouw and Timo Tijhof discussed requirements and design specifications during the Berlin Hackathon, but there are currently no engineering resources available to work on ResourceLoader 2.0. A development sprint is planned for July.
 * Non-Roman character set localization — Alolita Sharma and Erik Möller are currently gathering requirements on this project with the help of possible customers, including the language committee.
 * German Wikipedia editor survey support — Wikimedia Deutschland will be running its own editor survey in mid June, to assess community health. The survey will run on the English and German Wikipedia. Wikimedia Deutschland completed the development work needed to integrate CentralNotice with the user profile information. The Features team of the Wikimedia tech department is now helping with code review and deployment.
 * Mobile survey support — The Global Development department of the Wikimedia Foundation is planning to run a survey about mobile usage on the English Wikipedia in early June. Ryan Kaldari and Arthur Richards provided engineering support for the survey; Nimish Gautam took over the project as he transitioned to the Global Development department.

Wikimedia Labs
Media projects — A set of features to improve media handling and key infrastructure support tools, many developed with Kaltura, such as Metavid, MwEmbed, and the Video Editor.
 * Status: Michael Dale's TimedMediaHandler extension was reviewed by Brion Vibber; Michael is now addressing the comments to make the extension ready for testing and deployment.
 * Program manager: Alolita Sharma

Special projects

 * Program manager: Tomasz Finc

Mobile projects
Mobile Research — A research project to help determine our Mobile strategy.
 * Status: Our India fieldwork in Bangalore and Delhi continued in May, consisting of about 30 interviews, led by Parul Vora and Mani Pande. A follow-up workshop with interview participants, as well as community members, took place in Bangalore on May 15. We continued to recruit and prepare for the parallel study in Brazil, consisting of about 20 interviews, that will be conducted in June. We also sent out an RfP for the third mobile research study in the United States. The mobile survey launch was delayed due to reallocation of resources, but is planned to go live in mid-July.

Mobile site rewrite — Port of our Ruby-based mobile gateway to PHP.
 * Status: Patrick Reilly demoed the mobile extension at the Berlin hackathon and answered implementation concerns about a skin vs. extension (see the follow-up discussion on wikitech-l) approach. We also continued to develop the extension, notably by integrating functionality of the WAP platform, and by expanding our device detection list. A prototype will soon be set up, where we'll need volunteers to help us test the new portal using their mobile devices.

Fundraising support
2011 Fundraiser — Support and development for the annual fundraiser of the Foundation.
 * Status: Arthur Richards continued work to streamline our audit framework to surface missing donation transactions. This has helped us find actual transactions that were not present in our fundraising database. Arthur also helped run a code sprint that gave the WMF instance of CiviCRM a huge performance increase. At the same time, the fundraising team started to create the next user stories to guide future development.

Offline
Wikipedia version tools — Support and development of a series of tools to select Wikipedia content for offline use.
 * Status: GSoC student Yuvi Panda began to port the WP 1.0 Bot to a MediaWiki extension. He drafted a project plan and started to develop a way to parse and track assessment data found in articles.

OpenZim for Collections — Integration of openZim into the Collections extension.
 * Status: All the existing critical bugs were fixed. We are now talking with PediaPress and numerous others about where to take the project next.

Kiwix — Improvement of the user experience of the Kiwix app to access offline Wikimedia content.
 * Status: While in Berlin, Ryan Kaldari, Sumana Harihareswara and Emmanuel Engelhart conducted a simple usability study with 7 volunteers, 6 of which were recorded; the videos will be posted to Commons. The initial findings were eye-opening and will be incorporated into the next development sprint. We're also wrapping up phase 2 of development, allowing us to release our first version of the integrated downloader. Last, we'll conduct testing on the week of June 6; please add your name to the list of testers if you are available.

General Engineering

 * Program manager: Rob Lanphier

MediaWiki development and tools
MediaWiki 1.17 release — The upcoming MediaWiki release.
 * Status: Almost all the blockers have been dealt with. Only bug 28840 remains as a blocker, which finding the right fix for has been tricky, but should be completed this week.  1.17 beta 2 is due June 3, with a final release of 1.17 slated for the week of June 6.



Code review management — Review of changes made to the MediaWiki code.
 * Status: Until a few weeks ago, the amount of unreviewed commits was increasing at the same rate as they were before the 1.17 code review sprint. The group of reviewers (Brion Vibber, Tim Starling, Chad Horohoe, Trevor Parscal, Roan Kattouw and Sam Reed) was expanded to include Timo Tijhof, Bryan Tong Minh, Alexandre Emsenhuber, Aryeh Gregor, Neil Kandalgaonkar and Andrew Garrett. Thanks to a conscious effort made by code reviewers, the trend was reversed, and the backlog of unreviewed commits slated for inclusion in the next 1.18 release is now decreasing.

Bugmeistering — Management of our bug tracker.
 * Status: Mark Hershberger continued his efforts to watch, assign and resolve bugs, notably by leading the bug squashing sessions at the Berlin Hackathon; more than 50 bugs where closed. He also worked with Priyanka Dhanda to get meaningful reports and metrics out of bugzilla.

Summer of Code 2011 — A sponsored community program allowing students to join the community as developers.
 * Status: In May, our eight Summer of Code students learned how MediaWiki works, as software and as a community. They checked out IRC and the mailing lists, talked with their mentors, asked for commit access, and started investigating the components that would include their project areas. On May 23rd, they started working on their projects full-time. WMF staffers and community members are mentoring the students, assisted by Sumana Harihareswara.

Parser — Groundwork for the next generation visual editor of MediaWiki.
 * Status: There was a lot of activity and discussion in Berlin around the parser and the general direction we should follow. Brion Vibber consolidated the output around the parser plan, the abstract syntax tree and parser test cases. We generally agreed on keeping the current syntax as intact as possible, to allow for easy and clean migration. A syntax reform, though popular among alternative parser writers, is left to the future: new streamlined format conversions will become easier once most participants are used to visual editing. Trevor Parscal also started to work on initial tests modeled on the general plans (formal parse, additional transforms and output conversion).

Wikimedia analytics
Wikimedia Report Card 2.0 — Usability improvements and streamlining of the creation of the monthly report card.
 * Status: Erik Zachte, Nimish Gautam Erik Möller, Mani Pande and Asher Feldman laid down the requirements and groundwork of the next version of the Report Card. Erik Zachte's scripts will be modified to enter the data into a database, that can then be accessed with a dedicated API to automatically generate the report card and other charts using a visualization framework. The API will also be puclicly available for third parties to access the data.

Technical communications
Engineering project documentation — An activity to ensure that project documentation of Wikimedia engineering activities is complete and up-to-date.
 * Status: The Berlin Hackathon and Wikimedia tech days were an opportunity to start catching up on missing project pages. Stubs were created using the new, lighter format, and some existing pages were transitioned to the new format. As new projects start, we'll continue to try and be diligent in publishing project documentation publicly.

Engineering communications support — Perennial communication support for Wikimedia engineering projects and staff
 * Status: In May, Guillaume Paumier provided communications help and advice for major deployments, including of Article Feedback 3.0 to the English Wikipedia, of Upload Wizard as default uploader to Commons, and for the planned downtime due to the router upgrade.

Other activities

 * Disk-backed object cache (DBOC) — The deployment of a disk-backed object cache to increase the parser cache hit ratio was on hold in May, in favor of the 1.17 release work. It will be resumed in June.
 * API maintenance — Sam Reed continued to fix bugs and to do general maintenance on the MediaWiki API.
 * Shell bugs — The backlog of shell bugs was on the agenda for the Berlin hackathon, and the team continued to hold dedicated triage sessions. Priyanka Dhanda also started to work on a process to streamline configuration changes, especially for popular requests.
 * Access to Subversion — About 13 new developers were granted commit access in May, among which 6 Summer of Code students, and 2 Wikimedia Foundation employees. Volunteer development coordinator Sumana Harihareswara joined the review team, and will become the primary point of contact for commit access requests.
 * Heterogeneous deployment — Priyanka Dhanda prepared a project plan and did the initial preparatory work. A prototype will soon be set up.
 * HipHop support — HipHop was discussed during the Berlin hackathon, and it was agreed that HipHop support would be part of the MediaWiki 1.20 release.
 * udp2log — Nimish Gautam's work on this custom data analytics logging system is completed. The Operations team is now ready to test and enable multicast logging.
 * App-level monitoring — Sam Reed started to implement a job queue monitor, as the first of a set of application-level monitoring tools using the API.
 * A/B testing — Nimish Gautam completed this set of tools to perform A/B testing on Wikimedia sites. It has now transitioned to maintenance and bugfixing mode.
 * Configuration management — Chad Horohoe committed his initial work on a configuration management system. The goal is to move from the current system (where configuration mostly happens through globals in LocalSettings.php) to one where all the configuration parameters are contained in a configuration object.