Wikimedia Engineering/Report/2011/April

Major news this month include:
 * The senior engineering team started working on the budgetting exercise for the 2011-2012 fiscal year.
 * AFT 3, upload wizard
 * mobile work taking off
 * mobile work taking off

Upcoming events

 * Berlin Hackathon 2011 (May 13-15, Berlin) — This event will be almost entirely devoted to hacking, with short presentations happening throughout the week-end. The overall schedule is now available. One of the main topics at the Hackathon will be the Parser work planned to support Rich Text Editing features. There has already been much discussion on this topic on the mail lists, indicating great interest in the general MediaWiki community. We are exploring ways to broadcast discussions from the Hackathon.
 * Wikimania (August 2-7, Haifa, Israel) — The engineering staff was encouraged to submit proposals for Wikimania, and they did so on a variety of topics to provide the rest of the community with opportunities to learn and discuss their work. Just now we are reviewing submissions for overlap.
 * Check out the Software deployments page on the wikitech wiki for up-to-date information on the upcoming deployments to Wikimedia sites. This page has been populated since mid-April and will be maintained as another way for the community to see what Foundation engineering is accomplishing.

Job openings
Are you looking to work for Wikimedia? We have a lot of hiring coming up, and we really love talking to active community members about these roles.

The following positions have opened this month:
 * Software Developer, Rich Text Editing — Features
 * Product Manager — Features

The following positions are still open:
 * Engineering Program Manager — Data Analytics
 * Software Developer — Features
 * Systems Engineer — Data Analytics (previously Data Analytics Engineer)
 * Operations Engineer
 * Senior QA Engineer
 * Networking Contractor — Amsterdam
 * Software Engineer — Community R&D

In addition, we hope to post the following positions over the next few months:
 * Release Engineer
 * Technical Writer

Short news

 * Visitors — Ward Cunningham continued his "in-residence" visits to the Wikimedia office in San Francisco. We also welcomed Dr. Bryan Lewis, of Kent State, and Asher Feldman.
 * Hires —
 * Patrick Reilly was hired as Senior Software Developer for Mobile (read announcement).
 * Timo Tijhof and Jan Paul Posma were hired as contractor Software Developers for Features.
 * Sumana Harihareswara was hired as Volunteer Development Coordinator (read announcement).

Site operations
Virginia Data Center — Installation of a world-class primary data center for Wikimedia Foundation websites.
 * Status: In the next few days we expect our connectivity between Tampa and Ashburn to be installed. This will allow all data to be copied to the new data center for backup and fail-over purposes. During the Hackaton in Berlin we will address many of the challenges in making all of our services redundant and making optimal usage of our new data center.
 * Program manager: Mark Bergsma

Media Storage — Improvement of our media storage architecture to accommodate expected increase in media uploads.
 * Status: A test wiki is pushing new media uploads to the Swift cluster, and producing pages which fetch from it as well. For thumbnail generation, there are too many handlers to try to teach them all about Swift. Thus, this month's work is on fetching the original into the local filesystem, running the handler, then pushing the thumb that it wrote back into Swift.
 * Program manager: Mark Bergsma

Testing environment
Virtualization test cluster — Environment to deploy temporary machines for testing and experimentation, for use by WMF staff and volunteers working on important projects (as capacity allows).
 * Status: Production hardware is set up, and initial configuration of the software is done. We should have a basic environment demo-able in time for Berlin hack-a-thon. Ryan Lane gave a keynote at the OpenStack Developer's conference about this in late April (see the slides).
 * Program manager: Mark Bergsma

Backups and data archives
Backups — Improvement of backup coverage of Wikimedia-hosted data.
 * Status: Now connectivity between our two data centers is finally being installed, we can start making use of the new hardware and storage space to ensure full backup coverage of all data. We expect to have live replication or daily backups of all important data by the end of May.
 * Program manager: Mark Bergsma

Data Dumps — Improvement of processes to create and provide public copies of public Wikimedia data.
 * Status: We are investigating some kernel messages on the old dumps server, but in the meantime Google has come through with the account setup we need for copying dumps to Google storage. The April run of the English Wikipedia dumps completed in just over two weeks, even after having to rerun two pieces. The next run will be on the new beefy server. We're looking forward to seeing how much faster it is!
 * Program manager: Mark Bergsma

Short news

 * A new search indexer has been installed last month, resolving the space issues we had with the older server.
 * 5 new database machines and a new snapshots generation server have been installed and are being deployed.
 * We enhanced our WatchMouse setup with more service uptime monitoring and reporting.
 * We upgraded our etherpad software and server.
 * We encountered (and resolved) a few production issues in April:
 * The thumbnail server experienced ZFS problem (memory leak bug) after getting a surge in uploads.
 * Our Squid software upgrade caused caching problem and was reverted.
 * We experienced site performance issues when we enabled click-tracking for our Article Feedback Tool, before we disabled it.

Content Quality and Editorial Tools
Article Feedback — A feature to collaboratively assess article quality and incorporate reader ratings on Wikipedia.
 * Phase 2: Trevor Parscal implemented the expiration of ratings, added error handling mechanisms and fixed IE bugs. Roan Kattouw reviewed and deployed the code to production.
 * Phase 3: Timo Tijhof and Trevor Parscal implemented the EmailCapture extension, a tool that allows unregistered users rating an article to leave their e-mail address if they want to be contacted later by the Community department. It was deployed by Roan Kattouw. The team is now working on the dashboard, a summary page to surface general rating trends.
 * Program manager: Alolita Sharma

Article feedback (extended review) — An interface for quality reviews of Wikipedia content.
 * Status: The specifications and wireframes are now stable, and this phase of the project is considered completed. The system will provide an expanded interface for readers to provide feedback, to praise authors and to report abuse. A "quality page" will show aggregated summary data, as well as a list of reviews & praise. Users will be able to promote particularly relevant reviews to the talk page of the article reviewed. The system will also include a mechanism for credentialed experts belonging to a specific organization to attach their credentials to the review. This phase
 * Commissioned by: Erik Möller

FlaggedRevs — A feature to allow changes made by logged-out and new users to be reviewed before they appear as the primary version of an article.
 * Status: Aaron Schulz continued to refactor the extension and to fix bugs. He improved the API error messages, added other features to the API, and worked on performance improvements.
 * Program manager: Alolita Sharma

Discussions and Interactions
Liquid Threads — A feature that brings threaded discussions capabilities to Wikimedia projects and MediaWiki.
 * Status: Lead developer Andrew Garrett wrote a new object model for LiquidThreads, with support for channels, topics, posts, summaries and respective version objects. He also began work on integrating the new object model with the rest of the extension, starting with a more maintainable reimplementation of the display layer. Integration with the rest of the extension proved to require more time than expected, and an updated schedule was published.
 * Program manager: Alolita Sharma

WikiLove — An extension to encourage praise and virtual gifts between users.
 * Status: The initial script was converted to a rough MediaWiki extension by Ryan Kaldari and Jan Paul Posma, now in SVN.
 * Program manager: Alolita Sharma

Multimedia Tools
Upload wizard — A feature that provides an easier way of uploading files to Wikimedia Commons, the media library associated with Wikipedia.
 * Status: Neil Kandalgaonkar and Ryan Kaldari continued to fix bugs, and added new features as well. The upload wizard now offers a configurable & localizable license picker; it is also possible now to abort uploads, and recover from errors. A built-in feedback form allows users to report bugs and issues directly. Roan Kattouw reviewed and deployed the new code every week in April. Last, Brandon Harris provided design recommendations to improve the interface.
 * Program manager: Alolita Sharma

Community feature prototyping

 * Extension:CustomUserSignup

Engineering support
Editor survey — Integration work between LimeSurvey and MediaWiki to support
 * Status:


 * Program manager: Alolita Sharma

Other projects

 * Style guide for forms —


 * SimpleSurvey 2.0 —


 * Resource loader — Roan Kattouw and Timo Tijhof started to work on specifications for a version 2 of the ResourceLoader; it will be on the agenda for the Berlin hackathon.
 * Non-Roman character set localization — Trevor Parscal provided mockups and recommendations to better integrate the Narayam extension with the Vector interface. Brion Vibber also added support for Esperanto to the Narayam extension.
 * Interlanguage extension design improvements — Following discussions about the deployment of the Interlanguage extension to Wikipedia, Brandon Harris made interaction design recommendations.
 * Commons & multimedia strategy — Neil Kandalgaonkar, Ryan Kaldari and Brandon Harris had several discussions about a strategy for Wikimedia Commons and multimedia content, including one about titles and filenames.
 * Parser functions - Ryan Kaldari added multi-language support to #time and fixed several bugs.

Wikimedia Labs
Media projects — A set of features to improve media handling and key infrastructure support tools, many developed with Kaltura, such as Metavid, MwEmbed, and the Video Editor.
 * Status: Michael Dale continued to improve the TimedMediaHandler extension, notably by adding a trancode state manager, and adding tests. Trevor Parscal reviewed the front-end and JavaScript code.
 * Program manager: Alolita Sharma

Mobile
Mobile projects — All things Mobile and Wikimedia.
 * Status: We expanded on the features and roadmap definition, and started diagramming the interface. We are currently reviewing a features list with the community.
 * Program manager: Tomasz Finc

Mobile Research — A research project to help determine our Mobile strategy.
 * Status Mani Pande and Parul Vora led a series of 30 interviews in New Delhi and Bangalore with Wikipedia readers and editors to assess mobile user experience and needs. Preliminary results indicate that many respondents prefer to access Wikipedia on their phone instead on their computer. However, technical and editorial issues can make this difficult; for example, limited bandwidth causes articles with many or large images to load very slowly. Readers also stated that scrolling was tedious, and emphasized their preference for good introductory summaries. Last, users expressed the wish to be able to download Wikipedia articles or save them to read later.
 * Program manager: Tomasz Finc

Mobile site rewrite — Port of our Ruby-based mobile gateway to PHP.
 * Status: We made good progress on a PHP MediaWiki extension to replace our Ruby-based mobile gateway. We hope to demo our first all-MediaWiki version, whose appearance is similar to the current gateway, at the Berlin hackathon.
 * Program manager: Tomasz Finc

Fundraising support
2011 Fundraiser — Support and development for the annual fundraiser of the Foundation.
 * Status: Arthur Richards upgraded our CiviCRM platform to version 3.4, and wrote a contribution auditing framework for fundraising. Ryan Kaldari worked on a development roadmap for the CentralNotice extension, used for the global messaging to potential donors.
 * Program manager: Tomasz Finc

Offline
Wikipedia version tools — Support and development of a series of tools to select Wikipedia content for offline use.
 * Status: Yuvi Panda was accepted as one the Foundation's students for Google Summer of Code. Arthur Richards will be mentoring him to port the existing collection tools to a Mediawiki extension.
 * Program manager: Tomasz Finc

OpenZim for Collections — Integration of openZim into the Collections extension.
 * Status: PediaPress patched the current extension so that any generated openZIm fields now have a navigable table of contents. We've had over a 1,000 downloads of openZim file in only the last week!
 * Program manager: Tomasz Finc

Kiwix — Improvement of the user experience of the Kiwix app to access offline Wikimedia content.
 * Status: We ran through a second development sprint, working on a content download manager to download offline archives directly from Kiwix (see mockups). We also worked with the Wikimedia operations team to connect the content manager with the Wikimedia infrastructure. Since we've picked up this project we've seen Kiwix downloads go up 97%, or about 17% per month.
 * Program manager: Tomasz Finc

MediaWiki development and tools
MediaWiki 1.17 release — The upcoming MediaWiki release.
 * Status: Tim Starling and Roan Kattouw investigated security bugs, and three security releases were made for MediaWiki 1.16 (1.16.3, 1.16.4 and 1.16.5). Developers continued to prepare the 1.17 release, and a beta 1.17 release was announced. The final release is now imminent.
 * Program manager: Rob Lanphier

Code review — Review of changes made to the MediaWiki code.
 * Status: Tim Starling, Sam Reed, Chad Horohoe and Roan Kattouw devoted part of their time to code review. Despite their efforts, the backlog of unreviewed new commits is still increasing. A new feature in the CodeReview tool since the deployment of MediaWiki 1.17 is the ability to "sign off" on commits. Developers are encouraged to test and sign off on commits, in order to help the team prioritize what is ready for review.
 * Program manager: Rob Lanphier

Bugmeistering — Management of our bug tracker.
 * Status: Mark Hershberger reached out to other open-source communities (like Mozilla) to look for best practices in bug management and workflow; he started to experiment with a new "unprioritized" value for the "priority" field. He has also been organizing weekly bug triage sessions at different times to allow for participation from different timezones.
 * Program manager: Rob Lanphier

Summer of Code 2011 — A sponsored community program allowing students to join the community as developers.
 * Status: More than 25 proposals were submitted. Sumana Harihareswara announced the eight students and projects that were selected for this year's Google Summer of Code. The projects include interface improvements using AJAX, extension release management and work on Semantic MediaWiki. Students and mentors have now entered the "community bonding" period. (Read more.)
 * Program manager: Rob Lanphier

Parser & gadgets — Groundwork for the next generation visual editor of MediaWiki.
 * Status: Brion Vibber is laying the groundwork for exploratory tools for the upcoming parser work, integral to the future Visual editor. He created a JavaScript tool to compare the parse tree and output of several parsers. On a related note, he also worked on tools to facilitate the development and use of gadgets, for example by embedding a JavaScript syntax highlighting editor.
 * Program manager: Rob Lanphier

Performance optimization
PoolCounter — A MediaWiki extension to avoid parser deadlocks on high-traffic pages.
 * Status: This extension was deployed and is now in production. We've observed a reduction of roughly 2% in total parse time due to the pool counter being active. We believe the biggest benefits come when there is a lot of editing and view traffic directed at a single page. While we don't have good metrics for proving that assertion, we've had a few major events that might have triggered performance issues prior to PoolCounter that didn't pose a problem for us.
 * Program manager: Rob Lanphier

Disk-backed object cache — Deployment of a disk-backed object cache to increase the parser cache hit ratio.
 * Status: Issues that arose during the testing of EHcache convinced Tim Starling to use another tool. His next trial will involve implementing a thin caching layer on top of a MySQL-based disk store. Implementation is planned to happen after the MediaWiki 1.17 release.
 * Program manager: Rob Lanphier

Wikimedia analytics
udp2log — A custom data analytics logging system.
 * Status: Nimish Gautam completed a patch for our Squids to implement multicast logging, but issues with the Squids upgrade (which caused a site outage) delayed the deployment of the patch. The operations team is now the patches to diagnose the issue before redeploying.
 * Program manager: Rob Lanphier

A/B testing — A set of tools to perform A/B testing on Wikimedia sites.
 * Status: Nimish Gautam continued to work on the ClickTracking extension, to allow us to put users into buckets. Deployment was completed to the English Wikipedia on April 27. This Community department is using this tool to try out different designs for the account creation improvement project. (read more)
 * Program managers: Rob Lanphier

Technical communications
Development process improvement — A project to increase transparency and organize Wikimedia Foundation's engineering efforts more efficiently.
 * Status: Guillaume Paumier set up a set of pages, templates and tools to facilitate the maintenance of project pages. The new system allows to pull project information from one central place per project, using the Labeled Section Transclusion extension, which was installed on mediawiki.org for this purpose.
 * Program manager: Rob Lanphier

Wikimedia blog overhaul — A project to consolidate and improve the Wikimedia blogs.
 * Status: Rob Halsell set up a test blog and documented the blog our configuration management system. This improves ease of backups and redeployment, and further streamlining and automating our operations processes. Technical issues with the back-end delayed the implementation, but Rob resolved them with Ryan Lane's help. Deployment happened on May 7 and went smoothly. This project is considered completed, even though we'll continue to improve the blogs incrementally in the future (read more).
 * Project manager: Guillaume Paumier

Other projects

 * Bugzilla upgrade to 4.0 — Priyanka Dhanda fixed a few bugs following the upgrade to Bugzilla 4.0 back in March. Our bug tracker is pretty stable now.
 * OpenWebAnalytics — Integrating a full-fledged OWA framework with our infrastructure proved to be difficult, so we decided to scale down our efforts. A postmortem will be published, notably to help the new dedicated analytics team decide if they want to use individual components of OWA for specific uses like heatmaps.
 * API maintenance — Besides general maintenance and bug fixing, Sam Reed is starting work on app-level system health monitoring, by creating a job queue monitor.
 * Shell bugs — Mark Hershberger organized triage meetings specifically for shell bugs, with Priyanka Dhanda, Rob Halsell and CT Woo. Priyanka and Rob have been moving through the backlog of issues filed there.
 * Access to Subversion — The team (composed of Rob Lanphier, Priyanka Dhanda, Chad Horohoe and Tim Starling) are now meeting briefly every Wednesday to go through the commit access requests.
 * Migration to Git — The migration to git will be a major topic of discussion during the upcoming Berlin Hackathon.
 * Heterogeneous deployment — Priyanka Dhanda is working on a project plan. Implementation is scheduled to happen after the deployment of the disk cache component.
 * Report card — Erik Zachte, Nimish Gautam and Erik Möller are investigating visualization toolkits to use in the report card (a monthly report of key metrics to measure community health). Additionally, they are streamlining and modularizing the report creation process.
 * HipHop support — Tim Starling implemented basic support for HipHop for PHP in MediaWiki, and invited other developers to improve and continue his work. We will pick this work back up later this year after the completion of some of the other projects above.