Wikimedia Engineering/Report/2011/July

Major news in July include:
 * Ongoing data replication from our primary Florida data center to our new Virginia data center;
 * The deployment of the Article Feedback feature to all articles on the English Wikipedia, and the deployment of MoodBar;
 * The successful implementation of a MySQL-based parser cache on Wikimedia wikis;
 * Mid-term evaluation of our Summer of Code projects.

Recent events

 * OSCON (July 25-29, Portland, Oregon, USA) — About a dozen Wikimedia engineers attended the Open Source Convention in late July. OSCON is used to showcase the latest and greatest developments in open source technologies (including hands-on tutorials), and is generally an opportunity for Wikimedia developers to stay in the loop and to network with individuals from other projects and communities. We had two presentations in the program (on the 2010-11 fundraising campaign, and on ResourceLoader), which are available in the Wikimedia engineering presentations collection. We also promoted WMF job openings at every opportunity. Finally, Danese Cooper, Sumana Harihareswara and Erik Moeller participated in a workshop with like-minded organizations regarding volunteer matching strategies for open source projects.

Upcoming events

 * Wikimania (August 2-7, Haifa, Israel) — Another delegation of about a dozen Wikimedia engineers will be attending the Wikimania conference in early August, as well as the satellite meetings such as the Developer Days and the OpenZIM Developers Meeting. The engineering report for August will provide a more exhaustive report.


 * Check out the Software deployments page on the wikitech wiki for up-to-date information on the upcoming deployments to Wikimedia sites.

Job openings
Are you looking to work for Wikimedia? We have a lot of hiring coming up, and we really love talking to active community members about these roles. The following positions opened in July:
 * Product Manager (Mobile)
 * Software Developer (Mobile)
 * Product Manager (New Editor Engagement)

New Requests for Proposals:
 * Article Feedback Feature

The following positions are still open: Product Manager (Analytics), QA Lead, Operations Engineer (Networking), Director of Features Engineering, Systems Engineer (Data Analytics), Networking Contractor (Amsterdam), Software Developer (Rich Text Editing, Features), Software Developer (Front-end) and Software Developer (Back-end).

Short news

 * Jeff Green was hired as Operations Engineer for Special Projects (announcement).
 * The operations team continued to grow with Ben Hartshorne and contractor Daniel Zahn joining as Operations Engineers (announcement).
 * Ian Baker joined the Feature engineering team as Software Developer (announcement).
 * Chief technology officer Danese Cooper and Code maintenance engineer Priyanka Dhanda left the Wikimedia Foundation in July (announcements: wikimediaannounce-l/2011-June/000180.html, ).

Site infrastructure

 * Tampa Data Center — 74 new servers were purchased to increase the capacity of our Apache cluster; they will be installed in August. Network maintenance was also performed to install a new router and replace a core switch. A number of servers were upgraded, and automated with puppet.


 * Virginia Data Center — Full network connectivity was set up and the 7 wiki database clusters have now been replicated to our new servers in Virginia. We have also standardized the puppet configuration and enabled LVM snapshots. About 20 other databases (of tools like OTRS, CiviCRM, Bugzilla, WordPress and RT)  have been replicated as well. Next steps include rolling out some of our Varnish caching servers, after a stability and performance assessment.


 * Media Storage — The SwiftMedia extension developed by Russ Nelson now supports all the major media features such as download, upload, re-upload, revert, delete, and restore. Upcoming work includes unit tests and performing end-to-end tests.


 * HTTPS & IPv6 — HTTPS was enabled on a private production wiki and testwiki to test functionality and uncover bugs. Protocol-relative URLs (which will be a major feature of MediaWiki 1.18) were enabled on testwiki for community testing before rolling out to all projects (read more).

Testing environment

 * Virtualization test cluster — This project was slowed down in favor of deploying HTTPS. Some work was done to move the puppet configuration into a public repository.

Backups and data archives

 * Data Dumps — The June and July runs of the English Wikipedia dump were completed, and the August run is underway; possible explanations for the resolution of issues include different NFS mounting options, and fine-tuning the number of concurrent jobs. Chinese Wikipedia dumps have also been fixed. Upcoming work is focusing on checkpoint files of history dumps, to break out in-progress dumps into chunks.

Editing tools

 * Visual editor — Trevor Parscal continued to work on the front-end of the visual editor and rich text rendering; he was joined by Inez Korczynski, a developer from Wikia, who are also interested in the visual editor. Neil Kandalgaonkar worked on real-time collaboration, concurrent editing and dived into the inner workings of Etherpad. (Read summary on wikitext-l.)
 * Internationalization and localization tools — Alolita Sharma continued to assemble a team dedicated to localization, and to work on the project definition and priorities.

Content Quality and Editorial Tools

 * Article feedback — Roan Kattouw completed the UDP logger (for clicktracking metrics) and deployed it to production. The Article feedback feature was incrementally rolled out to all articles on the English Wikipedia, and the Product research team continued to analyze its impact (read more).

Participation and editor retention
The code was completed, and the feature deployed to the English Wikipedia at the end of June. The Product research team published a wlove-analysis>Special:MyLanguage/WikiLove/Analysis|basic analysis of its usage, and wlove-stories>Special:MyLanguage/WikiLove/Stories|stories of its evolving usage and impact. This project is now considered to be completed.
 * WikiLove —
 * MoodBar — The code was completed and deployed to the English Wikipedia. The research team is now analyzing its impact.
 * GlobalProfile (formerly "StructuredProfile") — Brandon Harris continued to engage in discussions with users to collect feedback and assemble requirements. The feature was renamed to "GlobalProfile" as it is now intended to work consistently across all wikis.
 * LiquidThreads 3.0 — This project was mostly on hold in July, in favor of the MoodBar feature.

Multimedia Tools

 * UploadWizard — Ian Baker joined the team and started to work on the UploadStash back-end. Jeroen De Dauw started to extend the UploadWizard code base to support customized campaigns, like the Wiki Loves Monuments contest. Neil Kandalgaonkar refactored some libraries to better support Ian and Jeroen's work, and committed some fixes to reduce categorization and licensing mistakes.

MediaWiki infrastructure

 * ResourceLoader — Roan Kattouw and Timo Tijhof started to work on global gadgets and a gadget manager. The back-end for loading gadgets remotely from another wiki now works, although it is limited to database loading within the same server farm; an API back-end is in the works. A Gadgets inventory is now also available, with plans to add actions like creation, modification, deletion of gadgets.

Wikimedia Labs

 * Multimedia — Michael Dale continued to address comments from code review, and participated in a Multimedia sprint planning meeting. He also started to plan the final review and possible deployment of TimedMediaHandler around September.
 * MediaWiki.next — Brion Vibber continued to work on the ParserPlayground extension, which is now a mostly working demo. He's now focusing on the API between the parser/renderer and its host environment (read more).

Mobile

 * Mobile Research — Parul Vora and Mani Pande continued to plan the US mobile research, to talk to possible firms, and to draft the mobile survey. Reports and syntheses from the India and Brazil field research were delayed in favor of the US research planning.


 * MobileFrontend —

Fundraising support

 * 2011 Fundraiser — Ryan Kaldari modified CentralNotice to allow the logging of changes to banners and campaigns, and has begun working on a log filter. Katie Horn fixed an issue with the PayflowPro Pending Processor script (which handles determining whether or not credit card donations flagged as 'pending' have been approved or not). She's also added unit tests to new and existing code. Our server was successfully puppetized and upgraded by Peter Youngmeister and Arthur Richards. Arthur also set up advanced monitoring through Ganglia.

Offline

 * Wikipedia version tools — GSoC student Yuvaraj Pandian continued to port User:CBM's WP 1.0 bot to a MediaWiki extension, and nearly achieved feature parity with it by implementing article selection filtering based on project, quality, importance and category. Mentored by Arthur Richards, Yuvaraj also implemented the ability to save lists of filtered articles. In August, Yuvaraj will wrap up the initial development by adding the ability to manually curate article selection lists and export article lists in CSV format.
 * Kiwix UX initiative — Kiwix 0.9 beta1 was released in July and included a new content manager, better search results, and fixes from our first usability study (more details in the changelog). We also refined our build system to speed up the release process.

MediaWiki Core

 * MediaWiki 1.18 — MediaWiki 1.18 was initially branched in May. It was re-branched in mid-July because trunk was in a better working state than the former 1.18 branch. This increased the amount of yet unreviewed commits, but will eventually save time and effort towards the deployment of 1.18 to the Wikimedia cluster, and its release to the public. A revision report was created to focus on the remaining commits to review.
 * Code review management — Work continued to review commits (see chart); the re-branching of MediaWiki 1.18 aims to reduce the backlog faster. In July, Wikimedia Foundation engineering staff and contractors also attended a Code review workshop; the goal was to share experience and practices on the general review process, as well as security and performance. The accompanying documentation is now being organized.
 * Heterogeneous deployment — Aaron Schulz picked up work on this project, and completed most of it. Testing is scheduled to happen by early August.
 * Disk-backed object cache — To improve the MySQL-based version of this system, Domas Mituzas suggested to split the cache into several tables, which Tim Starling implemented in MediaWiki. The system was then deployed on July 11th and the cache has been filling up since then, thus increasing the parser cache hit ratio from about 30% to 80%. Possible future steps include adding previous page revisions to the cache.
 * API maintenance — Sam Reed continued to fix bugs and to add new features to the MediaWiki API. Sam's API work in July focused on providing the API component to the new Report Card project.
 * Shell requests — Sam Reed took over maintenance of shell requests. He added a new "ops" keyword to differentiate between requests that require shell access (which he can process), and other requests that can only be processed by someone with root access ("ops"). As of July 26, there were only 69 remaining shell requests, and that number keeps decreasing.
 * Continuous integration — This project aims to rebuild the Wikimedia continuous integration legacy server (currently hosted on a virtual machine) on a dedicated server in eqiad, our new data center. Chad Horohoe started to consolidate the platform to run automated tests systematically at post-commit time, to check that the SVN trunk is in an (almost) constantly deployable state. This project also relates to the will to have more frequent code deployments, as continuous integration will give us more confidence in new code if it already passed the automated tests. The new server will be combined with TestSwarm, a distributed continuous integration tool for JavaScript, currently hosted on the Toolserver. Timo Tijhof reached out to the TestSwarm team, who were enthusiastic about incorporating our improvements, notably on performance.
 * Projects on hold — The HipHop deployment, AcademicAccess, App-level monitoring and Configuration management projects were mostly on hold in July.

Wikimedia analytics

 * Wikimedia Report Card 2.0 — The team started their second sprint in July, whose goal was to incorporate key metrics into the Report card such as editors by geography, page views (both mobile and non-mobile) and gender breakdown of editors. Nimish Gautam worked on the infrastructure and analytics for editor by geography. Sam Reed implemented a generic CSV importer, and looked at how to use the Google API to automatically draw data about offline usage into the Report card from Google Spreadsheets.

Technical Liaison; Developer Relations

 * Bug management — Mark Hershberger continued to conduct bug triage sessions on IRC, some of which were focused on MediaWiki 1.18 blockers, thumbnails issues, caching and operations-related requests. With Sumana Harihareswara, he cleaned up default assignees in bugzilla in order for assignments to be more meaningful, which prompted a discussion on the wikitech-l list.
 * Summer of Code 2011 — 7 out of 8 students made it through the mid-term evaluation, and continue to work on their projects (read more).
 * Engineering project documentation — Guillaume Paumier continued to create, update, clean up and organize the project documentation pages for most engineering activities. This report was built in part using content transcluded from the project status pages. An activity index was also drafted.
 * Volunteer coordination and outreach — About 11 developers were granted commit access in July, among which were 2 Wikimedia employees, and 4 Wikia employees. Sumana Harihareswara attended the Community leadership summit and OSCON in Portland, notably to reach out to potential new developers and testers for MediaWiki.