Wikimedia Engineering/Report/2011/June

Major news this month include:
 * the network setup in our new datacenter, that opened the way to new server setup and backups;
 * progress on features to encourage and facilitate participation, like the Visual editor groundwork, and the WikiLove button;
 * productive community testing on our now mobile front-end and the Kiwix download manager;
 * the release of the stable MediaWiki release 1.17.0;
 * the first commits by our Summer of Code students;
 * major progress on our code review backlog.

Recent events

 * 2011 Wikimedia fundraising summit (June 17-19, Vienna, Austria) — Representatives from Wikimedia Austria, UK, Sweden, Hungary, Germany, Switzerland, France, the Netheralnds New York City, Portugal and the Foundation met for two and half days in Vienna, Austria to discuss strategic, operational and technical aspects of Wikimedia fundraising. Arthur Richards presented about how the WMF uses CiviCRM for donor data management, how the Foundation's donation pipeline works, and how different parts of the infrastructure could potentially be adapted for use by fundraising chapters.  A big thanks to Wikimedia Austria for hosting and showing us all a terrific time!
 * Open Source Bridge conference (June 21-25, Portland, Oregon, USA) — Sumana Harihareswara presented new features in MediaWiki 1.17 and gathered offers of volunteer help (especially around database support, testing, bug triage, and right-to-left support). She also gave a talk about technology management, and recruited candidates for the Wikimedia Foundation's current job openings (read more).

Upcoming events

 * OSCON (July 25-29, Portland, Oregon, USA) — A delegation of about a dozen Wikimedia engineers will be attending the Open Source Convention in July. Many of them will give talks (see full schedule).
 * Wikimania (August 2-7, Haifa, Israel) — Another delegation of about a dozen Wikimedia engineers will be attending the Wikimania conference in August. Besides the Developer Days and wm2011:OpenZIM Developers Meeting, they will also give several talks; the full schedule is now available.
 * Check out the Software deployments page on the wikitech wiki for up-to-date information on the upcoming deployments to Wikimedia sites.

Job openings
Are you looking to work for Wikimedia? We have a lot of hiring coming up, and we really love talking to active community members about these roles. The following positions have opened this month: Requests for proposals: The following positions are still open: Software Developer (Features), Systems Engineer (Data Analytics), Operations Engineer, Networking Contractor (Amsterdam), Software Developer (Rich Text Editing, Features), Product Manager (Features), Software Developer (Front-end) and Software Developer (Back-end).
 * Product Manager — Analytics
 * QA Lead
 * Operations Engineer — Networking
 * Director of Features Engineering
 * Internationalization and Localization Outreach
 * Internationalization and Localization Feature Development

Short news

 * CTO leaving WMF — Danese Cooper announced in early June that she would be leaving as Chief Technology Officer of the Wikimedia Foundation at the end of July.
 * Promotions and role changes — Erik Möller announced a reorganization of the engineering management team, promoting Rob Lanphier to Director of Platform Engineering, Tomasz Finc to Director of Mobile and Special Projects, Tim Starling to Lead Platform Architect and Mark Bergsma to Lead Operations Architect. Alolita Sharma is serving as acting Director for Features Engineering, and Erik Möller assumed the position of VP of Engineering and Product Development (read more).
 * Other changes — Guillaume Paumier's position officially transitioned from Product Manager to Technical Communications Manager.

Site infrastructure

 * Virginia Data Center — In June, our network setup in EQIAD, our facility at Equinix in Ashburn, Virginia, was finished. Two independent "transport" connections between our two data centers in Tampa and Ashburn have been installed, and local "IP transit" (Internet connectivity) is now available in Ashburn as well. We started replicating our thumb server from ms5 (tampa) to ms1004 (eqiad). All our 48 database servers are now up and running, and database replication will start this week. Several services should come online from eqiad in July.
 * Media Storage — Russ Nelson continued to develop the SwiftMedia extension to integrate MediaWiki with Swift. He also cleaned up and removed duplicate code, made performance assessments and, with the help of Neil Kandalgaonkar, got the extension to mostly work with UploadStash.


 * HTTPS & IPv6 — Ryan Lane announced that Wikimedia sites would be switching to protocol-relative URLs in July, as part of the work to properly support HTTPS. Support for the HTTPS cluster has been added to geoiplookup. An nginx logging module has been written for udp2log. HTTPS has been tested for bits, upload, test Wikipedia, and some smaller wikis.


 * June 23 outage — Production wikis suffered from a 45-minute outage on June 23; a postmortem was drafted, as well as a more general incident response document.


 * Server decommission donations — Rob Halsell announced that decommissioned Wikimedia servers would be donated to non-profits, and called for applicants. Applications are now closed.


 * Summer of Research 2011 — Asher Feldman and Ryan Lane created the systems infrastructure for the Summer of Research team to perform data mining and analysis work.

Testing environment

 * Virtualization test cluster — Project has been temporarily slowed down in favor of deploying HTTPS. Some work has been done though. Networking has been set up, instances can be created, and can register with puppet. Work is now being done to move the puppet configuration into a git repository so that instances can fully build themselves.

Backups and data archives

 * Data Dumps — The first run of the English Wikipedia dumps on the new beefy server was somewhat disappointing.  More than half of the files were truncated; clearly 32 jobs at once is a bit much for it.  We'll be doing some testing to find the sweet spot this coming month.  In the meantime the truncated files are being regenerated in smaller batches. Code to detect truncated dump files has been committed and will go live shortly.  This code let us find an issue with the zh wikipedia history dumps that has been around for almost a year; the issue is under investigation while a good dump is being generated in smaller chunks.  After fixing a glitch which caused the per-project dump web page not to be readable on hosts with the upgraded OS, installs are now fully puppetized, with just one host left to upgrade.
 * Backups — Network connectivity to our new data center is now available. The first data is being copied and/or replicated onto the new servers and storage systems in Ashburn, and all important data will be present in our new facility in July.

Editing tools

 * Visual editor 0.1 — Trevor Parscal continued to work on the front-end of the visual editor, and specifications for accessing the editing surface via the API. A hybrid rendering approach appears to be the best strategy for the visual editor. Neil Kandalgaonkar continued to work on the middleware, DOM and transactions. Neil also continued to work on a demo to integrate MediaWiki and Etherpad. With Alolita Sharma, they planned their upcoming sprints.  Neil and Trevor are posting about their work to the parser email list.
 * FlaggedRevs — Aaron Schulz improved user preferences and changed the way statistics are stored in the database, among other minor improvements. Chad Horohoe helped review the backlog of unreviewed commits.


 * Non-Roman character set localization — Alolita Sharma published two RfPs to assemble a dedicated team to tackle localization issues from a feature development and outreach perspective. She is currently reviewing applications and contracts.

Content Quality and Editorial Tools

 * Article Feedback — Additional features were added in June. An additional dashboard tracking articles receiving low ratings was added. In response to community feedback, tooltips were added to provide more information on the meaning of the star ratings. Roan Kattouw is currently implementing the UDP back-end to provide clicktracking metrics to assess user engagement. The community provided feedback and bug reports, and the development team addressed the concerns raised, for example by implementing a user preference to hide the tool. Dario Taraborelli continued to evaluate the data provided by the articles already showing the feature. The incremental roll-out to all articles on the English Wikipedia is planned to be completed by mid-July.

Participation and editor retention

 * WikiLove 1.0 — Jan Paul Posma completed the back-end work and fixed bugs. Roan Kattouw reviewed the code and enabled the extension on a private production wiki for testing. The feature was then enabled on a public prototype wiki that was used for informal user testing. The extension is planned to be deployed to production wikis on June 30.
 * MoodBar 0.1 — Brandon Harris updated the feature's design, while Andrew Garrett and Timo Tijhof have completed a first iteration of development on both front-end and back-end (see code in SVN).
 * StructuredProfile — This possible feature aims to make it easier for new editors to fill out their profile pages with meaningful information about their background and interests, and surface select profile information to experienced editors within lists such as recent changes, watchlist, etc. The ideas are still in development, and feedback is welcome on the StructuredProfile Talk page.


 * LiquidThreads 3.0 — WMF work on this project was mostly on hold in June due to limited resources, that were affected in priority to supporting the 2011 Board Election and the MoodBar.

Multimedia Tools

 * Upload wizard — Neil Kandalgaonkar continued to fix bugs, and added an additional functionality to show thumbnails before upload in modern browsers.

MediaWiki infrastructure

 * ResourceLoader 2.0 — This project was mostly on hold in June due to the lack of engineering resources. Work is planned to resume in July.

Wikimedia Labs

 * Media projects — Michael Dale continued to address Brion Vibber's comments from code review, and update and fix the TimedMediaHandler code. He also started to work on a test plan to perform user experience testing on a prototype.
 * Parser — Brion Vibber continued to work on the parser plan, and also moved the "parser playground" gadget to an extension. He invited the developer community to use it and provide feedback (read more).

Mobile

 * Mobile Research — In June, we completed our fieldwork in Brazil, consisting of 16 interviews, led by Parul Vora and Mani Pande in São Paulo, Salvador and Porto Alegre. We conducted extensive in-home interviews with three kinds of participants: readers of Wikipedia on a mobile phone, potential mobile readers (i.e. who currently use a computer, but could become mobile readers) and editors (primarily of the Portuguese Wikipedia, and to a lesser degree the English Wikipedia). We also received about 6 proposals from US firms in response to our RfP, to conduct research in three cities in the US. The mobile survey is scheduled to be launched at the end of July.
 * Mobile site rewrite — Tomasz Finc sent a call for testers to help test the prototype in English, Japanese and Hebrew. Feedback is now being addressed by the mobile team, who is tracking fixes and new feature requests in bugzilla. Patrick Reilly and Asher Feldman also worked together to profile the MobileFrontend extension (formerly "PatchOutputMobile") to prep it for deployment. We're now looking at how to integrate it with our Varnish and Squid caching architecture, so that we can have the advantages of the WURFL mobile device database with an acceptable performance.
 * Mobile site issues — In June, we noticed a lot of 500 errors for image resources on our mobile platform. Asher Feldman upgraded the Mobile ruby gateway to a newer version of Ruby and Passenger, which cleared up a lot of production issues and cut our service times dramatically.

Fundraising support

 * 2011 Fundraiser — Arthur Richards, Ryan Kaldari and Katie Horn are wrapping up their first development sprint, and preparing for the next one beginning on July 5. Their work focuses on new features for CentralNotice to better facilitate banner management, as well as back-end enhancements to the donation processing pipeline.

Offline

 * Wikipedia version tools — GSoC student Yuvi Panda continued to port the WP 1.0 bot to a MediaWiki extension. Mentored by Arthur Richards, and supported by WP 1.0 Bot author/maintainer User:CBM, Yuvi implemented an assessment template processing feature, and is now working on a WP 1.0 bot replacement feature that will automatically include real-time assessment statistics on project pages. A feature to filter and select articles based on assessment criteria is planned to be added in July.


 * Collections — Tomasz Finc discussed future work with PediaPress, Wikimedia Italia and others. Possible directions include scaling the Collections extension to output much larger collections, integrating it with the new Kiwix download manager, improving general user experience and making it a general purpose solution for article selection.
 * Kiwix UX initiative — We posted the videos from our Berlin usability testing session, and released the next beta release of Kiwix. Users of Kiwix can now easily download new openZim files right within the interface. We're looking at connecting it to the Collections extension (see above) so that anyone can easily download new books collections. If you're interested in participating, please check out our volunteer program. We're especially in need of an expert who can help us with some complex libtool issues.

MediaWiki Core

 * MediaWiki 1.17 — Tim Starling announced the release of MediaWiki 1.17.0. Besides many bug fixes, MediaWiki 1.17 features a new install wizard, the ResourceLoader (a tool to load JavaScript and CSS assets), category sorting improvements, and improved localization (see full release notes).


 * MediaWiki 1.18 — Thanks to the efforts of the code review team, the backlog of unreviewed commits for MediaWiki 1.18 was drastically reduced in June (see chart). Mark Hershberger started a discussion about which extensions to bundle with MediaWiki 1.18.
 * Code review management — The backlog of unreviewed commits continued to decrease in June. A long but productive discussion between developers happened on the wikitech-l list about how to further improve the code review process. It led to a proposal of a "20% policy", according to which every eligible Wikimedia engineer would spend 20% of their time doing service work that directly benefits the rest of the community.


 * Heterogeneous deployment — Priyanka Dhanda and Tim Starling added features and improved the code, which is now in SVN. Priyanka used it to deploy different versions of MediaWiki on a prototype. Tim and Priyanka are now discussing edge cases and remaining tasks.


 * HipHop support — HipHop support is still planned to be part of MediaWiki 1.20. In the meantime, we're looking for volunteers to help us package it for different distributions. Please contact Sumana Harihareswara or the wikitech-l mailing list if you have experience with packaging or would like to get involved in this area.


 * Disk-backed object cache — A deployment was attempted on June 27, but had to be rolled back. Tim Starling and Domas Mituzas are investigating what went wrong, and another deployment attempt will be scheduled soon.


 * Academic publications authentication proxy — Chad Horohoe started a project whose goal is to allow selected Wikimedians to access third-party academic publishing sites to help with verifiability. The authentication challenges this entails are not trivial; Ryan Lane is also involved in this project, particularly with his previous experience with OpenID (which may be used as a mechanism for tying to CentralAuth).


 * API maintenance — Sam Reed continued to fix bugs and to add new features to the MediaWiki API.


 * Shell requests — Priyanka Dhanda continued to process shell requests, after it appeared that a harmonization of wiki configuration options wouldn't be a significant time saver.


 * Projects on hold — The App-level monitoring and Configuration management projects were delayed in June, in favor of other work like the 1.17 release. Some of them will be resumed in July.

Wikimedia analytics

 * Wikimedia Report Card 2.0 — Erik Zachte and Nimish Gautam started a development sprint and worked on the back-end infrastructure, supported by Asher Feldman & Sam Reed. The information stored in a database is accessed via a new MediaWiki extension ("MetricsReporting", see in SVN), and the visualization part uses JQplot. The team hopes to demonstrate a prototype for the next report card in early July.

Technical Liaison; Developer Relations

 * Bugmeistering — Mark Hershberger continued conducting bug triages to surface issues that require attention or decisions; this month these meetings switched from phone to IRC to improve transparency and accessibility to the community.  After helping developers wrap up the 1.17 tarball, he started looking at 1.18 bugs, and led a triage to narrow down the list of open bugs blocking deployment of MediaWiki 1.18 on Wikimedia Foundation sites.  He also worked on several concerns raised by the community, such as enabling "International" numerals on Hindi wiki with Priyanka's help, and right-to-left and extension bundling issues.  He also did internal knowledge sharing on his Bugzilla API client.
 * Summer of Code 2011 — Our eight Summer of Code students continued working on their projects full-time, and all are now committing code. WMF staffers and community members are mentoring the students, assisted by Sumana Harihareswara. They are preparing for their mid-term evaluations, to take place in mid-July.
 * Engineering project documentation — Guillaume Paumier finalized the infrastructure for project pages, using templates and transclusion. Because of the tools' limits, full automation wasn't possible. He also continued to update project pages and statuses. A sprint is planned in July to catch up with the documentation of all projects.


 * Access to Subversion — Volunteer development coordinator Sumana Harihareswara is now the primary point of contact for commit access requests. About 7 new developers were granted commit access in June, among which were 2 Summer of Code students, and 2 Wikimedia Foundation employees.


 * translatewiki.net support — The list of 500 most used MediaWiki interface messages was updated to help translators focus on the messages with the most impact. The Translate extension may be reviewed in July to be used for content translation on Wikimedia sites, e.g. on meta-wiki.