Wikimedia Engineering/Report/2011/August

Major news in August include:
 * Technical discussions at Wikimania and Developer Days;
 * Progress on HTTPS, and generally better processes in Operations;
 * The kick-off of the Internationalization and localization tools project;
 * New features in UploadWizard, including major work on customized campaigns for the Wiki Loves Monuments event;
 * MediaWiki 1.18 and the new Mobile platform approaching deployment readiness;

Hover your mouse over the green question marks to see the description of a particular project.

Recent events

 * Wikimania (August 2-7, Haifa, Israel) — A delegation of about a dozen Wikimedia engineers attended the Wikimania conference in early August, as well as the Developer Days and OpenZIM Developers Meeting. A special focus of the dev days was the recruitment and mentoring of new MediaWiki developers. Extensive notes were taken, with the hope of turning them into perennial on-wiki documentation.


 * Google Tech Talk (August 25, Mountain View, California) — Erik Möller, Rob Lanphier and Alolita Sharma presented a tech talk at Google to give an all-round update across Wikimedia's engineering projects, to help refresh the understanding of Googlers and other interested parties. The presentation slides are available, and the video will be made available through the GoogleTechTalks Youtube channel.

Upcoming events

 * New Orleans hackathon (14-16 October, New Orleans) — Ryan Lane and Sumana Harihareswara are organizing a coding event on the theme "The infrastructure of innovation". The hackathon's goal is to advance Wikimedia's tools and infrastructure; a major focus will be Wikimedia Labs, starting with the dev-ops virtualization cluster. Other areas of work include gadgets/extensions/tools support, authorization/authentication strategy, and general training and hacking.
 * Check out the Software deployments page on the wikitech wiki for up-to-date information on the upcoming deployments to Wikimedia sites.

Job openings
Are you looking to work for Wikimedia? We have a lot of hiring coming up, and we really love talking to active community members about these roles. The following positions opened in August:
 * Product Manager (New Editor Engagement)

New and open Requests for Proposals:
 * Engineering Outreach
 * Development and Operations Engineer
 * Article Feedback Feature

The following positions are still open: Product Manager (Analytics), QA Lead, Operations Engineer (Networking), Director of Features Engineering, Systems Engineer (Data Analytics), Software Developer (Rich Text Editing, Features), Software Developer (Front-end) and Software Developer (Back-end), Product Manager (Mobile), Software Developer (Mobile)

Short news

 * Jeremy Postlethwaite joined as Software Engineer for Fundraising (announcement).
 * The team for the Internationalization and localization tools (I18n/L10n) project was put together in August. It consists of Siebrand Mazeland (Product Manager, Localization), Niklas Laxström & Santhosh Thottingal (I18n/L10n Software Engineers), and Gerard Meijssen (I18n/L10n Outreach Consultant).
 * Contractor Aaron Schulz is now a full-time employee, working as Software Developer Back-end (announcement).

Training and Process Improvement

 * Operations staff meeting — The Operations team got together the week of the 22nd. The goals of the meeting were to: improve and share site recovery knowledge (documentation and training); share knowledge of new project designs; review & prioritize operations projects; document and communicate our EQIAD data center buildup milestones; and to develop the RT management process.

Site infrastructure

 * Tampa Data Center — Mark Bergsma put into production the second router, which means we have now router redundancy in our Tampa network infrastructure. Follow-up work is underway to fully implement auto-router (hot) failover, which should be completed by mid September. Mark also standardized our LVS implementation and puppetized the configuration. Since our data center contractor in Tampa left us, the installation of new application servers was delayed. Other highlights in August include software upgrade to our Squid servers, and upload performance issues (now solved).


 * Virginia Data Center — Asher Feldman deployed the new Mobile Varnish servers in our Eqiad data center. All six LVS servers are ready and two of them are in production now, load-balancing the mobile Varnish servers. Ben Hartshorne and Asher also created two new database servers for the Summer of Research interns, in addition to the original one created last month. Last, the team updated the backup procedures documentation to reflect Eqiad being our (current) key backup and recovery store.


 * Media Storage — Russ Nelson refactored code in the SwiftMedia extension following Tim Starling's code review feedback; he also asked users for peculiar use cases involving file manipulation. Ben Hartshorne started to plan for the deployment and switchover on Wikimedia sites.


 * HTTPS — Roan Kattouw and Ryan Lane have been fixing issues that surfaced during the internal testing period. New servers were ordered for the AMS and TPA to handle SSL termination processing. In the meantime, Ryan has been setting up SSL servers in eqiad, enhancing Varnish to deal with X-Forwarded-For and X-Forwarded-Proto HTTP headers, and making necessary changes to Squid. On August 31st, HTTPS was enabled on wikimediafoundation.org and Wikimedia Commons.

Testing environment

 * Virtualization test cluster — Work has resumed on this project. The puppet configuration used in the production sites has been split into public and private repositories, and all sensitive information has been moved to the private repository. Gerrit has been configured, and the public puppet configuration will soon be moved into a public repository there. Labs LDAP and SVN LDAP are currently being merged, so that SVN users will more easily have access to Labs.

Backups and data archives

 * Data Dumps — August runs for English Wikipedia started on the 3rd of the month and completed on the 13th, after a couple of restarts of the history phase of the dumps.  We also cleared out the backlog of earlier incomplete dumps for May, bringing us up to current.  In the meantime, we did yet more tweaking of the filesystem to try to reduce file corruption and truncation issues.  And finally, code to "checkpoint" the history dumps by writing a sequence of smaller files was announced and tested, and it will be used for the next production run in September.

Editing tools

 * Visual editor — Trevor Parscal and Inez Korczynski worked on a transaction-based model for the visual editor, where the document is built as a series of events (instead of saving it entirely at every change), which makes it easier to undo actions. Neil Kandalgaonkar continued to work on real-time collaboration and is close to presenting a demo of Etherpad working inside a MediaWiki edit window. Ian Baker investigated and started to work on a chat system to be integrated to the concurrent editing interface, for collaboration and live help. More details on the wikitext-l mailing list.
 * Internationalization and localization tools — Siebrand Mazeland, Niklas Laxström and Gerard Meijssen joined the project in August. Niklas focused on code review for MediaWiki 1.18 regarding internationalization issues; he also introduced more flexible language fallback sequences. Siebrand worked on the product roadmap for 2011-2012, and started to plan hackathons & localization sprints in India in November. The Kiwix offline app was added to translatewiki.net, where it can now be localized.

Participation and editor retention
Development was completed in July, with minor fixes in August. Dario Daraborelli analyzed data from the usage of the tool on the English Wikipedia, which showed that WikiLove messages were disproportionately sent by new editors. wlove-data>Special:MyLanguage/WikiLove/Data|Data specifications were published in preparation for the release of data dumps; meta-summer>:m:Research:Wikimedia Summer of Research 2011|Summer of Research fellows also worked on an algorithm to automatically categorize WikiLove comments.
 * Article feedback — Development was completed in July; August was mostly devoted to data analysis. Dario Taraborelli analyzed the volume of edits, and couldn't yet find any statistically significant difference in edits before and after the activation of the feature on English Wikipedia articles. In order to clarify licensing and privacy policies regarding the data (and to facilitate its reuse by external tools & researchers), an explanation was published, stating that user feedback data (from Article feedback and MoodBar, for example) were considered public contributions just like any edit.
 * WikiLove —
 * Feedback Dashboard — Brandon Harris started to design a dashboard to surface and sort data from MoodBar comments. It could become a help center where experienced users can easily answer questions and concerns from new users.
 * GlobalProfile — Brandon Harris presented this project at Wikimania, where it was well received and echoed other talks encouraging better tools for social interactions between users.
 * QuickComments — Brandon Harris started to design this feature proposal after it appeared that many new users were using the WikiLove tool to send messages to other users, because they couldn't find any other way. Initial designs add a new icon at the top of user pages, which opens a modal overlay to leave a new message.

Multimedia Tools

 * UploadWizard — Ian Baker worked on the TitleBlacklist API, as well as bug fixes for UploadStash. Together with Neil Kandalgaonkar, he investigated video thumbnail issues. Ian also released Neil's message string library (that leverages wikitext, jQuery, and internationalization tools) after packaging it into a MediaWiki extension. Contractor Jan Gerber added XHR FormData support to UploadWizard, and chunk uploads. Jeroen De Dauw's code to support customized campaigns was deployed to the Commons prototype wiki, then to production on Commons; Jeroen also added new features based on the feedback from Wiki Loves Monuments organizers. Neil reviewed Jeroen's and Jan's code, and generally prepared the code for deployment.

MediaWiki infrastructure

 * ResourceLoader — Roan Kattouw and Timo Tijhof got together in late August to do back-end work on Global gadgets. They improved the format for defining gadgets, which will eventually be done via a user interface. Gadget internationalization is now also fully supported and happening in a MediaWiki: page for each message, as opposed to being a large blob in the gadget source.

Wikimedia Labs

 * Multimedia — Michael Dale completed the fixes suggested in code review, and continued to prepare the extension for deployment. Jan Gerber fixed an ffmpeg seek issue and cleaned up transcode key names.

Mobile

 * Mobile Research — Mani Pande and Parul Vora continued to synthesize the findings from field research in India and Brazil; a research page was also created on meta. They launched user experience research in US with AnswerLab, and have started recruiting readers and editors for ethnographic research to be conducted in San Francisco, Dallas and Chicago. The mobile survey was prepared in LimeSurvey, and translations are ongoing.


 * MobileFrontend —

Fundraising support

 * 2011 Fundraiser — Ryan Kaldari built views and filters to facilitate reviewing changes to CentralNotice campaigns and banners. Katie Horn added an API to the ContributionTracking extension as well as minor bug fixes to some of our Drupal/CiviCRM modules. Jeremy Postlethwaite joined the team and began getting familiar with the code base. Arthur Richards added a new log parser and made bug fixes to the contribution auditing framework. Jeff Green analyzed our server architecture and began taking strides to increase resiliency and security.

Offline

 * Wikipedia version tools — Yuvaraj Pandian successfully completed his Google Summer of Code project. He achieved feature parity with User:CBM's WP 1.0 bot and added the ability to save selections of articles, manually modify/delete the contents of a selection, and export a CSV of article selections. Yuvaraj reached out to the community to review his code, and plans to continue work on the extension, which still requires extensive testing and bug fixing.
 * Kiwix UX initiative — Following the release of Kiwix 0.9 beta1 in July, Tomasz Finc invited users to test it, which surfaced issues with the Ubuntu package. A mailing list dedicated to testing Kiwix is also available.

Platform Engineering
An overview of the Platform engineering team was published on the Wikimedia blog in August.

MediaWiki Core

 * MediaWiki 1.18 — Code review on MediaWiki 1.18 progressed well in August and should be over by mid September (see chart). Engineers have been going through the remaining revisions in the revision report and identifying those that needed fixing. Actually resolving the issues discovered took longer but the rate eventually improved. Gradual deployment to Wikimedia wikis is planned over September, using the newly completed heterogeneous deployment system.
 * Code review management — Code review efforts largely focused on MediaWiki 1.18 in August. Still, code review of trunk also remained under control (see chart), which is encouraging, since it's likely to lead to a shorter release cycle. Revisions are now tagged more systematically where specific expertise is needed, e.g. with "front-end", "database" or "i18n"; this also makes it easier to involve volunteers and to hold focused sign-off triages. Also, Wikimania's developers days included a code review training.
 * Heterogeneous deployment — Aaron Schulz completed most of the features, and Tim Starling reviewed Aaron's code. Only a few minor fixes were required; the system should be deployed in early September, so it can be used for the deployment of MediaWiki 1.18 to Wikimedia sites.
 * API maintenance — Sam Reed continued to maintain the API and to work on the Report Card API component, and reviewed API patches from volunteer John du Hart.
 * Shell requests — Sam Reed continued to go through the backlog of shell requests and to process them. He notably enabled the AbuseFilter extension on all Wikimedia wikis.
 * Continuous integration — Chad Horohoe continued to set up the virtual machine environment, while the Operations team set up the physical hardware in the Virginia data center. The final server will use Jenkins instead of CruiseControl.
 * Wikitext scripting — Volunteer Victor Vasiliev worked on a MediaWiki extension to embed scripts into pages; this was a result of discussions over the years about replacing ad-hoc template- and ParserFunctions-based logic by a more efficient and powerful solution. Tim Starling discussed the extension with Victor to become more familiar with his work, and researched other alternatives. He (Tim) wrote a PHP extension embedding a Lua interpreter, and added support for it to the existing Lua MediaWiki extension for backward compatibility.
 * Projects on hold — The HipHop deployment, AcademicAccess and Disk-backed object cache projects were on hold in August.

Wikimedia analytics

 * Wikimedia Report Card 2.0 — Nimish Gautam and Sam Reed worked on allowing content from CSV files and from Google Spreadsheets into the dashboard. Nimish also mined data to identify editors by geography, and worked on a page views tab, using the WURFL library to estimate mobile page views and device capabilities.

Technical Liaison; Developer Relations

 * Bug management — Mark Hershberger held bug triage sessions on Mobile & PDF export/Collections. The bug triage page now lists past and upcoming triages, as well as notes and summaries when available.
 * Summer of Code 2011 — In August, the GSoC students finished their projects and students and mentors turned in their final evaluations; all seven remaining students passed. They started to write to the wikitech-l mailing list to summarize what they finished and what still needs to be done. For example, Salvatore Ingala wrote an integration howto to guide other MediaWiki developers in merging his code into trunk. Students are expected to upload representative tarballs of their code into the Google Code portfolio repository.
 * Engineering project documentation — Guillaume Paumier continued to update project documentation pages and to write engineering reports.
 * Volunteer coordination and outreach — Sumana Harihareswara has been following up on contacts made at OSCON and Wikimania conferences. She has publicized the NOLA Hackathon and encouraged extension, gadget, script, tool, and template developers to attend. Additionally, she has been publicizing the work of the parser and visual editor team, encouraging code reviewers, and finding administrators and developers of other intensive MediaWiki installations to bring them into the larger MediaWiki ecology. In August, 9 developers were granted commit access: six volunteers and three Wikimedia Foundation employees.
 * MediaWiki architecture document — Greg Wilson, editor of the Architecture of Open Source Applications book, contacted the engineering department of the Wikimedia Foundation to offer to include a chapter on MediaWiki in volume 2 of the book, which presents the architecture of large-scale open-source projects, and decisions that led to it. Since it appeared that a document would also be generally useful to help new developers dive into MediaWiki development, Guillaume Paumier and Sumana Harihareswara accepted the responsibility of leading the collaborative writing of the document by the MediaWiki community.