Wikimedia Engineering/Report/2011/February

Major accomplishments this month include:
 * the racking party at our new data center in Virginia
 * the Data Summit that happened in early February in California
 * the release of Editor Trends study data and tooling
 * the painful, but ultimately successful, deployment of MediaWiki 1.17 to all Wikimedia wikis.

Note: In the past, each "monthly engineering update" has reported on what was accomplished the previous month: the previous "February update" hence reported on what we did in January. In order to avoid any ambiguity, and to be more consistent with the other Wikimedia reports, we're now going to explicitly call them reports of the previous month. This means this "February report" is about what we did in February.

Recent events

 * Data Summit (February 4, California) — A lot of fruitful discussions happened during this working session. Notes are available from the working groups on parsers, structured data and analytics.
 * FOSDEM 2011 (February 5-6, Brussels, Belgium) — Arthur Richards and Tomasz Finc attended and engaged in discussions about data, CiviCRM and Drupal. They also gave an overview presentation of the current state of Wikimedia analytics.
 * GNUnify 2011 (February 11-12, Pune, India) — This year's GNUnify Wikimedia track was an opportunity to present the general MediaWiki architecture, how to hack MediaWiki, the use of Drupal and CiviCRM at the Wikimedia Foundation, and the current and future state of Wikimedia mobile. An Android version of WikiSnaps was also developed there (see below). Alongside the technical tracks, numerous Wikipedians attended and gave presentations on the Schools offline projects, challenges within India, and basics on how to edit.

Upcoming events

 * Berlin Hackathon 2011 (May, Berlin) — If you haven't already, please participate in the poll to choose the date of this coding event. Participants are also listing topics to work on.
 * Wikimania (August 2-7, Haifa, Israel) — This year's Wikimania will be preceded by two days of hacking (August 2-3); the actual conference (August 4-7) will also include Technology tracks.

Personnel
Are you looking to work for Wikimedia? We have a lot of hiring coming up this year, and we really love talking to active community members about these roles. The following positions are currently open:
 * Volunteer Development Coordinator
 * Performance Engineer
 * Software Developer (Features)
 * Software Developer (Mobile)
 * Data Analytics Engineer
 * Operations Engineer
 * Senior QA Engineer

We are also looking for a contractor in the Netherlands who will support the Operations team in designing and maintaining the Wikimedia network(s), and perform on-site work in the data center facilities in Haarlem and Amsterdam.

In addition, we hope to post the following positions over the next few months:
 * Rich Text Editor Engineer
 * Release Engineer
 * Technical Writer

Short news

 * Visitors — Ward Cunningham started a year of monthly "in residence" visits to the Wikimedia office in San Francisco.
 * Hires — Sumana Harihareswara was hired as a contractor to help out with Google Summer of Code 2011 and the Berlin Developer meeting.

Operations
Virginia Data Center — Installation of a world-class primary data center for Wikimedia Foundation websites.
 * Status: Nearly all hardware has been delivered to the data center. More than 50 pallets of equipment have been unboxed, stacked and installed in the 16 racks by a 4-person team. Almost everything has been cabled, and we are working on the finishing touches, as well as the initial setup of all devices to make them available for management on the network. In March, configuration of the first clusters of servers and services will begin, while we wait for network transport and transit services to be installed.
 * Program manager: Mark Bergsma

Media Storage — Improvement of our media storage architecture to accommodate expected increase in media uploads.
 * Status: Contractor Russell Nelson has installed and deployed Swift on a test cluster of three machines. Some code has been written to integrate Swift with MediaWiki's thumbnail generation, as well as Squid, the caching proxy software used on the "upload" media serving cluster. We are still working on fixing some bugs and doing some preliminary testing, before we can deploy this test setup to serve a small portion of our media and replicated traffic (read more about distributed file storage choices).
 * Program manager: Mark Bergsma

Virtualization test cluster — Environment to deploy temporary machines for testing and experimentation, for use by WMF staff and volunteers working on important projects (as capacity allows).
 * Status: A new OpenStack has just been released, which contains the software features we need. This project was however also delayed due to the build out of the new data center. We expect to have the virtualization test cluster production ready in March.
 * Program manager: Mark Bergsma

Backups — Improvement of backup coverage of Wikimedia-hosted data.
 * Status: We have purchased a dedicated storage solution which will arrive in March, and improve the reliability of part of our data. Once servers in the new data center are online, and our private connection between Tampa and Ashburn is up, we will be able to replicate all data between the two data centers as well.
 * Program manager: Mark Bergsma

Data Dumps — Improvement of processes to create and provide public copies of public Wikimedia data.
 * Status: Dumps were suspended for the upgrade to MediaWiki 1.17, and delayed by the difficulties encountered during its deployment. They are now running again, and their performance is being tested. With the new version, they include the byte length of revisions (a popular request, implemented by Rob Lanphier). In order to facilitate production on a regular and faster basis, as well as reuse, Ariel Glenn is now looking into production of dumps in many small pieces.
 * Program manager: Mark Bergsma

Content Quality and Editorial Tools
Article Feedback — A feature to collaboratively assess article quality and incorporate reader ratings on Wikipedia.
 * Status: The deployment to our prototype has surfaced additional feature requirements that we've now addressed. Now that MediaWiki 1.17 has been successfully deployed, we can release the latest version of the Article feedback tool on the English Wikipedia, as part of our pilot experiment this quarter. Requirements for the next version (3.0) are being drafted.
 * Program manager: Alolita Sharma

Pending Changes — A feature to allow changes made by logged-out and new users to be reviewed before they appear as the primary version of an article.
 * Status: Developer Aaron Schulz has focused on bug fixes. Further development is waiting for the English Wikipedia community to come to a consensus regarding what the future of the trial should be. A new Request for Comment was started for this purpose.
 * Program manager: Alolita Sharma

Personal image filter — A feature to allow users to selectively hide images on a wiki.
 * Status: Following the 2010 Wikimedia Study of Controversial Content, Brandon Harris has created mockups of the feature, including initial UI design recommendations, in collaboration with the Community department and Board member Phoebe Ayers, who also sent an update. They will be presented to the Board of Trustees by the Strategic product team.
 * Program manager: Alolita Sharma

Review system — An interface for external reviews of Wikipedia content.
 * Status: At the request of the Strategic Product Department, Guillaume Paumier researched and compared previous and current initiatives of quality review of Wikipedia content. The analysis of goals and needs of both Wikipedians and "experts" led to a set of draft requirements for an open review system for Wikipedia, as well as an API and user interface for quality indicators.
 * Commissioned by: Erik Möller

Discussions and Interactions
Liquid Threads — A feature that brings threaded discussions capabilities to Wikimedia projects and MediaWiki.
 * Status: Andrew Garrett has published documentation on upcoming back-end and architecture changes. New design specifications have been published by Brandon Harris as well. A discussion was started on the "gender gap" mailing list about how this new discussion system could improve interactions between participants.
 * Program manager: Alolita Sharma

SimpleSurvey 2.0 — A MediaWiki extension to create and run surveys in MediaWiki.
 * Status: In our work on the Article Feedback tool, we used some functionality from the existing SimpleSurvey extension. In order to make it more robust, Trevor Parscal has been evaluating the existing codebase, refactoring the extension, and consolidating code from other survey extensions. SimpleSurvey will also help us conduct small surveys to support strategic research.
 * Program manager: Alolita Sharma

Editing features
Non-Roman character set localization — A set of tools to facilitate editing in languages using a non-Roman alphabet.
 * Status: Volunteer developer Junaidpv created the Narayam extension for MediaWiki, which adds input methods for some Indic scripts. Roan Kattouw refactored it heavily in order to facilitate its future deployment. We're planning to create a team in India to continue to work on Indic scripts, as a first step in our efforts to support non-Roman alphabet editors.
 * Program manager: Alolita Sharma

Multimedia Tools
Upload wizard — A feature that provides an easier way of uploading files to Wikimedia Commons, the media library associated with Wikipedia.
 * Status: Ryan Kaldari has joined Neil Kandalgaonkar to fix bugs and prioritize the work to be done for an UploadWizard 1.0 release.
 * Program manager: Alolita Sharma

JavaScript parsing library — A JavaScript parsing library for wikitext.
 * Status: Neil Kandalgaonkar implemented a JavaScript parser for wikitext using Parsing expression grammar. It will allow JavaScript tools to support internationalization, templating and other features; it will especially benefit multimedia and Media labs tools. Integration with ResourceLoader is underway.
 * Program manager: Alolita Sharma

MediaWiki infrastructure
Resource loader — A feature to improve the load times for JavaScript and CSS in MediaWiki.
 * Status: The deployment of MediaWiki 1.17 to Wikimedia sites has surfaced many bugs. Roan Kattouw and Trevor Parscal have worked on fixing them, and were also available for an IRC office hour to help JavaScript maintainers fix compatibility issues. Migration guides are now available for user and developers.
 * Program manager: Alolita Sharma

Community feature prototyping
In February, the Community and Tech departments started a joint experiment in which engineers are working even more closely with Community department staff. Developers are "embedded" in the Community department, to try out a more agile way to prototype features. Trevor Parscal started in this role in February, and will continue in March.

In February, Alolita Sharma and Brandon Harris also provided some support to the Outreach team, by discussing A/B testing requirements with Frank Schulenburg & Lennart Guldbrandsson.

Wikimedia Labs
Media projects — A set of features to improve media handling and key infrastructure support tools, many developed with Kaltura, such as Metavid, MwEmbed, and the Video Editor.
 * Status: Michael Dale has been working on the integration of TimedMediaHandler and the Add Media Wizard with the Resource Loader, by converting them from gadgets to MediaWiki extensions. A prototype for TimedMediaHandler is now available to showcase some of its new features.
 * Program manager: Alolita Sharma

Short news

 * Rich-text editor — In February, Trevor Parscal continued to assess technologies in preparation for the upcoming rich-text editor project for MediaWiki.

MediaWiki development
MediaWiki 1.17 deployment — Deployment of the latest MediaWiki version (1.17) to Wikimedia sites.
 * Status: In preparation for the planned deployment of MediaWiki 1.17 on February 8, all outstanding revisions were reviewed. The deployment was attempted twice that day, and eventually postponed because of major performance issues that caused an outage. The problems were investigated, and another plan was published, based on heterogeneous deployment (meaning not all wikis would run the same version of the software). Tim Starling and Roan Kattouw developed wmerrors, a PHP extension to display fatal error pages for PHP. On February 11, a first wave of small wikis were switched to MediaWiki 1.17. On February 16, other small and medium-sized wikis were switched. An attempt to deploy to our biggest wiki (en.wikipedia.org) resulted in a short outage. The English Wikipedia and all remaining wikis were successfully upgraded to MediaWiki 1.17 later that day (read the latest update). Many issues encountered this month were due to the large amount of code changes since the last release (almost 5500 changes reviewed over 7 months). In the future, software deployments should be smaller and happen more regularly, hence reducing the risk of repeated outages.
 * Program manager: Rob Lanphier

MediaWiki 1.17 release — The upcoming MediaWiki release.
 * Status: Now that MediaWiki 1.17 has been deployed to all Wikimedia wikis, remaining bugs are expected to surface and be fixed. We're hoping to release MediaWiki 1.17 soon for third-party users (see draft release notes), but problems related to DBMS support may delay it. Its main feature will be the Resource loader. It will also include category collation improvements. Developers are already discussing MediaWiki 1.18.
 * Program manager: Rob Lanphier

Test framework deployment — Creation of an automated test environment for MediaWiki using CruiseControl, Selenium, and PHPUnit.
 * Status: Foundation work on this was put on hold pending the 1.17 release. We're now planning on publishing an open request for proposals calling for developers to move this work forward. In the community, Markus Glaser continues to add support for database setup inside the Selenium framework.
 * Program manager: Rob Lanphier

Technical Documentation – Improvement of our technical documentation by making small, incremental improvements to the docs and docs process.
 * Status: The initial phase of this effort was wrapped up in February. We plan to put Foundation work on this on hold while we shift focus to Volunteer Developer services.
 * Program manager: Rob Lanphier

Wikimedia analytics
udp2log — A custom data analytics logging system.
 * Status: We initially attempted to deploy the multicast version of udp2log, but we discovered firmware problems in our routing infrastructure. Our plan is now to have a second machine that receives unicast logging messages that we use for secondary services.
 * Program manager: Rob Lanphier

OpenWebAnalytics — Installation and customization of an Open Web Analytics (OWA) platform to process data to support decision making
 * Status: We're testing OWA integration on private wikis with the goal of understanding its reporting characteristics and how sharing them publicly would work. We've begun a second engagement with OWA's author, Peter Adams, to ensure that it fully complies with our Privacy policy while being able to publish summary reports. We're evaluating public projects to run the next pilot against, since our fundraiser is concluded.
 * Program managers: Rob Lanphier & Tomasz Finc

Wikilytics — A toolkit to create data sets to analyze Editor Trends.
 * Status: During the Data Summit, Diederik van Liere released and presented the Python toolkit he developed as part as his work on data analytics for the Editor Trends Study. It is now available in SVN.
 * Program manager: Howie Fung

Mobile
Mobile site rewrite — Port of our existing gateway to another framework for easier support & collaborative development.
 * Status: We're still in hiring mode looking for a great developer to lead our effors. At the same time, we're also putting together a roadmap for our mobile development, and starting to coordinate research and development. We're drafting a survey now.
 * Program manager: Tomasz Finc

WikiSnaps for Android — Port of the mobile upload app experience with WikiSnaps to the Android platform.
 * Status: Currently in development and will be on GitHub shortly.
 * Program manager: Tomasz Finc

Offline
Wikipedia version tools — Support and development of a series of tools to select Wikipedia content for offline use.
 * Status: Currently, offline copies of Wikipedia content are generated by the Wikipedia 1.0 team through use of the release version tools written by User:CBM. Since many in the community would like to see more options, Arthur Richards is actively assessing the codebase on the toolserver to understand the work involved in extending the current toolset.
 * Program manager: Tomasz Finc

OpenZim for Collections — Integration of OpenZim into the Collections extension.
 * Status: PediaPress has wrapped up their first development push for adding openZim support to the collections extension. Testers are invited to test the new extension on PediaPress' test wiki. We're now collecting bug reports before deploying it to the live site.
 * Program manager: Tomasz Finc

Kiwix UX study — Evaluation of the user experience of the Kiwix mobile app to access offline Wikimedia content.
 * Status: We've finished our first UX pass over Kiwix and published the recommendations on the Kiwix wiki. Emmanuel Engelhart is implementing some of these new features while we gear up for the next phase of assessment. At the same time, we're engaging with the local Wikimedia community in India to see how well the tool is working.
 * Program manager: Tomasz Finc