Wikimedia Engineering/Report/2011/March

Major news this month include:
 * The publication of a Product whitepaper by the Strategic product team (and the associated update from Sue Gardner) that will guide future engineering efforts.
 * The return of Brion Vibber, Wikimedia's first employee, as Lead architect for MediaWiki.
 * The deployment of Article Feedback 2.0 to the English Wikipedia, and of Upload Wizard 1.0 to Wikimedia Commons.

Upcoming events

 * Berlin Hackathon 2011 (May 13-15, Berlin) — Daniel Kinzler announced the dates and location of the Berlin Hackathon. Registration is open until April 10. Participants are also listing topics to work on.
 * Summer of Code 2011 — Sumana Harihareswara sent a call for students for the upcoming summer of code. Developers are now signing up as students and mentors, and projects are being discussed. Read the dedicated article to learn more and join us.
 * Wikimania (August 2-7, Haifa, Israel) — This year's Wikimania will be preceded by two days of hacking (August 2-3); the actual conference (August 4-7) will also include Technology tracks.

Job openings
Are you looking to work for Wikimedia? We have a lot of hiring coming up, and we really love talking to active community members about these roles.

The following positions have opened this month:
 * Engineering Program Manager — Data Analytics

The following positions are still open:
 * Performance Engineer
 * Software Developer — Features
 * Software Developer — Mobile
 * Systems Engineer — Data Analytics (previously Data Analytics Engineer)
 * Operations Engineer
 * Senior QA Engineer
 * Networking Contractor — Amsterdam
 * Software Engineer — Community R&D

In addition, we hope to post the following positions over the next few months:
 * Rich Text Editor Engineer
 * Release Engineer
 * Technical Writer

Short news

 * Visitors — Ward Cunningham continued his "in-residence" visits to the Wikimedia office in San Francisco.
 * Hires — We're delighted to welcome Peter Youngmeister, as a Consultant Operations Engineer, and the legendary Brion Vibber, who rejoined the Wikimedia Foundation as Lead Architect,

Site operations
Virginia Data Center — Installation of a world-class primary data center for Wikimedia Foundation websites.
 * Status: The last pieces of hardware arrived at the data center and were racked. The network routers and switches were setup, and the configuration is about 60% done. The first servers are being brought up while we wait for our network connectivity to be installed. We expect to be able to serve limited live traffic and services starting in May.
 * Program manager: Mark Bergsma

Media Storage — Improvement of our media storage architecture to accommodate expected increase in media uploads.
 * Status: A test cluster of three machines running OpenStack Swift was deployed, and is now serving a small portion of media traffic. Contractor Russ Nelson is also developing MediaWiki FileRepo support for Swift, so new media uploads can be pushed to the Swift cluster directly.
 * Program manager: Mark Bergsma

Testing environment
Virtualization test cluster — Environment to deploy temporary machines for testing and experimentation, for use by WMF staff and volunteers working on important projects (as capacity allows).
 * Status: The deployment of the virtualization test cluster hardware (which was slightly delayed) is now ready for service. Ryan Lane released version 1.2 of his OpenStackManager extension and created detailed documentation on the setup. He will be finishing the deployment of the virtual test cluster in the first weeks of April.
 * Program manager: Mark Bergsma

Backups and data archives
Backups — Improvement of backup coverage of Wikimedia-hosted data.
 * Status: Backup coverage of Wikimedia hosted data will see a major increase as soon as connectivity between our two primary data centers is available and data can be copied and replicated. As reliability, fail-over and backup are the primary goals of the new primary data center, setting up live replicas and frequent backups of all our data will have the highest priority of service deployments there.
 * Program manager: Mark Bergsma

Data Dumps — Improvement of processes to create and provide public copies of public Wikimedia data.
 * Status: The dumps server is back, hardware repaired and running, and we have started to move data over as a live backup of the XML dumps. The new server for the English Wikipedia dumps arrived and is being set up.The January run of the English Wikipedia dumps completed in March and the history files are available for download in two formats.  The March run is almost complete and the history files are ready for download in one format already. We're also working with Google to enable regular mirroring of the most recent dumps to Google storage for download.
 * Program manager: Mark Bergsma

Short news

 * Thumbnail issues — Our existing, non-scalable media storage architecture hit a performance limit again, which caused image thumbnail download slowdowns around Monday March 28th. This is a known problem that will finally be resolved by our Media Storage redesign described above. In the meantime, we have been working on fixing the existing problems by fine tuning the performance and behavior of the existing systems, and increasing the memory capacity of the current media servers. We are also working on deploying a second thumbnail server to take on some load, as a temporary solution.

Content Quality and Editorial Tools


Article Feedback (phase 2) — A feature to collaboratively assess article quality and incorporate reader ratings on Wikipedia.
 * Status: The second phase of this feature was released on the English Wikipedia in mid March. A major change in the interface is the ability for reviewers to specify the source of their knowledge, e.g. if they have an academic degree in a related field (see screenshot). Experiments to encourage user engagement are being performed as well. Dario Taraborelli also published an analysis of the first phase experiment. We're currently expanding the scope of the experiment to include several thousand articles, in order to get results that are more meaningful statistically.
 * Program manager: Alolita Sharma

Article feedback (extended review) — An interface for quality reviews of Wikipedia content.
 * Status: The "Open wiki review system" is now considered as a possible evolution of the Article feedback feature. It would offer an interface to submit detailed quality reviews, as well as a system to sort and assess reviews. Ways to surface quality indicators for readers are also being explored.
 * Commissioned by: Erik Möller

Pending Changes — A feature to allow changes made by logged-out and new users to be reviewed before they appear as the primary version of an article.
 * Status: Development is in maintenance mode; work will resume when developer resources become available, and after the English Wikipedia community makes a decision regarding the future of the this trial. Steven Walling requested additional data to help the community come to a consensus.
 * Program manager: Alolita Sharma

Personal image filter — A feature to allow users to selectively hide media files on a wiki.
 * Status: Brandon Harris' initial UI design recommendations were presented to the Board of Trustees. Erik Möller is now coordinating with Brandon to take the Board's feedback into consideration. See the detailed article published in a recent Signpost issue.
 * Program manager: Alolita Sharma

Discussions and Interactions


Wikilove 0.1 — A user script to encourage praise and virtual gifts between users.
 * Status: Because many automated patrolling tools and gadgets are focused on making it easy to warn or reprimand users, Ryan Kaldari wrote a user script to facilitate nice behavior between editors. For example, it is now possible, on the English Wikipedia and other wikis, to give a "virtual kitten" to another editor. The script was adapted for use by the Russian and Tamil communities, and Ryan is helping support other communities willing to use it.
 * Program manager: Alolita Sharma

Multimedia Tools
Upload wizard — A feature that provides an easier way of uploading files to Wikimedia Commons, the media library associated with Wikipedia.
 * Status: Neil Kandalgaonkar and Ryan Kaldari continued to fix bugs, test functionality, and generally ready the software for a 1.0 release. Roan Kattouw reviewed the code and deployed the changes to the Commons prototype; Neil sent out a call for testing to uncover remaining bugs. The 1.0 release was deployed on Commons on March 30th.
 * Program manager: Alolita Sharma

Community feature prototyping
As the first engineer "embedded" in the Community department, Trevor Parscal completed the first experiment, related to the location and appearance of the edit link. The results are not available yet, but will be published in the coming weeks. He's now turning to the account creation improvement project (and the associated A/B testing) with Frank Schulenburg & Lennart Guldbrandsson.

Nimish Gautam and Roan Kattouw also provided support for the A/B testing and deployment respectively.

Engineering support
Editor survey — Integration work between LimeSurvey and MediaWiki to support
 * Status: In preparation for the upcoming Editors survey conducted by the Global development department, work was done to integrate the survey software (LimeSurvey) with Wikimedia's infrastructure. Arthur Richards and Nimish Gautam worked on the back-end to allow LimeSurvey to pull information directly from our database, and automatically provide useful stats about editors, hence simplifying and shortening the survey. Ryan Kaldari worked on integrating LimeSurvey with CentralNotice.
 * Program manager: Alolita Sharma

Other projects



 * Style guide for forms — Designer Brandon Harris published a draft style guide for forms in MediaWiki, and started a discussion on wikitech-l.
 * Liquid Threads — Main developer Andrew Garrett laid down a timeline for his upcoming work on this feature that brings threaded discussions capabilities to MediaWiki. He will first focus on back-end work, before moving to documentation and front-end.
 * SimpleSurvey 2.0 — Work on this survey extension for MediaWiki is currently on hold, and will resume as developer resources become available.
 * JavaScript parsing library — Work on this JavaScript parsing library for wikitext was slowed down in favor of the Upload Wizard.
 * Resource loader — This core feature of MediaWiki 1.17, improving the load time for JavaScript and CSS, is now feature-complete and transitioned to maintenance mode. Trevor Parscal and Roan Kattouw continued to fix bugs as they arose.
 * Non-Roman character set localization — Roan Kattouw deployed the Narayam extension, that he had previously refactored in depth. It is now in production on all wikis in Malayam language. This extension adds input methods for some Indic scripts.

Wikimedia Labs
Media projects — A set of features to improve media handling and key infrastructure support tools, many developed with Kaltura, such as Metavid, MwEmbed, and the Video Editor.
 * Status: The back-end code of the TimedMediaHandler extension was reviewed by Roan Kattouw, and Michael Dale started to integrate the feedback in the code. The front-end and JavaScript code will be reviewed by Trevor Parscal.
 * Program manager: Alolita Sharma

MediaWiki development and tools
MediaWiki 1.17 release — The upcoming MediaWiki release.
 * Status: Developers continued to fix bugs discovered after the deployment of MediaWiki 1.17 to Wikimedia sites. A few issues remain, notably related to the new installer and the support of alternative database management systems. We plan to release a beta in early April.
 * Program manager: Rob Lanphier

Code review — Review of changes made to the MediaWiki code.
 * Status: After the 1.17 code review sprint, the number of unreviewed new revisions started to increase again (see the automatically generated chart). Mark Hershberger started to assign name tags to revisions, to help developers track reviews that are requested from them.
 * Program manager: Rob Lanphier

Bugzilla 4.0 upgrade — Upgrade of our bug tracker to the latest version of Bugzilla.
 * Status: Priyanka Dhanda coordinated with Rob Halsell to prepare for the upgrade. A prototype was set up, the Vector skin was cleaned up, and some old tweaks were moved into extensions. Chad Horohoe also used the prototype to try out a summary report script shared by the KDE community.
 * Program manager: Rob Lanphier

Performance optimization
PoolCounter — A MediaWiki extension to avoid parser deadlocks on high-traffic pages.
 * Status: Tim Starling deployed this extension, written by Platonides to controls the number of simultaneous parses that happen on a single page (to avoid the "Michael Jackson" effect). It was later disabled because of a bug now fixed; Platonides also added integrated statistics to this tool. We plan a second deployment attempt early the week of April 4.
 * Program manager: Rob Lanphier

Ehcache deployment — Deployment of a disk-backed object cache to increase parser cache hit ratio.
 * Status: Tim Starling investigated Wikimedia's low parser cache hit ratio and suggested to increase the parser cache size to reduce Apache CPU usage. After researching available options for disk-backed object caches, he selected EHcache and wrote a MediaWiki client for it. Our test deployments showed promising results, but also surfaced additional problems that we need to sort out.
 * Program manager: Rob Lanphier

Wikimedia analytics
udp2log — A custom data analytics logging system.
 * Status: A second logging machine was installed and a load balancer set up to handle the amount of data. Data is now being collected, sampled, filtered and cleaned up. The long-term plan is still to use multicast, in order to allow for growth.
 * Program manager: Rob Lanphier

A/B testing — A set of tools to perform A/B testing on Wikimedia sites.
 * Status: Nimish Gautam and Trevor Parscal are working on a tally extension, based on the ClickTracking extension. Its purpose is to provide a managing console for A/B tests via an interface similar to how we manage banners in CentralNotice. A "bucketing" extension is also planned, that will divert people to the proper test group. This feature will be integral to the account creation improvement project led by the Community department.
 * Program managers: Rob Lanphier

Report card — A monthly report of key metrics to measure community health.
 * Status: Erik Zachte tweaked his code on page view statistics. Future improvements include mining the CentralAuth database to identify accounts of the same user across wikis, and use this information to refine editor counts.
 * Program managers: Rob Lanphier

Technical communications
Development process improvement — A project to increase transparency and organize Wikimedia Foundation's engineering efforts more efficiently.
 * Status: Guillaume Paumier revived this project and has been focusing on summary pages and versions & phases for Wikimedia-funded engineering projects. The goal is to make it easier to find this information and keep it up-to-date, for the benefit of staff, volunteer developers and users.
 * Program manager: Rob Lanphier

Wikimedia blog overhaul — A project to consolidate and improve the Wikimedia blogs.
 * Status: After assessing the current situation of Wikimedia blogs, Guillaume Paumier worked with the Communications team, and other departments, to collect requirements. A technical proposal was then created and a prototype set up. Implementation should now happen shortly.
 * Project manager: Guillaume Paumier

Other projects

 * MediaWiki 1.17 deployment — Some bugs and other minor issues were fixed following the deployment of MediaWiki 1.17 to Wikimedia sites.
 * Test framework deployment — Work on this automated test environment for MediaWiki (based on Selenium and PHPUnit) is currently on hold. It will resume when the virtualization cluster is in place, and resources become available.
 * OpenWebAnalytics — We're wrapping up our work on OWA until we're able to hire our new dedicated analytics team. In the short term, we're focusing our efforts on A/B testing and other immediate needs, allowing the future analytics team to map out a long-term strategy.
 * API maintenance — Sam Reed continued to work on the backlog of bugs and feature requests. He is also investigating appropriate APIs for monitoring system health.
 * Shell bugs — Site requests that require shell access to the servers are mostly handled by Rob Halsell and a few dedicated volunteers. Priyanka Dhanda is going to join the team and help out where possible.
 * Access to Subversion — Rob Lanphier, Priyanka Dhanda and Chad Horohoe have joined Tim Starling to handle requests for commit access to Subversion.
 * Migration to Git — Migrating from Subversion to Git was discussed on the wikitech-l list and issues were raised. The engineering staff is interested in supporting this migration once consensus is formed amongst developers.
 * Heterogeneous deployment — The deployment of MediaWiki 1.17 across Wikimedia sites confirmed the need for a way to target software changes and upgrades to specific sets of wikis. Progress is expected to be done by the deployment of MediaWiki 1.18.
 * Software deployments tracking — A new page on the wikitech wiki is now tracking recent and upcoming software changes, besides the server admin log.
 * Wikistats — Erik Zachte checked in the source code of many of his tools (that provide general statistics on Wikimedia wikis) into our code versioning system.

Mobile
Mobile — All things Mobile and Wikimedia
 * Status: We've almost wrapped up our hiring and have been active on building a space for all mobile projects. We're eager to have this as the launching point for not only the port but for any of our other projects. Many thanks for User:Qgil for helping us build out this space. Along side our software engineer efforts were also setting to do a significant amount of on the ground research on for mobile. As soon as we wrap up hiring (very soon) we'll be speeding across on the various mobile projects.
 * Program manager: Tomasz Finc

WikiSnaps for Android — Port of the mobile upload app experience with WikiSnaps to the Android platform.
 * Status: Our volunteer dev Vivek has been a bit busy and is almost done with his checkin. We'll put up the github link as soon as we get it.
 * Program manager: Tomasz Finc

Offline
Wikipedia version tools — Support and development of a series of tools to select Wikipedia content for offline use.
 * Status: We finished assessing the existing tools and are actively working with their original author (User:CBM) to plan our next steps. The project is going to focus on making it easier to create schools collections of Wiki projects and is an excellent fit for a Google Summer of Code project. We are also in active discussions with the one of the most active offline project members (User:Walkerma) to make sure that our use cases are capturing what's needed.
 * Program manager: Tomasz Finc

OpenZim for Collections — Integration of OpenZim into the Collections extension.
 * Status: After a successful launch to the enhancements, we've collected both email feedback and bugs that have arisen. We are now exploring where else we might engage with PediaPress for further work to improve the workflow of our offline projects.
 * Program manager: Tomasz Finc

Kiwix UX study — Evaluation of the user experience of the Kiwix mobile app to access offline Wikimedia content.
 * Status: We finished our first development sprint of the Kiwix UX improvements. Our next step is to work with testers from Wikimedia Kenya, Wikimedia India and WMF staff members to help us find bugs in the beta. If you would like to help us, please sign up as a tester. For our next sprint, we're looking at adding an integrated download manager to facilitate the download of new openZim collections. See our mockups for an early preview.
 * Program manager: Tomasz Finc