Engineering metrics in May:
Major news in May include:
Berlin hackathon (1–3 June 2012, Berlin, Germany)
- The Wikimedia technical community prepared tutorials and plans for the event. MediaWiki developers, Toolserver users, systems administrators, bot writers and maintainers, Gadget creators, and other Wikimedia technologists looked forward to learning about and working on Lua, Git, Gadgets changes, security, Wikidata, RENDER, and other Wikimedia technologies. More information will be available in the June engineering report.
Wikimania hackathon (10–11 July 2012, Washington, D.C., USA)
- Katie Filbert, Gregory Varnum, and Sumana Harihareswara are organizing a hybrid inreach/outreach hackathon occurring just prior to Wikimania, and aim to make it welcoming for both novices and experts. Experienced Wikimedia technologists will collaborate on their own projects, while interested new developers will be able to learn introductory MediaWiki development. Accessibility will be one of the event themes.
Are you looking to work for Wikimedia? We have a lot of hiring coming up, and we really love talking to active community members about these roles.
- May has been a busy month for racking, stacking and provisioning of newly purchased servers, especially by Rob Halsell, Chris Johnson, Leslie Carr and Mark Bergsma. Recently, we purchased new hardware for server refresh (many are out of warranty and over 4 years old), adding capacity and redundancy, and for new projects, including servers for Search, Analytics, Fundraising, OpenStreetMap, databases, Varnish, Memcached and backups. Much effort was put into OS installation and servers network; they are now ready for the various system and application deployments.
- IPv6 work went into full swing as well, in order to be ready for IPv6 Launch Day on June 6. As of end of May, the database schemas were updated, and work started on refactoring LVS, PayPal, Varnish, Squid, DNS, Nagios monitoring and puppetization.
- In April, we deployed the newly built Search cluster at our Ashburn datacenter, and disabled the Tampa search cluster. This month, Peter Youngmeister went through the exercise of upgrading the 4-year-old Tampa search cluster infrastructure, and brought it back up. We now have a cross-datacenter hot standby for the 'Search' service.
- With Ubuntu 12.4 (Precise Pangolin) available, we have packaged and started using it selectively in some of our systems, including Search at Tampa and half of the LVS servers. Next, we will be setting up the Apache servers at Ashburn data center using Precise as well.
- Recently, we were experiencing a few systems rebooting themselves. Faidon Liambotis investigated and reported a bug with our kernel in 10.4 that caused servers to reboot after about 208 days of uptime. We applied the necessary kernel and security patches to the impacted servers.
- Ben Hartshorne has been working with the SwiftStack folks on enhancing Swift to provide additional Swift-specific monitoring to our ganglia tool. Next, they will work on identifying potential Swift performance bottlenecks (when under load) in our implementation and recommend mitigation. Ben has started testing the upgrading of our current version to the just-released 1.4.8. This should improve stability of the software.
- The Labs infrastructure had a couple outages, due to excess load and the GlusterFS system. As a result, Ryan Lane, Faidon Liambotis and Andrew Bogott are working on a get well plan, which includes finding a suitable replacement of GlusterFS. The short term plan that is in the works however would expose us to a non-redundant infrastructure, by placing the instances in local storage on each node. Longer term plans are evaluating Ceph and possibly writing a new filesystem mode in OpenStack to use DRBD in a way similar to Ganeti. Faidon implemented a new way of managing puppet that allows users to test all of their changes locally before pushing them in for review. Sara Smollett moved her changes for ganglia from Labs to production. Andrew Bogott has been working on bringing up the new cluster in eqiad, for testing ceph, testing the upgrade of OpenStack from Diablo to Essex, and preparing for the new zone we'll add there. Ryan Lane wrote a new software deployment system, slated for use in Labs and in production, using git-deploy and saltstack.
Backups and data archives
- We've been busy creating bundles of media in use per project and the first set of files is almost complete. For each wiki, there is now one or more files containing all media uploaded locally to the wiki, and one or more files containing all media used by the wiki but uploaded to Commons. We've also been preparing for the media back-end switch to Swift; since we won't be able to make copies of all media files in the usual way, some scripts were hacked together which will check the
imagelinks tables and will retrieve and/or update media files via http as needed. Your.org and Masaryk University mirrors officially came online; we're still looking for other partners to host media backups and pageview statistics.
The team completed release planning
for June and welcomed James Forrester
as the Technical Product Analyst for the project. Ongoing work on the Visual Editor
is tracked on-wiki. Gabriel Wicke
set up a very basic parsoid service
that lets users browse the English Wikipedia as Parsoid sees it, and convert Wikitext to HTML DOM and vice versa.
worked with OmniTI and new WMF engineer Matthias Mullie
to develop a range of new features
for version 5 of the Article Feedback Tool
(AFT5). This month, the team deployed a new look and feel for the article feedback page
(based on a streamlined design by Pau Giner
) as well as a central feedback page
, where editors can monitor posts from all articles on Wikipedia. We also developed a final feedback form
(scroll to bottom of page), which gradually engages users to contribute to the encyclopedia. Dario Taraborelli
, Oliver Keyes
and Aaron Halfaker
collected and analyzed data on how posting feedback impacts both conversion and newcomer quality
. Based on this analysis, we now project over 2 million feedback posts per month on the English Encyclopedia when the tool is widely deployed later this year (on par with the total number of edits per month). Our research suggests that posting feedback encourages a substantial number of users to productively edit articles on Wikipedia, which is expected to help reverse the recent decline in both new and existing editors. Roan Kattouw
continued to review our code and deploy weekly releases, while training our team to deploy their own code over time. We are planning for a wider deployment by the end of June, with full deployment a couple months later.
, Benny Situ
, Ian Baker
, Brandon Harris
, Oliver Keyes
, Fabrice Florin
and Howie Fung
deployed the first prototype of a list view for Page Triage
on the English Wikipedia. This new tool, called New Pages Feed
, provides an enhanced list of pages for review by community patrollers. The team started work on a new curation toolbar to appear on article pages, enabling patrollers to get more article info, mark pages as reviewed, tag them or nominate them for deletion. Current goals are to complete development of this new curation toolbar this month, then deploy an integrated release version the following month, along with the Article Creation landing system
. Check out the current prototype
on the English Wikipedia, as well as the latest version
, now under development on Wikimedia Labs.
Article Creation Workflow
, Benny Situ
, Ryan Kaldari
, Ian Baker
, Oliver Keyes
and Brandon Harris
have put this Article Creation feature
on hold, in order to make more progress on the New Pages Feed project (code-named Page Triage). We are not comfortable deploying this feature until the New Pages Feed is released, because it may create more work for page patrollers. Dario Taraborelli
prepared a streamlined metrics plan to test this feature, and determine whether or not to build a special drafts tool. These metrics will be collected next month, once the New Pages Feed project is further along. The current ACW prototype
is available for testing on Wikimedia Labs.
Work was mostly delayed until the June Berlin hackathon
, as engineering resources have been devoted to Platform engineering (automated testing) and Visual Editor this month.
Wikipedia Education Program
Jeroen De Dauw
completed the project. Sam Reed
is now reviewing the code.
2012 Wikimedia fundraiser
The fundraising team developed and deployed new filters to help identify and stop fraudulent transactions. In addition, the team made employment offers to two candidates that were accepted. The new staff will be integrated to the team, which will be fully staffed before Wikimania.
Internationalization and Editor Engagement Experiments
Internationalization and localization tools
The team continued integrating the first round of UI design for the Universal Language Selector
(ULS) for desktop and mobile browsers. The prototype to showcase the first version of ULS was completed and demonstrated. The team completed development and deployed enhancements to the Translate extension
with notification support, added more language support to the Narayam extension
, fixed bugs, reviewed code for i18n support in Mediawiki, and completed a first draft for language impact metrics. The team also participated in IRC office hours with the community.
Editor engagement experiments
The team started the development of the Timestamp Position Modification
experimental feature, which was deployed then disabled due to a conflict between the ClickTracking feature and the MediaWiki API. Further testing and tuning continues, as well as analysis, redesign and development of the ClickTracking extension
. We are gathering requirements for the next experiment on analyzing post-edit feedback
, and we continue to hire software engineers.
Mobile Contact US
finished his changes to the Contact us
feature, and we've deployed the first version to production. We'll be collecting feedback over the next week to figure out what other features to add.
Wiki Loves Monuments mobile application
, Lindsey Smith, Yuvaraj Pandian
, and Brion Vibber
spent the month defining specifications, prototyping, and implementing the first version of the Wiki Loves Monuments (WLM) app. Phil & Lindsey worked with various members of the WLM community (including Elke Wetzig
and Maarten Dammers
) to better understand the requirements of the contest.
The mobile team spent the month of May converting the Wikipedia app to use the API, increasing the amount of supported platforms, and porting it to the latest PhoneGap codebase. Yuvaraj Pandian
and Max Semenik
used the new mobile API to fully decouple the Wikipedia app and started beta testing. By using the new mobile API, the Wikipedia app no longer has to screen-scrape
the site, allowing us to make design changes that don't break the app experience. Brion Vibber
released the Windows 8 version
May saw the first official launch of Wikipedia Zero in Malaysia
. Patrick Reilly
, Dan Foy
and others conducted tests in Tunisia, Kenya, Uganda, Cameroon, Niger and Ivory Coast. Patrick worked with Dan to further improve the Wikipedia Zero extension.
We've finalized on a vendor and are completing the contract.
Mobile support in MediaWiki core
, Max Semenik
, and Arthur Richards
worked on the very ambitious project of moving MobileFrontend
to MediaWiki core. We now have a dedicated set of tasks for the project and have started to process them. Max added modular device detection support to core, and Arthur migrated HTMLForm.
Mobile default for sibling projects
defined the specifications for the move of Wikipedia's sister projects to the default redirect to the mobile site for mobile devices. We now have a schedule posted
and have started to reach out to our various communities to let them know of the change.
Improved Mobile Device Detection
Diederik van Liere
and Patrick Reilly
defined the specifications and built an early prototype for the Apache Device Map project
. Over the next months, we'll use it for simple data collection.
- We set up a fully virtualized compilation farm using technologies like Buildbot, Virtualbox and Qemu. This will allow for a better continuous integration and more frequent releases. We have also developed our first proof-of-concept for kiwix-mobile using cordova-qt. Kiwix was featured as "project of the week" on SourceForge the last week of May, which helped us reach the milestone of 25.000 monthly software downloads for the first time.
and Ryan Lane
upgraded Gerrit to version 2.3, which brings a variety of fixes and features; notably, a less cluttered diff interface, and the ability to have the "mediawiki/extensions" meta-repository that holds all extensions be automatically updated. Many new repositories were created for extensions and other uses, including a repository for packaging MediaWiki to easily install on Windows servers. We have also managed to fix our long-standing UTF-8 issues with help from Marcin Cieślak
, so users can all now use Unicode for their commits, comments and usernames. Image changes can now be shown via the UI, rather than being downloaded as a ZIP file to be compared locally (example
). Brion Vibber
has agreed to lead a process for evaluating Gerrit (and possible alternatives), which will conclude in early August. David Schoonover
is currently writing up a list of alternatives to Gerrit which he plans to publish on mediawiki.org and announce on wikitech-l.
and the Labs team have unblocked the deployment prep issues
; Labs is now closely tracking production MediaWiki. Most of the features (upload, play, full screen, etc.) are now in testing
, and upload seems to be faster than before as well. Aaron Schulz
and Ben Hartshorne
deployed a new version of the thumbnail handler to Commons, test, test2, and mediawiki.org, that uses our Swift FileBackend code. It should provide us with useful production testing prior to using Swift FileBackend for handling original files. Cleanup of corrupted thumbnails is now finished. Aaron deployed a SiteStats fix that should make uploads much faster and fix some timeout problems. Ben and Aaron will also roll out the FileBackend-based thumbnail handler to the rest of the wikis.
Code review management
We continue to handle the bulk of code review via cross-review among team members plus the Wikimedia engineering 20% policy
for reviewing volunteer code. Diederik van Liere
is working on getting Gerrit stats published so that we can establish a trendline on our backlog. In addition to code review in Gerrit, we continue to keep an eye on Bugzilla, RFCs
and extensions to review
Security auditing and response
QA and testing
, Sam Reed
, Antoine Musso
, Faidon Liambotis, and Ryan Lane
met in San Francisco the week of May 7 to bootstrap work on this project, kickstarting a process of aligning the configuration with our production cluster. Apache web server instances are now completely configured automatically using Puppet classes. A few key Wikimedia configuration files that were previously managed via private Subversion repository are now managed in a public Git repository. Much work remains to make this a stable testing environment, which will continue in June.
continued to work on the TestSwarm rewrite. The team is considering moving the continuous integration environment into Wikimedia Labs. The new TestSwarm version will probably be first deployed in the new environment instead of the current environment.
Fabian Kaelin and Erik Zachte
updated the datasets to include April's data, and the whole team contributed to improving the graphs' appearance. David Schoonover
implemented the high-priority requests from people who use Reportcard in their presentations: for example, the front page has been streamlined, and now loads only the "core" graphs. Finally, the team has been working behind the scenes to make the framework behind the Reportcard, named "Limn", a best-of-breed project for general use. While not ready for public consumption, we implemented a GUI for selecting and manipulating datasets, and began work to support multiple visualization types. We now have several staging environments, including both test
targets. We hope to be in a place to open-source the framework in June.
Our plan to improve logging sources
(Squid, Varnish, nginx, etc.) includes adding more fields, and also allowing us to add arbitrary fields in the future without breaking features. Changing the field formats of the logging sources requires coordination with the Operations team. The format changes have been committed
, but not yet deployed.
has been modified so that it is more flexible, and a few features have been added as well: it now can geocode and anonymize inline in the same field as the IP address, so that later log parsers don't have to try to detect a new field.
Mark Hershberger wrote a triaging guide
and the Engineering Community Team is now encouraging volunteers to use it to respond to new bugs.
Summer of Code 2012/management
The nine Google Summer of Code students have begun their twelve weeks of design and coding.
Wikimedia Foundation engineering project documentation
Volunteer coordination and outreach
continued to follow up on contacts, recruit new contributors to the Wikimedia tech community, and mentor new contributors. She granted developer access and Gerrit project ownership requests, and planned upcoming events.
Wikimedia engineering 20% policy
- The Wikidata project is funded and executed by Wikimedia Deutschland.
The team made good progress on their work on interwiki links. The demo system shows the current state of development. They published a draft showing how interwiki links should work in the future, which was amended after the recent work done on the universal language selector. They published another document explaining how data from Wikidata is going to be included in Wikipedia sites, also rewritten based on community feedback. Last, members of the Wikidata team attended a lot of events (like LinuxTag, re:publica and the 2nd ESWC Summer School) and held IRC office hours. At the end of the month, the team met with Foundation staff and community members in Berlin at the Wikidata/RENDER summit to present the work done so far, and discuss important decisions for the future of the project.
The engineering management team continues to update the Software deployments page weekly, providing up-to-date information on the upcoming deployments to Wikimedia sites, as well as the engineering roadmap, listing ongoing and future Wikimedia engineering efforts.