Wikimedia Engineering/Report/2012/January

 Engineering metrics in January:
 * 100 unique committers contributed code to MediaWiki.
 * About 2800 code commits were reviewed.
 * The total number of unreviewed commits went from 116 to 44.
 * About 42 shell requests were processed.
 * 9 developers got commit access, among which 7 volunteers.
 * Wikimedia Labs now hosts 46 projects, 80 instances and 103 users.

Major news in January include:
 * Tech support for the black-out protesting SOPA & PIPA;
 * The successful San Francisco hackathon;
 * The release of the official Wikipedia Android app;
 * A new beta cluster for Wikimedians to test upcoming software before it's deployed to production.

Hover your mouse over the green question marks to see the description of a particular project.

Recent events



 * English Wikipedia anti-SOPA blackout — The engineering team supported this online protest by developing and deploying the blackout code and design, including the CongressLookup extension for helping people find and contact their representative, pulling from Sunlight Foundation APIs and other sources. The Operations team disabled editing during the 24 hour time period as agreed upon by the community, and helped keep other systems up and running, including the temporarily overloaded Wikimedia blog.


 * San Francisco hackathon (21–22 January 2012, San Francisco, California, USA) — More than 90 participants learned and hacked during this outreach-focused developers week-end. Sumana Harihareswara and several other WMF staffers worked with the coworking venue pariSoma to organize the event. The teams of participants demonstrated more than a dozen projects. The demos, a speech by Erik Möller, and tutorials in mobile, Gadgets, and the web API were recorded and are available on Commons, along with photos of the event.


 * October 2011 Coding Challenge — The winners of the coding challenge were announced. They include an Android app for uploading to Wikimedia Commons, a user script for surfacing pages with a lot of recent editing activity, and a user script for displaying relevant images in an article as a lightbox slideshow.


 * Linux.conf.au — Trevor Parscal and Roan Kattouw visited Ballarat, Australia and presented Low-hanging Fruit vs. Micro-optimization, Creative Techniques for Loading Web Pages Faster, a talk about ResourceLoader. A recording is available.

Upcoming events

 * Pune hackathon (10–12 February 2012, Pune, India) — Preparation began and registration continued for an outreach-focused developers week-end to take place in Pune, India, and led by Alolita Sharma. Approximately 70 participants are expected to learn and contribute at this event, focusing on the gadgets framework, mobile Wikimedia access, and language support (i18n/L10n).


 * GLAMcamp DC (10–12 February 2012, Washington, D.C., USA) — The Foundation's Ryan Kaldari and Asaf Bartov plan to attend the technical track of this GLAM conference. Engineers will work on mass upload and analytics functionality, which cultural institutions find useful in partnering with Wikimedia.

Job openings
Are you looking to work for Wikimedia? We have a lot of hiring coming up, and we really love talking to active community members about these roles.


 * Developers and engineers:
 * Senior Software Engineer Front-end
 * Interaction Designer
 * Systems Engineer (Data Analytics)
 * Software Developer (Back-end, Data Analytics)
 * Software Developer (Rich Text Editing, Features)
 * Software Developer (Front-end)
 * QA Lead
 * Software Developer (Mobile)
 * Software Security Engineer
 * Operations Engineer (Labs)


 * Management & Product:
 * Director of Features Engineering


 * Requests for proposals:
 * Mobile UX — Help us redesign our mobile platform and apps as more and more visitors access Wikipedia and its sister sites via mobile devices.
 * Mobile QA — Help us set up testing and automation processes for all Wikimedia Mobile projects.

Short news

 * Andrew Otto joined the Platform engineering team as Software Developer for Analytics ([//lists.wikimedia.org/pipermail/wikitech-l/2012-January/057430.html announcement]).
 * Fabrice Florin joined the Product Development team as Product Manager for New Editor Engagement (announcement).
 * Andre Engels joined the Mobile team as data analyst contractor (announcement).
 * Howie Fung, previously Senior Product Manager, was promoted to Director of Product Development, a newly formed group within the Engineering department (announcement).
 * Chris McMahon joined the Platform engineering team as Quality Assurance Lead Engineer (announcement).
 * Neil Kandalgaonkar left the Wikimedia Foundation in January (announcement).

Site infrastructure

 * Data Centers — Work continued on building up the EQIAD datacenter in Virginia. We added a new bastion host, a ganglia server, new dataset servers and upgraded the database servers, with a new chained replication topology and heartbeat-based replication monitoring. We upgraded mailman and migrated it to a new server in EQIAD from ESAMS. We have also successfully tested the new thumbnail system and the text squid implementation that we'll start rolling out fully in February. As for our Tampa datacenter, we have been upgrading the database servers as well as adding new ones for redundancy and capacity needs. We have at least 4 databases per cluster now (except s5, where the fourth one is being added). At the same time, we have retired 40 old servers, freeing up 2 racks of space and the much-needed IP addresses. The retired servers will be available for donation soon.


 * Media Storage — Performance testing of Swift continued in January. We confirmed that performance degrades above roughly 10 million objects in a bucket, so we adjusted both the Swift middleware and MediaWiki support to allow sharded containers. Our current plan is to shard the Commons and English Wikipedia containers. The test hardware for object storage arrived, and we validated that it works as expected. We have thus ordered what will be the production hardware, and expect it to arrive and go into service towards the end of February. We have started populating the current cluster in Tampa with thumbnails in preparation for putting it into production service.


 * HTTPS — HTTPS work is still on hold in favor of other projects. We did have some activity thanks to volunteer Abe Music who fixed a portion of our UDP logging module for Nginx. There is one more fix still needed before HTTPS page views are properly tracked in our statistics. The following outstanding issues have been fixed: nagios (replaced self-signed cert), upload (served wrong cert via IPv6), integration (wrong cert / certificate chain). There are remaining issues related to: jobs, status and mail. The wiki table of HTTPS-less domains has been updated, and details can be found in the page's edit history.

Testing environment

 * Wikimedia Labs — To keep up with project growth, the virtualization controller  has been converted into a compute node. Doing so let   and   join the instance storage, doubling the filesystem storage available. The additional compute node also allows Labs to grow by up to another 30 instances. An old ruby gateway server  was converted into the virtualization controller . A number of projects were added or moved to Labs, including incubator, ganglia, deployment-prep, globaleducation, a number of mobile projects, and a bunch more. Labs was very useful during the SF hackathon. A number of projects were created, implemented, and demoed using Labs during the hackathon. We also had a project created, implemented, tested, and deployed to production during the hackathon. We are still waiting for the gluster storage to arrive for volume storage; it should arrive early February. There are now 46 projects, 80 instances and 103 users.

Backups and data archives

 * Data Dumps — A problem with the rsync to our mirror site was located and fixed. Another organization agreed to mirror the dumps as well, and we are waiting for their server to come online. Back issues of dumps from 2002 through 2006 were made available, for folks interested in historical data. New hardware has arrived in our Virginia datacenter, and we'll be copying all dumps over there as soon as it's ready. We're thinking about how to provide image dumps in some fashion, even if we don't keep local copies of the dumps or they are not run on a regular basis. We also cleaned up the dumps documentation and drafted this year's development plans. Finally, we have a contractor, Christian Aistleitner, who will be working on a test suite for dump generation.

Other news

 * Some users complained of slowness and pages not rendered on Wikipedia in Occitan. Domas Mituzas, one of our volunteers, helped fix the issue temporarily. After further investigation, he found that the root cause was badly constructed templates, which were hoarding up server memory RAM. Fixes are planned to prevent this kind of problem from surfacing again.
 * Some US-based users of Wikipedia experienced slow page rendering time for 10–15 minutes on January 20, 2012; this was due to the bits.wikimedia.org servers in EQIAD being overloaded. The issue was investigated and quickly resolved.

Editing tools

 * Visual editor — January was a bit slower on the visual editor front, as parts of the team took some well-deserved vacation after the successful prototype launch in December. During the SF Hackathon, a lot of issues were fleshed out. Plans for the second phase of the editor project were formulated. Inez Korczynski investigated a possible use of  to help with input methods and text selections on mobile devices. Gabriel Wicke extended the parser with the ability to fetch and expand templates in a parallel and asynchronous fashion. The parser now supports most parts of the English Wikipedia Main Page.
 * Internationalization and localization tools — January 2012 was slow on the code production side because of the code slush preparing for the branching of MediaWiki 1.19. The Localization team invested a lot of time in writing user documentation for translation tools, input methods and web fonts. Thanks go in particular to the volunteers that assisted in writing and proofreading the documentation. Much of the user documentation can be translated using the Translate extension. Amir worked to solve the old bug in the EasyTimeline extension, that prevented it from working with Indic and right-to-left scripts. Another focus was to improve test coverage for WebFonts and Narayam. In February we expect to deliver further improvements for the translation process, notably on Meta, with workflow improvements, the introduction of a translation memory and (subject to delay) notifying translators of newly available translations from inside MediaWiki by e-mail.

Participation and editor engagement

 * Article feedback — Fabrice Florin led development on the next round of Article Feedback Tool v5 features, including a new feedback page, to be released in early February by OmniTI, our development partner. Aaron Halfaker, Oliver Keyes and Dario Taraborelli continued to collect valuable data from the community about the usefulness of comments coming in from each of the three forms launched in December. A survey to get comments from readers about the effectiveness and attractiveness of each design was also launched, and the team has been compiling the various sets of data to produce a report on the pros and cons of each forms in early February. The target date for an expanded feedback page is Feb. 15 for pre-deployment testing on en-labs, then wider deployment on Feb. 22.
 * Feedback Dashboard — We implemented a leaderboard of recent top responders on the feedback dashboard. New editor feedback is now added to a dedicated log. When feedback is marked as helpful, that fact is displayed on the feedback dashboard itself. Other than a few other smaller changes, we're now moving the project into maintenance mode to focus on article creation workflow and New Page Triage.
 * Article Creation Workflow — Benny Situ, Ryan Kaldari, Brandon Harris, Alolita Sharma, Oliver Keyes, Howie Fung, and Ian Baker met to discuss sprint planning. They mapped out various user flows leading to article creation, agreed on a proposed landing system and defined changes that are going to be required.

MediaWiki infrastructure

 * ResourceLoader — Roan Kattouw and Timo Tijhof fixed bugs that would have affected the upcoming 1.19 deployment, and implemented an experimental asynchronous loading feature that will make JavaScript load faster.

Feature support

 * Wikipedia Education Program — Jeroen De Dauw implemented many features, like institution & course management, and instructor & student workflows. He also implemented logging and revision histories.

Mobile

 * Wikipedia Android App — The Mobile team released the first version of the Wikipedia Android application into the Android Market. In just over three weeks, we've had over 900,000 downloads, became the #1 search result for "Wikipedia", became the #1 trending app, and received a consistent 4/5 stars in the Android Market. We released two minor updates to fix bugs, and are processing user feedback to guide our next version.


 * WikipediaZero — We launched our first demo version of WikipediaZero for carrier testing. While there is still much work to be done in order to integrate with as many carriers as we'd like to see, we're already starting to make progress on how to simplify our implementations.


 * Wikipedia over SMS/USSD — Patrick Reilly, along with the PraeKelt Foundation, worked on a demo instance of a SMS/USSD gateway to access Wikipedia. We're hoping to have a complete demo in time for Mobile World Congress next month.


 * GPS Storage/Retrieval — An early prototype of the GPS storage retrieval API went live this month. We still have a large to-do list in order to roll it out in production, but it's showing great early stage progress.


 * FeaturedFeeds — Max Semenik, with the help of Arthur Richards, deployed the first version of FeaturedFeeds to production. Wikimedia communities can now make use of these RSS feeds to better surface their content to other applications. A list of existing feeds is available on the Toolserver.

Fundraising support

 * 2011 Wikimedia fundraiser — After meeting the budget goals, the 2011 Fundraiser was wrapped up in early January, after which the team participated in a two-day retrospective. Work began on the Fundraiser 2011 cleanup, including the addition of recurring donation and auditing support for GlobalCollect. A two-day inception was held for preliminary planning of the 2012 Fundraiser.

Offline

 * Kiwix UX initiative — We have finished the second round of the UX initiative. Focus is now on porting Kiwix to Android/ARM, and we expect to release a first beta version for the end of February. Another goal is to reduce the show-stopper bug list for final release of Kiwix Desktop v0.9.

MediaWiki Core

 * MediaWiki 1.19/Roadmap — A new Beta cluster, replicating the production environment, was set up to allow Wikimedians to test upcoming software (including MediaWiki 1.19) on Wikimedia Labs before deployment. A preliminary schedule was drafted, according to which deployment of MediaWiki 1.19 will start on February 13th and complete on March 1st.
 * Continuous integration — The team has rearranged Jenkins jobs to make them easier to manage in the long run, and to add capacity. TestSwarm is pending testing of the new Special:JavaScriptTest page.
 * Git conversion — The MediaWiki  repository has been successfully converted with branches. Another upcoming test repository will include release tags, and extension projects will be set up shortly.
 * Multimedia — The beginnings of a TimedMediaHandler test setup were put into place in Wikimedia Labs, including video transcoding infrastructure, at http://commons.wikimedia.beta.wmflabs.org. Work on this test setup will continue in February, with the goal to begin executing the test plan in preparation for deployment. Aaron Schulz added container sharding to FileBackend, and continued to make performance improvements and fixes. The Swift back-end now passes all unit tests; the code is being reviewed and cleaned up. Some of Aaron's code for purging thumbnails will be backported to MediaWiki 1.18. Ben Hartshorne has prepared interim hardware for production deployment, which will begin the week of February 6.
 * Lua scripting — A team of Wikimedia engineers agreed on Lua as the language to implement as a production-ready replacement for MediaWiki markup-based templates. Tim Starling will lead this effort after the 1.19 deployment and Git migration.

Wikimedia analytics

 * Analytics/Reportcard — The new key/value storage approach has been approved by Rob Lanphier. Andrew Otto and Diederik van Liere have started working on a data pipeline framework. All reportcard related code can be found in git.

Technical Liaison; Developer Relations

 * Bug management — In January, Mark Hershberger worked with developers to prepare for the (planned) 1.19 deployment in February. He worked with volunteers to launch the beta cluster and held a triage to review 1.19 deployment blockers. The beta cluster has already begun to show some promise with the bugs it has helped reveal.
 * Summer of Code 2011/management — Merges of student code from GSoC 2011 were substantially on hold in January as developer attention focused on the deployment of MediaWiki 1.19.
 * Wikimedia Foundation engineering project documentation — Besides the usual ongoing maintenance of project pages, and putting together this report, Guillaume Paumier also wrote a how-to guide about how to create, use and update project and status pages.
 * Volunteer coordination and outreach — In preparation for the San Francisco hackathon, Guillaume Paumier rewrote How to become a MediaWiki hacker along the lines suggested by Yuvaraj Pandian, and cleaned up the documentation about gadgets. Sumana Harihareswara focused on improving the API documentation, and wrote and edited tutorial references for building the Wikipedia Android application, MediaWiki's web API, and Gadgets. Nine developers got commit access, including seven volunteers. Sumana continued to follow up on contacts and recruit new contributors to the Wikimedia tech community (especially for commit and patch review), and mentor new contributors. Sumana also prepared for the February Pune hackathon and the May hackathon organized by Wikimedia Germany, introduced a friendly space policy for WMF technical events, and recruited participants for upcoming events.
 * MediaWiki architecture document — This project is now completed; Guillaume Paumier did some final polishing of the text in collaboration with the book editors. We expect the Architecture of Open Source Applications book to come out in March 2012.
 * Wikimedia blog maintenance — The new WMBlog plugin (which brings functionality specific to the Wikimedia blog independently of the theme) was deployed in January, as well as tweaks to the [//github.com/gpaumier/WP-Victor/ theme]. Due to the SOPA/PIPA blackout-related traffic, the Operations team moved the blog to a newer, more powerful server and added caching layers (Varnish & Memcached), and Guillaume Paumier fixed the blog's theme to paginate comments. Guillaume also fixed issues with attachment pages, and created a custom "meta" widget that adds a link to the Wikimedia blog guidelines to the WordPress standard meta widget.

Future
The engineering management team continues to update the Software deployments page weekly, providing up-to-date information on the upcoming deployments to Wikimedia sites, as well as the engineering roadmap, listing ongoing and future Wikimedia engineering efforts.