Wikimedia Engineering/Report/2013/September

Engineering metrics in September: Major news in September include:
 * 135 unique committers contributed patchsets of code to MediaWiki.
 * The total number of unresolved commits went from around 1080 to about 1020.
 * About 29 shell requests were processed.
 * Wikimedia Labs now hosts 173 projects and 1,848 users; to date 2305 instances have been created.
 * The tools project in Labs now hosts 325 tools and 266 members.
 * A recap on how our engineers worked with volunteers to improve language tools at Wikimania;
 * A call for wikis willing to experiment with using HTTPS for all users;
 * A recap on how our new image scaling system was implemented by a volunteer developer;
 * A call for technical projects that could for instance be completed as part of our mentorship programs;
 * Design experiments to show the community behind Wikipedia articles on mobile devices;
 * Another release of the MediaWiki Language Extension Bundle, with an explanation of how it's put together;
 * The completion of the sixth round of the Outreach Program for Women;
 * A recap of the launch of Notifications to more language versions of Wikipedia, and their impact.

Note: We're also providing a shorter, simpler and translatable version of this report that does not assume specialized technical knowledge.

Upcoming events
There are many opportunities for you to get involved and contribute to MediaWiki and technical activities to improve Wikimedia sites, both for coders and contributors with other talents.

For a more complete and up-to-date list, check out the Project:Calendar.

Work with us
Are you looking to work for Wikimedia? We have a lot of hiring coming up, and we really love talking to active community members about these roles.



Announcements

 * Kartik Mistry joined the Language Engineering team as Software Engineer (announcement).
 * Sucheta Ghoshal joined the Language Engineering team as associate software engineer (announcement).
 * Kaity Hammerstein joined the User experience team as Associate UX Designer (announcement).
 * Aaron Halfaker joined the Analytics team as Research Analyst (announcement).
 * Oliver Keyes transitioned to the role of Product Analyst (announcement).
 * Dan Garry joined the Product development team as Associate Product Manager for Platform. (announcement).
 * Nick Wilson joined the Product development team as Community Liaison (announcement).

Technical Operations
Site infrastructure
 * Work to refactor and modularize our Puppet repository continues: this month, lots of dead code was removed, and some tiny miscellaneous classes aggregated with more relevant components. Work on git-deploy has also restarted this month. Changes were made to make git-deploy easier to configure and to make initial setup of new repositories and setup of new minion targets completely automated.
 * Many of the services within the Tampa data center have already migrated to EQIAD, however there remain several smaller, unique, or in some cases orphaned services that we still need to document or scope prior to the closure of this center. Several of these services may no longer be required, and we expect there to be some discussion about how they are migrated or maintained going forward. Additionally, several of these systems need to be moved to the new secondary data center (see below), and will be waiting until infrastructure is in place to do so. Our goal is to try to have these systems moved before the end of 2013, but we'll continue to have equipment in this location for as long as necessary to ensure stability of our network.
 * For EQIAD, the ordering process is underway to complete our fourth row of machines, and ensure we have capacity to take in systems that will be arriving from the sunsetting of the Tampa data center, as well as handle our expected growth.
 * For ULSFO, after a long initial setup, initial bootstrapping and configuration of the systems is finally underway. Over the next several weeks, we will be configuring, testing, and redirecting traffic at this location.
 * Lastly, work has begun on a definition and RFP process for a new, secondary data center, likely on the west coast of the US. We will send further updates on this project once our RFPs are complete and we begin the selection process. Our hope is to have this facility ready to take systems from Tampa by the end of 2013.

Data Dumps
 * The GSoC incremental dumps project has drawn to a close, but User:Svick will still be around. There's work to be done before this can go into production, as well as extensive testing and code review from folks with C++ expertise. If you want to help, check the repository.

Wikimedia Labs
 * The DNS infrastructure of the Labs has been overhauled and much improved. The hardware switch to replace Labs' NFS server unreliable hardware is ready, and should be enabled this week. Yuvaraj Pandian has created and deployed a new instance proxy with an OpenStack-style API. The new proxy is in use for a small number of instances right now, but will be expanded to most instances in the future. The new proxy uses nginx with Lua code to read its configuration of virtual hosts from redis and can handle arbitrary URLs to arbitrary back-ends.

Editor retention: Editing tools
VisualEditor  edit In September, the VisualEditor team continued their work to improve the editor and roll it out to additional wikis. The deployed version of the code was updated four times (1.22-wmf16, 1.22-wmf17, 1.22-wmf18 and 1.22-wmf19). The focus in the team's work this month was to continue to improve the stability and performance of the system, fix a number of bugs uncovered by the community, and make some usability improvements. Parsoid  edit We fixed a few bugs reported in production, added performance stats to our RT-testing framework (and discovered a couple bugs and fixed them as a result) and did some long-standing cleanup work in our codebase. September also saw the all-staff meeting at the WMF offices in San Francisco which gave us the opportunity to work in person and discuss some proposals. We planned out an implementation strategy for language variant support, and started researching and experimenting with HTML storage options which is required for a number of projects in our roadmap.

Core Features
Echo (Notifications)  edit In September, we released Notifications on more Wikipedias, such as the Dutch, Hebrew, Japanese, Korean, Spanish, Ukrainian and Vietnamese. Fabrice Florin and Keegan Peterzell managed community relations for these new releases, and are reaching out to more projects. Our next deployments will take place every other Tuesday. Developer Benny Situ was responsible for these deployments and fixed a number of bugs, with the help of Erik Benhardson and Matthias Mullie. Community response has been very positive so far, across languages and regions. For each release, we reached out to community members weeks in advance, inviting them to translate and discuss the tool with their peers. As a result, we have now formed productive relationships with volunteer groups in each project, and are very grateful for their generous support. To learn more, visit our project hub, read the help page and join the discussion on the talk page. Flow Portal/Project information  edit This month, we continued back-end work on the Flow first release – integrating with the recent changes table (to ensure that users will be able to monitor Flow boards via the watchlist and Special:Recentchanges, in the same way they monitor wiki pages), mentions and notifications, and an early experiment with VisualEditor-enabled posting. We also kicked off a sprint to create a new visual design treatment for the board and discussions that will work across desktop and mobile platforms. We are aiming to implement this design next month, in preparation for several rounds of new user and experienced user feedback before the first onwiki release.

Growth
Growth  edit In September, the Growth team (formerly known as Editor Engagement Experiments, or E3), primarily worked on the onboarding new Wikipedians project. In particular, this included the creation and deployment of two new guided tours to teach any new user how to make their first edit, using wikitext or VisualEditor. The guided tours extension was also deployed to the following language editions of Wikipedia: Catalan, Hebrew, Hungarian, Malay, Spanish, Swedish, and Ukrainian.

Along with the renaming, the team held its third Quarterly Review (minutes are available), published its 2013–2014 product goals, and shared a new job opening for two additional software engineers.

In accordance with our 2013-14 goals, the Growth team began research into modeling newcomer retention on Wikipedia, anonymous editor acquisition, and article creation improvement.

Support
2013 Wikimedia fundraiser  edit This month, the team mostly focused on preparing for the upcoming English fundraiser. Planning began for periodic tests throughout October, which will help determine the launch date and other aspects of our fundraising efforts in November and December.

Mobile
Wikipedia Zero  edit This month, the team released enhanced URL rewriting and debug flag-only Edge Side Includes (ESI) banner inclusion to production, supported the Ops implementation of dynamic MCC/MNC carrier tagging, identified web access log and user agent anomalies, further analyzed and recommended load balancer IP address-related changes in support of HTTPS requirements, and tested JavaScript-based Wikipedia Zero user interface enhancements. Mobile web projects  edit In September, we mostly focused on Tutorial A/B testing, Notifications overlay in Beta, and adding campaign tracking to MobileFrontend.

MediaWiki Core
MediaWiki 1.22/Roadmap <span class="plainlinks noprint mw-statushelper-editlink" style="margin: 0 0 0 1em; font-size:80%; background:#e4e4e4;" data-statuspage="MediaWiki 1.22/Roadmap/status" data-entrydate="2013-09-monthly"> edit <div style="margin: 0 0 0 2em;" class="mw-statushelper-entry" id="MediaWiki 1.22/Roadmap" data-statuspage="MediaWiki 1.22/Roadmap/status" data-entrydate="2013-09-monthly">In September, MediaWiki 1.22wmf16 through 1.22wmf19 were deployed to the production Wikimedia Foundation cluster. Multimedia <span class="plainlinks noprint mw-statushelper-editlink" style="margin: 0 0 0 1em; font-size:80%; background:#e4e4e4;" data-statuspage="Multimedia/status" data-entrydate="2013-09-monthly"> edit <div style="margin: 0 0 0 2em;" class="mw-statushelper-entry" id="Multimedia" data-statuspage="Multimedia/status" data-entrydate="2013-09-monthly">In September, we continued to expand our multimedia team and updated our multimedia plan for the coming year (see slides). Mark Holmquist continued development on the Media Viewer to improve the image viewing experience, based on designs by May Tee-Galloway and Jared Zimmerman. We also made good progress on the Beta Features project, which will invite users to test, give feedback, and use a range of new features in real-world settings. We aim to have first beta versions of both products ready by the end of the October. New employee Bryan Davis started work on several multimedia platform bugs, and Summer contractor Jan Gerber completed his work on the TimedMediaHandler extension. To discuss these features and keep up with our work, we invite you to join the new multimedia mailing list. Last but not least, we are also recruiting for a senior software engineer position on our team. Admin tools development <span class="plainlinks noprint mw-statushelper-editlink" style="margin: 0 0 0 1em; font-size:80%; background:#e4e4e4;" data-statuspage="Admin tools development/status" data-entrydate="2013-09-monthly"> edit <div style="margin: 0 0 0 2em;" class="mw-statushelper-entry" id="Admin tools development" data-statuspage="Admin tools development/status" data-entrydate="2013-09-monthly">Although this activity is still officially on hold, several bug fixes were committed this month by community members. The GSoC project to implement a simple Bayesian Filter extension was successfully completed. Search <span class="plainlinks noprint mw-statushelper-editlink" style="margin: 0 0 0 1em; font-size:80%; background:#e4e4e4;" data-statuspage="Search/status" data-entrydate="2013-09-monthly"> edit <div style="margin: 0 0 0 2em;" class="mw-statushelper-entry" id="Search" data-statuspage="Search/status" data-entrydate="2013-09-monthly">In September, we expanded the new CirrusSearch back-end to a number of wikis. Italian Wiktionary, Catalan Wikipedia and English Wikisource are all running CirrusSearch now. Additionally, we deployed to all "closed" wikis. Further feature refinement and bugfixing are ongoing, with roughly 2 to 3 deployments a week. Auth systems <span class="plainlinks noprint mw-statushelper-editlink" style="margin: 0 0 0 1em; font-size:80%; background:#e4e4e4;" data-statuspage="Auth systems/status" data-entrydate="2013-09-monthly"> edit <div style="margin: 0 0 0 2em;" class="mw-statushelper-entry" id="Auth systems" data-statuspage="Auth systems/status" data-entrydate="2013-09-monthly">The team improved the user interface of OAuth and deployed these changes to mediawiki.org and test.wikipedia.org. We hope to test and refine the extension with third party developers, and subsequently deploy to all wikis. An initial review of Extension:OpenID was performed, and several issues were brought to the attention of the extension maintainer. Several bugs with CentralAuth/SUL were also fixed. Security auditing and response <span class="plainlinks noprint mw-statushelper-editlink" style="margin: 0 0 0 1em; font-size:80%; background:#e4e4e4;" data-statuspage="Security auditing and response/status" data-entrydate="2013-09-monthly"> edit <div style="margin: 0 0 0 2em;" class="mw-statushelper-entry" id="Security auditing and response" data-statuspage="Security auditing and response/status" data-entrydate="2013-09-monthly">The team responded to reported issues, and released MediaWiki 1.21.2, 1.20.7 and 1.19.8 security releases to fix several issues in core and extensions.

Quality assurance
Quality Assurance <span class="plainlinks noprint mw-statushelper-editlink" style="margin: 0 0 0 1em; font-size:80%; background:#e4e4e4;" data-statuspage="Quality Assurance/status" data-entrydate="2013-09-monthly"> edit <div style="margin: 0 0 0 2em;" class="mw-statushelper-entry" id="Quality Assurance" data-statuspage="Quality Assurance/status" data-entrydate="2013-09-monthly">This month, we wrapped up Rachel Thomas' Outreach Program for Women internship successfully. Rachel helped us extend our browser test coverage of VisualEditor. Besides our ongoing collaboration with Wikimedia Foundation development projects, we are also engaging the greater community on the QA mailing list, where we discuss both code contributions and general QA topics. Quality Assurance/Browser testing <span class="plainlinks noprint mw-statushelper-editlink" style="margin: 0 0 0 1em; font-size:80%; background:#e4e4e4;" data-statuspage="Quality Assurance/Browser testing/status" data-entrydate="2013-09-monthly"> edit <div style="margin: 0 0 0 2em;" class="mw-statushelper-entry" id="Quality Assurance/Browser testing" data-statuspage="Quality Assurance/Browser testing/status" data-entrydate="2013-09-monthly">This month saw significant improvements to both coverage and speed in our tests for VisualEditor. We are collaborating with the Language team on browser tests for the UniversalLanguageSelector extension and Translatewiki.net. We created our first tests for the new Flow feature and are in the process of supporting Flow fully in a reference test environment. We presented yet another of our ongoing series of training sessions, this one live in San Francisco.

Engineering community team
Bug management <span class="plainlinks noprint mw-statushelper-editlink" style="margin: 0 0 0 1em; font-size:80%; background:#e4e4e4;" data-statuspage="Bug management/status" data-entrydate="2013-09-monthly"> edit <div style="margin: 0 0 0 2em;" class="mw-statushelper-entry" id="Bug management" data-statuspage="Bug management/status" data-entrydate="2013-09-monthly">This month, beside our usual ongoing work and numerous small fixes to Bugzilla's code and changes to taxonomy, Legoktm provided a patch to support Sourceforge URLs in Bugzilla's "See Also" field, as part of moving pywikipedia bug reports from SourceForge to Bugzilla. Andre Klapper added an option to display metadata changes to a bug report (which are available via the "History" link in a Bugzilla ticket) inline, between the comments. It is now possible to distinguish the products 'MediaWiki' and 'MediaWiki extensions' in Bugzilla's search results. Furthermore, work on creating a guided bug entry form for newcomers continued. Mentorship programs <span class="plainlinks noprint mw-statushelper-editlink" style="margin: 0 0 0 1em; font-size:80%; background:#e4e4e4;" data-statuspage="Mentorship programs/status" data-entrydate="2013-09-monthly"> edit <div style="margin: 0 0 0 2em;" class="mw-statushelper-entry" id="Mentorship programs" data-statuspage="Mentorship programs/status" data-entrydate="2013-09-monthly">18 out of the 20 Google Summer of Code projects have passed the program evaluation, as well as the one Outreach Program for Women project (read our announcement and blog post). These numbers are unprecedented and we have to ensure that they are not just occasional results but a trend. Wrap-up reports from the projects: Technical communications <span class="plainlinks noprint mw-statushelper-editlink" style="margin: 0 0 0 1em; font-size:80%; background:#e4e4e4;" data-statuspage="Technical communications/status" data-entrydate="2013-09-monthly"> edit <div style="margin: 0 0 0 2em;" class="mw-statushelper-entry" id="Technical communications" data-statuspage="Technical communications/status" data-entrydate="2013-09-monthly">Guillaume Paumier wrapped up work on supporting the deployment of VisualEditor, and resumed regular activities like preparing the Tech newsletter and ongoing communications support for the engineering staff. Volunteer coordination and outreach <span class="plainlinks noprint mw-statushelper-editlink" style="margin: 0 0 0 1em; font-size:80%; background:#e4e4e4;" data-statuspage="Volunteer coordination and outreach/status" data-entrydate="2013-09-monthly"> edit <div style="margin: 0 0 0 2em;" class="mw-statushelper-entry" id="Volunteer coordination and outreach" data-statuspage="Volunteer coordination and outreach/status" data-entrydate="2013-09-monthly">Together with XWiki and Tiki, we submitted a Wiki devroom proposal for FOSDEM, the biggest open source source conference in Europe. We are also preparing a proposal for a stand, lead by volunteers at the nascent Wikimedia Belgium chapter. The overall goal is to achieve a good MediaWiki & Wikimedia tech gathering in Brussels next February.
 * Browser test automation for Visual Editor
 * Internationalization and Right-To-Left Support in VisualEditor
 * Improve support for book structures
 * Section handling in Semantic Forms
 * Prototyping inline comments
 * jQuery.IME extensions for Firefox and Chrome
 * Pronunciation Recording Tool
 * Mobilize Wikidata
 * VisualEditor plugin for source code (SyntaxHighlight GeSHi support)
 * Android app for MediaWiki translation
 * Language Coverage Matrix Dashboard
 * Incremental dumps

We are also supporting the organization of the MediaWiki Architecture Summit in San Francisco on 23-24 January, 2014.

Analytics

 * The team has been focused on smaller but more important work items this month, including enhancement to Wikimetrics, Grantmaking and Program Developments graphing infrastructure and fixing some long-standing Limn bugs. On the infrastructure side, our collaboration with Ops has the Kafka middleware project moving along nicely. The all-staff meeting and travel schedules definitely impacted our throughput this month.
 * Two notable accomplishments should be called out: our Hadoop environment is now 100% free software, as we swapped out a proprietary JDK for OpenJDK 7. We also spent a lot of time on our engagement processes and planning for our first combined quarterly review in October, and made significant process on our hiring goals.

Research and data This month, Aaron Halfaker joined the research team as a full-time employee. We started to reorganize the team structure and engagement model in coordination with the Analytics developers. We performed a survival analysis of new editors in preparation for new experiments led by the Growth team, and worked with the team to iron out the data collection and experimental design for the fortcoming iteration of GettingStarted.

We worked with product owners to determine the initial research strategy for features with key releases scheduled for the next two quarters (Mobile Web, Beta Features, Multimedia, Flow, Universal Language Selector, Content translation). We started a cohort analysis of conversion rates for mobile vs desktop account registrations; the results will be published on Meta shortly.

We drafted a proposal to host tabular datasets in a dedicated namespace and solicited feedback from interested parties (particularly the Wikidata community). We also started fleshing out the Labs2 proposal, an outreach program for academic researchers and community members, launched at Wikimania 2013 in Hong Kong. We co-hosted the second IRC research office hours and prepared for the first Wikimedia research hackathon, an offline/online event to be held in various locations worldwide on November 9, 2013.

Last, we contributed to the September 2013 issue of the Wikimedia research newsletter.

Kiwix
The Kiwix project is funded and executed by Wikimedia CH.


 * Mediawiki offliner is now pretty stable, and its first release will happen in October. The ZIM incremental update GSoC project was successfully completed; we still need to do a little bit work to finish the integration in the openZIM and Kiwix code bases. libzim, the openZIM reference implementation, has been packaged for Debian.

Wikidata
The Wikidata project is funded and executed by Wikimedia Deutschland.


 * In September, the Wikidata team mainly concentrated on the sister projects Wikimedia Commons and Wiktionary. For Wikimedia Commons, we added the ability to store interwiki links in one central location (Wikidata) together with the ones for Wikipedia and Wikivoyage. For Wiktionary, we published an analysis of all existing proposals for the integration of Wikidata and Wiktionary.
 * On Wikidata itself, we rolled out the URL datatype. This for example allows you to provide a URL as a source of a statement. Denny Vrandečić published 2 blog posts about the ideas behind Wikidata: "Wikidata Quality and Quantity" and "A categorical imperative?". In addition, he shared a few thoughts on the future of Wikidata before leaving the project at the end of the month.
 * The development team is looking to hire another front-end developer experienced in JavaScript.

Future

 * The engineering management team continues to update the Deployments page weekly, providing up-to-date information on the upcoming deployments to Wikimedia sites, as well as the engineering roadmap, listing ongoing and future Wikimedia engineering efforts.