Wikimedia Engineering/Report/2013/August

Engineering metrics in August:
 * 122 unique committers contributed patchsets of code to MediaWiki.
 * The total number of unresolved commits went from around 1280 to about 1080.
 * About 36 shell requests were processed.
 * Wikimedia Labs now hosts 171 projects and 1,743 users; to date 2,212 instances have been created.
 * The tools project in Labs now hosts 256 tools and 300 members.

Major news in August include:
 * A discussion about using the secure HTTP protocol on Wikimedia sites, followed by a switch to that protocol for all registered users;
 * The launch of the Notifications feature on the mobile site;
 * A discussion about how security issues are handled in our community;
 * The Wikimania conference, which was notably an opportunity for the Language engineering team to meet with users and improve language support, particularly for the Javanese language;
 * A much-anticipated upgrade of the software used by our volunteer e-mail response team, OTRS.

''Note: We're also providing a shorter, simpler and translatable version of this report that does not assume specialized technical knowledge.

Upcoming events
There are many opportunities for you to get involved and contribute to MediaWiki and technical activities to improve Wikimedia sites, both for coders and contributors with other talents.

For a more complete and up-to-date list, check out the Project:Calendar.

Work with us
Are you looking to work for Wikimedia? We have a lot of hiring coming up, and we really love talking to active community members about these roles.



Technical Operations
Data Dumps
 * All dumps ran from the data center in Ashburn this month; only the miscellaneous and experimental services remain to be moved. GSOC student Petr Onderka completed the first incremental dump-producing code, along with a draft specification for the new format. Test it out and let us know what you think!

Wikimedia Labs
 * Due to Wikimania and staff vacations, this month had a relatively low number of infrastructure changes, but we had a relatively high influx of users and tools. We ran three workshops during Wikimania and helped Toolserver users migrate their tools to Labs. We did have a few infrastructure changes, though: A change for the service group interface was merged but not yet deployed. It removes the service group interface from the project interface, reducing clutter. An API was pushed in for project and service group information, to make the information available from Wikitech, rather than LDAP. Other infrastructure changes were bugfixes, which can be found through bugzilla.

OTRS
 * OTRS got a long overdue update to version 3.2.9 with the generous volunteer support of Martin and Marcel of Znuny GmbH. As part of the upgrade, the service was migrated from pmtpa to eqiad, and spam filtering was overhauled.

Editor retention: Editing tools
VisualEditor  edit In August, the VisualEditor team continued work, and presented and ran workshops at Wikimania in Hong Kong to discuss how to best improve the system. The deployed version of the code was updated three times (1.22-wmf13, 1.22-wmf14 and 1.22-wmf15), with several mid-deployment releases as the code was developed to patch urgent issues. The focus in this work was in improving the stability and performance of the system, fixing a number of bugs uncovered by the community, and making some usability improvements. Parsoid  edit In August, the Parsoid team continued to polish compatibility with existing wikitext. User feedback after the July VisualEditor release was instrumental in the identification of issues and the development of support for important use cases of creative templating.

The increased team size also allowed us to perform some long-standing code cleanup, make Parsoid compatible with Node 0.10, and improve testing. The round-trip testing infrastructure received a much-needed overhaul. The storage back-end switched from SQLite to MySQL, which improved throughput a lot and is allowing us to test new code far more quickly than before. Performance statistics are now recorded, which will let us identify performance bottlenecks as well as catch performance regressions.

During Wikimania, the Kiwix team used Parsoid output to create an offline copy of Wikivoyage. With standard HTML libraries and the rich RDFa information in the Parsoid DOM, downloading and modifying the HTML representation was done in about 1000 lines of JavaScript.

Editor engagement features
Echo (Notifications)  edit In August, we released Notifications on the French, Hungarian, Polish, Portuguese and Swedish Wikipedias, after extensive testing on the English Wikipedia, as well as mediawiki.org and Meta-Wiki. This engagement tool was well received by our new communities, especially social features such as Mentions and Thanks, which enable users to communicate more effectively than before. Benny Situ led the engineering work for this deployment and fixed a number of bugs, with the help of Erik Benhardson and Matthias Mullie. Fabrice Florin managed community relations for these new releases, updating this release plan and reaching out to more projects, to prepare for worldwide deployments on all wiki projects in coming months. To that end, we teamed up with Philippe Beaudette, Maggie Dennis, Patrick Earley, Jan Eissfeld, Anna Koval, Keegan Peterzell, and Sherry Snyder to coordinate these releases with the communities they serve. Dario Taraborelli created new metrics dashboards for French, Hungarian, Polish, Portuguese and Swedish Wikipedias. Lastly, we presented our work on Notifications in two talks at Wikimania 2013, with both a general overview and a technical presentation (see slides). We are very grateful to all our community champions for each language and look forward to more collaborations in the future. Our next major deployment to non-English Wikipedias will take place on Sep. 17, to be followed by weekly releases throughout the fall, as outlined in our release plan. To learn more, visit the project portal, read the help page and join the discussion on the talk page. Flow Portal/Project information  edit In August, we continued development of the Flow prototype by implementing revisioning, moderation, and display code, on top of the storage and block abstractions. We have deployed this prototype to an internal labs instance to encourage the full team's involvement in development. Additionally, we participated in an agile workshop run by Arthur and Tomasz from the mobile team. This workshop facilitated planning the Flow MVP and setting goals for the team's first development sprint, along with providing information about agile guidelines and practices that have worked well for the mobile team. Article feedback  edit In August, we made a few feature tweaks and bug fixes for the Article Feedback Tool (AFT5) on the English and French Wikipedias. Matthias Mullie released a few patches to improve the opt-in/opt-out tool, and tested the new feedback notifications to let users know when feedback is marked as useful for a page they watch (or for a comment they posted). We also presented our work on AFT5 at Wikimania 2013, with designer Pau Giner and our French and German champions Benoît Evellin and Denis Barthel, in this session (see slides). The team plans to make the AFT5 tool available to other wiki projects interested in testing this tool later this year, as outlined in the release plan.

Editor engagement experiments
Editor engagement experiments  edit In August, the Editor Engagement Experiments team (E3) primarily focused on development for its next and final A/B test of the Getting Started task suggestion system, a part of a project aimed at onboarding new Wikipedians. The team also worked on enhancements and bug fixes for the GuidedTour extension, such as adding the ability to customize default tour actions and better integration with VisualEditor.

During part of August, the majority of the E3 team was at Wikimania 2013 in Hong Kong, and delivered three talks, including on: guided tours, the team's new editor onboarding process, and product management at the Wikimedia Foundation.

Mobile
Wikimedia Apps/Commons  edit This month, the Mobile Apps team pushed out additional releases of the Commons photo uploader app for iOS and Android. The iOS version includes a major UI revamp by Monte, while the Android version has received multiple incremental updates by Yuvi and Brion. Yuvi has been working on modernizing support for campaigns in UploadWizard, which will make it easier to coordinate uploads for events like Wiki Loves Monuments. Viewer, contributor, and admin user interfaces for campaigns will come to the web, with campaign-tied uploading in the web and mobile app. The team also started making plans for the next generation of the Wikipedia reader app, which will be more closely integrated with the mobile web site to ensure that new features are always available through a web view, even where we don't create specific native support. More details will be put together in the next couple months. Wikipedia Zero  edit This month, the team completed version 1 of Wikipedia Zero automation tests, continued programming the re-architecture of Wikipedia Zero, implemented search engine non-indexing, and analyzed HTTPS requirements in support of a push for greater usage of HTTPS across Wikimedia projects. The Wikipedia Zero engineering team thanks Amit Kapoor from the Wikipedia Zero partnerships team, who wrapped up work with Wikimedia Foundation this month, for his hard work getting the program off the ground. And the team is also pleased to welcome Carolynne Schloeder, who joins the Wikipedia Zero program as Director of Mobile, Programs. Mobile web projects <span class="plainlinks noprint mw-statushelper-editlink" style="margin: 0 0 0 1em; font-size:80%; background:#e4e4e4;" data-statuspage="Mobile web projects/status" data-entrydate="2013-08-monthly"> edit <div style="margin: 0 0 0 2em;" class="mw-statushelper-entry" id="Mobile web projects" data-statuspage="Mobile web projects/status" data-entrydate="2013-08-monthly">This month we continued to improve the mobile editing feature, monitoring and triaging bugs and expanding the feature show at the section level of articles. We also released the first iteration of mobile notifications to projects where Echo is enabled (English, French, Polish, Portuguese, Hungarian, and Swedish Wikipedia, as well as and Meta). In beta, we built a new notifications treatment to be released in later months and continued working on mobile talk pages.

Language engineering
Language tools <span class="plainlinks noprint mw-statushelper-editlink" style="margin: 0 0 0 1em; font-size:80%; background:#e4e4e4;" data-statuspage="Language tools/status" data-entrydate="2013-08-monthly"> edit <div style="margin: 0 0 0 2em;" class="mw-statushelper-entry" id="Language tools" data-statuspage="Language tools/status" data-entrydate="2013-08-monthly">The language team continued maintenance of the UniversalLanguageSelector, in particular improving performance and integration testing, and completed its integration with EventLogging, which will provide metrics useful e.g. to choose the best default font for a language. Counts from translatewiki.net are live and a deployment plan for Wikimedia projects is under analysis. The team also released its monthly version of the MediaWiki Language Extension Bundle (MLEB) which is used by third party developers and community members to add language support for their MediaWiki applications.

The team continued mentoring four Google Summer of Code (GSoC) students. Praveen Singh, mentored by Santhosh Thottingal, released a Chrome extension for Wikimedia Input Tools and contributed to the Indic Font Specification, a collaborative open source project. Team members also continued to work with Red Hat on various language initiatives.

The team participated at Wikimania in Hong Kong, which was an opportunity to meet face to face, as well as to interact with Wikipedians and community members to solve a variety of issues, including dealing with Chinese language variants and adding language assets for Javanese. The team also presented various talks on language engineering.

MediaWiki Core
Multimedia <span class="plainlinks noprint mw-statushelper-editlink" style="margin: 0 0 0 1em; font-size:80%; background:#e4e4e4;" data-statuspage="Multimedia/status" data-entrydate="2013-08-monthly"> edit <div style="margin: 0 0 0 2em;" class="mw-statushelper-entry" id="Multimedia" data-statuspage="Multimedia/status" data-entrydate="2013-08-monthly">In August, we continued to expand our multimedia team: Bryan Davis joined us as senior platform engineer, working with product manager Fabrice Florin, front-end engineer Mark Holmquist and engineering director Rob Lanphier. We discussed multimedia plans and new feature ideas with community members in two separate events: a multimedia roundtable at Wikimania 2013 and an IRC chat, and updated our multimedia plan for the coming year based on their feedback (see slides). Summer contractor Brian Wolff completed the development of new gallery tags to support more appealing layouts for thumbnails, while Jan Gerber made improvements to the Score and TimeMediaHandler extensions. Mark Holmquist started development on the Media Viewer, based on designs by May Tee-Galloway and Jared Zimmerman; this new tool will display images in larger size when clicking on article thumbnails, as well as display file information and a full-screen viewing option, right on the same page. We aim to test a first version of this tool as part of a beta experiment on a few pilot sites in coming weeks. To discuss these features and keep up with our work, we invite you to join this new multimedia mailing list. Last but not least, we are also recruiting for two more positions for our team: a multimedia systems engineer and a senior software engineer. Please spread the word about this unique opportunity to create a richer multimedia experience for Wikipedia and MediaWiki sites! Search <span class="plainlinks noprint mw-statushelper-editlink" style="margin: 0 0 0 1em; font-size:80%; background:#e4e4e4;" data-statuspage="Search/status" data-entrydate="2013-08-monthly"> edit <div style="margin: 0 0 0 2em;" class="mw-statushelper-entry" id="Search" data-statuspage="Search/status" data-entrydate="2013-08-monthly">In August we deployed CirrusSearch to test2.wikipedia.org and mediawiki.org and we're testing there. We're actively looking for other volunteers to test out CirrusSearch. Right now, CirrusSearch is not the primary search for mediawiki.org; you have to use a URL parameter to test it. We're hoping to make it the primary in September. Auth systems <span class="plainlinks noprint mw-statushelper-editlink" style="margin: 0 0 0 1em; font-size:80%; background:#e4e4e4;" data-statuspage="Auth systems/status" data-entrydate="2013-08-monthly"> edit <div style="margin: 0 0 0 2em;" class="mw-statushelper-entry" id="Auth systems" data-statuspage="Auth systems/status" data-entrydate="2013-08-monthly">The team deployed OAuth to mediawiki.org on Aug 20th, and are working on enhancement requests before the extension is enabled on the rest of the WMF wikis. Several small bugs were fixed in SUL. Security auditing and response <span class="plainlinks noprint mw-statushelper-editlink" style="margin: 0 0 0 1em; font-size:80%; background:#e4e4e4;" data-statuspage="Security auditing and response/status" data-entrydate="2013-08-monthly"> edit <div style="margin: 0 0 0 2em;" class="mw-statushelper-entry" id="Security auditing and response" data-statuspage="Security auditing and response/status" data-entrydate="2013-08-monthly">The team responded to reported issues, and prepared for the next MediaWiki release, scheduled on September 3. We worked with Operations to enable HTTPS for user logins in most geographies.

Quality assurance
Quality Assurance <span class="plainlinks noprint mw-statushelper-editlink" style="margin: 0 0 0 1em; font-size:80%; background:#e4e4e4;" data-statuspage="Quality Assurance/status" data-entrydate="2013-08-monthly"> edit <div style="margin: 0 0 0 2em;" class="mw-statushelper-entry" id="Quality Assurance" data-statuspage="Quality Assurance/status" data-entrydate="2013-08-monthly">This month QA began collaborating closely with Release Engineering to coordinate improvement of reporting, monitoring, and testing software releases. Our goal is to make our frequent software releases even more reliable than they already are, and to use the tools and systems in place today such as the beta labs cluster to make those reliable releases even more frequent. Quality Assurance/Browser testing <span class="plainlinks noprint mw-statushelper-editlink" style="margin: 0 0 0 1em; font-size:80%; background:#e4e4e4;" data-statuspage="Quality Assurance/Browser testing/status" data-entrydate="2013-08-monthly"> edit <div style="margin: 0 0 0 2em;" class="mw-statushelper-entry" id="Quality Assurance/Browser testing" data-statuspage="Quality Assurance/Browser testing/status" data-entrydate="2013-08-monthly">This month saw a significant change to the structure and organization of browser tests, with tests and builds for CirrusSearch, UniversalLanguageSelector, and VisualEditor following the example of MobileFrontend and now residing in the git repositories for those extensions, rather than in the /qa/browsertests repository. This creates opportunities for more frequent and more accurate Jenkins builds of the tests, while also reducing the overhead required for analyzing test failures.

Engineering community team
Bug management <span class="plainlinks noprint mw-statushelper-editlink" style="margin: 0 0 0 1em; font-size:80%; background:#e4e4e4;" data-statuspage="Bug management/status" data-entrydate="2013-08-monthly"> edit <div style="margin: 0 0 0 2em;" class="mw-statushelper-entry" id="Bug management" data-statuspage="Bug management/status" data-entrydate="2013-08-monthly">Andre gave presentations on Improving MediaWiki quality: How everybody can help with bug report triaging and Transparency and collaboration in Wikimedia engineering at Wikimania 2013. He updated Bugzilla's technical documentation and documented how to test Bugzilla code changes on the Wikimedia Labs instance. Bugzilla now consequently links to canonical places explaining how to write a good bug report and explaining Bugzilla's UI fields. Bugzilla also shows a new "Show other bugs" link next to the "Component" area to make finding similar reports easier. Andre cleaned up his Greasemonkey triage helper scripts by providing a setting for each functionality at the beginning of the file; a blog post provides more details. Bugzilla's testing instance on Wikimedia Labs saw several patches deployed for testing, which after some more testing should end up in the live Bugzilla; changes include: showing the history of a bug report inline between the comments and configuring the guided bug entry form for users that are new to bug reporting. Mentorship programs <span class="plainlinks noprint mw-statushelper-editlink" style="margin: 0 0 0 1em; font-size:80%; background:#e4e4e4;" data-statuspage="Mentorship programs/status" data-entrydate="2013-08-monthly"> edit <div style="margin: 0 0 0 2em;" class="mw-statushelper-entry" id="Mentorship programs" data-statuspage="Mentorship programs/status" data-entrydate="2013-08-monthly"> The 20 Google Summer of Code projects passed the official mid-term evaluation at the beginning of August, and the Outreach Program for Women project is on track as well. Katie Filbert (Aude), David Cuenca (Micru) and Quim Gil (Qgil) will participate at GSoC Mentors Summit in Mountain View (CA, USA) on October 19-20.

Monthly reports from the projects: Technical communications <span class="plainlinks noprint mw-statushelper-editlink" style="margin: 0 0 0 1em; font-size:80%; background:#e4e4e4;" data-statuspage="Technical communications/status" data-entrydate="2013-08-monthly"> edit <div style="margin: 0 0 0 2em;" class="mw-statushelper-entry" id="Technical communications" data-statuspage="Technical communications/status" data-entrydate="2013-08-monthly">Guillaume Paumier continued to focus on the VisualEditor deployment effort, working on communications, documentation and liaising with the French Wikipedia. Work on technical communications mostly focused on perennial activities like Tech news and ongoing communications support to the engineering staff. Volunteer coordination and outreach <span class="plainlinks noprint mw-statushelper-editlink" style="margin: 0 0 0 1em; font-size:80%; background:#e4e4e4;" data-statuspage="Volunteer coordination and outreach/status" data-entrydate="2013-08-monthly"> edit <div style="margin: 0 0 0 2em;" class="mw-statushelper-entry" id="Volunteer coordination and outreach" data-statuspage="Volunteer coordination and outreach/status" data-entrydate="2013-08-monthly">We had a team presentation at Wikimania: Transparency and collaboration in Wikimedia engineering, explaining how volunteers can make a difference. Following the work on Community metrics, the five key performance indicators (KPIs) were discussed and agreed upon. We are focusing on the first one: who contributes code. A list of Key Wikimedia software projects has been created to define the scope of these KPIs. Recruiting automated browser testers keeps being our top priority. We are organizing the next workshop in San Francisco and online on September 18: Epic fail: figuring out Selenium test results.
 * Refactoring of ProofreadPage extension
 * Section handling in Semantic forms
 * jQuery.IME extensions for Firefox and Chrome
 * Android app for MediaWiki translation
 * Mobilizing Wikidata
 * Improve support for book structures
 * Incremental data dumps
 * Language Coverage Matrix Dashboard
 * Internationalization and Right-To-Left Support in VisualEditor
 * Browser test automation for Visual Editor
 * VisualEditor plugin for source code
 * UploadWizard: Book upload customization
 * Prototyping inline comments
 * Improvement of glossary tools
 * Incremental updates for Kiwix
 * Pronunciation Recording Tool
 * Bayesian Spam Filter
 * Wikidata language fallback and conversion

Analytics
Analytics/Infrastructure <span class="plainlinks noprint mw-statushelper-editlink" style="margin: 0 0 0 1em; font-size:80%; background:#e4e4e4;" data-statuspage="Analytics/Infrastructure/status" data-entrydate="2013-08-monthly"> edit <div style="margin: 0 0 0 2em;" class="mw-statushelper-entry" id="Analytics/Infrastructure" data-statuspage="Analytics/Infrastructure/status" data-entrydate="2013-08-monthly">We continue to pursue the initiatives listed in our planning document. We've had one analyst accept a job offer (welcome Aaron!) and are in discussions with a software engineer. We continue to have a solid pipeline and are spending a lot of time interviewing. Wikimetrics is on target for an early September release and we've made good progress against our hadoop infrastructure goals. In co-operation with Ops, we've completed our reinstall of the Hadoop cluster and run several days of reliability testing over the labor day weekend. We are currently investigating replacing the Oracle JDK with the Open JDK to be in line with our goals of using open source whenever possible. Our project to replace udp2log with Kafka is making steadily progress. Varnishkafka, which will replace varnishncsa, has been debianized and the first performance tests of compressing the message sets are very encouraging. We created a test environment in Labs to test Kafka failover modes and we have been prototyping with Camus to consume the data from a broker and write it to HDFS. We are right now thinking about how to set up Kafka in a multi data-center environment. The Zookeepers have been reinstalled through Puppet as well. Analytics/Visualization, Reporting & Applications <span class="plainlinks noprint mw-statushelper-editlink" style="margin: 0 0 0 1em; font-size:80%; background:#e4e4e4;" data-statuspage="Analytics/Visualization, Reporting & Applications/status" data-entrydate="2013-08-monthly"> edit <div style="margin: 0 0 0 2em;" class="mw-statushelper-entry" id="Analytics/Visualization, Reporting & Applications" data-statuspage="Analytics/Visualization, Reporting & Applications/status" data-entrydate="2013-08-monthly">In close collaboration with Dario, Jaime and Jessie, we have worked on new features for Wikimetrics. In particular, we are adding new metrics such as survival, pages created, aggregation of metrics, metadata in the CSV output, a support page and we have now more than 90% test coverage of the codebase. In preparation for the reinstallation of the Hadoop cluster, we moved all Wikipedia Zero jobs off the cluster. We took this opportunity to add additional monitoring to the creation of Wikipedia Zero dashboards. We have worked with Wikipedia Zero to identify a problem with Geolocation of requests that has created large jumps in total traffic. We spent quite some time creating a more robust process for updating and monitoring gp.wmflabs.org. This dashboard is used by various internal stakeholders and receives its information from different datastreams using different scripts. We have been working on running these scripts under the general purpose stats user, adding additional monitoring to prevent stale data and puppetized some of the jobs. Analytics/Data Releases <span class="plainlinks noprint mw-statushelper-editlink" style="margin: 0 0 0 1em; font-size:80%; background:#e4e4e4;" data-statuspage="Analytics/Data Releases/status" data-entrydate="2013-08-monthly"> edit <div style="margin: 0 0 0 2em;" class="mw-statushelper-entry" id="Analytics/Data Releases" data-statuspage="Analytics/Data Releases/status" data-entrydate="2013-08-monthly">In August, we attended WikiSym and Wikimania. Dario Taraborelli gave a keynote address on actionable Wikipedia research at WikiSym, where several other Wikipedia research papers were presented. At Wikimania, we hosted two sessions focused on Wikimedia data and analytics tools. We also worked with Platform engineering this month on analyzing and visualizing HTTPS failure rates by country, in preparation for the switch to HTTPS as a default. We released new dashboards for the launch of notifications on 5 other Wikipedias and continued to provide ad-hoc support to teams in Editor Engagement. Last, we continued screening and interviewing candidates for an open research analyst position.

Kiwix
The Kiwix project is funded and executed by Wikimedia CH.


 * Release of the new Mediawiki offliner was a little bit delayed; we are still fixing stability bugs. This solution has already proven its efficiency, as we have released 20 new ZIM files this month: a new throughput record. The ZIM incremental update GSoC project progresses too, as the student works on the integration of zimdiff/zimpatch in the Kiwix ecosystem. Kiwix developers have had a 6 days hackathon in Hong-Kong to prepare the next Kiwix release, after some final work on compilation.

Wikidata
The Wikidata project is funded and executed by Wikimedia Deutschland.


 * In August, the Wikidata team was present at 3 events: COSCUP, Wikimania and a meetup about Wikidata and Incubator. A lot of work has been put into improving the API and its documentation. The team also worked on the ability to reorder the qualifiers and sources in a statement, improved the speed of Wikidata slightly, and made progress on the ability to query for statements with a specific property and value, as well as merging items. An improved proposal for the support of Wiktionary has been published. They also started the paper cuts initiative to find and fix small bugs that have a large impact on how enjoyable it is to use Wikidata. Denny and Adam gave a short overview of the state of Wikidata and answered questions during an office hour on IRC. The biggest news for August though was the activation of data access (Wikidata phase 2) on Wikivoyage.

Future

 * The engineering management team continues to update the Deployments page weekly, providing up-to-date information on the upcoming deployments to Wikimedia sites, as well as the engineering roadmap, listing ongoing and future Wikimedia engineering efforts.