Wikimedia Engineering/Report/2014/December

Major news in December include:
 * the third version of the Content translation tool;
 * a look at collaboration practices within the mobile web engineering team;
 * lead images prominently featured in the Beta version of the Wikipedia for Android app;
 * the migration of the Technical Operations team from RT to Phabricator;
 * a retrospective on how the deployment of HHVM improved editing performance on Wikimedia sites.

Upcoming events
There are many opportunities for you to get involved and contribute to MediaWiki and technical activities to improve Wikimedia sites, both for coders and contributors with other talents.

For a more complete and up-to-date list, check out the Project:Calendar.

Work with us
Are you looking to work for Wikimedia? We have a lot of hiring coming up, and we really love talking to active community members about these roles.

* Director of Engineering
 * Senior Software Engineer - Services
 * Software Engineer - Flow (Front-end)
 * Software Engineer - Mobile - Android
 * Application Security Engineer
 * Full Stack Developer - Analytics
 * Agile Coach
 * Scrum Master
 * Senior Technical Product Manager
 * Community Liaison
 * UX Senior Designer
 * UX Senior Design Researcher
 * UX Visual Design Fellowship

Announcements

 * Nirzar Pangarkar joined the User Experience team as User Experience Designer (announcement).

Technical Operations
Labs metrics in December: Tool metrics:
 * Number of projects: 155
 * Number of instances: 437
 * Amount of RAM in use (in MBs): 2,147,840
 * Amount of allocated storage (in GBs): 21,745
 * Number of virtual CPUs in use: 1,059
 * Number of users: 4,599
 * Number of tools: 1002
 * Number of tool maintainers: 559

 Wikimedia Labs
 * We've added two new large servers to the Labs virtualization cluster in eqiad. This will increases our capacity by about 40%; Andrew is now in the process of rebalancing load among servers and allocating new resources to projects that need them.
 * Yuvi has made some progress with an intermittent DNS issue. With luck this will be more stable now.
 * The toolserver is now officially decommissioned.

Editor retention: Editing tools
VisualEditor  edit  In December, the team working on VisualEditor introduced a new design for the front-end system that VisualEditor uses, improved several existing features, and fixed over 80 tasks, bugs and requests.

The new design of the interface is based on the future path set by the Wikimedia User Experience team, and is intended to gel with designs across Wikimedia's tools and services and the wider direction of travel for the Web as a whole. The interface is subject to revision and improvements, and nothing is set in stone; feedback is appreciated from users and designers alike.

We also made a number of improvements to language editing and browser stability, progress on providing a new auto-filled citations tool, and improvements to the link editing and media searching tools, all of which will be coming in the near future.

The deployed version of the code was updated three times in the regular release cycle (1.25-wmf11, 1.25-wmf12 and 1.25-wmf13). Editing  edit In December, the Editing team continued their work on the front-end standardisation project and VisualEditor, both of which are reported separately. The team landed a key new function in the ResourceLoader library used inside MediaWiki core, which converts icons for use in the interface automatically based on browser capabilities and language. The team also provided the OOjs UI library inside MediaWiki vendor for the first time. The TemplateData extension now uses OOjs UI rather than jQuery UI, and has been completely re-written. The team made a number of improvements to the forthcoming citoid service, including support for PMIDs. The team also supported a number of other teams, with some work on each of the MoodBar, GlobalBlocking, ContentTranslation,  MobileFrontend, LastModified, and UrlShortener projects. Parsoid  edit  In December, we finished work supporting templates that generate attributes of a table as well as content of the table and do not fit well within the DOM-based model that Parsoid works with. Besides that, we improved error reporting when handling images that lets clients like Flow and VE handle them better. With a view to reducing the HTML size that needs to be loaded and parsed by clients, we stripped the private data-parsoid attribute from templated content since it is unnecessary. We also continued with code cleanup and pay back technical debt. Specifically, we did a bunch of fixes in our nowiki handling when HTML is serialized to wikitext. We improved robustness, correctness, and reduced the number of scenarios where nowikis are needed for quotes. We also made it simpler to detect nowiki scenarios for other wikitext constructs and specifically applied it to links of all flavors.

Collaboration
Flow/Project information  edit In December, the Collaboration team completed work on the first iteration of Flow's Table of Contents feature, for release in early January. Catalan Wikipedia now uses Flow on their new Village Pump/Technical page, and we're working on feature requests that will get Flow ready for further rollout on both Catalan and French WP.

Mobile
Mobile web projects  edit This month, the mobile web team released the first test of WikiGrok to the full mobile site. The test ran for one week on articles about people and music albums on English Wikipedia for logged in users only. The test was intended to assess the level of engagement that users have with WikiGrok and the quality of their responses. Initial data indicate that the quality of submissions from logged in users, both those who have edited in the past and those with no edits, is high: around 80% of a sample of hand-coded responses were correct. Given these findings, the team will begin to send WikiGrok contributions to Wikidata and run a test with logged out users in the coming months.

Language Engineering
Language tools  edit  Content translation  edit 
 * UniversalLanguageSelector:
 * Made the width of the ULS panel configurable according to the number of languages.
 * Removed the map. The analytics showed that very few people click it.
 * Improve the visual design of the Compact Language Links beta feature (done by Niharika Kohli, design and review by Pau Giner and Santhosh Thottingal)
 * Improve the performance of language search (done by Thiemo Mättig, review by Santhosh Thottingal)
 * Add ability to rely on default configuration when loading messages in jquery.i18n.
 * Positioning fixes for the Input Method selector.
 * Translate
 * Multiple fixes to translation memory
 * Refactored and redesigned the article and language selector using ULS. (This required changes in the design of ULS itself.)
 * Multiple fixes in section alignment in the translation interface.
 * Warning about possible overwriting of the translated page (same title, same topic)
 * Show license text on entry points.
 * Disabled all ContentTranslation features unless the user enabled the beta feature.
 * Automatic draft translation saving (one of the most requested features from user feedback sessions).
 * Multiple other minor bug fixes.
 * Preparation for deployment in January - configuration, puppetization, etc.

MediaWiki Core
Search  edit In December we wound down the search project after successfully deploying CirrusSearch to enwiki. We fixed a few bugs and did some work on preventing very complex queries from putting undue load on the servers. We don't plan to do any work on this project in January. Wikibase/Indexing <span class="plainlinks noprint mw-statushelper-editlink" style="margin: 0 0 0 1em; font-size:80%; background:#e4e4e4;" data-statuspage="Wikibase/Indexing/status" data-entrydate="2014-12-monthly"> edit <div style="margin: 0 0 0 2em;" class="mw-statushelper-entry" id="Wikibase/Indexing" data-statuspage="Wikibase/Indexing/status" data-entrydate="2014-12-monthly">In December we picked a backend for the query engine. Unfortunately in early February that choice became less good when its primary developers were snapped up by another company and they said they'll have "less time" to work on it. Sad. We've reopened the investigation for backends. SUL finalisation <span class="plainlinks noprint mw-statushelper-editlink" style="margin: 0 0 0 1em; font-size:80%; background:#e4e4e4;" data-statuspage="SUL finalisation/status" data-entrydate="2014-12-monthly"> edit <div style="margin: 0 0 0 2em;" class="mw-statushelper-entry" id="SUL finalisation" data-statuspage="SUL finalisation/status" data-entrydate="2014-12-monthly">Special:GlobalRenameRequest and Special:GlobalRenameQueue have tested and out are ready for production, Special:GlobalUserMerge continues to be problematic and buggy. An upstream HHVM bug that showed up in late December has caused us to turn off Special:MergeAccount, a vital function to getting unattached accounts to globalize. This is a blocker bug until it gets fixed (estimated late January). Library infrastructure for MediaWiki <span class="plainlinks noprint mw-statushelper-editlink" style="margin: 0 0 0 1em; font-size:80%; background:#e4e4e4;" data-statuspage="Library infrastructure for MediaWiki/status" data-entrydate="2014-12-monthly"> edit <div style="margin: 0 0 0 2em;" class="mw-statushelper-entry" id="Library infrastructure for MediaWiki" data-statuspage="Library infrastructure for MediaWiki/status" data-entrydate="2014-12-monthly">Work on integrating XHProf as the preferred method of profiling an individual request has largely been completed. Ori Livneh has also built a parallel infrastructure for use on the Wikimedia Foundation servers that profiles HHVM using a extension named Xenon. The flame graph output of the Xenon based reporting can be seen at performance.wikimedia.org and the scripts that power it are available on GitHub.

Documentation on the addition of external libraries to MediaWiki core is underway. An RFC on guildelines for extracting, publishing and managing libraries has also been drafted and will be under discussion by the MediaWiki developer community. A blog post describing the accomplishments and next steps for the continuation of the project is being drafted as well and is expected to be published in January 2015.

Antoine Musso and Timo Tijhof are finalizing a standard practice for testing Composer managed projects using CDB as a test bed for refining the techniques.

Structured logging implementation is continuing with the use of Monolog and Redis transport being tested in the beta cluster and a subset of the production Wikimedia Foundation cluster. A new Monolog handler was written and upstreamed to Monolog to support the use case of randomly sampled log event streams. Chris Steipp also contributed a security related patch to Monolog based on a security review he did for deployment on the Wikimedia Foundation cluster.

The project's "top priority" status expires with the end of the fiscal year quarter, but organic contributions and a raised awareness of the functionality offered by library extraction and integration of high quality third party code is expected to continue. A list of future projects has been started and Bryan Davis and others who have participated in the project have high hopes for the transformative changes that this project has unblocked in MediaWiki. Security auditing and response <span class="plainlinks noprint mw-statushelper-editlink" style="margin: 0 0 0 1em; font-size:80%; background:#e4e4e4;" data-statuspage="Security auditing and response/status" data-entrydate="2014-12-monthly"> edit <div style="margin: 0 0 0 2em;" class="mw-statushelper-entry" id="Security auditing and response" data-statuspage="Security auditing and response/status" data-entrydate="2014-12-monthly">MediaWiki 1.24.1 was released, fixing issues in core and several extensions. Reviews for kafkatee and plancake email parser were finished. During December, the WMF also participated in a security assessment of MediaWiki by iSec Partners, sponsored by the Open Technology Fund. The results will be made public in February.

Multimedia
Multimedia <span class="plainlinks noprint mw-statushelper-editlink" style="margin: 0 0 0 1em; font-size:80%; background:#e4e4e4;" data-statuspage="Multimedia/status" data-entrydate="2014-12-monthly"> edit <div style="margin: 0 0 0 2em;" class="mw-statushelper-entry" id="Multimedia" data-statuspage="Multimedia/status" data-entrydate="2014-12-monthly"> In December 2014, the team fully migrated its tasks and workflow to Phabricator.

Active maintenance was performed on Media Viewer and CommonsMetadata, which have both seen several minor improvements. The long tail of small bugs affecting Media Viewer in particular was a source of several Google Code-in tasks, which several volunteers tackled.

UploadWizard has been the continued focus of major code cleanup and test coverage increase efforts. Bugs were fixed along the way as well.

Thumbnail chaining was deployed for a short period of time and subsequently undeployed following concerns about over-sharpening it caused. Investigations regarding the performance improvements or lack thereof introduced by thumbnail prerendering at uploaded time and thumbnail chaining were conducted. The discussion and further research are happening on our mailing list.

Significant work has gone into creating an extension to support Sentry, a tool designed to log javascript errors. This effort should benefit the entire organization and give engineering visibility on the live client-side errors encountered by users. In our team's case, it is being worked on in order to identify client-side errors that might be happening in UploadWizard and which wouldn't have been the subject of bug reports yet.

For more information about our work, join the multimedia mailing list.

Engineering Community Team
Bug management <span class="plainlinks noprint mw-statushelper-editlink" style="margin: 0 0 0 1em; font-size:80%; background:#e4e4e4;" data-statuspage="Bug management/status" data-entrydate="2014-12-monthly"> edit <div style="margin: 0 0 0 2em;" class="mw-statushelper-entry" id="Bug management" data-statuspage="Bug management/status" data-entrydate="2014-12-monthly">Numerous Security related tags were added. After the migration from Bugzilla to Phabricator, many projects got created or renamed. "Blocked-on-team" projects were created as requested by Scrum teams. The MediaWiki-extensions-MWSearch and Wikimedia-lucene-search-2 projects were archived. The Future-Release milestone was removed and several other milestones archived. "Technical debt" was turned into a tag, superseding a tracking task. The MediaWiki-Javascript project and the javascript tracking task were superseded by a generic Javascript tag. Phabricator/Migration <span class="plainlinks noprint mw-statushelper-editlink" style="margin: 0 0 0 1em; font-size:80%; background:#e4e4e4;" data-statuspage="Phabricator/Migration/status" data-entrydate="2014-12-monthly"> edit <div style="margin: 0 0 0 2em;" class="mw-statushelper-entry" id="Phabricator/Migration" data-statuspage="Phabricator/Migration/status" data-entrydate="2014-12-monthly">On December 18, Wikimedia's RT instance was migrated to Phabricator (except for the access-requests@ and procurement@ queues). RT users were added to Phabricator's WMF-NDA group. The Sprint extension was deployed, allowing teams to visualize progress. Best practices for Project management, working with workboards, working with the upstream Phabricator team, and our Phabricator Security setup were documented. The Language Engineering, Multimedia and Analytics teams started migrating to Phabricator. The Legalpad application was enabled and configured. A maintenance window for updating our instance was defined. Mentorship programs <span class="plainlinks noprint mw-statushelper-editlink" style="margin: 0 0 0 1em; font-size:80%; background:#e4e4e4;" data-statuspage="Mentorship programs/status" data-entrydate="2014-12-monthly"> edit <div style="margin: 0 0 0 2em;" class="mw-statushelper-entry" id="Mentorship programs" data-statuspage="Mentorship programs/status" data-entrydate="2014-12-monthly">FOSS Outreach Program for Women/Round 9 interns started working on their projects and reporting on a weekly basis:
 * Neta Livneh: Wikipedia article translation metrics weekly reports.
 * Roxana Necula: Wikipedia article translation metrics weekly reports.
 * Priyanka Jayaswal: Pywikibot: Compat to core migration weekly reports.
 * Anke Nowottne: Wikipedia Education Program need-finding research weekly reports.
 * Ankita Shukla: Collaborative spelling dictionary building tool weekly reports.
 * Manpreet Kaur: Extending PyWikiBot support to sites on IWM weekly reports.

Christy Okpo's project "Improving the Wikimedia Performance Portal" had a delayed start due to a last minute problem finding a mentor available. She finally started working in a metrics project for Parsoid (weekly reports).

Wikimedia's participation in Google Code-in 2014 during December evolved as planned, with the participation of several students and mentors. Volunteer coordination and outreach <span class="plainlinks noprint mw-statushelper-editlink" style="margin: 0 0 0 1em; font-size:80%; background:#e4e4e4;" data-statuspage="Volunteer coordination and outreach/status" data-entrydate="2014-12-monthly"> edit <div style="margin: 0 0 0 2em;" class="mw-statushelper-entry" id="Volunteer coordination and outreach" data-statuspage="Volunteer coordination and outreach/status" data-entrydate="2014-12-monthly">In October we had a successful week long MediaWiki Core offsite in San Diego. Currently we are focusing most of our energy on planning for the MediaWiki Developer Summit 2015 (Jan 26/27, 2015). We are also ramping up slowly on the French Hackathon in Lyon which will be taking place on May 23 - 25. Team offsides come up in January: Ops (1 day offsite), Team Practices (3 day offsite).

Analytics
Analytics/Research and Data <span class="plainlinks noprint mw-statushelper-editlink" style="margin: 0 0 0 1em; font-size:80%; background:#e4e4e4;" data-statuspage="Analytics/Research and Data/status" data-entrydate="2014-12-monthly"> edit <div style="margin: 0 0 0 2em;" class="mw-statushelper-entry" id="Analytics/Research and Data" data-statuspage="Analytics/Research and Data/status" data-entrydate="2014-12-monthly">We gave a at the December 2014 Monthly Metrics meeting, highlighting growth and decline in pageviews to Wikimedia projects across different global areas and by access type. We continued to provide research support to the English fundraising campaign throughout the month and to the Mobile team with code instrumentation, experimental design and data analysis.

We kicked off a collaboration with a research team at Stanford University (Bob West and Jure Leskovec) aimed to understand and improve link coverage in Wikipedia across languages. In collaboration with this team, we submitted a workshop to ICWSM '15 that was accepted in the conference program and will be held in Oxford on May 26, 2015.

We kicked off a new IEG-funded project to bring a machine learning based revision scoring service to Wikimedia Labs with User:とある白い猫 and User:He7d3r.

We published an overview of the most edited articles in 2014 (press). We published an open data corpus from the Article Feedback pilot in collaboration with the Product team. The corpus contains over 1.5 million messages posted on three major Wikipedias (English, French, German) over the course of a year.

We hosted our monthly showcase with a presentation on mobile readership by Oliver Keyes and a guest talk on flu forecasting with Wikipedia data by Reid Priedhorsky (Los Alamos National Laboratory).

Kiwix
The Kiwix project is funded and executed by Wikimedia CH.


 * The Google Code-in project has started and we are part of it, we propose a good dozen of tasks related to Kiwix for Android.


 * After two years since the 0.9rc2, we have finally released Kiwix for PCs (Windows, OSX, Linux) version 0.9. This new release brings a lot of small improvements and bug fixes which should really improve the global user experience.


 * Most of our resources were used this month to improve mwoffliner, our solution to build a ZIM file (offline version) of any Mediawiki. These efforts to improve the performance, the features and the usability start to pay, We have built more than 400 ZIM files this month (trend increasing) and third party projects start to use it to provide their own ZIM files (like here or here).


 * On the operational side, we have started to setup a ZIM building farm in the Wikimedia labs. Work is ongoing but the goal is to be able to generate soon, one time a month at least, ZIM files for all our projects.

Wikidata
The Wikidata project is funded and executed by Wikimedia Deutschland.


 * In December the Wikidata team released a number of new features. These are:
 * language fallbacks (so you see information in a different language if it is not available in your primary language)
 * statements on properties (this is important for mapping our properties to other databases for example)
 * a new datatype for linking to properties (this is necessary to say that one property is the inverse of another property for example - like "father" and "child")
 * performance improvements.
 * In addition we provided support for the work on query functionality for Wikidata. Your input is requested for a new feature: article placeholders powered by Wikidata.


 * It was announced that Freebase will be shut down in favor of Wikidata. Lydia wrote a blog post about scaling Wikidata over the next year.

Future

 * The engineering management team continues to update the Deployments page weekly, providing up-to-date information on the upcoming deployments to Wikimedia sites, as well as the annual goals, listing ongoing and future Wikimedia engineering efforts.