Wikimedia Engineering/Report/2014/November

Major news in November include:
 * the release of the second version of the Content Translation tool, which heavily relies on Apertium for machine translation;
 * updates to MediaWiki's internationalization based on new CLDR data;
 * the move from Bugzilla to Phabricator as the new collaboration platform for the Wikimedia technical community.

Upcoming events
There are many opportunities for you to get involved and contribute to MediaWiki and technical activities to improve Wikimedia sites, both for coders and contributors with other talents.

For a more complete and up-to-date list, check out the Project:Calendar.

Jagalah hati jangan kau nodai dan kau kotori

Work with us
Are you looking to work for Wikimedia? We have a lot of hiring coming up, and we really love talking to active community members about these roles.

* Director of Engineering
 * Senior Software Engineer - Services
 * Software Engineer - Flow (Front-end)
 * Software Engineer - Mobile - Android
 * Application Security Engineer
 * Full Stack Developer - Analytics
 * Agile Coach
 * Scrum Master
 * Senior Technical Product Manager
 * Community Liaison
 * Community Liaison (PT Contract)
 * UX Senior Designer
 * UX Senior Design Researcher
 * UX Visual Design Fellowship

Announcements

 * Andrew Garrett joined the Wikimedia Foundation as a full time Software Engineer (announcement).
 * Yuvaraj Pandian joined the Wikimedia Technical Operations team (announcement).
 * Tracy Beasley joined the Design Research Team as Participant Recruiter (announcement).
 * James Douglas joined the Platform engineering team as part of the Services group (announcement).
 * Stas Malyshev joined the Platform engineering team as part of the MediaWiki Core group (announcement).

Technical Operations
Labs metrics in November: Tool metrics:
 * Number of projects: 154
 * Number of instances: 440
 * Amount of RAM in use (in MBs): 2,131,456
 * Amount of allocated storage (in GBs): 21,555
 * Number of virtual CPUs in use: 1,047
 * Number of users: 4,426
 * Number of tools: 976
 * Number of tool maintainers: 543

 Wikimedia Labs
 * Yuvi has officially joined the labs team.
 * We updated the labs OpenStack install from version 'Havana' to version 'Icehouse'.
 * Ldap (used for sign-in on many WMF services) is now de-coupled from the Labs hardware. Ldap has a dedicated server in each of eqiad and codfw.
 * Hardware to expand Labs VM capacity in eqiad is now racked. Work on the OS and OpenStack install is ongoing.
 * Trusty instances can now pull ssh keys directly from ldap, so logins (on Trusty instances) will still work in case of shared-storage outage
 * We now have redirects from tool server to toollabs. This is one of the last steps in sunsetting the tool server.
 * Marc added a few experimental Trusty nodes to toollabs.

Editor retention: Editing tools
VisualEditor  edit  In November, the team working on VisualEditor introduced table structure editing, improved some existing features, and fixed over 100 tasks, bugs and requests.

You can now edit the structure of a table, adding or deleting rows and columns and various other common tasks like merging cells and using captions. VisualEditor now support keyboard shortcuts like entering " " at the start of a line to make a bullet list; if you didn't mean to use the "smart" sequence, pressing undo will get back to what you typed. Most wikis now have VisualEditor available as an opt-in tool, whereas previously communities had to ask for it to be switched on.

The toolbar's menus in VisualEditor now open in a collapsed, short form with uncommon tools only shown when requested. You can now create and edit simple "blockquoted" paragraphs for indenting. You can now use a basic editor for gallery and hieroglyphic blocks on the page. Category editing was enhanced in a few ways, including adding a redirect category now adds its target, and making categories without a description page show as red. We improved compatibility with some variations of how wikis use the Flagged Revisions system. Armenian language users now get customised bold and italic icons in the toolbar; if your language would benefit from customised icons, please contact us.

We also made progress on providing a new auto-filled citations tool, and improvements to the link editing and media searching tools, all of which will be coming in the near future.

The deployed version of the code was updated four times in the regular release cycle (1.25-wmf7, 1.25-wmf8, 1.25-wmf9 and 1.25-wmf10). Editing  edit In November, the Editing team continued their work on the front-end standardisation project and VisualEditor, both of which are reported separately. The team made some improvements to the ResourceLoader library used inside MediaWiki core, as part of their wider work to bring in the OOjs UI library to MediaWiki. A volunteer attempt to add high quality (SVG) versions of the toolbar icons in WikiEditor was introduced, but later removed because of some quality issues; we will be re-doing this soon. The team led continuous integration work to move the existing unit testing system for MediaWiki from production slaves to virtual boxes in Wikimedia Labs, and CI improvements for the citoid and MobileFrontend projects. The team made a number of improvements to the Vector, Monobook and Apex skins. Parsoid  edit In November, the Parsoid team continued to work through the big blockers to using Parsoid HTML for read views. We made further progress in customizing the Cite extension via CSS, and started work on supporting templates that Parsoid does not handle properly yet. These templates used on a subset of pages on Wikipedia generate attributes of a table as well as content of the table and do not fit well within the DOM-based model that Parsoid works with. We expect both these blockers to be lifted by early January which significantly furthers our goal of serving read views via Parsoid's HTML. Besides this, we continued to ongoing code cleanup, maintenance, bug fixes, and regular deployments.

Core Features
Flow/Project information  edit In November, the Flow team completed our first conversion of LiquidThreads (LQT) pages into Flow pages, on the private Wikimedia Office wiki. The team now has the ability to turn LQT pages into Flow boards, keeping the items in history and user contributions intact, including edits made to LQT posts. OfficeWiki also has existing wiki talk pages, and the team prepared a conversion script to archive the existing pages, with a prominent link to the archives in the new Flow board headers. The conversion of all talk pages on OfficeWiki will happen in early December.

The team created a new Flow dashboard in Limn to chart usage across all projects. This is a helpful baseline for comparison when we make changes and build new features. We're also working on implementing EventLogging in Flow for the first time. Feature work included front-end work on a Table of Contents feature, and back-end work supporting an upcoming Search feature.

Mobile
Wikipedia Zero  edit During November 2014, the Partners engineering team continued work on the the partners portal (ZeroPortal) minimum viable product ("MVP"), further improved the fidelity of zero-rated Graph extension pageview statistics relative to the legacy Limn graphs and identified further page filtering criteria, generated statistics on basic pageviews by JavaScript support level in both Wikipedia Zero and general mobile web Wikipedia, added basic scheduled runs of the portal Cucumber tests, added colorized indicator support to the Wikipedia for iOS Wikipedia Zero experience, fixed the broken "x" close button on Wikipedia Zero banners, and finalized the speedier mdot landing page and language-aware redirect code for Wikipedia Zero users. The team also consulted with partners on API usage and zero-rating implementations. Mobile web projects  edit This month the team ran tests of the WikiGrok interface (version a and b) for readers in beta, in preparation for launching a logged in test on the stable mobile site this quarter. In order to test in production, we also expanded our question set to include simple claims that are easier to generate. To help support our continued testing and data-driven decision making, the team also overhauled many of the mobile dashboards, adding more features and functionality to the set that we monitor for improvement.

Language Engineering
Language tools  edit The team fixed several bugs in UniversalLanguageSelector. A bug was fixed that caused JavaScript errors on a few special pages without headings. The font size for button text was fixed to improve display on Monobook skin for Mozilla Firefox. The team also added support for the WOFF2 webfont format. Experiments revealed significant improvement in overheads. However, there are no WOFF2 webfonts in the font repository yet due to pending issues in WOFF2 font generation.21 new languages are now supported in the language selector, and autonyms for 5 languages were updated.

The team migrated the translation memory service of the Translate extension to ElasticSearch. Thanks to WMF's ElasticSearch cluster, this migration increases the speed and reliability of the service. We have identified one issue with the suggestions, which is being fixed during December. Thanks to Chad and Nik for helping Niklas.

Last, the team also made RTL fixes in MobileFrontend and VisualEditor. Content translation  edit The third version was released; it includes several enhancements and fixes. New features include a simple first version translation dashboard for viewing, loading and saving own translation, and the ability to save ongoing translations. The deployment of the Content Translation Database is currently in progress. On completion, users will be able to use the newly added dashboard and save and resume translations for unfinished articles. Collaboration continues with Tech Ops team for preparing the tool for deployment in a production Wikipedia as a beta-feature

The Machine Translation service code was refactored to make it more extensible for other languages and translation services. As an experiment, the Yandex machine translation service was tested. Several fixes related to template adaptation were done. The language selector and the top-bar in the editing interface have been redesigned.

The fourth release is currently underway with a specific goal to prepare the tool for deployment as a beta feature in January.

MediaWiki Core
HHVM <span class="plainlinks noprint mw-statushelper-editlink" style="margin: 0 0 0 1em; font-size:80%; background:#e4e4e4;" data-statuspage="HHVM/status" data-entrydate="2014-11-monthly"> edit <div style="margin: 0 0 0 2em;" class="mw-statushelper-entry" id="HHVM" data-statuspage="HHVM/status" data-entrydate="2014-11-monthly">Rollout is ongoing. The plan is to deploy HHVM on 100% of app servers by Christmas, barring unforeseen issues. Tim Starling and Giuseppe Lavagetto isolated and resolved an issue with OAuth: Authorization header was not available to HHVM via apache_request_headers due to a mod_proxy_fcgi bug. It was resolved by reinjecting the header via a SetEnvIf directive in the Apache config.

Tim Starling isolated and fixed an API throughput issue (T758). Shelling out to tidy didn’t perform well on HHVM. It was solved by making the tidy extension for PHP compatible with HHVM (at least for essential functionality) and using that instead. This needs to be tested and fully deployed. SUL finalisation <span class="plainlinks noprint mw-statushelper-editlink" style="margin: 0 0 0 1em; font-size:80%; background:#e4e4e4;" data-statuspage="SUL finalisation/status" data-entrydate="2014-11-monthly"> edit <div style="margin: 0 0 0 2em;" class="mw-statushelper-entry" id="SUL finalisation" data-statuspage="SUL finalisation/status" data-entrydate="2014-11-monthly">Special:GlobalRenameRequest and Special:GlobalRenameQueue, special pages intended to help users and Steward and global renamers process SUL finalization name changes, are just about ready to move into production on Meta. Special:GlobalUserMerge required some more work this month due to database conflicts with beta labs. Once properly tested out, GlobalUserMerge will join the other two tools on Meta to complete the kit needed to process the anticipated large number of rename requests.

One last bit of engineering is being completed in November and executed in early December: contacting existing accounts with unconfirmed email addresses to request confirmation. This will allow for additional formerly globally unattached accounts to be attached without going through any forms or process before SUL finalization takes place. Library infrastructure for MediaWiki <span class="plainlinks noprint mw-statushelper-editlink" style="margin: 0 0 0 1em; font-size:80%; background:#e4e4e4;" data-statuspage="Library infrastructure for MediaWiki/status" data-entrydate="2014-11-monthly"> edit <div style="margin: 0 0 0 2em;" class="mw-statushelper-entry" id="Library infrastructure for MediaWiki" data-statuspage="Library infrastructure for MediaWiki/status" data-entrydate="2014-11-monthly">Aaron Schulz and Chad Horohoe joined the team. Both are helping make updates to the Profiler classes used to measure the performance of MediaWiki related to the Better PHP profiling RFC. Profiling was identified early on as a common entanglement for many generally useful utility libraries found in the MediaWiki codebase. The work that is progressing here should enable us to remove many explicit  and   calls in the MediaWiki PHP code while still getting the benefit of performance measurements via the XHProf profiling library. Profiling via the XHProf functionality built into HHVM is currently in use in both the beta and production clusters and helping drive some low hanging fruit code improvements.

Bryan is continuing to work on structured logging changes and is testing a Monolog based logging pipeline in Beta to replace the current system. has been deprecated and all usage in the core or MediaWiki replaced with the new  class which was introduced by the PSR-3 logging changes.

The cssjanus library has been removed from MediaWiki's core repository and replaced with a Composer managed import from the official upstream. The lessphp CSS pre-processor which was historically manually copied into MediaWiki's git repository is now imported via Composer.

The CDB library originally written by Tim Starling has been extracted to its own git repository and published on Packagist. Both MediaWiki itself and the "multiversion" scripts that are used to manage the WMF wiki family are now importing CDB via Composer instead of the old practice of keeping two copies of the code updated manually in the respective repositories.

The simplei18n PHP library that was developed for the IEG's Grant review application based on code from the Wikimania Scholarships application was transferred from Bryan's personal github account to the official Wikimedia account.

External dependencies for the BounceHandler and Elastica extensions have been removed from the extension git repositories and replaced with Composer managed imports. For the WMF cluster, these dependent libraries have been added to the mediawiki/vendor.git repository. ExtensionDistributor has been updated to package composer managed dependencies in the tarballs it generates for installing extensions. The php-composer-validate test is now applied to all extensions and skins to validate the syntax of composer.json when changes are uploaded to gerrit.

Several classes have been moved from includes/utils to includes/libs (ArrayUtils, MapCacheLRU, Cookie/CookieJar) which makes them easy candidates for publication in stand alone libraries in the future. Aaron is working on a list of possible libraries to create from the MediaWiki codebase that would group several useful classes together. This should produce more sustainable projects than having literally dozens of libraries made up of only a single class.

Security auditing and response <span class="plainlinks noprint mw-statushelper-editlink" style="margin: 0 0 0 1em; font-size:80%; background:#e4e4e4;" data-statuspage="Security auditing and response/status" data-entrydate="2014-11-monthly"> edit <div style="margin: 0 0 0 2em;" class="mw-statushelper-entry" id="Security auditing and response" data-statuspage="Security auditing and response/status" data-entrydate="2014-11-monthly">We fixed four security issues in the 1.23.7 release., and completed security reviews of OOjs UI (PHP Implementation), SandboxLink extension, GlobalUserPage, and Phabricator Sprint.

Release Engineering
Quality Assurance <span class="plainlinks noprint mw-statushelper-editlink" style="margin: 0 0 0 1em; font-size:80%; background:#e4e4e4;" data-statuspage="Quality Assurance/status" data-entrydate="2014-11-monthly"> edit <div style="margin: 0 0 0 2em;" class="mw-statushelper-entry" id="Quality Assurance" data-statuspage="Quality Assurance/status" data-entrydate="2014-11-monthly">In November we made significant improvements to the Vagrant development environments, and also conducted a poll of Vagrant users to guide future development. We sorted out issues with HHVM on beta labs, and we made improvements to Jenkins and Zuul for speed and efficency. We introduced the Ruby linter "rubocop" to all of the Jenkins builds, one more step in managing technical debt and improving the quality of our browser test code. Quality Assurance/Browser testing <span class="plainlinks noprint mw-statushelper-editlink" style="margin: 0 0 0 1em; font-size:80%; background:#e4e4e4;" data-statuspage="Quality Assurance/Browser testing/status" data-entrydate="2014-11-monthly"> edit <div style="margin: 0 0 0 2em;" class="mw-statushelper-entry" id="Quality Assurance/Browser testing" data-statuspage="Quality Assurance/Browser testing/status" data-entrydate="2014-11-monthly">In November the CentralNotice repo acquired its first browser tests. In addition to adding new test coverage, we continue to refactor existing tests across all of our repositories. While the primary purpose of this refactoring is to update all of the tests' assertions to RSpec3 syntax from RSpec2, we are also taking the time to address technical debt and sort other issues in the tests such as removing inefficient code and replacing explicit wait statements with dynamic wait-for statements. This not only improves the speed at which the tests run, but also helps immensely with maintenance and usability into the future.

Multimedia
Multimedia <span class="plainlinks noprint mw-statushelper-editlink" style="margin: 0 0 0 1em; font-size:80%; background:#e4e4e4;" data-statuspage="Multimedia/status" data-entrydate="2014-11-monthly"> edit <div style="margin: 0 0 0 2em;" class="mw-statushelper-entry" id="Multimedia" data-statuspage="Multimedia/status" data-entrydate="2014-11-monthly"> In November 2014, after releasing requested improvements, the multimedia team started shifting its focus away from Media Viewer and onto bugs in the upload pipeline and UploadWizard.

The team attended the Amsterdam Hackathon, with a focus on GLAMs and structured data. There work was done on a working prototype for what entering structured data on Commons might be like, on research and groundwork for means to track per-file views (a long-standing request from GLAMs) in preparation for Erik Zachte and Christian Aistleitner's RfC as well as parsing image annotations so that they may be displayed in Media Viewer in the future.

The team's focus on Media Viewer has significantly reduced after releasing the last round of improvements that came out of the community consultation. The project has moved to maintenance mode, taking care of major bugs that need immediate attention. The team has provided support for the file metadata cleanup drive and will continue to do so, in order to improve the accuracy of the metadata displayed in Media Viewer.

UploadWizard has seen numerous code cleanup improvements, as well as the trimming down of a few obscure legacy features. This refactoring effort is already making UploadWizard easier to maintain, which supports the team's goal of fixing bugs and improving the efficiency of the upload pipeline.

For more information about our work, join the multimedia mailing list.

Engineering Community Team
Bug management <span class="plainlinks noprint mw-statushelper-editlink" style="margin: 0 0 0 1em; font-size:80%; background:#e4e4e4;" data-statuspage="Bug management/status" data-entrydate="2014-11-monthly"> edit <div style="margin: 0 0 0 2em;" class="mw-statushelper-entry" id="Bug management" data-statuspage="Bug management/status" data-entrydate="2014-11-monthly">Bugzilla was migrated to Phabricator (see announcement on wikitech-l). All tasks and accounts (which need to get claimed by their users) were imported into Phabricator. Bug management or Bugzilla related documentation was updated accordingly. Details and a list of all the steps performed for the migration are also available. Bugzilla is still available at https://old-bugzilla.wikimedia.org - see Phabricator/versus_Bugzilla for more information and differences between Phabricator and Bugzilla. Bugzilla users which were default CC or default assignees of components in Bugzilla were contacted to join their corresponding projects in Phabricator. Change notifications to the wikibugs-l mailing list are disabled as they were considered too noisy. Availability of batch editing is currently restricted to members of the Triagers project. After the migration, several projects were renamed or newly created (such requests are handled in the Project-Creators project in Phabricator). Phabricator/Migration <span class="plainlinks noprint mw-statushelper-editlink" style="margin: 0 0 0 1em; font-size:80%; background:#e4e4e4;" data-statuspage="Phabricator/Migration/status" data-entrydate="2014-11-monthly"> edit <div style="margin: 0 0 0 2em;" class="mw-statushelper-entry" id="Phabricator/Migration" data-statuspage="Phabricator/Migration/status" data-entrydate="2014-11-monthly">Wikimedia migrated from Bugzilla to Phabricator for issue tracking. All Bugzilla tasks and accounts (which need to get claimed by their users) were imported into Phabricator. URLs for Bugzilla reports are redirected to the corresponding tasks in Phabricator. (Details and a list of steps performed for the Bugzilla migration are available.) Bugzilla accounts that were default CC or default assignees of components in Bugzilla were asked to join their corresponding projects in Phabricator. Furthermore, Diffusion (for hosting and browsing repositories, to replace gitblit) got enabled in Phabricator, fab.wmflabs.org redirects to phabricator.wikimedia.org, the Gerrit notification bot creates notifications in a corresponding Phab task even when the commit message contains a link to a Bugzilla ticket, and the notification feature in the top bar of Phabricator got enabled. Work continues on common project management guidelines and providing burndown charts for sprints. Next on the Phabricator migration plan is RT, followed by Mingle and Trello. Outreach programs <span class="plainlinks noprint mw-statushelper-editlink" style="margin: 0 0 0 1em; font-size:80%; background:#e4e4e4;" data-statuspage="Outreach programs/status" data-entrydate="2014-11-monthly"> edit <div style="margin: 0 0 0 2em;" class="mw-statushelper-entry" id="Outreach programs" data-statuspage="Outreach programs/status" data-entrydate="2014-11-monthly">FOSS Outreach Program for Women/Round 9 interns and projects were selected:
 * Neta Livneh and Roxana Necula: Wikipedia article translation metrics
 * Priyanka Jayaswal: Pywikibot: Compat to core migration
 * Anke Nowottne: Wikipedia Education Program need-finding research
 * Ankita Shukla: Collaborative spelling dictionary building tool
 * Christy Okpo: Improving the Wikimedia Performance Portal

Also, Wikimedia was accepted for taking part in Google Code-in 2014, a contest for 13-17 year old pre-university students, running from December 1, 2014 to January 19, 2015.

Analytics
Analytics/Research and Data <span class="plainlinks noprint mw-statushelper-editlink" style="margin: 0 0 0 1em; font-size:80%; background:#e4e4e4;" data-statuspage="Analytics/Research and Data/status" data-entrydate="2014-11-monthly"> edit <div style="margin: 0 0 0 2em;" class="mw-statushelper-entry" id="Analytics/Research and Data" data-statuspage="Analytics/Research and Data/status" data-entrydate="2014-11-monthly">In November, we supported the Fundraising team with the preparation and kickoff of the English fundraising campaign.

We started applying our new page view definition towards a number of reports and presentations, including an update for the Wikimedia Foundation board on readership trends.

We continued supporting the Mobile team with data on mobile traffic and prepared the launch of two controlled tests of microcontributions on the beta version of the mobile site. We performed preliminary analysis and QA of the data in preparation of a larger test to be launched on the stable site in January. We concluded data analysis for the test of HHVM and found no conclusive evidence that HHVM substantially affects newcomer engagement in Wikipedia, but hypothesized that HHVM would have effects elsewhere.

We hosted a research showcase with Yan Chen (University of Michigan) as a guest speaker. We finalized a formal collaboration with a team of researchers at Stanford University, to be launched in December. A workshop we submitted to CSCW '15 on creating industry/academic partnerships for open collaboration research was accepted and will be held at the conference in Vancouver on March 14, 2015.

Kiwix
The Kiwix project is funded and executed by Wikimedia CH.




 * We have released, for the first time, a complete offline version of the Gutenberg project, a 50,000-book public domain online library. This new software solution is able to easily create a complete offline snapshot proposing all the books in HTML and EPUB format. We make the books accessible via a custom and really easy-to-use interface. It's consequently trivial to have this big library available everywhere on your PC, local network or even smartphone. This was the first step of a broader effort to increase outreach of public domain literature; further development will take place in 2015. If you want to know more, read the release announcement.


 * Automation and consolidation of the Wikimedia projects dumping process continues its progress. Beside the continuous improvement of mwoffliner, a new small tool called mwmatrixoffliner was released. It uses MediaWiki's Matrix extension to allow dumping of all linguistic versions of a project. As a result, we have started to produce monthly ZIM snapshots of Wikivoyage, Wikinews, Wikiquote, Wikiversity, Wikibooks and Wikispecies. For all theses projects, we make available complete dumps with or without pictures, as a raw ZIM file or pre-packaged with Kiwix in a "portable version" on download.kiwix.org, over BitTorrent and HTTP. We will soon provide this service for bigger projects like Wiktionaries and Wikipedias, and have therefore started to setup new server instances in Wikimedia labs.


 * We have also made a new release of TED talks ZIM files. This follows an effort to improve the user interface; these updated files benefit from a slightly reviewed user interface.

Wikidata
The Wikidata project is funded and executed by Wikimedia Deutschland.


 * Wikidata won the Open Data Award in the category Publisher by the Open Data Institute. Development in November focused on:
 * performance improvements
 * introducing language fallbacks (so you will see labels in other languages you likely speak if they are not available in your language)
 * statements on properties (so you can indicate that one property is the inverse of another property or that a given property on Wikidata corresponds to another one on another website)
 * redesigning the sitelinks section.
 * Wikidata was also a big topic at the GLAM hackathon in Amsterdam and was used for many great applications like the Sum of all Paintings. An office hour about structured data on Commons was well-attended.

Future

 * The engineering management team continues to update the Deployments page weekly, providing up-to-date information on the upcoming deployments to Wikimedia sites, as well as the annual goals, listing ongoing and future Wikimedia engineering efforts.