Wikimedia Engineering/Report/2013/March

Engineering metrics in March:
 * unique committers contributed patchsets of code to MediaWiki.
 * The total number of unresolved commits went from about 830 to.
 * About shell requests were processed.
 * Wikimedia Labs now hosts projects and  users; to date  instances have been created.

Major news in March include:

''Note: We're also providing a shorter, simpler and translatable version of this report that does not assume specialized technical knowledge.

Upcoming events
There are many opportunities for you to get involved and contribute to MediaWiki and technical activities to improve Wikimedia sites, both for coders and contributors with other talents.

For a more complete and up-to-date list, check out the Project:Calendar.

Work with us
Are you looking to work for Wikimedia? We have a lot of hiring coming up, and we really love talking to active community members about these roles.



Announcements
Two new full-time employees started in WMF engineering in March:


 * Yuri Astrakhan, Senior Software Engineer in the Mobile group (announcement).
 * Adam Baso, Senior Software Engineer, Mobile (Engineering) (announcement).

Technical Operations
Site infrastructure
 * This month we saw a few short site glitches that lasted from about a minute to ten minutes each. The outages did not noticeably affect readers, but editors and contributors experienced intermittent problems.
 * The first incident was triggered by a deployment of Article Feedback Tool v5, and once the code was reverted, the site outage ended. The incident lasted for about 10 minutes (incident documentation).
 * The other two were jobqueue-related, according to Asher Feldman. The current MySQL jobqueue implementation is far too costly. In analyzing the data during that 24-hour period, we see that 75% of all queries that take over 450ms to run on the English Wikipedia master are related to the jobqueue, and all major actions result in replicated writes. In fact, the jobqueue takes 58% of all query execution time when not limiting the analysis to queries over the slow threshold.  If 1 million refresh-links jobs are queued as quickly as possible without paying attention to replication lag, that causes the Apache servers to experience time-out due to the replication lag.  MediaWiki depends on reading from slaves to scale, and avoids lagged ones.  If all slaves are lagged, the master is used for everything, and if this happens to English Wikipedia, the site falls over. This MySQL jobqueue was identified as a scaling bottleneck a while ago, and thus we will be switching to Redis very soon.  We're currently aiming for that switch to coincide with the release of 1.22wmf1, but we may be able to backport to 1.21wmf12 and get this done in early April.
 * On March 12, we experienced a ESAM site outage which was probably caused by packet loss between ESAM and EQIAD. Leslie changed routes from Esams to Eqiad to fix the packet loss, which caused Esams to recover. While we still don't clearly understand what caused the outage, we did notice it coincided with the news release when the new Pope was elected. The election did trigger a surge in traffic to our web properties.
 * In March, we had a short security sprint led by Leslie Carr. We patched servers that needed security upgrades. In addition, we continued to work on MariaDB migration, Ceph deployment and fixing Varnish bugs.
 * TechOps has initiated a biweekly meeting with the engineering teams to drive alignment amongst the various engineering projects and TechOps regarding requirements and expectations. This is also the process to surface potential deployment issues (such as capacity demand, new infrastructure and performance). Meeting minutes are documented on the meeting Etherpad.

Fundraising
 * Built and deployed new public reporting host samarium.wikimedia.org. Added logging to fundraising deployment scripts.

Data Dumps
 * Work is continuing on tools for import. Setting up a local copy of a wiki which includes only a subset of the page content has always been problematic, since this requires use of the notoriously slow and finicky importDumpphp maintenance script.  Under development is a tool to filter the currently produced SQL table dumps against a list of page IDs of a content subset; these tables could then be imported into a MySQL database, along with tables produced from the content subset, bypassing the need for importDump.php.  Additionally, these SQL fles could be shared with other users who are interested in the same content subset. We hope to be able to launch this in April.

Wikimedia Labs
 * Check out the current plan for LabsDB. Also, in March we worked on a GlusterFS issue.

Language engineering
Language community outreach:
 * The Language Engineering team kickstarted its Language Support Maven plan for getting language tools feedback from Wikimedian community members who are using internationalisation and localisation tools developed by the team. The team also held its regular monthly office hours in March. The team's outreach coordinator also reported team progress with multiple blog posts on the technology blog. The team plans to restore its bug triage sessions, starting in April 2013.

Analytics
Visualization, Reporting & Applications
 * In order to support mobile initiatives--including the Mobile Website, Mobile Apps, and Wikipedia Zero--we focused our attention on providing data extracts and visualizations with this focus. New visualizations include the Mobile app dashboard.


 * In addition, we updated the report card for the March Metrics Meeting, improved the robustness of the reportcard infrastructure, added target bars and added links to the metric definitions.

Wikistats
 * We are currently working on a new mobile pageview report.

Services & Access Points
 * In March, we saw the launch of the User Metrics API, a service that allows researchers to perform cohort analysis on various data sets, making it easier to measure the effects of programs and platform experiments among discrete sets of users. We are currently working on improving the web-based user interface to make it available for use outside of Wikimedia Foundation staff in the coming months.

Analytics Infrastructure
 * Our big-data cluster known as Kraken has been undergone no major changes in capability, but we have been working to make it more robust and improve security. Our udp2log monitoring has become more accurate, and Limn can be installed on both production and Labs instances.

Misc: Defects Closed
 * Fixed the Space characters in pagecounst-raw titles bug.

Misc: Management & Communication
 * The Analytics team has started to use Mingle to manage its work more effectively day-to-day. Bugzilla remains our primary interface for managing defects with respect to communicating their priority and status.


 * Finally, we had our Analytics Reboot meeting, where all internal WMF Analytics stakeholders convened and we surveyed what customer opportunities were out there, what Analytics models are currently available, and how to improve inter-team communication.

Kiwix
The Kiwix project is funded and executed by Wikimedia CH.
 * Work on the 0.rc3 release of Kiwix is ongoing, mostly consisting of bug fixing and a few UI improvements. The release date is in around one month. For the first time, a ZIM file of Wikisource (in French) was done, within the scope of the Afripedia project.

Wikidata
The Wikidata project is funded and executed by Wikimedia Deutschland.


 * Denny Vrandečić and Lydia Pintscher gave a short update on Wikidata's status at the metrics and activities meeting. A more detailed analysis can be found in our blog post. In addition, Wikidata phase 1 (language links) has been activated on the remaining 282 Wikipedias. This means that all Wikipedias now get their language links from Wikidata. Not too long after that, phase 2 (infoboxes) was activated on the first 11 Wikipedias. They can now make use of shared structured data from Wikidata in their articles. On Wikidata itself we introduced a new data type, extended references in statements (they can now have multiple values), and improved the search box.


 * We have written down how we envision queries on Wikidata and would appreciate your feedback.


 * As a nice demonstration of the potential of Wikidata we've seen two new projects this month: Wiri and a tree of life.

Future

 * The engineering management team continues to update the Deployments page weekly, providing up-to-date information on the upcoming deployments to Wikimedia sites, as well as the engineering roadmap, listing ongoing and future Wikimedia engineering efforts.