Wikimedia Engineering/Report/2013/March

Engineering metrics in March:
 * unique committers contributed patchsets of code to MediaWiki.
 * The total number of unresolved commits went from about 830 to.
 * About shell requests were processed.
 * Wikimedia Labs now hosts projects and  users; to date  instances have been created.

Major news in March include:

''Note: We're also providing a shorter, simpler and translatable version of this report that does not assume specialized technical knowledge.

Upcoming events
There are many opportunities for you to get involved and contribute to MediaWiki and technical activities to improve Wikimedia sites, both for coders and contributors with other talents.

For a more complete and up-to-date list, check out the Project:Calendar.

Work with us
Are you looking to work for Wikimedia? We have a lot of hiring coming up, and we really love talking to active community members about these roles.



Technical Operations
Site infrastructure
 * This month we saw a couple short site glitches, which lasted from about a minute to ten minutes. There were no noticeable impact on readers but editors and contributors however experienced intermittent problems.


 * The first incident was triggered when AFTv5 was released and once the code was reverted, the site outage ended. The incident lasted for about 10 minutes. The incident is documented on https://wikitech.wikimedia.org/wiki/Site_issue_Mar_15_2013.
 * The other two were jobqueue related, according to Asher. The current mysql jobqueue implementation is way too costly. In analyzing the data during that 24 hours, 75% of all queries that take over 450ms to run on the enwiki master are related to the jobqueue and all major actions result in replicated writes. It's 58% of all query execution time when not looking at over the slow threshold.  If 1 million refresh-links jobs are queued as quickly as possible without paying attention to replication lag, it would caused the apache servers to experience time-out due to the replication lag.  Mediawiki depends on reading from slaves to scale and avoids lagged ones.  If all slaves are lagged, the master is used for everything, and if this happens to enwiki, the site falls over. This mysql jobqueue was identified as a scaling bottleneck a while ago, and will be switching to redis very soon.  It's currently targeted with the release of 1.22wmf1, but we may be able to backport to 1.21wmf12 and get this done early April.
 * On Mar 12, we experienced a ESAM site outage which was probably caused caused  by packet loss between ESAM and EQIAD. Leslie changed routes from Esams to Eqiad to fix the packet loss, with that Esams recovered. While we are still not clear what caused it, we did notice it coincided with the news release when the new Pope was elected. That did trigger a surge in traffic to our web properties.
 * In March, we had a short 'security' sprint, led by Leslie where we patched servers that needed security upgrades. In addition, we continued to work on MariaDB migration, Ceph deployment and fixing Varnish bugs.
 * TechOps has initiated a biweekly meeting with the engineering teams to drive alignment amongst the various engineering projects and TechOps, in their requirements and expectation. It is also the process to surface potential deployment issues (such as capacity demand, new infrastructure and performance). Meeting minutes are documented on http://etherpad.wmflabs.org/pad/p/EMGT-Ops-Projects-25March2013.

Fundraising
 * Built and deployed new public reporting host samarium.wikimedia.org. Added logging to fundraising deployment scripts.

Data Dumps
 * Work is continuing on tools for import. Setting up a local copy of a wiki which includes only a subset of the page content has always been problematic, since this requires use of the notoriously slow and finicky importDumpphp maintenance script.  Under development is a tool to filter the currently produced sql table dumps against a list of page ids of a content subset; these tables could then be imported into mysql database, along with tables produced from the content subset, bypassing the need for importDump.php.  Additionally, these sql fles could be shared with other users who are interested in the same content subset.

Wikimedia Labs
 * LabsDB - https://wikitech.wikimedia.org/wiki/ToolLabsDatabasePlan - current plan
 * GlusterFS issue

Kiwix
The Kiwix project is funded and executed by Wikimedia CH.



Wikidata
The Wikidata project is funded and executed by Wikimedia Deutschland.

Denny and Lydia gave a short update on Wikidata's status at the metrics and activities meeting. A more detailed analysis can be found in this blog post. In addition Wikidata phase 1 (language links) has been activated on the remaining 282 Wikipedias. This means that all Wikipedias now get their language links from Wikidata. Not too long after that phase 2 (infoboxes) was activated on the first 11 Wikipedias. They can now make use of shared structured data from Wikidata in their articles. On Wikidata itself we introduced a new data type (string), extended references in statements (they can now have multiple values) and improved the search box.

We have written down how we envision queries on Wikidata and would appreciate your feedback.

As a nice demonstration of the potential of Wikidata we've seen two new projects this month: Wiri and a tree of life.

Future

 * The engineering management team continues to update the Deployments page weekly, providing up-to-date information on the upcoming deployments to Wikimedia sites, as well as the engineering roadmap, listing ongoing and future Wikimedia engineering efforts.