Wikimedia Engineering/Report/2013/February

Engineering metrics in February:
 * 110 unique committers contributed patchsets of code to MediaWiki.
 * The total number of unresolved commits went from about 650 to about 830.
 * About 69 shell requests were processed.
 * Wikimedia Labs now hosts 150 projects and 1,002 users; to date 1561 instances have been created.

Major news in February include:
 * The Wikipedia Zero project got a Knight News Challenge grant.
 * Additional input methods were made available for jQuery.IME.
 * The Translate extension introduced a new iteration of the Translation Editor.
 * The Wikimedia mobile web team launched the ability to view or add pages to watchlist — all from mobile devices.
 * Echo is A new notification system for Wikipedia.
 * The Technical Operations team found ways to stop problems in their tracks.
 * Wikipedia Mobile hit 3 billion monthly page views.

''Note: We're also proposing a shorter, simpler and translatable version of this report that does not assume specialized technical knowledge.

Upcoming events
There are many opportunities for you to get involved and contribute to MediaWiki and technical activities to improve Wikimedia sites, both for coders and contributors with other talents.

For a more complete and up-to-date list, check out the Project:Calendar.

Work with us
Are you looking to work for Wikimedia? We have a lot of hiring coming up, and we really love talking to active community members about these roles.



Announcements

 * Ed Sanders joined the Features engineering group as Software Engineer working on Visual Editor (announcement).
 * Christian Aistleitner joined as a contractor specializing in work on Gerrit (announcement).
 * Marc-Andre Pelletier joined the Technical Operations team as Operations Engineer (contractor), focusing on the Wikimedia Labs infrastructure and migration of tools (announcement).
 * Kirsten Menger-Anderson joined the Features group as a part-time contractor Technical Writer focusing on Editor Engagement Experiments.
 * Greg Grossmeier joined the Platform engineering group as Release Manager (announcement).
 * Site Performance Engineer and Senior Technical Advisor Patrick Reilly's last day with WMF was February 19th (announcement).

Technical Operations
Site infrastructure
 * Both Asher Feldman and Peter Youngmeister are proceeding cautiously in broadening MariaDB deployment in our clusters. We have one MariaDB instance for each of the database clusters (s1 to s7). The MariaDB support team has been quick in resolving bugs we encountered along the way. In another database administration task, Asher reviewed and deployed the Wikidata schema changes and migrated it from s3 cluster to s5, adding more growth capacity.
 * We put sixty new application servers into production in each of the two datacenters. This is in anticipation of expected traffic growth coming from both our regular and mobile sites in the coming year.
 * Lately we have been experiencing short time-out failures in the nightly search indices built with search-pool4. Asher is experimenting with a fix. He redistributed the search-pool4 indices in the Tampa data center based on sizes and what seems to be a more acceptable index size-to-ram ratio. We essentially have a virtual search-pool5 shard, but with the spelling and highlight indices for pool4 and pool5 sharing the same servers.  The pool4 wikis are using the new setup in Tampa, with everything else continuing to use our Ashburn cluster. We should know soon if it works.
 * The TechOps team had a in-person team meeting the week of 25th February in WMF's San Francisco office.


 * The highlights of the meeting were:
 * Discuss the upkeep of the "failover" datacenter and capture the lessons learned from the recent datacenter switchover. For example, we think we could reduce the switchover "readonly" time from 32 minutes to 10 minutes by automating more of the database and caching failover procedures.
 * Improve and streamline our hiring process and our security access model.
 * Organize a coordinated sprints process to fill the gap between smaller tasks (for which we use the RT ticketing system) and larger tasks (which require department coordination) by collecting some thoughts. We started brainstorming and forming teams via the Projects wikitech page.
 * Face-to-face meetings with the Engineering teams from Platform, Mobile, Analytics and Wikidata.
 * Short TechOps sprints to reduce cronspams and our RT queue.
 * Review budget needs for the 2013-2014 fiscal year.

Data Dumps
 * Numerous bug fixes were made to the mwxml2sql tool, and a set of SQL files bsed on an English language Wikipedia XML dump was published for use by testers . A tool to convert SQL dumps to escaped tab-delimited format is now available for use with MySQL's LOAD DATA INFILE command, much faster than INSERTs.  All SQL fles from the same dump were converted to this format and also published.
 * A new mirror has come on line, initially mirroring historical archives of XML files as well as MediaWiki releases, page view statistics and other files . Thanks to Robert Smith and Wansecurity.com for providing the resources to make this happen.

Wikimedia Labs
 * This month was mostly spent stabilizing Labs components. Labs Ganglia was fixed to report instance statistics properly. Adminbot was updated to fix utf8 issues, and to fix package issues when upgrading. A number of changes were made to the glusterfs support to bring more stability. Gluster was upgraded to 3.3.1 to fix a memory leak on both the client and server. Gluster isn't matching our use case of multitenancy, as the glusterd daemon isn't handling the large number of volumes well. To help with this, until we either fix the issue in gluster, or replace it, we've made a change to not create/manage Gluster volumes for projects unless they opt in. We've also disabled and deleted Gluster volumes for projects that are currently unused. Work was done to turn Puppet classes for installing MediaWiki in Labs into modules, so that they can be reused more easily.
 * We merged wikitech.wikimedia.org (our operations and infrastructure documentation) and labsconsole.wikimedia.org together into wikitech.wikimedia.org. wikitech-static.wikimedia.org is available as a backup, in case all access to our cluster is unavailable. Work was started on supporting saltstack reactors, to replace the bootstrapping for instance creation. This month we have new member of the Labs team, Marc-Andre Pelletier, also known in the community as Coren. Coren will be working on the new Tool Labs infrastructure and we're very excited to have him on-board. Asher and Peter started work on replicated databases for Tool Labs during the last week of the month.

Kiwix
The Kiwix project is funded and executed by Wikimedia CH.


 * We have migrated our source code repository from Subversion to Git. We have have also focused in February on the revamping of the Kiwix Web site. The new Web site is really more user friendly. Audience continues to grow with 120,000 downloads of the software in February.

Wikidata
The Wikidata project is funded and executed by Wikimedia Deutschland.

In February the first phase of Wikidata (language links) was deployed on the English-language Wikipedia. Additionally the first parts of phase 2 (infoboxes) went life on wikidata.org. It is now possible to add statements. For an example see 159. The first tools have already been written on top of this, for example Geneawiki and Reasonator. In the meantime more work has been put into additional data-types, like strings and geocoordinates, as well as the foundations of phase 3 (lists based on queries).

In other good news: Wikimedia Germany has decided to fund Wikidata development after the end of the first year of development at the end of March.

Future

 * The engineering management team continues to update the Deployments page weekly, providing up-to-date information on the upcoming deployments to Wikimedia sites, as well as the engineering roadmap, listing ongoing and future Wikimedia engineering efforts.