Wikimedia Engineering/Report/2013/February

Engineering metrics in February:
 * 110 unique committers contributed patchsets of code to MediaWiki.
 * The total number of unresolved commits went from about 650 to about 830.
 * About 69 shell requests were processed.
 * Wikimedia Labs now hosts 150 projects and 1,002 users; to date 1561 instances have been created.

Major news in February include:
 * The Wikipedia Zero project got a Knight News Challenge grant.
 * Additional input methods were made available for jQuery.IME.
 * The Translate extension introduced a new iteration of the Translation Editor.
 * The Wikimedia mobile web team launched the ability to view or add pages to watchlist — all from mobile devices.
 * Echo is A new notification system for Wikipedia.
 * The Technical Operations team found ways to stop problems in their tracks.
 * Wikipedia Mobile hit 3 billion monthly page views.

''Note: We're also proposing a shorter, simpler and translatable version of this report that does not assume specialized technical knowledge.

Upcoming events
There are many opportunities for you to get involved and contribute to MediaWiki and technical activities to improve Wikimedia sites, both for coders and contributors with other talents.

For a more complete and up-to-date list, check out the Project:Calendar.

Work with us
Are you looking to work for Wikimedia? We have a lot of hiring coming up, and we really love talking to active community members about these roles.



Announcements

 * Ed Sanders joined the Features engineering group as Software Engineer working on Visual Editor (announcement).
 * Christian Aistleitner joined as a contractor specializing in work on Gerrit (announcement).
 * Marc-Andre Pelletier joined the Technical Operations team as Operations Engineer (contractor), focusing on the Wikimedia Labs infrastructure and migration of tools (announcement).
 * Greg Grossmeier joined the Platform engineering group as Release Manager (announcement).
 * Greg Grossmeier joined the Platform engineering group as Release Manager (announcement).

Technical Operations
Site infrastructure
 * Both Asher and Peter are proceeding cautiously in broadening MariaDB deployment in our clusters. We have one mariadb instance for each of the db clusters (s1 to s7). MariaDB support team has been quick in resolving bugs we encountered along the way. In another database administration task, Asher reviewed and deployed the wikidata schema changes as well as migrated it from s3 cluster to s5, adding more capacity for it to grow.
 * Sixty new application servers were put into production in each of the two datacenters. This is in anticipation of expected traffic growth coming from both our regular and mobile sites in the coming year.
 * Lately we have been experiencing short time-out failures in the nightly search indices build with search-pool4. Asher is experimenting a fix. He redistributed the search-pool4 indices in pmtpa based on sizes and what seems to be a more acceptable index-size : ram ratio. We essentially have a virtual search-pool5 shard, but with the spelling and highlight indices for pool4 and pool5 sharing the same servers.  The pool4 wikis are using the new setup in Tampa, with everything else continuing to use our Ashburn cluster. We should know soon if it works.
 * The TechOps team had a in-person team meeting the week of 25th February in San Francisco office.


 * The highlights of the meeting were:
 * Discuss the upkeep of the 'failover' datacenter and captured the lessons learned from the recent datacenter switchover. For example, we think we could reduce the switchover 'readonly' time from 32 minutes to 10 minutes by automating more of the database and caching failover procedures.
 * Improve and streamline our hiring process & our security access model.
 * Organize a coordinated sprints process to fill the gap between smaller tasks (RT) and larger tasks (department coordination) by collecting some thoughts, brainstorming and forming teams via the following wikitech page¹: https://wikitech.wikimedia.org/view/Projects
 * Face-to-Face meetings with the Engineering teams from Platform, Mobile, Analytics and Wikidata.
 * Short TechOps sprints to reduce cronspams and our RT queue.
 * Review 2013/2014 budget needs.

Data Dumps
 * Numerous bug fixes were made to the mwxml2sql tool and a set of sql files bsed on an English language Wikipedia XML dump was published for use by testers . A tool to convert sql dumps to escaped tab-delimited format is now available for use with MySQL's LOAD DATA INFILE command, much faster than INSERTs.  All sql fles from the same dump were converted to this format and also published.
 * A new mirror has come on line, initially mirroring historical archives of XML files as well as MediaWiki releases, page view statistics and other files . Thanks to Robert Smith and Wansecurity.com for providing the resources to make this happen.

Wikimedia Labs
 * This month was mostly spent stabilizing Labs components. Labs Ganglia was fixed to report instance statistics properly. Adminbot was updated to fix utf8 issues, and to fix package issues when upgrading. A number of changes were made to the glusterfs support to bring more stability. Gluster was upgraded to 3.3.1 to fix a memory leak on both the client and server. Gluster isn't matching our use case of multitenancy, as the glusterd daemon isn't handling the large number of volumes well. To help with this, until we either fix the issue in gluster, or replace it, we've made a change to not create/manage gluster volumes for projects unless they opt in. We've also disabled and deleted gluster volumes for projects that are currently unused. Work was done to turn Puppet classes for installing MediaWiki in Labs into modules, so that they can be reused more easily. wikitech.wikimedia.org (our operations and infrastructure documentation) and labsconsole.wikimedia.org were merged together into wikitech.wikimedia.org. wikitech-static.wikimedia.org is available as a backup, in the case that all access to our cluster is unavailable. Work was started on supporting saltstack reactors, to replace the bootstrapping for instance creation. This month we have new member of the Labs team, Marc-Andre Pelletier, also known in the community as Coren. Coren will be working on the new Tool Labs infrastructure and we're very excited to have him on-board. Work on replicated databases for Tool Labs began during the last week of the Month.

Kiwix
The Kiwix project is funded and executed by Wikimedia CH.


 * We have migrated our source code repository from Subversion to Git. We have have also focused in February on the revamping of the Kiwix Web site. The new Web site is really more user friendly. Audience continues to grow with 120.000 downloads of the software in February.

Wikidata
The Wikidata project is funded and executed by Wikimedia Deutschland.

In February the first phase of Wikidata (language links) was deployed on the English-language Wikipedia. Additionally the first parts of phase 2 (infoboxes) went life on wikidata.org. It is now possible to add statements. For an example see 159. The first tools have already been written on top of this, for example Geneawiki and Reasonator. In the meantime more work has been put into additional data-types, like strings and geocoordinates, as well as the foundations of phase 3 (lists based on queries).

In other good news: Wikimedia Germany has decided to fund Wikidata development after the end of the first year of development at the end of March.

Future

 * The engineering management team continues to update the Deployments page weekly, providing up-to-date information on the upcoming deployments to Wikimedia sites, as well as the engineering roadmap, listing ongoing and future Wikimedia engineering efforts.