Wikimedia Engineering/Report/2013/January

Engineering metrics in January:
 * 112 unique committers contributed patchsets of code to MediaWiki.
 * The total number of unresolved commits remained stable around 650.
 * About 45 shell requests were processed.
 * Wikimedia Labs now hosts 155 projects and 931 users; to date 1473 instances have been created.
 * Detailed community metrics are also available.

Major news in January include:
 * https://blog.wikimedia.org/2013/01/19/wikimedia-sites-move-to-primary-data-center-in-ashburn-virginia/
 * https://blog.wikimedia.org/2013/01/11/mobile-beta-a-sandbox-for-new-experimental-features/
 * https://blog.wikimedia.org/2013/01/25/language-engineering-progress-with-input-methods-and-translation-editor/
 * https://blog.wikimedia.org/2013/01/11/a-more-efficient-translation-interface/
 * https://blog.wikimedia.org/2013/01/31/geodata-a-new-age-of-geotagging-on-wikipedia/
 * https://blog.wikimedia.org/2013/01/28/help-us-test-and-investigate-visualeditor/
 * https://blog.wikimedia.org/2013/01/28/help-us-test-and-investigate-visualeditor/

''Note: We're also proposing a shorter, simpler and translatable version of this report that does not assume specialized technical knowledge.

Work with us
Are you looking to work for Wikimedia? We have a lot of hiring coming up, and we really love talking to active community members about these roles.



Announcements

 * Yuvaraj Pandian re-joined the Mobile engineering team as Software developer (announcement). He joined the newly created Mobile App team with Brion Vibber and Shankar Narayan.
 * Munagala Ramanath (Ram) joined the MediaWiki core team of the Platform engineering group as Senior Software Engineer (announcement).
 * Runa Bhattacharjee joined the Language Engineering team as Outreach and QA coordinator (announcement).

Technical Operations
Production Site Switchover


 * Wikimedia Foundation switched over its primary data center from Tampa to Ashburn on January 22. Given the scale and complexity of the migration, we scheduled three 8-hour windows on 22nd, 23rd and 24th to perform the fail-over tasks. We are glad to report we completed it on the first attempt. Because the switchover involved, among other things, moving over the master databases from Tampa to Ashburn, the site was set to 'read-only' mode for about 32 minutes to facilitate those operations. During that period, the site was available but no new contents were created, edited or uploaded. As expected, there were some minor fallout as a result of the data center switchover, mostly due to configuration changes. They were quickly contained by the Engineering and Operation teams. Examples were 'test.wikimedia.org' was still in read-only mode and users in Europe were getting stale caches. For those interested in the details of the operation tasks performed on that day, you can find the documentation on http://wikitech.wikimedia.org/view/Eqiad_Migration_Planning/Steps.
 * With this migration, Tampa data center will now be our fail-over site and we plan to perform site fail-over test every few months. There are remaining small non-core applications still using Tampa as the primary site, such as RT, etherpad and Bugzilla. They too will be migrated in the coming months.
 * The Engineering Community team has started writing a couple of blogs on the site migration effort. The first two are already out - check them out: http://blog.wikimedia.org/2013/01/19/wikimedia-sites-move-to-primary-data-center-in-ashburn-virginia/ and https://blog.wikimedia.org/2013/02/01/from-duct-tape-to-puppets/. Enjoy!

Site infrastructure
 * One of the main concerns of the switchover was serving traffic from the new data center using empty memcached servers. The spike in load on the apache and database servers could be disastrous to the site. To address it, Tim improved on the single instance implementation of 'Parser Cache' persistent store in Tampa (to 3 sharded instances). Then Asher built and replicated the databases across the 2 data centers. That indeed helped.


 * Another improvement both Peter and Asher made was implementing MHA (Master High Availability) to our MySQL clusters. MHA primary objective is to automate the promotion of a slave database in a master database fail-over scenario and to to reduce downtime, without suffering from replication integrity problems, without prolong database latency, and without changing existing deployments.


 * Faidon and Mark continued to work on the Ceph file object store. With Domas help, they identified a performance issue with the RAID card which caused sever read/write latency on the Ceph cluster. Faidon has confirmed with the vendor that it is a known problem and no fix is available yet. We have ordered and substituted those RAID cards. Test results on those new RAID cards seem to solve the performance issue.

Fundraising
 * Fundraising bastion hosts deployed in eqiad and pmtpa. Queue (ActiveMQ) host silicon put in service to replace erzurumi. Tweaked and tuned central logging, monitoring. Returned temporary-use db1013 to general pool now that fundraiser is over. Converted remaining fundraising myisam tables to innodb, should fix dump-induced replication lag.

Data Dumps
 * This month we had a look at the process of using the XML dumps to create a local copy of a WMF project. This process is painful and cumbersome at best and unfathomable for the enduser in the worst case.  As part of an attempt to improve this situation, there's a new tool  available for *nix platforms, for generating MySQL tables from the XML stub and page content files.  It is intended to read input files from various versions of MediaWiki and generate output for the version the user wants.  Strictly alpha code but we'd love help testing it.

Wikimedia Labs
 * This month we had a number of performance and usability improvements. Three compute nodes were added into the pmtpa zone. Alex Monk added Echo notification support to labsconsole. Passwordless sudo now the default for projects. Shell requests are created automatically on account creation.  The sysadmin and netadmin roles have been combined into a single projectadmin role. Glusterfs was upgraded to handle a memory leak, but unfortunately a new bug has been introduced that caused some instability in project storage. Work is ongoing to improve the project storage situation.

Kiwix
The Kiwix project is funded and executed by Wikimedia CH.

We have adapted the kiwix-plug script to a device cheaper than the Dreamplug, the Tonidoplug2. Kiwix was elected by users as February of the Month on Sourceforge and an Interivew of Kelson was published. For the first time, Kiwix has reached 100.000 downloads a month in January.

Wikidata
The Wikidata project is funded and executed by Wikimedia Deutschland.

January has been an exciting month for Wikidata. The deployment on the first 3 Wikipedias was done - Hungarian, Hebrew and Italian. At the same time work has continued on the user interface and backend for statements, the core part of phase 2 of Wikidata. This will enable users to enter information like the children of a given person or link to their portrait on Wikimedia Commons. This can already be tested on the demo system. We've also worked on making Abuse Filter work with Wikidata and wrote a new mechanism to distribute changes to the clients (Wikipedia) so they can show Wikidata changes in their Recent Changes. We made progress on using Solr for search and rewrote the draft for the inclusion syntax to be much simpler. This is the syntax that Wikipedia editors will use to include data from Wikidata in Wikipedia. A manual for using Pywikipedia on Wikidata was written as well.

If you want to code on Wikibase, the software powering Wikidata, have a look at these bugs and tasks.

Future

 * The engineering management team continues to update the Deployments page weekly, providing up-to-date information on the upcoming deployments to Wikimedia sites, as well as the engineering roadmap, listing ongoing and future Wikimedia engineering efforts.