Wikimedia Engineering/Report/2012/December

Engineering metrics in December:
 * 113 unique committers contributed patchsets of code to MediaWiki.
 * The total number of unresolved commits went from about 535 to about 648.
 * About 39 shell requests were processed.
 * As of December 2012, users can self-register on Wikimedia Labs (and get access to git/Gerrit). It is no longer necessary to request an account for developer access.
 * Wikimedia Labs now hosts 148 projects, 847 users; to date 1378 instances have been created.
 * Detailed community metrics are also available.

Major news in December include:
 * The launch of an alpha, opt-in version of the VisualEditor to the English Wikipedia, a project more complex than it appears;
 * A research study on the use of the Article Feedback feature;
 * New metrics for the MediaWiki community;
 * The start of the Outreach Program for Women;
 * Continued work to improve the workflow and interface for translators.

''Note: We're also proposing a shorter, simpler and translatable version of this report that does not assume specialized technical knowledge.

Work with us
Are you looking to work for Wikimedia? We have a lot of hiring coming up, and we really love talking to active community members about these roles.



Announcements

 * Matthew Flaschen joined the Wikimedia Features engineering team as Features Engineer (announcement).
 * Mike Wang joined the Operations team as part time Labs Ops Engineer (consultant) (announcement).

Technical Operations
Production Site Switchover


 * The Technical Operations team continued to work on completing the outstanding migration tasks, and to ready our Ashburn infrastructure for the big switchover day, i.e., the complete transition from the Tampa datacenter to the one in Ashburn, on the week of January 22, 2013.
 * In the past few months, we've transitioned services from the Tampa datacenter to the one in Ashburn, which now serves most of our traffic (about 90%). However, application (MediaWiki), memcached and database systems are all still running exclusively out of Tampa. We have been working to upgrade the technologies and set up those systems at Ashburn, and we plan to perform the switchover of those services from Tampa to Ashburn in the coming weeks. This will provide us some assurance of a hot standby datacenter, should we encounter an irrecoverable and lengthy outage in one of the main datacenters.

Site infrastructure
 * Because December is when the annual Wikimedia fundraiser happens, the Operations team usually makes fewer site infrastructure changes to mitigate the risks of causing outages. Some of the lesser-risk work performed include deploying the new Parsoid cluster to support the Visual Editor project, rolling out doc.wikimedia.org (our auto-generated puppet documentation), using a new and unified SSL certificate for *wikipedia.org and *.m.wikipedia.org sites, and setting up a monitoring server and service in Ashburn.
 * Asher Feldman migrated one of the main production slave database server (db59) for the English Wikipedia (enwiki) to MariaDB 5.5.28. He has been testing 5.5.27 on the primary research slave, and on the current build on a slave in Ashburn. Taking the times of 100% of all queries over regular sample windows, the average query time across all enwiki slave queries is about 8% faster with MariaDB compared to our production build of MySQL 5.1-fb. Some queries types are 10–15% faster, some are 3% slower, and nothing looks aberrant beyond those bounds. Overall throughput as measured by qps has generally been improved by 2–10%. Asher wouldn't draw any conclusions from this data yet: more testing is needed to filter out noise, but initial results are positive. The main reason for migrating to MariaDB is not performance, but rather by the belief that it's in the Wikimedia Foundation's and the open-source communities' interest to coalesce around the MariaDB Foundation as the best route to ensuring a truly open and well-supported future for MySQL-derived database technology.
 * Mark Bergsma and Faidon Liambotis have made tremendous progress in testing and deploying Ceph in Ashburn. We are hopeful it will be robust and scalable.
 * Ryan Lane has been writing a new deployment system using git and Saltstack. Parsoid is currently being deployed with this system, and MediaWiki is slated to use it for its next major deployment.

Fundraising
 * There were no major changes on the fundraising infrastructure because of the fundraiser itself. We ordered and received bastion hosts that we're in the process of deploying. Monitoring got an overhaul and we're now sending alerts to the fundraising technical staff or the technical operations team depending on what triggered the alert.

Data Dumps
 * A tool for dump users to set up interwiki links on their local mirrors is available in alpha, as well as documentation of the interwiki cdb file. Also, work with WanSecurity on mirroring is moving forward: they now hold a current copy of all 'other' files, including page views and Picture of the Year bundles, among other things.

Wikimedia Labs
 * Labs came out of beta this month, following the opening of self-registration. Another major change this month was the migration from the shared NFS instance to per-project glusterfs volumes. A number of smaller changes were made, including: the addition of puppet documentation links from classes and variables on the instance configuration pages; the modification of the project filter to act as a table of contents; a split of LDAP project groups into projects and POSIX groups; and the installation of Saltstack on all instances to act as a guest agent.

Language engineering

 * Other news
 * Pau Giner and Amir Aharoni participated in the Open Tech Chat this month to talk about best practices in multilingual user testing and internationalization. Amir Aharoni also participated in mentoring OPW candidate Priyanka Nag for the new LevelUp program. Srikanth Lakshmanan and Arun Ganesh’s tenure ended with the Language Engineering team in December.

Mobile

 * The Mobile development and design team worked to finalize contributory and other experimental editor-focused features on the Beta site (uploads, editing, and watchlist functionality) in order to clear the way for a full push on mobile uploads by March 2013. We also worked to improve the reader and potential editor experience by introducing features geared toward educating/engaging our users, such as a human-readable last modified timestamp for articles and watchlist, and thumbnail images to illustrate the watchlist view. Lastly, because of the huge interest we generated in our Beta testing site, we created an Alpha site to house very early work on contributory features, in order not to disrupt the reading experience of our 100,000+ Beta users.

Kiwix
The Kiwix project is funded and executed by Wikimedia CH.


 * A new Kiwix 0.9rc2 was released. This version embeds our ZIM HTTP server kiwix-serve for Windows, OSX and Linux. It is now integrated in the Kiwix UI, allowing everyone to share Wikipedia on a LAN in two clicks . We have revamped our audience measurement tool, a solution that could be interesting for other projects using Mirrorbrain. We continue at the same time to increase our ZIM production throughput with 8 new Wikipedia ZIM files in December. December was also a month of new records for Kiwix: for the first time, we have had more than 70.000 downloads a month and a Lead position for Education software at Sourceforge.

Wikidata
The Wikidata project is funded and executed by Wikimedia Deutschland.


 * New code and bugfixes have been deployed (with MediaWiki 1.21wmf5 and 1.21wmf6) and test2 now gets language links from Wikidata. Changes on Wikidata that concern articles on test2 are shown in the recent changes of test2 as well. If there are no problems, deployment on the Hungarian Wikipedia will happen on January 14, 2013. Other Wikipedia sites will follow.


 * For the second phase of Wikidata, representation of values is the central focus. We published a draft and discussions have started; we'd appreciate your feedback. Additionally, Denny Vrandečić and Lydia Pintscher held IRC office hours; logs are available in English and German.

Future

 * The engineering management team continues to update the Deployments page weekly, providing up-to-date information on the upcoming deployments to Wikimedia sites, as well as the engineering roadmap, listing ongoing and future Wikimedia engineering efforts.