Wikimedia Engineering/Report/2012/December

Engineering metrics in December:
 * 113 unique committers contributed patchsets of code to MediaWiki.
 * The total number of unresolved commits went from about 535 to about 648.
 * About 39 shell requests were processed.
 * As of December 2012, users can self-register on Wikimedia Labs (and get access to git/Gerrit). It is no longer necessary to request an account for developer access.
 * Wikimedia Labs now hosts 148 projects, 847 users; to date 1378 instances have been created.
 * Detailed community metrics are also available.

Major news in December include:
 * https://blog.wikimedia.org/2012/12/07/inventing-as-we-go-building-a-visual-editor-for-mediawiki/ https://blog.wikimedia.org/2012/12/12/try-out-the-alpha-version-of-the-visualeditor/
 * https://blog.wikimedia.org/2012/12/20/article-feedback-new-research-and-next-steps/
 * https://blog.wikimedia.org/2012/12/10/introducing-mediawiki-community-metrics/
 * https://blog.wikimedia.org/2012/12/11/welcome-to-floss-outreach-program-for-women-interns/
 * [https://blog.wikimedia.org/2012/12/12/translation-interface-makeover-in-progress/
 * [https://blog.wikimedia.org/2012/12/12/translation-interface-makeover-in-progress/

''Note: Like last month, we're proposing a shorter and simpler version of this report that does not assume specialized technical knowledge.

Work with us
Are you looking to work for Wikimedia? We have a lot of hiring coming up, and we really love talking to active community members about these roles.



Announcements

 * Matthew Flaschen joined the Wikimedia Features engineering team as Features Engineer (announcement).
 * Mike Wang joined the Operations team as part time Labs Ops Engineer (consultant) (announcement).

Technical Operations
Production Site Switchover


 * The Technical Operations team continued to work on completing the outstanding migration tasks, and to ready our Ashburn infrastructure for the big switchover day, i.e., the complete transition from the Tampa datacenter to the one in Ashburn, on the week of January 22, 2013.
 * In the past few months, we've transitioned services from the Tampa datacenter to the one in Ashburn, which now serves most of our traffic (about 90%). However, application (MediaWiki), memcached and database systems are all still running exclusively out of Tampa. We have been working to upgrade the technologies and set up those systems at Ashburn, and we plan to perform the switchover of those services from Tampa to Ashburn in the coming weeks. This will provide us some assurance of a hot standby datacenter, should we encounter an irrecoverable and lengthy outage in one of the main datacenters.

Site Infrastructure
 * Because December is when the annual Wikimedia fundraiser happens, the Operations team usually makes fewer site infrastructure changes to mitigate the risks of causing outages. Some of the lesser-risk work performed include deploying the new Parsoid cluster to support the Visual Editor project, rolling out doc.wikimedia.org (our auto-generated puppet documentation), using a new and unified SSL certificate for *wikipedia.org and *.m.wikipedia.org sites, and setting up the new Ashburn monitoring server and service - icingna.wikimedia.org.
 * Asher migrated one of the main production English Wikipedia slaves, db59, to MariaDB 5.5.28. He has previously been testing 5.5.27 on the primary research slave, and on the current build on a slave in Ashburn datacenter. Taking the times of 100% of all queries over regular sample windows, the average query time across all enwiki slave queries is about 8% faster with MariaDB vs. our production build of 5.1-fb.  Some queries types are 10-15% faster, some are 3% slower, and nothing looks aberrant beyond those bounds.  Overall throughput as measured by qps has generally been improved by 2-10%. He wouldn't draw any conclusions from this data yet, more is needed to filter out noise, but it's positive.The main goal of migrating to MariaDB is not performance driven.  More so, I think it's in WMF's and the open source communities interest to coalesce around the MariaDB Foundation as the best route to ensuring a truly open and well supported future for mysql derived database technology.
 * Both Mark and Faidon have made tremendous progress in testing and deploying Ceph in our Ashbrun site. We are hopeful it would be robust and scalable.
 * Ryan has been working on writing a new deployment system using git and Saltstack. Parsoid is currently being deployed with this system and MediaWiki is slated to use this system for its next major deployment.

Fundraising
 * There were no major changes on the fundraising infrastructure because of the fundraiser itself. We ordered/received bastion hosts for which we're in the process of deploying. Monitoring got an overhaul and we're now sending send alerts to fundraising-tech and/or techops depending on metric.

Data Dumps
 * A tool for dump users to set up interwiki links on their local mirrors is now in alpha, details of that and docs on the innards of the interwiki cdb file are here.
 * Work with WanSecurity on mirroring is moving forward again; they now hold a current copy of all 'other' files, including page views and Picture of the Year bundles, among other things. More to come soon.

Wikimedia Labs
 * Labs came out of beta this month, following the opening of self-registration. Another major change this month was the migration from the shared nfs instance to per-project glusterfs volumes. A number of smaller changes were made, including:
 * Addition of puppet documentation links from classes and variables on the instance configuration pages
 * Modification of the project filter to act as a table of contents
 * Split ldap project groups into projects and posix groups - fixed a bug with group search
 * Saltstack was installed on all instances to act as a guest agent

Language engineering
Highlights of the Language Engineering team’s projects this month include:

1. Translate extension improvements - Development of the new user interface for translate as well as the translation editor functionality continued at full pace throughout the month of December with iterative feature development and user experience improvements. Santhosh Thottingal and Niklas Laxstrom are leading development while Pau Giner is focused on optimizing user experience elements.

2. MediaWiki Language Extension Bundle: The latest version of MLEB was released by the team.

3. Universal Language Selector: Increased support for language variants, alternate language codes were added to ULS.

4. L10n/i18n language tools collaboration: Alolita Sharma continued to work with Red Hat’s L10n and i18n teams to evaluate localization data, translation tools as well as i18n tools and technologies.

1. Milkshake: Added more language input methods contributed by language communities to jquery.ime library.

1. Community outreach: Pau Giner and Amir Aharoni participated in the Open Tech Chat this month to talk about best practices in multilingual user testing and internationalization. Amir Aharoni also participated in mentoring OPW’s candidate Priyanka Nag for the new LevelUp program.

2. Blog posts by the team this month:

a. Translation editor growing snazzier - http://blog.wikimedia.org/2012/12/31/translation-editor-growing-snazzier/

b. Translation interface makeover in progress - http://blog.wikimedia.org/2012/12/12/translation-interface-makeover-in-progress/

3. Srikanth Lakshmanan and Arun Ganesh’s tenure ended with the Language Engineering team in December.

Kiwix
The Kiwix project is funded and executed by Wikimedia CH.

New Kiwix 0.9rc2 released. This version embedds our ZIM HTTP server kiwix-serve for Windows, OSX and Linux. Better, this software is now integrated in the Kiwix UI; allowing everyone, in two mouse clicks, to share Wikipedia on a LAN. We also have revamped our audience measurement tool, a solution which could be interesting for other projects using Mirrorbrain. We continue at the same time to increase our ZIM production throughput with 8 new Wikipedia ZIM files in December. December was also a month of new records for Kiwix: for the first time with have had more than 70.000 downloads a month and a Lead position for Education sotwares at Sourceforge.

Wikidata
The Wikidata project is funded and executed by Wikimedia Deutschland.

New code and bugfixes have been deployed (detailed changes here and here and test2.wikipedia.org now gets language links from Wikidata. Changes on Wikidata that concern articles on test2 are shown in the recent changes of test2 as well. If there are no problems deployment on the Hungarian Wikipedia will be in January 14th. Other Wikipedias are going to follow later.

For the second phase of Wikidata representation of values is the central focus. We published a draft for this and discussions have started. We'd appreciate your feedback.

Additionally Denny and Lydia held office hours on IRC again. (logs in English and German)

More detailed summaries about what is happening around Wikidata are available here.

Future

 * The engineering management team continues to update the Software deployments page weekly, providing up-to-date information on the upcoming deployments to Wikimedia sites, as well as the engineering roadmap, listing ongoing and future Wikimedia engineering efforts.