Wikimedia Engineering/Report/2013/September

Engineering metrics in September: Major news in September include:
 * 135 unique committers contributed patchsets of code to MediaWiki.
 * The total number of unresolved commits went from around 1080 to about 1020.
 * About 29 shell requests were processed.
 * Wikimedia Labs now hosts 173 projects and 1,848 users; to date 2305 instances have been created.
 * The tools project in Labs now hosts 325 tools and 266 members.
 * A recap on how our engineers worked with volunteers to improve language tools at Wikimania;
 * A call for wikis willing to experiment with using HTTPS for all users;
 * A recap on how our new image scaling system was implemented by a volunteer developer;
 * A call for technical projects that could for instance be completed as part of our mentorship programs;
 * Design experiments to show the community behind Wikipedia articles on mobile devices;
 * Another release of the MediaWiki Language Extension Bundle, with an explanation of how it's put together;
 * The completion of the sixth round of the Outreach Program for Women;
 * A recap of the launch of Notifications to more language versions of Wikipedia, and their impact.

''Note: We're also providing a shorter, simpler and translatable version of this report that does not assume specialized technical knowledge.

Upcoming events
There are many opportunities for you to get involved and contribute to MediaWiki and technical activities to improve Wikimedia sites, both for coders and contributors with other talents.

For a more complete and up-to-date list, check out the Project:Calendar.

Work with us
Are you looking to work for Wikimedia? We have a lot of hiring coming up, and we really love talking to active community members about these roles.



Announcements

 * Kartik Mistry joined the Language Engineering team as Software Engineer (announcement).
 * Sucheta Ghoshal joined the Language Engineering team as associate software engineer (announcement).
 * Kaity Hammerstein joined the User experience team as Associate UX Designer (announcement).
 * Aaron Halfaker joined the Analytics team as Research Analyst (announcement).
 * Oliver Keyes transitioned to the role of Product Analyst (announcement).
 * Dan Garry joined the Product development team as Associate Product Manager for Platform. (announcement).
 * Nick Wilson joined the Product development team as Community Liaison (announcement).

Technical Operations
Site infrastructure
 * Work to refactor and modularize our Puppet repository continues: this month, lots of dead code was removed, and some tiny miscellaneous classes aggregated with more relevant components. Work on git-deploy has also restarted this month. Changes were made to make git-deploy easier to configure and to make initial setup of new repositories and setup of new minion targets completely automated.
 * Many of the services within the Tampa data center have already migrated to EQIAD, however there remain several smaller, unique, or in some cases orphaned services that we still need to document or scope prior to the closure of this center. Several of these services may no longer be required, and we expect there to be some discussion about how they are migrated or maintained going forward. Additionally, several of these systems need to be moved to the new secondary data center (see below), and will be waiting until infrastructure is in place to do so. Our goal is to try to have these systems moved before the end of 2013, but we'll continue to have equipment in this location for as long as necessary to ensure stability of our network.
 * For EQIAD, the ordering process is underway to complete our fourth row of machines, and ensure we have capacity to take in systems that will be arriving from the sunsetting of the Tampa data center, as well as handle our expected growth.
 * For ULSFO, after a long initial setup, initial bootstrapping and configuration of the systems is finally underway. Over the next several weeks, we will be configuring, testing, and redirecting traffic at this location.
 * Lastly, work has begun on a definition and RFP process for a new, secondary data center, likely on the west coast of the US. We will send further updates on this project once our RFPs are complete and we begin the selection process. Our hope is to have this facility ready to take systems from Tampa by the end of 2013.

Data Dumps
 * The GSoC incremental dumps project has drawn to a close, but User:Svick will still be around. There's work to be done before this can go into production, as well as extensive testing and code review from folks with C++ expertise. If you want to help, check the repository.

Wikimedia Labs
 * The DNS infrastructure of the Labs has been overhauled and much improved. The hardware switch to replace Labs' NFS server unreliable hardware is ready, and should be enabled this week. Yuvaraj Pandian has created and deployed a new instance proxy with an OpenStack-style API. The new proxy is in use for a small number of instances right now, but will be expanded to most instances in the future. The new proxy uses nginx with Lua code to read its configuration of virtual hosts from redis and can handle arbitrary URLs to arbitrary back-ends.

Analytics

 * The team has been focused on smaller but more important work items this month, including enhancement to Wikimetrics, Grantmaking and Program Developments graphing infrastructure and fixing some long-standing Limn bugs. On the infrastructure side, our collaboration with Ops has the Kafka middleware project moving along nicely. The all-staff meeting and travel schedules definitely impacted our throughput this month.
 * Two notable accomplishments should be called out: our Hadoop environment is now 100% free software, as we swapped out a proprietary JDK for OpenJDK 7. We also spent a lot of time on our engagement processes and planning for our first combined quarterly review in October, and made significant process on our hiring goals.

Research and data

Kiwix
The Kiwix project is funded and executed by Wikimedia CH.


 * Mediawiki offliner is now pretty stable, and its first release will happen in October. The ZIM incremental update GSoC project was successfully completed; we still need to do a little bit work to finish the integration in the openZIM and Kiwix code bases. libzim, the openZIM reference implementation, has been packaged for Debian.

Wikidata
The Wikidata project is funded and executed by Wikimedia Deutschland.


 * In September, the Wikidata team mainly concentrated on the sister projects Wikimedia Commons and Wiktionary. For Wikimedia Commons, we added the ability to store interwiki links in one central location (Wikidata) together with the ones for Wikipedia and Wikivoyage. For Wiktionary, we published an analysis of all existing proposals for the integration of Wikidata and Wiktionary.
 * On Wikidata itself, we rolled out the URL datatype. This for example allows you to provide a URL as a source of a statement. Denny Vrandečić published 2 blog posts about the ideas behind Wikidata: "Wikidata Quality and Quantity" and "A categorical imperative?". In addition, he shared a few thoughts on the future of Wikidata before leaving the project at the end of the month.
 * The development team is looking to hire another front-end developer experienced in JavaScript.

Future

 * The engineering management team continues to update the Deployments page weekly, providing up-to-date information on the upcoming deployments to Wikimedia sites, as well as the engineering roadmap, listing ongoing and future Wikimedia engineering efforts.