Wikimedia Engineering/Report/2012/September

Engineering metrics in September:
 * 98 unique committers contributed patchsets of code to MediaWiki.
 * The total number of unresolved commits went from about 360 to 440.
 * About 35 shell requests were processed.
 * About 41 developers got access to Git and Wikimedia Labs.
 * Wikimedia Labs now hosts 131 projects, 633 users; to date 241 instances have been created.

Major news in September include:
 * The introduction of translation memory by the Language engineering team;
 * A breakage of our gerrit tool, and its subsequent recovery;
 * A new version of the Wiki Loves Monuments App, that allows users to use it in combination with a separate camera.
 * A new EPUB export feature enabled on the English Wikipedia;
 * A call to nonprofits, schools, libraries, etc. who may be interested in getting decommissioned Wikimedia servers;
 * The launch of the Page curation feature on the English Wikipedia;
 * The completion of Google Summer of Code 2012.

Past events
Wikimedia Tech Days (11–12 September 2012, San Francisco, USA)
 * Just before the Wikimedia Foundation's yearly all-staff meeting, the engineering department met to discuss designs and procedures, meet each other face to face, and hack.

Upcoming events
Bangalore DevCamp (9–11 November 2012, Bangalore, India)
 * The Wikimedia Foundation is planning a technical meetup at the Indian Institute of Management campus. This DevCamp will focus on development of JavaScript-based internationalization and localization tools, as well as mobile applications using PhoneGap and LAMP technologies. The Mobile, Language and Engineering Community teams worked on planning the event and planning outreach to software engineers, UX/UI designers, and translators.

Work with us
Are you looking to work for Wikimedia? We have a lot of hiring coming up, and we really love talking to active community members about these roles.



Announcements

 * Mark Holmquist joined the Features team as Software Engineer working primarily on Parsoid (announcement).
 * Dan Andreescu joined the Platform team as JavaScript/UI Engineer working on Analytics (announcement).

Technical Operations
Site infrastructure
 * The focus for September was getting EQIAD to take over as the primary data center, if possible in October. The outstanding items are setting up:
 * Varnish with persistent cache (to replace current Squid implementation). Mark Bergsma has successfully deployed it on 8 servers at EQIAD, and routed traffic through them for the last three weeks. He will add another 8 servers and fully deploy it in the coming week or two.
 * Redis as a replacement of current Memcached implementation. Asher Feldman has built and puppetized it, and the Tampa servers have been setup. Asher will be deploying it in the coming week or two as well, and he will be testing it in parallel with the current Memcached, to mitigate any risk associated with the Redis implementation. Once the team is comfortable and satisfied with it, Asher will be replicating the Redis datastore across to EQIAD. This is critical to the EQIAD migration because we will then have 'warm' caches at both data centers.
 * Apache servers to run MediaWiki and image scalers. Peter Youngmeister has expanded his deployment at Tampa, which surfaced a blocker bug. He identified the issue with Tim Starling's help, and Faidon Liambotis is working on the fix. Meantime, Peter has deployed several application servers at EQIAD to be used by Asher and Aaron Schulz for testing purposes.
 * Swift to be replicated across the data centers. When Faidon was implementing Swift replication, he encountered and fixed several bugs. However, the replication rate is still very slow and at that pace, the replication would take 6 to 12 months to complete. This is now an issue and the team is currently brainstorming a suitable solution. The OpenStack Swift team has acknowledged inherent weaknesses with the current implementation and they have plans to rewrite the replication feature, but that is months away.
 * Asher has reconfigured db1047 for data analysis users. This database contains both the enwiki replica and custom user databases. The new db1047 is running mariadb 5.5, and has now an additional database called "staging" that users can write to, with 5TB of free space. This is our first use of mariadb.
 * Jeff Green has been building the new Fundraising infrastructure at EQIAD, and it has successfully processed live fundraising traffic. We're in the process of switching over to the new infrastructure: new logging, monitoring, and backups services are deployed. We have more testing to do before we switch to the new payments hosts, and the PMTPA hosts will remain in service. Banner log collection and archiving have been moved from storage3 to NFS storage arrays.

Wikimedia Labs
 * Several key enhancements were implemented:
 * Code was moved from OpenStackManager to OpenStack Nova for updating Instances' on-wiki status pages, making their updates much more reliable.
 * Salt was installed on all Labs instances, with virt0 as the master. This allows us to easily and quickly do remote execution tasks on all instances in all projects. There are plans to extend salt's capabilities to make it multi-tenant, so that we can allow remote execution rights for instances within projects.
 * Writing/testing new deployment system in the demo project. demo-deployment1 is the deployment system, demo-web1/demo-web2 are the app servers. demo-deployment1 can call the deployment runner on virt0 via peer permissions.
 * Work was completed to allow open registration for Labs. Specifically, shell access was split apart as a right. Shell access must be requested separately from creating an account.
 * Two-factor authentication was modified so that certain groups are required to use it when logging in, if they wish to use Nova features. Any user that can modify user rights is currently forced to use two-factor, as they can add themselves to any project and role.
 * A new compute node was added to the PMTPA cluster. The rest of the Cisco nodes will soon be added as well.
 * Work began on replacing the home directory NFS share with gluster shares.

Data Dumps
 * Although most of this month went to beating on Swift hardware, we found some time to find and squash the pesky bug in the bz2 multistream index generation. There's now a toy offline reader using the bz2 multistream XML file, a sorted index file and a python script to grab and display the text of the English Wikipedia article of your choice on demand, without reading through the entire file.

Other news
 * Wikimedia sites experienced 3 episodes of intermittent performance lags and brief unavailability on September 16th & 17th, 2012.

Mobile
We are preparing for another work sprint on the mobile interface! Some beta features will be graduated to the standard mobile view, such as the new navigation menu.

Preliminary support for sharper images on high-density displays (such as the iPhone 4/4S/5 and many Android phones) is being worked on; this will apply also to the desktop view on suitable tablets (iPad 3, Nexus 7, Kindle HD) and laptops (Retina MacBook Pro, Windows laptops with desktop zoom at 150% or 200%).

Wikidata

 * The Wikidata project is funded and executed by Wikimedia Deutschland.

The Wikidata team is working on the last parts of a first deployment and the code is currently being reviewed by WMF engineers. Anja Jentzsch has joined the team and focuses on quality and the deployment of Wikidata. On the coding side, a lot has been done, including work on edit conflicts and permissions, and reworking the special page to create new items. Work on phase 2 of Wikidata (infoboxes) has also started. This includes for example the ValueHandler extension, which will be used for our data values. The team has also met with a group of database experts from different projects to get their input for phase 2 and 3.

In addition, we started a page for bot discussions and coordination, published test coverage data, updated the demo system and attended a lot of events (including WikiCon and Software Freedom Day), and held another round of office hours. If you are interested in contributing to Wikidata there is now a new contribute page for you.

Fore more information, you can check out our more comprehensive weekly updates; we now also have a Facebook and Google+ page.

Future
The engineering management team continues to update the Software deployments page weekly, providing up-to-date information on the upcoming deployments to Wikimedia sites, as well as the engineering roadmap, listing ongoing and future Wikimedia engineering efforts.