The focus for September was getting EQIAD to take over as the primary data center, if possible in October. The outstanding items are setting up:
Varnish with persistent cache (to replace current Squid implementation). Mark Bergsma has successfully deployed it on 8 servers at EQIAD, and routed traffic through them for the last three weeks. He will add another 8 servers and fully deploy it in the coming week or two.
Redis as a replacement of current Memcached implementation. Asher Feldman has built and puppetized it, and the Tampa servers have been setup. Asher will be deploying it in the coming week or two as well, and he will be testing it in parallel with the current Memcached, to mitigate any risk associated with the Redis implementation. Once the team is comfortable and satisfied with it, Asher will be replicating the Redis datastore across to EQIAD. This is critical to the EQIAD migration because we will then have 'warm' caches at both data centers.
Apache servers to run MediaWiki and image scalers. Peter Youngmeister has expanded his deployment at Tampa, which surfaced a blocker bug. He identified the issue with Tim Starling's help, and Faidon Liambotis is working on the fix. Meantime, Peter has deployed several application servers at EQIAD to be used by Asher and Aaron Schulz for testing purposes.
Swift to be replicated across the data centers. When Faidon was implementing Swift replication, he encountered and fixed several bugs. However, the replication rate is still very slow and at that pace, the replication would take 6 to 12 months to complete. This is now an issue and the team is currently brainstorming a suitable solution. The OpenStack Swift team has acknowledged inherent weaknesses with the current implementation and they have plans to rewrite the replication feature, but that is months away.
Asher has reconfigured db1047 for data analysis users. This database contains both the enwiki replica and custom user databases. The new db1047 is running mariadb 5.5, and has now an additional database called "staging" that users can write to, with 5TB of free space. This is our first use of mariadb.
Jeff Green has been building the new Fundraising infrastructure at EQIAD, and it has successfully processed live fundraising traffic. We're in the process of switching over to the new infrastructure: new logging, monitoring, and backups services are deployed. We have more testing to do before we switch to the new payments hosts, and the PMTPA hosts will remain in service. Banner log collection and archiving have been moved from storage3 to NFS storage arrays.
Code was moved from OpenStackManager to OpenStack Nova for updating Instances' on-wiki status pages, making their updates much more reliable.
Salt was installed on all Labs instances, with virt0 as the master. This allows us to easily and quickly do remote execution tasks on all instances in all projects. There are plans to extend salt's capabilities to make it multi-tenant, so that we can allow remote execution rights for instances within projects.
Writing/testing new deployment system in the demo project. demo-deployment1 is the deployment system, demo-web1/demo-web2 are the app servers. demo-deployment1 can call the deployment runner on virt0 via peer permissions.
Work was completed to allow open registration for Labs. Specifically, shell access was split apart as a right. Shell access must be requested separately from creating an account.
Two-factor authentication was modified so that certain groups are required to use it when logging in, if they wish to use Nova features. Any user that can modify user rights is currently forced to use two-factor, as they can add themselves to any project and role.
A new compute node was added to the PMTPA cluster. The rest of the Cisco nodes will soon be added as well.
Work began on replacing the home directory NFS share with gluster shares.
Although most of this month went to beating on Swift hardware, we found some time to find and squash the pesky bug in the bz2 multistream index generation. There's now a toy offline reader  using the bz2 multistream XML file, a sorted index file and a python script to grab and display the text of the English Wikipedia article of your choice on demand, without reading through the entire file.
Wikimedia sites experienced 3 episodes of intermittent performance lags and brief unavailability on September 16th & 17th, 2012.
In September, the team continued its focus on re-engineering the code design of VisualEditor so that it is more modular and easier to extend. This involves creating and documenting a number of formal APIs at each point in the architecture, that means a developer does not have to understand the entire code base to be able to add new features. The early version of the VisualEditor on mediawiki.org was updated twice (wmf11 and wmf12), fixing a number of bugs and building out better support for internationalisation and key concepts like categories, language links and other "magic words".
This month, we deployed more new features for Article Feedback, which is still being tested on 10% of the English Wikipedia. Improvements include iPad support, updating special abuse filters to automatically disallow swear words and common vandalism, as well as developing new automated filtering features to reduce the workload for editors, administrators and oversighters. These and other features can be tested on this sample article feedback page or on the central feedback page (please report any bugs on Bugzilla). We are also in the process of re-factoring our code and 'sharding' our database to make this extension more scalable, prototyping a mobile web interface, as well as collecting more data to track how many readers who post feedback end up becoming editors as a result. As we complete these final improvements and confirm that this tool is converting readers into editors as intended, we are preparing for a full release to 100% of English Wikipedia in coming weeks — with other wiki projects starting later this year. For more information about this tool, check our project overview. This new tool was created in collaboration with the Wikipedia community and developed by Fabrice Florin, Matthias Mullie, Pau Giner, Ryan Kaldari, Oliver Keyes, Chris McMahon, Benny Situ, Dario Taraborelli, Howie Fung and Terry Chay.
This month, we deployed our first release version of Page Curation on the English Wikipedia. Page Curation aims to help Wikipedia patrollers review new pages faster and easier, as well as provide better feedback to page creators. To learn more, visit our introduction page, watch this video tour or read this tutorial. This product includes two integrated tools: 1) the New Pages Feed, a dynamic list of new pages for review by community patrollers; and 2) the Curation Toolbar, an optional panel on article pages, which enables editors to quickly review these pages. The Curation Toolbar provides a variety of tools that let users get page info, mark a page as reviewed, tag it, mark it for deletion, send WikiLove to page creators — or jump to the next page on the list. If you are an experienced editor, try out the final version on the English Wikipedia — and please report any bugs on Bugzilla. We are now winding down feature development for this product and preparing to make it available to other wiki projects in coming months, with a plan to upgrade this product again in 2013. Formerly called 'Page Triage', this new tool was designed in close collaboration with the Wikipedia community and developed by Ryan Kaldari, Benny Situ, Fabrice Florin, Oliver Keyes, Brandon Harris, Vibha Bamba, Howie Fung and Terry Chay.
Our first project to simplify the "edit" window has now been deployed on the English-language Wikipedia, with design by Vibha Bamba, and Rob Moen & Benny Situ handling development. Our current priority is to fully productise these changes and fix some associated bugs before moving forward.
This month the E3 team announced the results of the first iteration of the post-edit feedback experiment, and worked on productization of the most successful confirmation message in a new extension, as well as through collaboration with the VisualEditor team. In addition, the team deployed the second iteration of experimental post-edit feedback, which lets new editors know when they reach important editing milestones early in their participation on Wikipedia. E3 also continued readying work on account creation user experience and the new event logging and usertagging analytics infrastructure to support feature experimentation, all of which are in alpha deployments to English Wikipedia.
This month, we started to ramp up planning, design and development for Notifications on MediaWiki (code-named 'Echo') with the aim to have it in limited deployment in early 2013. This month Andrew Garrett worked on getting the Echo extension re-deployed on MediaWiki.org (currently blocked on a timestamp change in Gerrrit) with special thanks to the contributions/participation of Alex Monk (krenair) as well as meeting with Wikia for further collaboration. Vibha and Fabrice are working on auditing the messaging currently done in the system to improve flows and design. Aaron has been working on getting the JobQueue stuff abstracted to support queuing systems that would be able to handle Echo's design. This new infrastructure tool will be developed by Wikimedia's editor engagement team, including Fabrice Florin, Vibha Bamba, Brandon Harris, Ryan Kaldari, Matthias Mullie, Benny Situ, Andrew Garrett, Oliver Keyes, with Terry Chay and Howie Fung.
Development work for Messaging on MediaWiki (code-named 'Flow') will start officially in January 2013 (after Echo first deployment). This new user-to-user messaging infrastructure tool will be developed by Wikimedia's editor engagement team, including Fabrice Florin, Vibha Bamba, Ryan Kaldari, Benny Situ, Matthias Mullie, Brandon Harris, Oliver Keyes, Howie Fung and Terry Chay. In the meantime, Performance engineering + Matthias Mullie are doing the underlying prep work with the RDB store (database sharding) on AFTv5 (and proper abstraction).
Throughout September the team worked toward the October 1st fundraising code slush refactoring a few existing payment processors and integrating with Adyen. The Adyen integration will give the fundraiser credit processor redundancy as well as reduce the percentage of each donation lost to processing fees. In addition, the team worked to integrate the Translation extension during a sprint with the Internationalization team and made many other, smaller bug fixes and enhancements for the upcoming 2012 fundraiser.
The team formerly known as the "Localisation team" has been rebranded to "Language Engineering team". The goal of this name change is to communicate its goals more easily, and we were of the opinion that terminology like internationalization and localisation does not illustrate this clearly enough to those not in the know.
The Universal Language Selector is now mostly complete, and talks are underway with Wikimedia operations to plan the first deployments of it on (very) small Wikimedia wikis. The reason for a very careful deployment is that there are very valid concerns for so-called "cache fragmentation", having to store multiple versions of a content page, each with a different user language, to be served to anonymous users with different browser language settings, compared to the current caching strategy of serving all anonymous users with the same single cached version.
The Language Engineering team has made presentations about the project Milkshake components at San Francisco State University, Twitter, Google and Change.org.
We are preparing for another work sprint on the mobile interface! Some beta features will be graduated to the standard mobile view, such as the new navigation menu.
Preliminary support for sharper images on high-density displays (such as the iPhone 4/4S/5 and many Android phones) is being worked on; this will apply also to the desktop view on suitable tablets (iPad 3, Nexus 7, Kindle HD) and laptops (Retina MacBook Pro, Windows laptops with desktop zoom at 150% or 200%).
The mobile features team of Brion, Jon, Max, and Arthur released several updates to the WLM app after launching the WLM app into the Google Play store. We've seen over 3,000 mobile uploads contributing to the first dedicated mobile contribution pipeline that the projects have seen. Working with the product team the team will next assess the data of the competition to better understand how to proceed with a dedicated commons upload tool. The team has received a significant amount of positive feedback about the app and which has been a large hit with new commons users from early data analysis.
Open Path delivered the final build of the Wikipedia J2ME app this month. Patrick Reilly and the Global Development team did internal testing to validate that it was performing as expected. We're now feature-complete and spending cycles on making the app perform better on low memory devices. The project will now be pending distribution with our partners.
Patrick worked with Jeremy from the Prakelt foundation to finalize the puppet configurations for Vumi. Next the team will be creating the analytics pipelines to measure the effectiveness of the app. Working with various partners we'll launch focused country tests later this year.
MediaWiki 1.20wmf11 and MediaWiki 1.20wmf12 were deployed to the Wikimedia sites in September. Mark Hershberger published two testing tarballs in preparation for a tentative October release of MediaWiki 1.20.
September saw some major problems with Gerrit that took about two full weeks to repair. Replication testing and preparation has completed, and we will begin replication to Github in early October. Gerrit 2.5 is nearing release and has been tested, which we will be upgrading to in early October. Finally, discussion has begun about what to do with the code still in Subversion.
The test deployment and testing are tracked in bug 27699; Michael Dale is following up and we are hoping to fix the last issues to deploy TMH to Wikimedia sites soon. The Math, EZTimeline, and ConfirmEdit extensions were updated to be able to use Swift-based storage, and all but ConfirmEdit are now using Swift in production. We weren't able to turn off the old NFS server (ms7) as we originally planned, due to hardware issues on our Swift nodes and unanticipated issues with Swift-based replication to our Ashburn data center. We are very nearly out of space on ms7, so our new plan is to back our media storage onto a newer NFS system.
Tim Starling created a basic profiler to work with Lua code, added some time and date functions to the default environment, and fixed bugs. Experimentation continues with Lua on mediawiki.org and Lua on test2wiki.
show an uptick in patchsets awaiting review at the end of September, and a Signpost analysis showed (among other statistics) that WMF staff provide 86% of first reviews for core patchsets, and just five staffers collectively account for about 55% of that total.
Barring unforeseen circumstances and after examining many hundreds of applicants, hiring of final candidates for the open positions closely tied to QA (QA Engineer, Mobile QA Engineer, Volunteer Community Coordinator, Bug Wrangler) should all be complete within days. These new hires will greatly accelerate QA work in the near future. On the testing front, several key extensions are now being deployed automatically to the beta labs test environment, with AFTv5 to be the first key extension fully hosted on on beta in order to retire the prototype test host, which has become obsolete. This month also saw a renewed focus on browser test automation with the creation of an automated test for the UploadWizard extension being used by both QA and at least one WMF developer. This work will be extended and refined by the new QA Engineer in the very near future.
In September, QA Lead Chris McMahon announced that the Beta cluster is a fit test environment: code is routinely deployed there ahead of production, the test environment emulates the production environment closely, and we can easily and reliably manipulate aspects of the test environment (configuration, permissions, etc.) for testing purposes. Also, bits.beta.wmflabs.org is now fully managed by puppet. It serves MediaWiki and its extensions assets, as well as geographical lookup of IP addresses. Some work remains to be done (performance tuning, configuration) but the infrastructure is in place for software testing and browser test automation.
Antoine is integrating a new Gerrit/Jenkins gateway to let us finely tune how we trigger jobs in Jenkins. The system comes from OpenStack and is written in Python. Also, we've set up the new gateway on Labs. The production jobs are being migrated to use the new system. The Gerrit tool will need some upstream patches, and the way to get them onto our production server is being discussed with Chad Horohoe. Timo Tijhof has rewritten the testswarm-browserstack bridge in preparation for a more scalable deployment with automated browser worker creation and termination following the TestSwarm queue. This is currently being tested at integration.wmflabs.org.
Sumana Harihareswara is coordinating WMF engineers' efforts to spend 20% of their work time on code review and other efforts benefiting the entire Wikimedia engineering community. Some teams were exempt in September from the 20% policy, because of pressing deadlines. Also, the discussion of 20% time during the yearly all-engineering meeting included debate of various alternate proposals. Sumana Harihareswara will be writing and circulating a suggested way forward in October.
The Wikidata team is working on the last parts of a first deployment and the code is currently being reviewed by WMF engineers. Anja Jentzsch has joined the team and focuses on quality and the deployment of Wikidata. On the coding side, a lot has been done, including work on edit conflicts and permissions, and reworking the special page to create new items. Work on phase 2 of Wikidata (infoboxes) has also started. This includes for example the ValueHandler extension, which will be used for our data values. The team has also met with a group of database experts from different projects to get their input for phase 2 and 3.
The engineering management team continues to update the Software deployments page weekly, providing up-to-date information on the upcoming deployments to Wikimedia sites, as well as the engineering roadmap, listing ongoing and future Wikimedia engineering efforts.