Mark Bergsma made a breakthrough in resolving an old and elusive instability issue in Varnish which occurs when they are under extreme load or experiencing hanging connections/packet load. The problem turned out to be the slow epoll thread. When under load and once the pipe buffer (64 KB) is full, the writing Varnish worker threads block, and the server situation deteriorates rapidly. Mark fixed this issue by moving the reading of the sessions earlier in the epoll event loop, before the thread does anything else, thereby reducing the size of the pipe buffer. With this enhancement, Mark is confident he could further reduce the number of Varnish servers in our caching infrastructure.
Asher Feldman is happy to report that the memcached instances on the app servers in Tampa are no longer in use. This will give us back an extra 2GB of RAM on many of the app servers (which only have 8 or 12GB to begin with) which can go towards increasing PHP capacity. It also improves the stability of the site by addressing some of the root causes of multiple site outages, and brings with it multiple client improvements including consistent hashing, igbinary serialization, and better timeout handling. The total cache pool has increased from 140GB to 1392GB, enough to currently meet full parser cache requirements from RAM. Sessions are no longer stored in memcached at all but have been migrated to redis, which will provide replication to the stand-by datacenter. In addition, performance is quite a bit better as well, as can be seen by comparing the max value in the 90th and 99th percentile times in the attached graph.
In recent months, we've seen a high hardware failure rate with our batch of Swift servers. After discussion with our vendor, they agreed to replace all those servers with newer hardware. All the required servers to replace the Tampa Swift servers have just arrived. We are in the process of migrating data from the old servers to the new ones, but it will take time to drain traffic, remove the old hardware from production and slowly ramp up the new machines. Ariel Glenn's current plan is to add 2 servers per week.
After several months of testing and tweaking, Peter Youngmeister finally rolled out the new Apache-on-Precise build on all our Tampa app and imagescaler servers. This will be the same (and tested) image that we'll be using on the Ashburn App servers in the coming month.
Thanks to the efforts of Leslie Carr and Mark Bergsma, we are now a RIPE NCC member, and with this membership, we may be eligible to receive a one-time allocation of a /22 of IPv4 address space from the last of /8 of IPv4 address space. This is particularly important to us since we have run out of IPv4 addresses in Europe.
The SSL cluster was upgraded to Ubuntu Precise which provided a newer version of nginx and openssl, closing out the CRIME vulnerability and giving us the possibility of using HTTP 1.1 to the back-end. Testing of HTTP 1.1 for proxying will occur in the future.
The fundraising season started. Jeff Green and Leslie Carr rolled out the new Ashburn Fundraising server cluster and it is currently handling all payments. Leslie applied and tested firewall rules for the new cluster. There were lots of bug fixes and small improvements to configuration management, monitoring, and logging to cluster administration by the Operations and the Fundraising tech teams. Jeff built out the second payments messaging box (ActiveMQ) as a hot standby. A new wiki was deployed for the Fundraising email unsubscribe page, to segregate it from sensitive services (payments, CiviCRM). Specifications for new payments bastion hosts were started.
Media bundles are back in business at your.org now that the network issues have been fixed. Work has started on upgrading the OS on the servers that produce the dumps, rebuilding the necessary packages and testing. The 'add/changes' experimental dumps have been running stably long enough that we've made them available on the gluster public data volume accessible to all Labs projects.
Andrew Bogott continues to work on some long-term OpenStack issues. There's a new project, Moniker, which should (eventually) allow us to properly integrate the Labs cloud with our DNS back-end and provide better stability and a bit more user control. He continues to work on other more basic OpenStack work which will eventually trickle into Labs.
Andrew has also been fiddling quite a bit with the usability of OpenStackManager, which is the GUI for labsconsole. The interface is now marginally easier to use and understand, and improvements are ongoing.
Chris Johnson has relocated from Tampa to work in our Ashburn datacenter. Steven Bernardin is now the main Tampa data center engineer.
In November, the team worked primarily on finalizing the code re-engineering of VisualEditor so that it is more modular and easier to extend, and on the integration ahead of deploying it for wider testing in December. The early version of the VisualEditor on mediawiki.org was updated twice (1.21-wmf4 and -wmf5), fixing a number of bugs and missing wikitext compatibility, and wide-spread improvements to much of the user interface code so that it will be easier to change in future.
In preparation for the upcoming deployment on the English Wikipedia, the Parsoid team concentrated on the preservation of existing content. Automated round-trip testing on 100,000 randomly chosen pages from the English Wikipedia using distributed test runners helped to identify many issues, which were fixed and often resulted in new minimal test cases being added to the parser test suite. Currently, 79.4% test articles (up from about 65% last month) round-trip without any differences at all, an additional 18% round-trip with only minor (whitespace, quote style etc) differences, and the remaining 2.6% of pages have differences that still need fixing (down from about 15% last month). Selective serialization will further avoid dirty diffs in unmodified parts of a page by using the original wikitext for those. This will help further fix the 20% of pages that had any kind of difference in wikitext. The implementation of this algorithm is currently being finalized.
This month, we continued to develop final features for Article Feedback, and researched how people are using this tool on the English Wikipedia. With the help of community members, we designed new features to reduce the editor workload, including improved moderation tools and a more prominent feedback link. These features will be developed next month, once we've completed code re-factoring to improve database performance. We also analyzed new research data to track how moderators use the feedback page, and measure how many readers who post feedback become editors or registered users. Next month, we will invite Wikipedians to evaluate the usefulness of feedback posts and the effectiveness of our new moderation tools. Once these tasks are done, we plan to release Article Feedback v5 to 100% of the English Wikipedia in early 2013. For more information about this tool, check our project overview.
Page Curation is now in 'maintenance mode', following its release on the English Wikipedia in September 2012. We have been tracking the impact of this tool with a metrics dashboard, which confirms that it is being used actively, with over 27,000 pages reviewed since launch. To learn more, visit our introduction page, watch this video tour or read this tutorial. If you are an experienced editor, try out the final version on the English Wikipedia.
The Agora extension moves ever-closer to completion, with help from Munaf Assaf, Trevor Parscal, Rob Moen and Vibha Bamba. Several templates on the English-language Wikipedia have been redesigned to reduce interface clutter, with some already implemented.
In November, the Editor Engagement Experiments team (E3) deployed the third and final A/B test of the new account creation page, including client-side validation. Results from basic data analysis of all three tests were published on Meta, and the project will now move to the productization stage. Extension:PostEdit was put in maintenance mode after being deployed to a further seven Wikipedias, including French and Portuguese. On the analytics side, E3 transitioned permanently to Extension:EventLogging for data collection purposes, and collaborated with the mobile team to track activity on Wikipedia's mobile beta. Last but not least, the team also deployed a small design improvement to the personal tools menu in MediaWiki core, in collaboration with the Language Engineering team.
The work of Ankur Anand (a.k.a drecodream) on Flickr integration, done during GSoC, has now been merged, and Wikimedia engineers are working towards its deployment in the near future. Specifically, several bugs related to Internet Explorer were fixed. Once all the bug fixes are deployed, the feature will be turned on for Commons (hopefully in early December). Initially it will only be available to administrators.
This month, we designed and started building key features of the Notifications project (code-named 'Echo'), towards a first experimental deployment in early 2013. Fabrice Florin wrote detailed feature requirements for our first release, and Vibha Bamba designed the first components of the user experience. Ryan Kaldari and Benny Situ developed the main features of this application, including the notifications flyout, the all-notifications archive, as well as email notifications and preferences. To test our work in progress, visit our first prototype (create an account and post on your talkpage from a separate account). New employee Luke Welling is also starting work on an HTML email module for this project. For more information, visit our project hub, or check our overview slides.
November has been a busy month for Fundraising as the team helped to kick-off the annual 2012 fundraiser on November 26th with heavy testing before then. So far the 2012 fundraiser has been a resounding success raising over $12M in the 5 full days and limited testing days since November 15th. For current information, see the live stats. Shortly before the full launch, it was announced that the annual fundraiser would be splitting into an English-language fundraiser in Australia, Canada, Great Britain, the United States and New Zealand during the traditional November/December period with other languages and all countries in April. For more details see the announcement on wikimedia-l.
The rest of the month, development time was spent on completing the Universal Language Selector, and getting it to a state where it could be put in maintenance mode for a few months. In April 2013, phase two of the ULS will start, will consist of adding content language selection.
The Language Engineering designers completed the design for the Translation UX project, for which development has commenced end of November, and will continue for 8 sprints of a fortnight, until mid-March 2013.
The first phase of the Universal Language Selector (ULS) was completed in November. The jQuery modules jQuery.ULS, jQuery.IME, jQuery Webfonts and jQuery i18n have had their first stable version. The Universal Language Selector MediaWiki extension is now being used on Wikidata. During the DevCamp in Bangalore, experimentations were done with ULS in Android, a Chrome extension was created to make jQuery.IME usable in the Chrome web browser, and an extension for Firefox implementing the input methods is underway.
The first contributions by non-Wikimedia developers have been made, which indicates that the jQuery extensions are getting some attention. The Wikimedia Language Engineering team will now put the modules and MediaWiki extension in maintenance mode until April 2013.
The Mobile team (Jon Robson, Juliusz Gonera, Arthur Richards and Max Semenik) deployed several features to our beta and production mobile web infrastructure this month. To beta, we deployed experimental edit functionality, reformatted tables, random article support, simpler layout for cleanup templates, and watchlists. For production, we added log-in support.
This month, we've worked with volunteer developers at the Bangalore DevCamp to start supporting an important variant of our upcoming text messaging support. We currently have the SMS/USSD combination working and awaiting launch, and we are now working on the SMS-only version for carriers that cannot support USSD.
We have created several automated Mobile browser-based tests that are now running our Cloud Bees/Sauce Lab Continuous Integration configuration. Both Platform engineering and Mobile QA are leveraging Watir Webdriver and Cucumber. We also continue to add to our Mobile Browser Regression Tests.
A new project, Phpzim, was started with the support of Wikimedia CH. This project will create a binding in PHP of the zimlib, allowing any PHP developer to easily create and read ZIM files. This is the first stone of a bigger project to allow quick ZIM file generation in Mediawiki (and also other PHP CMSes). Work on ZIM Autobuild continues and Kiwix ZIM throughput increases slowly (4 files in November). Small testing stage of Kiwix 0.9rc2 will finally start in early December, followed by the release.
Wikimedia engineers deployed 1.21wmf3 and 1.21wmf4 to all Wikimedia sites, and began deploying 1.21wmf5 (with a momentary breakage). These updates included many significant improvements, including one-click (AJAX) patrolling, for both new page and diff patrol, and a Template Sandbox, which lets users preview changes to a template by previewing an example page where it's used.
We're still very much looking forward to deploying the latest version of Gerrit (see last month's update), but unfortunately remain blocked on a complicated LDAP propagation issue. Chad Horohoe is working with the Gerrit developers on finalizing the fix for this issue. Chad also attended the Gerrit Developer Summit in November, and both Chad and Rob Lanphier attended the Gerrit Users Summit (notes).
Wikivoyage was launched into public beta on November 10. The site is running on Wikimedia servers, and accounts and text content was migrated. Images from the old site have not been automatically imported, because some contain non-free content, and need to be added to each language wiki in accordance with the Exemption Doctrine Policy for that site. Public announcement and promotion of the site is delayed while the community is working on the image transfer.
We have deployed TimedMediaHandler to all wikis. Jan Gerber and Michael Dale continue to fix bugs. Jan Gerber and Aaron Schulz are working on an improved file upload mechanism in UploadWizard to make larger file uploads more practical. Thumbnails (and math/timeline files) are now written to nas1 and Swift. More improvements have been made to FileBackend to avoid extra HEAD requests for 404 errors. Webm thumbnails use temporary Swift URLs to support range requests. Feature requests and bugs reports are filed against Ceph as MediaWiki takes advantage of other Swift features.
Brad Jorsch and Chad Horohoe have joined Tim Starling on this project. Brad has built a template sandbox which will help in debugging both Lua scripts and regular templates. Chad is working on a shared repository for scripts, and Tim has been extending the API. His latest work has been around adding multilingual APIs for handling things like plurals within Lua. We're currently seeking a volunteer product manager to help out with the roll-out of this.
Various improvements to the job queue have been made to avoid CPU time wasted on duplicate jobs and redundant page cache purges. Changes have also been made to make it possible to edit heavily used templates without timeouts. Support needed for more complex data structures (lists, sets) in memcached (with atomic updates) is awaiting more review and testing. The coding is essentially done (changeset).
Wikia wants to attract motivated app developers and companies using Wikia's products to use the API. They also want to make the API more standards-compliant (a RESTful interface, using HTTP verbs), but that's a high-level goal. Mobile-related work is first, but this redesign would improve the whole platform, including the enterprise. The Wikimedia Foundation and Wikia want to work together on this; The Wikimedia Foundation also wants to avoid boxing ourselves into special-purpose, specific apps. Wikia developer Federico Lucignano is currently working on a Request for comments on the REST proposal.
The team contributed to the community QA draft strategy and presented the Acceptance Test-Driven Development concept to Wikimedia Product/Project managers. Regression testing of software deployments is ongoing.
We deployed ArticleFeedbackv5 to the beta cluster, which is the primary host for AFTv5 testing, including browser test automation. New Page Patrol is being maintained there as well. We are still working on issues of ongoing maintenance, and this cluster played a role in catching a defect that recently escaped to production.
A continuous integration summit occurred during the Netherlands Hackathon. integration-jenkins2 is now fully operational with Jenkins / Gerrit and a Zuul installation. Antoine Musso has generated the new MediaWiki core Jenkins jobs. Zuul has been deployed in production successfully. It triggers a new set of Jenkins jobs that will eventually replace the old MediaWiki-.* ones. The new Jenkins jobs for MediaWiki core (triggered by Zuul) have been tested in production and are successful. The new workflow has been documented.
In November, the QA team created a backlog of tests to be automated, ported existing tests from RSpec to Cucumber, and is now working on browser testing architecture, creating basic new tests (see the qa/browsertests repository in Gerrit), and refactoring tests for cleanliness. Chris McMahon began discussing automated browser tests with Wikimedia tech managers to get developers writing those tests as they develop extensions deployed on Wikimedia sites; public announcement will be coming very soon, when the existing example tests are in final or near-final form. Noisy tests failing for known reasons have been removed from the suite, which is now completely green (that is, passing); the team will soon be writing and adding more tests. Browser tests in November identified a serious regression in UploadWizard running on test2 and prevented its release to production.
Andre Klapper improved and cleaned up updated large parts of the bug management and Bugzilla documentation. This includes the beginnings of a triage guide. He also published his Greasemonkey scripts in a Git repository and went through obsolete extensions and updated their Bugzilla descriptions. Andre started analyzing how Wikimedia engineering teams use Bugzilla and their related workflows. He also investigated a potential upgrade of Bugzilla to version 4.2 by doing some basic testing. Furthermore, a wikitech-l discussion on standardizing the meaning of "highest priority" in Bugzilla resulted in creating a new "Immediate" priority status.
The first phase of the Outreach Program for Women (OPW) has been completed, receiving the submissions of more than 15 firm candidates, delivered to 8 mentors available. The Wikimedia Foundation is funding 4 full-time internship positions between January and March 2013. There is a possibility to obtain more, depending on external sponsors of the program. The selected candidates will be announced on December 11. The OPW is organized by the GNOME Foundation and 11 FLOSS projects are taking part.
Management reviewed options to determine the direction this activity would follow in future months. In the meantime, Guillaume Paumier cleaned up and expanded the Wikimedia glossary with terms related to Wikimedia technology and engineering, and volunteers & engineers came to expand it further. He also followed up on the consultation process initiated in October to identify how to improve dialogue between technical communities and user communities. He's now in the process of widening this discussion to more communities. Sumana Harihareswara sent a call for volunteers to lead or advise Wikimedia engineering staff on select activities, and followed up on the offers.
Sumana Harihareswara started sharing new volunteer coordination tasks with Quim Gil, the new technical contributor coordinator who started working with the Wikimedia foundation in November. They continued to follow up on contacts (such as those gained at October's Grace Hopper Celebration of Women in Computing), recruit new contributors to the Wikimedia tech community, and mentor newer contributors. The weekly online tech chats continued on Thursdays. Sumana and others continued to grant developer access and work on Gerrit project ownership requests.
The repository side of Wikidata has been launched on http://www.wikidata.org. It contains the results of phase 1 (language links) and has already attracted a community to maintain the wiki. Meanwhile, the Wikidata team has continued work on Phase 2 of Wikidata (Infoboxes) to add statements with values to the items in the Wikidata repository. The team improved the propagation of changes from the repository to the client and the messaging in Recent Changes. There is a constant exchange with Wikimedia Foundation engineers about the upcoming deployment cycle. Feedback and questions are welcome on the mailing list and on meta.
The engineering management team continues to update the Software deployments page weekly, providing up-to-date information on the upcoming deployments to Wikimedia sites, as well as the engineering roadmap, listing ongoing and future Wikimedia engineering efforts.