Wikimedia Engineering/Report/2012/June

From mediawiki.org

Major news in June include:

Events[edit]

Recent events[edit]

Berlin hackathon (1–3 June 2012, Berlin, Germany)

Approximately 104 participants from 30 countries came to Berlin, including MediaWiki developers, Toolserver users, systems administrators, bot writers and maintainers, Gadget creators, and other Wikimedia technologists. The community also learned more about the Wikidata and RENDER projects. More updates, links to videos, and followups are on the talk page.

Upcoming events[edit]

Pre-Wikimania hackathon (10–11 July 2012, Washington, D.C., USA)

Open source teaching nonprofit OpenHatch will be aiding in organizing and running this two-day event, with Katie Filbert, Gregory Varnum, and Sumana Harihareswara. Experienced Wikimedia technologists will collaborate on their own projects, while interested new developers will be able to learn introductory MediaWiki development. Accessibility will be one of the event themes. The event is free to attend even for those not attending Wikimania itself.

Personnel[edit]

Work with us[edit]

Are you looking to work for Wikimedia? We have a lot of hiring coming up, and we really love talking to active community members about these roles.

Announcements[edit]

Operations[edit]

Site infrastructure[edit]

June was another busy month for racking, stacking and provisioning of newly purchased equipment for Chris and Rob. In the works are additional servers to clusters such as External Store, Memcached, Parser Cache, Object Store and Labs. Meantime, new servers were rolled out in EQIAD for analytics, DNS resolver, and UDP2Log. Servers and firewalls were racked and cabled for the new EQIAD payments cluster. Storage3's RAID controller failure was repaired, and a replacement machine was ordered.
IPV6 Launch day (6/6/12) came and went without much fanfare. Much work was put into the infrastructure and system-stack by Mark, Faidon, Ryan and Asher, especially into LVS, PyBal, Varnish, Squid, DNS, database, Nagios monitoring and puppetization. We also took this opportunity to update those technologies as well as run them on Precise (12.04) where possible. We have been keeping IPV6 traffic on since. As part of risk mitigation, only half of the LVS and Pybal servers were upgraded to run IPV6 and the enhanced features, allowing us to fallback if needed. Since we have now one month of stability, we will soon begin the rest of the migration.
During the Berlin Hackathon, the TechOps team got together for about 2 hours to review the year's progress. A blog post on this will follow soon. In summary, the team completed 19 priority 1 projects (e.g., deploy Mobile, SSL, Labs, Db upgrades & Network redundancy) that were identified at the beginning of the year. We followed up with a list of high priority projects for this new fiscal year. A blog post with more details on this will also follow soon. In addition to working on IPv6-related work, the team did a major cleanup of jobs creating cronspam, making the logfiles more readable.
Asher performed benchmark testing on the External Store, comparing the current ISAM engine with InnoDB. He dispelled the myth that MyISAM is faster for external store for this use case. He has started migrating them to use InnoDB engine with this new information. You can read his report here.

Data Centers

We have identified a new colocation facility to be the new West Coast caching center, and it is located at 200 Paul Street, San Francisco. Work on building up the infrastructure is planned to begin this coming August/September. With this caching center, we will be able to improve users' site experience for US west coast and Asia Pacific.

Object Store/Swift

A severe bottleneck has been identified in doing container listings in Swift and Ben Hartshorne is adding SSD drives to the swift back end storage nodes to provide faster container listings. Testing has been completed to verify that this change will solve the problem and it is being deployed to production this month. Additionally, integration of the SwiftStack monitoring improvements was accepted to the mainline Swift codebase last month and will be deployed to our environment in July.

Testing environment[edit]

Wikimedia Labs

The Labs infrastructure had a DNS outage, caused by glue records that must be updated via a manual process. To combat that issue in the future, Labs DNS resolvers are now on service IPs with service host names. A DNS resolver was brought up in EQIAD, as well as an additional LDAP replica. Faidon's puppetmaster::self class is being put into use. It's working well enough that the test branch for puppet was merged into the production branch, and Labs now runs directly off of the production branch. The very annoying "No nova credentials for your account" bug has been fixed. virt6-12 in pmtpa have been racked, wired and installed. They will soon be put into production. Andrew Bogott's work on the nova plugin framework continued this month. The plugin framework has been moved into openstack-common, making it the plugin framework for all openstack services. Work is now ongoing to merge the changes back into nova. Per-project Debian repositories (for ubuntu-precise and above) are now available. An all-in-one MediaWiki puppet class is now available as well.

Backups and data archives[edit]

Data Dumps

Media downloads per project are now live, along with one or two "incremental" downloads per month. The new deployment system (which actually uses scripts instead of moving files around by hand) was completed and is in place. It was even used this month to push some minor changes. We're working with another organization that wants to mirror media, and we're still looking for more mirror sites for media, dumps or pageview stats; send us ideas! The archive.org uploader code was rewritten as a core S3 uploader library with archive.org extensions and new features we need are being added; this will be extended for Google Storage usage as well.


Other news[edit]

  • We had our fair share of several short site incidents in the month of June. On June 7, users reported experiencing API service slowness and unavailability. Tim was around to resolve that incident (detailed report). On June 20 (and also on June 21), users reported about getting Apache HTTP timeout issue. It was found that in both cases, one of the memcached servers was experiencing high load and restarting them resolved the issue (detailed report). The incident on June 19 did not impact our MediaWiki production clusters, though it caused our email system to be held up for half a day.
Jeff discovered a mail distributed spam attack on our mailsystem involving what appeared to be a few thousand malicious hosts. They were flooding our secondary mailserver with undeliverable messages to fake addresses at various WMF domains. The secondary mailserver forwarded those to the primary mailserver, which overloaded and became slow in processing legitimate mail. A temporary fix was put in place to drop those fake and spam messages, but it took a day for the mail system to catch up. We subsequently put a proper fix in place.

Editing tools[edit]

VisualEditor [edit]

The team did the first deployment of VisualEditor and Parsoid, with an early version now live in a test namespace on mediawiki.org. This editor is broadly feature-compatible with the old, EditableSurface-style code which this replaces, while being the first release that can create and edit pages. The team is now planning to deploy new code as it develops every two weeks or so. The initial push will be to work on bug-fixes, and to finalise the code for a few features that were close to being ready before the first deployment.

Editor engagement[edit]

Article feedback [edit]

Fabrice Florin worked with new WMF engineer Matthias Mullie and OmniTI to develop a range of new features for version 5 of the Article Feedback Tool (AFT5). This month, the team completed primary feature development for this tool, including the article feedback page, the central feedback page, and the final feedback form (scroll to bottom of page). We started writing and publishing new documentation about this project, including this help page. Dario Taraborelli, Aaron Halfaker and Oliver Keyes published a full report that suggests that people who post feedback are more likely to edit articles afterwards. Roan Kattouw continued to review our code and trained our team to start deploying code on their own. We have started a wider deployment of AFT, which will gradually increase our coverage to 10% of the English encyclopedia by the end of July, with full deployment a couple months later.

Page Triage [edit]

Ryan Kaldari, Benny Situ, Fabrice Florin, Oliver Keyes, Brandon Harris, Vibha Bamba and Howie Fung deployed an updated version of the New Pages Feed (formerly called Page Triage) on the English Wikipedia. This new tool provides an enhanced list of pages for review by community patrollers. The team also deployed the first version of a new curation toolbar to appear on article pages, enabling patrollers to get more article info, mark pages as reviewed, or tag them. We plan to complete development of the full curation toolbar this month (including tools to nominate articles for deletion and send WikiLove to page creators), then start integrating it with the Article Creation landing system. Check out the current prototype on the English Wikipedia, as well as the latest version on Wikimedia Labs. (Tech tip: if you are an auto-confirmed editor, click "Review" on any unreviewed article shown in red on New Pages Feed and add "?curationtoolbar=true" to the URL.) Please report any bugs on Bugzilla.

MediaWiki infrastructure[edit]

ResourceLoader [edit]

The demo with the latest version is deployed on WMF Labs where there's a cluster of 4 wikis connected with a shared Gadget repository.

Recently implemented:

  • Finished back-end validation of gadget definitions when saving. Users now get a descriptive error and the edit will not be saved.
  • Roan implemented a view for Gadget definitions where the JSON syntax is prettified with indention etc.
  • Timo is currently going through a review backlog in the RL2 branch, and working on front-end implementation of the new "skins" and "position" properties in the (visual) gadget defininition editor.
  • Assorted other progress on the implementation of the specification, and task list of small bug fixes and improvements.

Feature support[edit]

Wikipedia Education Program [edit]

Jeroen De Dauw and Sam Reed finished review. Extension has been deployed, but temporarily disabled again due to a namespace/title conflict with a Star Trek: Voyager episode ("Course: Oblivion"!). This should be resolved shortly.

2012 Wikimedia fundraiser [edit]

Onboarded Adam Wight to the team. GlobalCollect recurring is now code complete (now in code review). Integrated with Yandex through GlobalCollect. Finished migration of payments deployment to Git.

Internationalization and Editor Engagement Experiments[edit]

Internationalization and localization tools [edit]

In June, the team:
  • Completed initial UI design and user experience testing for the Universal Language Selector (ULS)
  • Developing initial prototype for the Universal Language Selector (ULS)
  • Developed and deployed Translation Notifications
  • Added more language input methods to Narayam
  • Added more language script fonts to Web Fonts
  • Made progress on integration of Translate functionality on meta for communications and fundraising groups (with integration into CentralNotice)
  • Started work with Arabic community to increase Arabic language support into i18n/L10n tools

Editor engagement experiments [edit]

The team redeployed the Timestamp Position Modification experiment and it is now wrapped and in analysis. Designs and analytics work on the next experiment, post-edit feedback, were completed in preparation for a July deployment. Debug hooks were added to the clicktracking extension with the goal of improving QA for experiments. We wrote a clicktracking dashboard that intercepts event logging calls and displays them on-screen, shows which experiments are currently active, and to which bucket (if any) the current user has been assigned. Work is ongoing on a re-write of the clicktracking extension, which is taking shape as at Extension:E3_Experiments.

Contributors[edit]

Mobile Contact US [edit]

Phil, Tomasz, and Arthur worked together to test out several new contact methods for mobile users. They routed general contact emails to OTRS and technical emails are going directly to the engineering team. With some minor changes a more permanent email address will be set up for technical problems.

Wiki Loves Monuments mobile application [edit]

The WLM app team (Yuvi, Phil, Elke, Lindsey & Jon) worked closely during the month of June to finalize requirements, draft workflows, design mockups, and begin implementation on the first version of the app. The team did its first showcase of the app and are working quickly to resolve any outstanding bugs.

Readers[edit]

Mobile_design/Wikipedia_navigation [edit]

The mobile nav continued its progression through the month of June. Jon, Phil and Lindsey worked together to get us closer to the new mobile experience. Language functionality got into an initial form, ready for deployment to the beta site, and the team has nearly settled the Article Action approach, along with an initial version of hiding and revealing the search bar. A relatively complete implementation of the new site UI will be pushed to the Beta site soon.

Wikimedia Apps [edit]

The app team (Yuvi) spent the month of June polishing off the Wikipedia app for Android (Vers. 1.2) and iOS (Vers. 3.2). Android is available in the market immediately while iOS is still under Apple review. The iOS version is picking up the latest PhoneGap 1.7 changes and will see a dramatic speed improvement.

Wikipedia Zero [edit]

Dan and Patrick continued conducting tests with Orange in six different countries. Additional testing and refinement is under way with our partners in Bangladesh and Montenegro. Dan and Patrick continued conducting tests with our partners in Bangladesh and Montenegro. We debugged and resolved serious issues with our Opera Mini integration and general infrastructure.

MobileFrontend/J2ME app [edit]

Tomasz worked with Legal and our new contractor OpenPath to close the initial work agreement. OpenPath has kicked off their development and we should see their first check in shortly. Initial screen flows can be found here. Kul & Phil worked to settle our initial device test list. Patrick will be providing any necessary technical assistance.

Wikipedia over SMS & USSD [edit]

We've been making significant progress recently to secure the necessary partnerships that will make it possible to provide Wikipedia over SMS.

Infrastructure[edit]

Mobile default for sibling projects [edit]

Phil, Patrick, Max, and Asher worked together to prep the sibling projects to become mobile default. Phil used the global messaging to notify all affected village pumps about the change. Patrick, Max, and Asher refined our squid mobile redirector to allow for new projects and enabled it for Wiktionary, Wikinews, and Wikisource. All three of these projects default to mobile now if we detect a mobile phone. We've scheduled our next set of projects on the project timeline.

Improved Mobile Device Detection [edit]

Diederik van Liere and Patrick Reilly continued our work on integration with the Apache Device Map project.

MediaWiki Core[edit]

MediaWiki 1.20/Roadmap [edit]

We successfully completed the MediaWiki 1.20wmf4 and MediaWiki 1.20wmf5 deployments in June, and started the MediaWiki 1.20wmf6 deployment process.

Git conversion [edit]

This has been a very busy month for Gerrit. The creation of new projects continues; this month saw every extension deployed on Translatewiki moved to Git. During the week of June 25th, we experienced some downtime with Gerrit due to search engine crawlers overloading the server. Also that week, lots of improvements to IRC logging were made, although discussion continues on how to make the bots more effective. We have scheduled an upgrade of Gerrit to the 2.4 release for the week of July 2nd--this will bring the much desired "Rebase button" to Gerrit which should lighten users' workload for trivial merges.

Multimedia [edit]

Development on TimedMediaHandler has been put on pause until Jan Gerber comes into San Francisco late July for the final push. Ben Hartshorne is installing SSDs for use in storing the object listing database, in hopes that having faster storage will result in faster purge times (fixing bug 34717), which we hoped to complete in June, but which is stretching into July. All work for deploying Swift for storage of original images is on hold until we fix the object listing performance problems.

Lua scripting [edit]

Tim Starling led tutorial sessions in June and videos (first session, second session) are now available on Vimeo. They will be on Wikimedia Commons by mid-July. Ross Andrews is now working on documentation in the form of help/tutorial pages, especially describing the MediaWiki interface. Once that's done, Tim will promote the prototyping site on Labs more heavily, and at some point after that, we will install the Scribunto extension on mediawiki.org. Work on Lua was paused in late June to catch up on other activities. Full deployment to Wikimedia sites is scheduled for 2013.

OAuth [edit]

OAuth/status

Code review management [edit]

Diederik van Liere is gathering Gerrit stats now, and is planning to publish the first batch soon. In the meantime, current statistics on all MediaWiki (core and extensions):
  • 49 that have received a positive tentative review (+1) but have not been merged (+2)
  • 203 that received neither -2, -1, +1, nor +2 reviews (but might have textual comments)
  • 61 received a negative tentative review (-1) with issue to be addressed by the original contributor
  • 15 that have been rejected (-2) but not yet abandoned by their original authors

Security auditing and response [edit]

Site performance and architecture [edit]

An initial investigation has begun on the possibility of upgrading from PHP 5.3 to PHP 5.4. Benchmarks are very promising, but a security enhancement we are currently using with PHP 5.3 (Suhosin) is not yet available for PHP 5.4, so the team is debating whether to carry on without it, as well as estimating the performance penalty introduced by this patch. More improvements have been made to Ganglia and Graphite.

Quality assurance[edit]

QA and testing [edit]

This month saw a big focus on hiring the QA Engineer and Volunteer QA Coordinator. We also continued to be focused on testing Article Feedback (including via an event on IRC with OpenHatch and new testing volunteers), and are working to get beta labs fit for use as a test environment for AFT and Editor Engagement (E2).

Beta cluster [edit]

The primary focus of Beta cluster work in June was in service to TimedMediaHandler (TMH). TMH has been setup though transcoding is not operational yet, since that would require a fully functional job queue. The team discovered that the version of Ubuntu currently used in production (Lucid) won’t work with TimedMedia Handler. As a result, Antoine and Faidon updated the Puppet configurations for the Apache web servers to run on the next generation Ubuntu (Precise). Administrative tools have been setup closely following the way it is done in production. For example, the Beta Cluster now uses the exact same workflow to update the l10n cache as we do in production. The team plans to further improve this by fetching l10n updates from translatewiki.

Continuous integration [edit]

Timo Tijhof is working on setting up the new TestSwarm in Wikimedia Labs. We will use the TestSwarm and BrowserStack API through the testswarm-browserstack bridge to automatically populate the swarm with needed browsers. Antoine Musso upgraded Jenkins to the latest version, 1.472.

Wikimedia analytics[edit]

Analytics/Reportcard [edit]

Erik Zachte and Fabian Kaelin prepared the June Reportcard for the monthly Metrics Meeting. David Schoonover worked on adding d3.js support to Limn and is preparing to publish the project's source code on github.com.

Analytics/Pageview logging [edit]

A change to add 2 new headers to logging fields has been submitted. We are waiting on the go ahead from consumers to merge and deploy this.

Analytics/Kraken [edit]

Engineering Community[edit]

Bug management [edit]

The Wikimedia Foundation is seeking a Bug Wrangler to work on management of bugs.

Summer of Code 2012/management [edit]

The Google Summer of Code students continue their twelve weeks of design and coding.

Wikimedia Foundation engineering project documentation [edit]

At the Berlin Hackathon, Guillaume Paumier, Rob Lanphier and Timo Tijhof discussed how to summon the Status Helper tool from custom edit links. Guillaume modified the templates to provide hidden metadata, and Rob implemented the functionality in the JavaScript. Timo also converted the user script into a full-fledged opt-in gadget. Rob Moen created a JavaScript tool to easily assess and tag pages, as part of an effort to clean up the wikitech wiki, before merging it with labsconsole.

Volunteer coordination and outreach [edit]

Sumana Harihareswara continued to follow up on contacts, recruit new contributors to the Wikimedia tech community, and mentor new contributors. She granted developer access and Gerrit project ownership requests, and planned upcoming events. The Foundation is also hiring a coordinator for volunteer testers and an engineering outreach coordinator to work on volunteer coordination and outreach.

Wikimedia engineering 20% policy [edit]

Sumana Harihareswara is coordinating WMF engineers' efforts to spend 20% of their work time on code review and other efforts benefiting the entire Wikimedia engineering community. Their highest priority is the Gerrit merge queue, especially for backlogged components such as UploadWizard and ProofreadPage, and secondarily patches awaiting review in Bugzilla for MediaWiki or WMF-deployed extensions. Some participants are instead concentrating on bug triage, documentation, and the extensions awaiting review for deployment.
The Wikidata project is funded and executed by Wikimedia Deutschland.

The team published an easier-to-understand version of their data model, updated their story boards for how to link between Wikipedias in the future, and submitted a proposal to the Knight News Challenge to make Wikidata a central, persistent repository for identifiers on the web in a second year of development. Also, proposed logos went up for public voting.

Future[edit]

The engineering management team continues to update the Software deployments page weekly, providing up-to-date information on the upcoming deployments to Wikimedia sites, as well as the engineering roadmap, listing ongoing and future Wikimedia engineering efforts.