Engineering metrics in July:
Major news in July include:
- Engineering presence at Wikimania 2012 in Washington, D.C., and the pre-Wikimania hackathon;
- the launch of Limn, an open source dataviz toolkit developed by the Wikimedia analytics team;
- the deployment of Article Feedback Version 5 (which supports free-text feedback and moderation thereof) to 10% of English Wikipedia articles
Wikimania and Pre-Wikimania hackathon (10–15 July 2012, Washington, D.C., USA)
- This year's pre-Wikimania Hackathon was special in that it had a full track for newcomers, going beyond tutorials. The Hackathon was a collaboration with OpenHatch, an open-source teaching non-profit. The new efforts included appropriate first-time tasks to orient newcomers into more advanced Wikipedia editing and tech contribution, a laptop setup guide that steps attendees through the process of configuring development environments, and constant in-person assistance to help people past problems they encountered. While at the event, we saw many people learning more about templates, editing Wikipedia, and using and modifying bots to improve the encyclopedia and media on it. At least 65 people signed in, with surely more in attendance. During the main Wikimania conference, a number of volunteer and staff gave talks and led discussions about technology-related topics.
Wikipedia Engineering Meetup (15 August 2012, San Francisco, USA)
- The Engineering department of the Wikimedia Foundation has initiated a Wikipedia Engineering Meetup to showcase the interesting problems and products they work on to the local developer community. Tentatively, the meetup will happen every two months at the Wikimedia offices in San Francisco, and will consist of three 15-minute engineering presentations, followed by a question & answer period bracketed by mingling. The inaugural meetup will feature talks about Mobile engineering, Analytics and the VisualEditor.
Are you looking to work for Wikimedia? We have a lot of hiring coming up, and we really love talking to active community members about these roles.
- July was a relatively quiet month for Operations, and the team was working mostly behind the scenes. Mark Bergsma has successfully integrated and tested the upgraded Varnish software (with persistent cache patch) on some of our mobile caching servers. They are working very well and the plan is to roll it widely in the coming weeks.
- Mark also made several feature upgrades to LVS/Pybal, including IPv6 BGP support, and a DNS recursor implemented as an LVS cluster and on the back of Ubuntu 12.4. The package has been puppetized and deployed. It also means EQIAD servers no longer hop to Tampa to get DNS answers.
- Peter Youngmeister has been packaging, puppetizing and testing the new application server build. This build runs on Precise (Ubuntu 12.04) and works with the Swift object store (rather than the current NFS filer). There will be further performance and scalability tests across bigger portion of Tampa application servers shortly.
- Asher has completed and tested his latest (Precise) MySQL build on one of the database slaves. This will serve as the package for future MySQL upgrades and new deployments going forward.
- Migration to Swift is progressing to the final stages now that the performance bottleneck issue identified in June has been resolved. MediaWiki is now operating fully using Swift as the primary object store for thumbnails (the NFS filer is relegated to a secondary fail-over backup). The 'originals' (uploaded images and multimedia contents) have been copied over to Swift as well, setting the stage to migrate away from the NFS filer next month. Also coming next month: upgrade to Swift version 1.5.0 and bring online a second Swift cluster in eqiad.
- This month was focused on adding new hardware, working on upgrading OpenStack infrastructure, and other stability efforts. virt6–8 have been added to the cluster and about 20 instances have been migrated to these nodes so far. Another 40 instances have been created on virt6–8 since the addition. Initial instance migration efforts ended in 30 instances being corrupted due to a KVM block migration bug. A cold migration process was created as a workaround, with an automated script. Development effort is ongoing to upgrade OpenStackManager to support the Essex release of OpenStack. Keystone support is complete and OpenStack API support is being added currently. Development work on OpenStack continues as well. Andrew Bogott's openstack-common plugin framework has been merged. novaclient work is progressing and should be merged after some cleanup efforts. Some changes needed for OpenStack Keystone's LDAP backend to work for the essex (stable) release were pushed in collaboratively between Ryan Lane and OpenStack developer Adam Young. A blueprint has been submitted for using Keystone to manage LDAP entries via templates, so that we can move to Keystone as an LDAP manager in the future. Work has begun on using OpenStack Nova for managing DNS entries. GlusterFS project storage has been upgraded to version 3.3. A tutorial on Using Puppet with Labs was hosted at the pre-Wikimania Hackathon by Leslie Carr and Ryan Lane, and a presentation on Labs and the State of Our Open Source Infrastructure was given at Wikimania.
- The YAS3 library for uploading to archive.org and to other s3-compatible sites, along with several command line clients, is now usable (though still under heavy development). This library handles 100 Continue correctly; this means that for large file uploads, the upload is only attempted once the client has been redirected to the right host, a great time saver. The library also supports uploads of large files in multiple chunks automatically, rather than requiring the user to split the file into separate pieces. That's a necessity for us since many of our dump files are quite large.
The VisualEditor (VE) team presented their work at Wikimania and received a good deal of feedback from the community. The team created a rough plan for the next three months' work
. The early version of VE on mediawiki.org was updated twice, fixing a number of bugs and noticeably including the addition of support for nested lists. Gabriel Wicke relocated to San Francisco, and Timo Tijhof visited the SF office for three weeks after Wikimania.
and Matthias Mullie
led the deployment of Article Feedback
on 10% of the English Encyclopedia, in collaboration with Pau Giner
, Ryan Kaldari
, Roan Kattouw
, Oliver Keyes
, Chris McMahon
, Benny Situ
, Heather Walls
, Howie Fung
and Terry Chay
. This month, the team developed final features
for this tool, including the article feedback page
, the central feedback page
, and the final feedback form
(scroll to bottom of page). To guide users of this tool, we also published a new video tour
, a walkthrough tutorial
and various help pages
. We have received a very positive response to article feedback from the Wikipedia community through a variety of channels, from talk pages to IRC chats and Wikimania presentations. Community members typically find the tool useful and well-thought out, and many editors have told us they have already made improvements to articles based on feedback from readers — which is exactly the behavior we were hoping to encourage. We have started our productization phase (more platforms, scalability, code re-factoring, localization, metrics, mobile). We are now aiming for a full release to 100% of English Wikipedia by October 2012 — with other wiki projects starting later this year.
, Benny Situ
, Fabrice Florin
, Oliver Keyes
, Brandon Harris
, Vibha Bamba
, Terry Chay
and Howie Fung
deployed an updated version of the new Page Curation product (formerly called Page Triage
) on the English Wikipedia. This new product includes two main features: 1) the New Pages Feed
, a dynamic list of new pages for review by community patrollers; and 2) the Curation Toolbar, an optional panel on article pages, which enables editors to get page info, mark a page as reviewed, tag it, mark it for deletion, send WikiLove to page creators — or jump to the next page on the list. This month, we completed development of all key curation tools and are now adding a couple final features (such as the ability to send a personal note to page creators, to give them helpful tips and let them know their page has been reviewed, tagged or nominated for deletion). We now plan to pre-release Page Curation on the English Wikipedia in mid-August — with a full release in September 2012. Check out the current beta version
on the English Wikipedia, as well as the latest version
on Wikimedia Labs. (Tech tip: if you are an auto-confirmed editor, click "Review" on any unreviewed article shown in red on the New Pages Feed
; until the product is pre-released, please add "?curationtoolbar=true
" to the URL, in order to see the Curation Toolbar.) Please report any bugs on Bugzilla
continued to build this feature and plans on deploying test Echo
prototype on mediawiki.org in early August. It will not include infrastructure parts that depend on JobQueue services. Right now it only supports notifications for talk and LiquidThreads.
The Labs back-end is complete on Wikimedia Labs
. Timo Tijhof
is currently debugging and testing
Wikipedia Education Program
The extension is still disabled, pending resolution of namespace issues. The Education Program team has been presenting at various conferences around the world.
2012 Wikimedia fundraiser
We onboarded Matt Walker
to the team. Progress was made on enhancements to CiviCRM that enable Finance and other departments to get relevant metrics and reports more easily. Katie Horn
traveled to Wikimania and gave a presentation about the fundraising infrastructure.
Internationalization and Editor Engagement Experiments
Internationalization and localization tools
Editor engagement experiments
The Timestamp Position Modification
experiment was completed, and initial analysis shows that adding the timestamp on articles increases clicks on the History tab. Development started to deliver post-edit feedback
messages; this experiment includes a proof-of-concept dry run of a new editor bucketing strategy for delivering experimental treatments that was deployed in advance of the full experiment. The team configured a test environment on Labs, to be used for UI and functional requirements validation by the team. The Wikimania conference was an opportunity to interact with editors from the English Wikipedia, and to define a new experiment related to cleanup templates.
Wiki Loves Monuments mobile application
Significant progress made at the Wikimania Hackathon in Washington DC and elsewhere. The development team of Jon, Brion, Yuvi, Max, Arthur improved the robustness of photo uploads, sped up monument discovery, and polished the app extensively. We held a showcase to demo our app to the WLM community and have released
a first beta for Android.
Dan and Patrick continued conducting tests with our partners in Bangladesh and Montenegro. We debugged and resolved serious issues with our Opera Mini integration and general infrastructure.
Development continued with our partner OpenPath
. We completed basic api integration, image light boxes, search, main page. Her name from you integration, and numerous others. OpenPath delivered two alphas builds to use for testing and were eager to move forward in to the beta cycle.
Wikipedia over SMS & USSD
Patrick, Jeremy, and Dan worked to make the vumi architecture stack production ready. They worked with the operations team to puppetize the setup in prep for moving it to real hardware. Next month we'll be prepping for a demo with one of our potential partners.
Mobile default for sister projects
Patrick and Arthur added three new projects and one new domain
to be mobile default. Wikiquote, Wikibooks, Wikiversity, and *.wikimedia.org are now equipped to better serve our mobile users. The development team is eager to hear back from our community about what they would like to see from their projects on mobile. Our last project to migrate will be commons
after which we'll close this project.
- We finally released Kiwix 0.9 rc1 (see the CHANGELOG). All the binary files were compiled using our new continuous integration build platform. In collaboration with Wikimedia France (for the Afripedia project), we released a first version of kiwix-plug, a standalone WiFi hotspot using cheap plug computers. The Black&White project, contracted by Wikimedia CH, was completed; a recent achievement was the introduction of Kiwix in the official Debian package repository. Also in collaboration with Wikimedia CH, we started a new project called ZIM autobuild aiming to quickly and automatically generate ZIM files of our projects.
We're in the process of evaluating Gerrit and its alternatives
as a code review tool. Chad Horohoe and Ryan Lane upgraded our Gerrit instance to 2.4, which provided many incremental fixes and small features (e.g. the "rebase" button). Ryan and Asher Feldman migrated the Gerrit database to our Ashburn datacenter, which resulted in a big performance boost.
Jan Gerber is in San Francisco, currently working on fixing up the transcoding, and ensuring interoperability with SwiftMedia
, working with Michael Dale. On July 31, Aaron Schulz, Jan, and Michael deployed the TimedMediaHandler extension to test2. Aaron Schulz is in the process of migrating image originals into Swift. Commons is completed (barring a few minor problems to investigate in the logs), and the rest of the wikis are in various stages of progress. Meanwhile, there's also a minor architectural change (MultiWrite backend) that will be deployed soon, which is a necessary prerequisite to serving/storing originals in Swift.
Tim Starling has added a debug console to test code snippets. We believe we're ready to deploy Lua to the WIkimedia cluster, starting with test2 in August, followed by mediawiki.org. We plan to let Lua incubate on mediawiki.org while we test the performance characteristics with key templates, and work out a deployment plan for larger wikis that includes community involvement.
Site performance and architecture
Tim Starling investigated an LLVM
PHP bytecode converter this month, which looked like a promising direction for performance optimization (slides here
). The theoretical gain seems pretty significant, but actual performance he was able to observe was disappointing and we probably won't go in that direction. Asher Feldman has deployed an upgraded version of the parser cache server (db40) and the results have been impressive. Comparing 90th percentile and 99th percentile cache response times averaged over several days (July 3-5) for the parser cache server versus the last 8 hour for new improved parsercache shows 90th percentile response time dropping from 53.6ms to 7.17ms, and 99th percentile response time dropping from 185.3ms to 17.1ms. This is relevant to every page request from logged in and cookied logged out users so should have a meaningful impact on the user experience. Aaron Schulz and Andrew Garrett have been working on job queue improvements, Tim Starling on Apache configuration cleanup, and Antoine Musso on normalizing Labs and production configurations.
Aaron Schulz wrote a new class (ExternalRDBStore) used for sharding tables
in MediaWiki, and is now in bugfixing mode. He also wrote a patch to shard some of the tables associated with FlaggedRevs as a first use of this class. Asher Feldman is currently investigating hardware requirements for utilizing sharding.
Code review management
These are the numbers as of 2012-08-01
- +1 but not merged: 41
- 0 but not merged: 210
- -1 but not merged: 87
- -2 and not merged: 15
Security auditing and response
Daniel Kinzler has finished up what we hope is the final round of changes to the Wikidata branch, which Tim Starling needs to review prior to merging into master. There is currently a discussion about the best way of storing a local copy of the Wikidata-based language links (see the thread about Wikidata blockers
Admin tools development
has been generally tasked with working on features related to spam blocking and other disruptive behavior blocking, with Tim Starling
's guidance. Chris Steipp, Andrew Garrett
, Tim Starling, James Forrester
, and Rob Lanphier
met to discuss the scope of this work, charting a broad list of feature requests, and attempted an initial prioritization of the most important items. Jack Phoenix
has volunteered to act as a Product Manager to help manage this work.
QA and testing
Hiring a QA Engineer remains a high priority. Article Feedback Version 5 is now on 5% of Wikipedia, with a plan in place to increase that percentage over the summer. AFTv5 is being praised highly by the Wikipedia community, although a small number of power users are experiencing a particular problem. We isolated that problem and have a potential fix in place for deployment July 24. Work on the labs beta cluster continues, with AFT and TimedMediaHandler first priorities and the Editor Engagement project to follow. No community test events are planned right now, although the groundwork is in place for community test events related to Bugzilla tickets for Extensions and to the Visual Editor.
The beta cluster infrastructure is now mostly in our configuration change engine (puppet) and start being used by third parties. The Features team and Jan Gerber are now taking advantage of the beta cluster to stage change for production. We have set up Captcha and IP blocking to reduce the amount of spam being generated on the beta wikis. An overview document
has been started to help introduce new people to the beta cluster.
Antoine Musso automated the process of updating extension code from Git/Gerrit using Ant, for purposes of automating unit tests on extensions. The first experiment was with the Wikidata project which revealed issues with other parts of the build scripts, so this is still a work in progress. Antoine will be out for much of August, and his primary focus has been on Beta Labs, so work in this area will resume in September.
The Wikimedia-specific portions of the Report Card were split out so that Limn can be used by third-parties.
Modified lucene lsearchd code to use log4j appender for udp2log rather than manually editing codebase. Also built scribe and scribe log4j appenders for sending arbitrary logs to scribe. No movement on log format changes.
The Wikimedia Foundation is seeking a Bug Wrangler
to work on management of bugs.
Summer of Code 2012/management
Wikimedia Foundation engineering project documentation
- The Wikidata project is funded and executed by Wikimedia Deutschland.
The Wikidata team has made good progress towards their first roll-out. The initial deployment plans are being made and the Hungarian Wikipedia community stepped up to be the first to use the interwiki part of Wikidata in a few weeks. This also means the demo system needs to be tested more. If you have five spare minutes, have a look at the demo system and report any bugs you might find there so they can be fixed before the initial deployment.
The team also started to collect future use cases of Wikidata that should be kept in mind during development. You are invited to refine them or add your own. Additionally, the team is looking for feedback on the third iteration of the storyboard for linking Wikipedia articles in the future.
The engineering management team continues to update the Software deployments page weekly, providing up-to-date information on the upcoming deployments to Wikimedia sites, as well as the engineering roadmap, listing ongoing and future Wikimedia engineering efforts.