Engineering metrics in August:
Major news in August include:
Wikipedia Engineering Meetup (15 August 2012, San Francisco, USA)
- Approximately 100 people attended the first Wikipedia Engineering Meetup in San Francisco, in a series meant to showcase Wikimedia's interesting engineering problems and products to the local developer community. Tentatively, the meetup will happen every two months at the Wikimedia offices in San Francisco, and will consist of three 15-minute engineering presentations, followed by a question & answer period bracketed by mingling. The inaugural meetup featured talks about Mobile engineering, Analytics and the VisualEditor.
Wikimedia's internationalization and mobile teams are tentatively planning a volunteer outreach event in Bangalore, India, November 9–11. More information will come in September.
Are you looking to work for Wikimedia? We have a lot of hiring coming up, and we really love talking to active community members about these roles.
- Continuing from his earlier MySQL work, Asher Feldman built additional MySQL servers for each of the clusters in Ashburn, all in preparation for the primary data center migration in the coming quarter. In the Tampa datacenter, he added a new server to the English Wikipedia (en.wp) cluster and replaced the en.wp master with newer hardware. A database tree chart provides the latest information on our database clusters.
- Thanks to Varnish Software support, we have a new build of Varnish that comes with persistent cache and the video streaming bug fix. Mark Bergsma tested the build on one of the mobile Varnish servers, and so far it has been stable. In the coming days, Mark will be updating the 'upload' Varnish cluster at Ashurn (Eqiad) and move traffic through them.
- Mark has also successfully updated and deployed the NetApp storage servers and enabled replication from Tampa to Ashburn. He started working on migrating some of the systems that are mounting to nfs1 to this new server. With this, Mark has resolved another critical path item on the migration to the new primary data center. In addition, Jeff Green started using the nas1-a to archive the Fundraising banner logs.
- The usual traffic surge due to the new school year caused an increase in package loss on our Tampa internal network. With Chris Johnson's help, Mark upgraded the links between the racks. Earlier this month, Leslie Carr and Chris installed a new passive optics (CWDM) system between the 2 floors of the Tampa datacenter hosting our servers, giving us effectively a 4X capacity increase.
- Jeff continues to make progress in the Fundraising infrastructure buildup at Ashburn (EQIAD). With Leslie's help, the new firewall was set up and Jeff deployed a build host, a logging host, the application cluster and built the pxeboot, preseed and puppet configurations. He has also enabled nagios-nsca monitoring for those new hosts.
- 'Originals' were successfully copied over to the Swift cluster from the ms7 (a NFS filer for images). In addition to serving thumbnails (which was completed last month), Swift is now also the primary object store for Images and multimedia contents. In the current setup, MediaWiki reads from Swift only, but writes to both the Swift cluster and the legacy NFS servers (ms5 & ms7). In the coming months, we will be disabling ms5 & ms7, and run solely on Swift.
- This month was mostly spent on upgrading all of the Labs infrastructure. OpenStack nova and glance were upgraded to the essex release. The keystone service was added and now handles all authentication for Labs-related OpenStack services. OpenStackManager was upgraded to support keystone, use the OpenStack API rather than the EC2 API, and to have multi-region support, in anticipation of the new region we'll be bringing up in Eqiad. Testing of ceph as a replacement of gluster for project storage continued during this month; more testing is required. A lot of puppet work has been done to start moving our spaghetti code-style repository into modules.
- We've been focusing on the media infrastructure, working on the migration to Swift, and also taking a hard look at scaled media usage and storage. Since scaled media (thumbnails) could be regenerated at will from the original, we are going to evaluate treating thumb storage as a medium-to-long term cache rather than permanent disk storage as we have been doing. Running the numbers on existing thumbs turned up some interesting results. We're still bringing mirrors online; we've gotten all the hardware and network issues worked out with WANSecurity and have started copying over the data. They'll have files most mirrors don't host: page view files, archives, and more, as well as a full copy of our media files.
In August, the team focused on overhauling the code design of VisualEditor so that it is more modular and easier to extend. This involves creating and documenting a number of formal APIs at each point in the architecture, that means a developer does not have to understand the entire code base to be able to add new features. The early version of the VisualEditor on mediawiki.org was updated twice (wmf9
), fixing a number of bugs, as well as adding a much-improved link inspector to help users build links, and a save dialog that better guides users on what to do.
The Parsoid team reached a major milestone in August by implementing a template output encapsulation algorithm, and started to use it to support expanded template round-tripping. In parallel with this and the usual smaller tweaks, work on a C++ port of the parser was started. The port is expected to allow an efficient integration with PHP and Lua, improve performance and allow the parallelization of the parser in the longer term.
This month, we developed a range of new features
for Article Feedback
, which is now deployed on 10% of the English Encyclopedia. Improvements include the ability to view feedback from my watched pages
, hide my posts
, give feedback on help pages
— as well as enable editors to clear all flags
and administrators to protect articles to limit feedback
on controversial pages. These and other features can be tested on this sample article feedback page
or on the central feedback page
(please report any bugs on Bugzilla
). For more information about this tool, check our project overview
. We are now in our productization phase (support for more platforms, scalability, code re-factoring, localization, metrics, mobile) and are aiming for a full release to 100% of English Wikipedia by the end of October 2012 — with other wiki projects starting later this year. This new tool was created in collaboration with the Wikipedia community and developed by Fabrice Florin
, Matthias Mullie
, Pau Giner
, Ryan Kaldari
, Roan Kattouw
, Oliver Keyes
, Chris McMahon
, Benny Situ
, Dario Taraborelli
, Howie Fung
and Terry Chay
, in association with OmniTI.
This month, we deployed a 'pre-release' version of Page Curation
on the English Wikipedia. This new product includes two main features: 1) the New Pages Feed
, a dynamic list of new pages for review by community patrollers; and 2) the Curation Toolbar
, an optional panel on article pages, which enables editors to quickly review these pages. The Curation Toolbar provides a variety of tools that let users get page info, mark a page as reviewed, tag it, mark it for deletion, send WikiLove to page creators — or jump to the next page on the list. This month, we completed development on final features, such as the ability to send a personal note to page creators, as well as special logs, links and templates, as outlined in this help page
and these project slides
. We are now preparing for a full release of Page Curation on the English Wikipedia at the end of September 2012. Check out the current beta version
on the English Wikipedia, as well as the latest version
on Wikimedia Labs (confirmed editors can click "Review" to curate any article on the New Pages Feed
). Please report any bugs on Bugzilla
. Formerly called 'Page Triage', this new tool was designed in close collaboration with the Wikipedia community and developed by Ryan Kaldari
, Benny Situ
, Fabrice Florin
, Oliver Keyes
, Brandon Harris
, Vibha Bamba
, Terry Chay
and Howie Fung
Micro Design Improvements
This month saw the creation of the Micro Design Improvements team, an ad-hoc group of staffers who look at small but useful design improvements to make to MediaWiki. Vibha Bamba
, Oliver Keyes
and Munaf Assaf
(with assistance from Howie Fung
) worked on the design for our first feature, which simplifies the "edit" window. The team is very grateful to Terry Chay
for securing technical assistance in the form of Rob Moen
, who has agreed to donate his 20% time to working on this project. In the coming month, we plan to talk to the community about this feature, deploy it, and work on more of the items on our to-do list
; if you have any thoughts about our current work or ideas for future projects, please leave us a note on the project talkpage
Editor engagement experiments
We deployed and ran the first iteration of post-edit feedback
, testing whether various types of positive feedback after submission of an edit increase the productivity and retention of Wikipedia editors. (The results will be publicized soon.) We are currently working on the next iteration of post-edit feedback and on a new experiment
which centers around the account creation process. We've also deployed click-tracking to the English Wikipedia community portal
, account creation page, and the article edit form, and devised a tool for generating reports from the raw log data. Working with Asher Feldman, we've also architected an alternative data pipeline for event tracking
, and begun its deployment.
After the sprint in July, there was no notable progress as the team were busy with other urgent projects. There was the start of a community discussion
about where global gadgets will be hosted for access across the Wikimedia cluster, and about their licensing (as they have generally been caught by the content license, which is less suitable for code).
deployed Echo to MediaWiki.org, but it was temporarily turned off pending a bug
that has recently been fixed. Vibha Bamba is working on some of the UI backlog.
Flow Portal/Project information
Work on Flow will officially start in January. In the meantime, preparatory work will focus on Database sharding
2012 Wikimedia fundraiser
The fundraising team completed 3 very successful sprints, completing more work in each sprint than some of the previous sprints combined (Sprint 7: Auditing and Reconciliation
; Sprint 8: Amazon, and a bunch of other random stuff
; and Sprint 9: Adyen, Amazon wrap-up, and Listeners
). During the sprints, the team integrated with Amazon Payments, added features to CiviCRM to enable the settlement of donations in multiple currencies, added features (including the beginning of an API) and made bugfixes to CentralNotice, discovered and dealt with an issue in the global credit card processing system, and began integration on a new payment processor that will give the fundraising team access to additional payment methods around the world.
Internationalization and localization tools
The team continued to work on the Universal Language Selector
(ULS): the display settings dialog was completed and is now able to show and set WebFonts, similarly to the WebFonts extension which will be phased out once the ULS is deployed. The lists of languages were tweaked to emphasize those likely to be chosen by the user, based on their location and past selections. Translation memory
was deployed on all Wikimedia sites using the Translate extension, and CLDR (Common Locale Data Repository
) plurals support was merged into the core master. User experience
testing of the Translate extension
is in progress. Initial analysis for i18n metrics
was also completed and published. The team conducted its monthly office hours, a bug triage and development showcase
Development on Project Milkshake continued at a lower priority due to the focus on the Universal Language Selector
this month. We are getting some basic blocks together in our GitHub repositories
Wiki Loves Monuments mobile application
The mobile team released three new betas for the WLM app and published the last one on Google Play
. We finalized many new features like saving for later, showing current location, and cleaned up data issues. The contest started on September 1st.
Configuration of partner data is now more configurable and various additional partners are now in testing mode. List of launches to follow.
Open Path delivered numerous new builds of the Wikipedia J2ME app this month. Patrick Reilly and the Global Development team did internal testing to validate that they were performing as expected. We're now feature-complete and spending cycles on making the app perform better on low memory devices. We expect to complete this project in a few weeks.
Wikipedia over SMS & USSD
Production hardware is now in place and running the latest builds via puppet configuration.
- Our work mostly focused on the 0.9 RC2 (see CHANGELOG) which should be released soon after we port kiwix-serve to MS/Windows. Kiwix UI localization was improved, thanks to the translatewiki.net Translation Rally; four new languages have been added. For the ZIM autobuild project, we have migrated the server to a datacenter in Zurich, Switzerland, and coding work is ongoing. We are planning our next projects and seeking volunteer help.
Chad Horohoe spent a good amount of time fixing issues upstream, including two big improvements to the project listing page. He also cleaned up the Gerrit installation on Labs to more accurately mirror production—also cleaning up the production setup along the way. Initial research was done into replication to GitHub. Finally, Gerrit 2.5 is nearing release, which brings a bunch of new features (like plugins) and fixes. The Labs instance of Gerrit is already running the release candidate. In September, we'll be upgrading to Gerrit 2.5 and getting repositories replicated out to GitHub.
In August we concentrated
on the testwiki, and found some issues that need addressing. The project is on hold for now, but we expect to resume in September. All Wikimedia sites are now using Swift as the primary storage mechanism for multimedia files such as images (both original images as well as image thumbnails). We continue to write images to our old NFS server as well, though we plan to turn this off in September. Some specialized extensions still use the old NFS server, such as the Math and Timeline extensions. These will be migrated to Swift soon (tentatively in September).
The Scribunto extension has been deployed to test2.wikipedia.org and www.mediawiki.org, and several editors are porting existing templates such as Cite over to Lua (see recent changes in the "Module:" namespace
Site performance and architecture
In addition to the Lua work, Tim Starling did some investigation of parallel parsing, but that project may go on the backburner until after Parsoid
goes into production. Tim Starling wrote a new Redis-based client for session handling. This will be important for the Virginia Datacenter Migration.
Admin tools development
added two new major features to the AbuseFilter extension
, global rules and global throttling. Code review was done by Tim Starling
and the changesets were merged successfully. These features will allow the creation of filters that apply to all Wikimedia wikis, which is effective for stopping cross-wiki spambots. Jack Phoenix
released the Phalanx extension
and began working on making it suitable for deployment on Wikimedia servers. During the rest of 2012, the team will work on through their roadmap
: CentralAuth mass account locking, improving, stabilizing and reviewing Phalanx, and evaluating the effectiveness of the current CAPTCHA system and possible replacements for it.
Code review management
The analytics team released code review graphs
, and Brian Wolff created a tool showing a view of unmerged patchsets
and a "Wall of Shame" for authors with several patchsets requiring improvement. Both tools helped inform the discussion
about the code review situation. Sumana Harihareswara encouraged authors
to take steps to get their code reviewed faster, and actively requested reviews for many submissions.
Security auditing and response
QA and testing
This month saw an emphasis on hiring, with excellent candidates soon to be hired for all the positions that will be closely related to QA. With AFTv5
in place in production, testing focus shifted to NewPagesFeed and Page Curation Toolbar
. Due to conflicts of holidays, vacations, time of year, meetings, and general complications, we decided not to hold an explicit community test event for NewPagesFeed/Curation, but test environments and a test plan
will be available for those interested to explore this new feature. NOTE: announcement for QA Engineer and possibly Mobile QA will have been made by the time this is published.
The MediaWiki core and its extensions are now automatically updating, and the beta cluster
is now always using the very latest version published under the master branch of each Git repository.
The TitleBlacklist extension
is the first MediaWiki extension for which tests are now automatically run via Jenkins. The dashboard is at https://integration.wikimedia.org/ci/job/Ext-TitleBlacklist/
and build status is sent back to Gerrit
The team started preparations to move hosting from Labs to a dedicated server (stat1001), and is investigating how to package a nodejs app.
The Wikimedia Foundation is nearing the end of its hiring process for a new Bug Wrangler, who will lead triage
activities and train volunteers to triage as well. In the interim, volunteers such as Krenair and Thehelpfulone have stepped in to partially fill the gap. Volunteer Matanya Moses is planning to lead an online bug triage meeting
, focusing on unreviewed patches, on September 5th.
Summer of Code 2011/management
A wikitech-l discussion
of new user account creation
drew former GSoC student Akshay Agarwal out of the woodwork to complete work on his SignupAPI extension
. WMF engineers are planning to collaborate with him this autumn. Also, WMF engineers plan to review student Salvatore Ingala's Gadgets work
as they improve ResourceLoader
Summer of Code 2012/management
In the end, eight of Wikimedia's nine Summer of Code 2012
students passed, and each posted a wrapup post on wikitech-l
. Their achievements
have already led to improvements in the Wikimedia Incubator
, and improvements to Semantic MediaWiki and UploadWizard will reach users soon. Improvements to SVG translation, realtime editing collaboration, and other functionality are also progressing as the students clean up, merge, and iterate on their summer work.
Volunteer coordination and outreach
continued to follow up on contacts, recruit new contributors to the Wikimedia tech community, and mentor newer contributors. She granted Developer access
and Gerrit project ownership requests, and worked on planning for the upcoming Bangalore outreach event. Hiring for a volunteer engineering coordinator to work on volunteer coordination and outreach is almost finished. Community discussion topics included Git and Gerrit's difficulty
, bug triages
, new mailing lists
, transparency and collaboration in feature design
, MediaWiki releases
and a potential community organization
, GSoC's effectiveness
, code review
, and appreciation for each other
Wikimedia engineering 20% policy
is coordinating WMF engineers' efforts to spend 20% of their work time on code review and other efforts benefiting the entire Wikimedia engineering community. Their highest priorities are fixing new urgent bugs, which surface during deployments, and addressing the Gerrit merge queue
, especially for backlogged components such as Wikidata, UploadWizard, and ProofreadPage. Some participants are concentrating on bug triage, documentation, and the extensions awaiting review for deployment
. Some teams were exempt in August from the 20% policy, because of pressing deadlines.
- The Wikidata project is funded and executed by Wikimedia Deutschland.
The team has been working further on getting the code-base ready for a first deployment. You can try the current status on the demo system. Work focused on diff, undo, migrating to using the Universal Language Selector, and providing useful edit summaries in recent changes and article history. They also published a draft for the export to RDF.
The team published tasks to get started to make it easier to contribute to Wikidata.
Joan Creus released pywikidata, a framework for Wikidata bots.
The engineering management team continues to update the Software deployments page weekly, providing up-to-date information on the upcoming deployments to Wikimedia sites, as well as the engineering roadmap, listing ongoing and future Wikimedia engineering efforts.