Engineering metrics in March:
Some metrics were disrupted this month by the move to git.
Major news in March include:
Hover your mouse over the green question marks ([?]) to see the description of a particular project.
- Chennai Hackathon (17 March 2012, Chennai, India) — Yuvaraj Pandian and volunteer Srikanthlogic held this one-day hackathon for experienced developers. Yuvaraj's report praised the 21 participants for coming up with 13 completed hacks, including 2 core MediaWiki patches, 3 Tamil Wikipedia userscript updates, and 2 new deployed tools.
- Berlin hackathon (1–3 June 2012, Berlin, Germany) — Registration opened in March for this three-day "inreach" hackathon for the Wikimedia technical community, including MediaWiki developers, Toolserver users, bot writers and maintainers, Gadget creators, and other Wikimedia technologists. The event, hosted by Wikimedia Deutschland, will mostly involve focused sprints, bugbashing, and other coding, with a few focused tutorials and trainings on Git, Lua, Gadgets changes, or other topics of interest. Wikimedia Deutschland will also use this event to consult on and discuss the Wikidata structured data project. Developers are encouraged to register now, and to mention in the registration form if they will need financial subsidies or help with accommodation or visa. Developers who will need that sort of assistance are urged to register as soon as possible, preferably before May 1st.
Are you looking to work for Wikimedia? We have a lot of hiring coming up, and we really love talking to active community members about these roles.
- Pau Giner joined the Product team as Interaction Designer (announcement).
- Wikimedia Deutschland announced the composition of their team working on the Wikidata project (announcement).
- Ashburn data center [?] — Mark Bergsma completed the Squid to Varnish conversion for image caching, and successfully deployed Varnish on 8 servers in our Ashburn data center for about half a day. During that time, he monitored and assessed the behavior of the software and the impact on the servers. Where there are currently 24 Squid servers, 8 varnish servers would provide sufficient capacity to replace them. However, there are concerns about overloading the NIC cards and the risk of concentrating too much cache on each server. Mark is now working on improving the Varnish implementation and possibly adding a few more servers. Also, because the Ashburn data center seem to be experiencing a higher server outage ratio than the Tampa site, Rob Halsell reviewed and added extra earth grounding to the cabinets, as a precaution. We are monitoring the situation to see if that does reduce server issues. Peter Youngmeister and Jeff Green are making good progress in testing, preparing and bringing up the Ashburn Search clusters. Full scale testing has just started and results have been quite promising. In the coming weeks, they will conduct limited trial deployments of the Ashburn clusters, running in parallel to the ones at the Tampa data center. The Ashburn data center added network peering, and Leslie Carr peered over 10 other big sites/ISPs with our network shortly after that, thus reducing latency especially to Europe, Japan and Hong Kong for many of our users there (and reducing bandwidth costs too).
- Amsterdam data center [?] — Mark Bergsma restacked, re-arranged and decommissioned servers, and started racking the new router and switches. The actual network switchover at Evoswitch is still to be scheduled; however, Mark did replace the old core router in Vancis and deployed the new one there.
- Media Storage [?] — After addressing earlier issues with the Swift deployment, Ben Hartshorne re-deployed it and it has been stable since. Ben removed the original testing hardware from the cluster and added the final production node to the cluster, bring a total of 5 new Swift nodes to be the thumbnails object store at Tampa. Swift is also now running in the Labs environment and ready to be used by other Labs projects that interact with Swift in production. Volunteer attention to the Swift Labs cluster is welcome to improve monitoring, analyze the configuration, and in any other way understand this component of our infrastructure better.
- Wikimedia Labs [?] — Gluster project storage is now available. In total 71TB are available for use. Each project has a default quota of 300GB that can be increased on request. Soon, public datasets (such as XML dumps) will also be available within Labs. There were two Labs downtime events this month. Both were due to glusterfs instance storage. The first was due to a limitation in the FUSE filesystem (in regards to recreating deleted directories) and was relatively short (roughly 2 hours). The second was due to malfunctioning hardware, which caused the glusterfs storage to go into a splitbrain situation that was unresolvable. There was no dataloss, but the instance's images had to be recovered manually from gluster's backend. Total downtime for the second outage was roughly 24 hours. Andrew Bogott has finished his work on the SharedFS support in Nova, with a gluster driver. Proposal for this for inclusion to nova is set for the folsem release; this will be discussed at the upcoming OpenStack design summit. Andrew has begun work on adding support for updating MediaWiki on nova changes.
Backups and data archives
- Data Dumps [?] — We sorted out the network issues to our mirror sites on our end by replacing a switch. We set up a new host to hold a copy of all uploaded media for copying to our mirror sites, and and the first copy of this media to an external mirror is now underway. Mirror sites will also be able to pick up a list of dump files to copy (the last 1, 2 or 5 good dumps) in a few different formats, produced by a new script. The first copy of recent dumps to a gluster share available to Labs users is available, but already out of date; one process is too slow, so a script is being tested that will dispatch copy requests to several processes running at once. Christian Aistleitner is working on PHPUnit tests now for the maintenance scripts used for the dumps. We've improved our process for deployment of new versions of the XML dump scripts, so that new code can be rolled out more often.
- Performance engineer Asher Feldman published an article explaining how site performance is measured at the Wikimedia Foundation. He notably presented graphite and a limited version available at http://gdash.wikimedia.org.
- Operations engineer Ryan Lane, who is leading the Wikimedia Labs project, was featured on the Wikimedia Blog this month.
- We started investigating the possibility of a caching center on the West Coast of the US. We believe it would improve the experience for users in Asia and America's West Coast.
- Readers reported intermittent performance issue on March 25th. Tim Starling investigated and determined it was a network problem. Leslie Carr quickly found the root cause, redirected the traffic and thus resolved the problem. Rob Halsell later swapped and replaced the problematic fiber and transceiver.
- Visual editor — A big decision in March was to move forward with contentEditable (CE), implemented by Wikia developers Inez Korczynski and Christian Williams, instead of Editable Surface (ES). Trevor Parscal and Roan Kattouw focused on the data model. Rob Moen worked on the user interface, first on right-to-left support in ES, then on getting the UI working in CE. Gabriel Wicke and Audrey Tang continued their work on Parsoid and need to decide on RDFa vs. microdata. They created a dump grepper with syntax highlighting, and used it to analyze existing wikilink/image syntax use.
- Article feedback — Fabrice Florin worked with OmniTi to develop new features for version 5 of the Article Feedback Tool (AFT5). This month, the team created new feedback links, new monitoring tools for editors and for oversighters, and started development on an abuse filter and a relevance filter. Brandon Harris and Heather Walls helped enhance the article feedback design, creating new icons for the monitoring tools. The AFT5 team also published its first report on phase 1 of AFT5, co-authored by Fabrice and Dario Taraborelli, with Howie Fung, Oliver Keyes and Aaron Halfaker. Roan Kattouw helped solve some tricky technical issues and deployed several new releases with the team this month. Current goals for this project are to complete feature development by the end of May, with full deployment in the summer.
- Page Triage — This month, the new editor engagement team developed the first prototype of Page Triage, which provides an enhanced list of articles to be triaged by community patrollers. Benny Situ completed APIs for retrieving metadata about articles and their authors, while Ian Baker and Ryan Kaldari developed new features for the list view enabling users to see that article metadata, working with new designs by Brandon Harris. Oliver Keyes acted as community liaison and Fabrice Florin managed this project with Howie Fung. Current goals for this project are to complete development of the list view in April, and start development of advanced features like the zoom view, for release in May.
- Article Creation Workflow — Benny Situ, Ryan Kaldari, Brandon Harris, Andrew Garrett, and Ian Baker released ACW to Labs (development is ongoing, so bugs are expected). Oliver Keyes is collecting feedback from the community. Fabrice Florin started to facilitate new development as product manager.
- 2012 Wikimedia fundraiser — The team continued to work on GlobalCollect recurring donations, with the code review remaining to be done. They also engaged in cleanup after an eventually successful upgrade of our production instance of CiviCRM from 3.4 to 4.1.1, the migration to git, and Mingle training. There was an issue with an imbalance of chargebacks, due to a spinning down of the Winter fundraising flagging fraud in GlobalCollect, that was resolved.
Internationalization and Editor Engagement Experimentation
- Internationalization and localization tools — The team started to develop (with UI/UX contractors) the UI for a Universal language selector for desktop and mobile. They also added keymaps for language support to Narayam, added Lohit font updates from upstream to WebFonts, fixed bugs, reviewed code for localization support in MediaWiki 1.19, and discussed language support metrics. Niklas Laxström migrated the Translatewiki.net workflow to reflect the move to git/gerrit.
- Editor Engagement Experimentation [?] — The newly created, cross-functional Editor Engagement Experimentation team will focus on engineering for experimentation around strategies to reverse stagnating/declining participation in Wikimedia projects, and will effectively launch on April 16. It will be composed of people from the Community and Engineering/Product departments, tasked specifically with conducting small, rapid experiments designed to improve editor retention. This is intended to go beyond the projects that are already being worked on; the purpose of this team will be to identify the possible changes we don't yet know about. The engineering team will report to Alolita Sharma, with two new software developer positions to be hired in the current fiscal year.
- Kiwix UX initiative — The team decided not to use Mozilla Gecko as the platform to port Kiwix to Android; an alternative is cordova-qt. Work continued on Kiwix 0.9 RC1, the largest release ever made for Kiwix. New ZIM files are regularly released for offline reading using Kiwix. In particular, for the first time this month, a full ZIM version of the English Wikipedia was made available, containing about 4 million articles, 11 million redirects, and 300,000 math images (see online demo).
- MediaWiki 1.19/Roadmap — We have now finished deploying MediaWiki 1.19 to all Wikipedia sites, including the Chinese language wikis (zh*). However, we are monitoring some post-deploy issues. We are keeping an eye on site performance; there's been a slight regression in our parser cache hit rate. The new diff colors have been temporarily reverted, and Trevor Parscal and Timo Tijhof plan to look into the subject. Marcin Cieślak and Aaron Schulz have cleaned up areas where the CheckUser feature briefly stopped working properly.
- Continuous integration — This activity was somewhat deprioritized in March in favor of the git migration. Nonetheless, Jenkins is now running the PHPUnit test suite and reporting tests results in Gerrit interface. This will help catch possible culprits as soon as a patch is submitted. Timo Tijhof wrote workflow specifications for continuous integration. Over the course of April, the Jenkins/Gerrit interaction will be polished and we will start looking at Selenium and bringing Testswarm back in action.
- Git conversion — We've now moved MediaWiki core and WMF-deployed extensions over to Git and Gerrit, and for those directories Subversion is now read-only. We've communicated links and workflow planning, and the new procedure to add and remove people from Gerrit project owner groups. A summary of the move was published in the Wikipedia Signpost.
- Multimedia — Michael Dale and Jan Gerber have TimedMediaHandler set up on beta. It is running into issues related to the Labs beta setup that are preventing the test plan from being run. Labs and QA leads are working with them to get to the point where testing can be run. QA support has been lined up. Swift is deployed for thumbnails. There are still some corrupted thumbnails in the Squid cache, but all known issues with new thumbnail corruption have been resolved. Work is underway to test and deploy Swift for original images, with work scheduled to complete in late May.
- Analytics/Reportcard — The analytics team is finetuning the interface of the new Report card. The test site in Labs is currently unavailable. The team is working towards showcasing a first report card prototype by April 6th, the date of the next metrics meeting for the Wikimedia Foundation. This prototype will replicate readers and pageviews. The team will also make a serious attempt at getting editor data up and running, and add the ability to add and signal benchmarks, for the April 6th meeting.
Technical Liaison; Developer Relations
The engineering management team continues to update the Software deployments page weekly, providing up-to-date information on the upcoming deployments to Wikimedia sites, as well as the engineering roadmap, listing ongoing and future Wikimedia engineering efforts. In March, a particular focus of the engineering management team was also the annual goal and budgeting process.