Final negotiations and coordination are still ongoing for the data center RFP, but we expect to be able to make an announcement soon.
Labs metrics in March:
Number of projects: 149
Number of instances: 310
Amount of RAM in use (in MBs): 1,288,704
Amount of allocated storage (in GBs): 14,925
Number of virtual CPUs in use: 635
Number of users: 2,907
The Labs Ops team has spent the month shepherding projects from the Tampa cloud to the Ashburn cloud. Dozens of volunteers contributed to the move, and all tools and projects have now been copied to or rebuilt in Ashburn. Some projects and tools are in a non-running state pending action on the part of their owners or admins. Ashburn Labs is running OpenStack Havana, with NFS for shared storage.
The usage stats this month are quite a bit different from last month. Quite a number of obsolete instances have been purged, and last month's stats may have included some data center duplication.
Tampa data center
During March, the Ops team has been decommissioning and shutting down a lot of hosts in the old Tampa data center, including all former appservers. The amount of energy consumed in the old data center has been greatly reduced. A few hosts are going to be migrated to another floor in the existing data center and physical data center work is coming up.
In March, the VisualEditor team continued their work on improving the stability and performance of the system, and added some new features and simplifications, helping users edit and create pages more swiftly and easily. Editing templates is now much simpler, moving most of the advanced controls that users don't often need into a special version of that dialog. The media dialog was improved and stream-lined a little, adding some hinting to the controls to explain how they work a bit more. The cursor entry points inserted by VisualEditor next to items like images or templates to give users somewhere to put the cursor now animate on hover and cursor entry to show that they're special. The overall design of dialogs and controls was improved a little to make it flow better, like double-clicking a block to open its dialog. A new system for quickly and simply inserting and editing "citations" (references based on templates) neared completion and will be deployed in the coming month. The deployed version of the code was updated four times in the regular releases (1.23-wmf17, 1.23-wmf18, 1.23-wmf19 and 1.23-wmf20).
March saw the Parsoid team continuing with a lot of unglamorous bug fixing and tweaking. Media / image handling in particular received a good amount of love, and is now in a much better state than it used to be. In the process, we discovered a lot of edge cases and inconsistent behavior in the PHP parser, and fixed some of those issues there as well.
We wrapped up our mentorship for Be Birchall and Maria Pecana in the Outreach Program for Women. We revamped our round-trip test server interface and fixed some diffing issues in the round-trip test system. Maria wrote a generic logging backend that lets us dynamically map an event stream to any number of logging sinks. A huge step up from our console.error based basic error logging so far.
We also designed and implemented a HTML templating library which combines the correctness and security support of a DOM-based solution with the performance of string-based templating. This is implemented as a compiler from KnockoutJS-compatible HTML syntax to a JSON intermediate representation, and a small and very fast runtime for the JSON representation. The runtime is now also being ported to PHP in order to gauge the performance there as well. It will also be a test bed for further forays into HTML templating for translation messages and eventually wiki content.
This month the Core Features team focused on improvements to how Flow works with key MediaWiki tools and processes. We made changes to the history, watchlist, and recent changes views, adding more context and bringing them more in line with what experienced users expect from these features. We also worked on improvements to the API and links tables integration. On the core discussion side, we released a Flow thank feature, allowing users to thank each other for posts, and began work on a feature to close and summarize discussions. Lastly, we continued work on rewriting the Flow front-end to make it cleaner, faster, and more responsive across a wide number of browsers/devices, which will be ongoing over the next month.
In March, the Growth team primarily focused on bug fixing, design enhancements, and refactoring of the GettingStarted and GuidedTour extensions, which were recently launched on 30 Wikipedias. We updated icons and button styles, rewrote the interface copy, and refactored the interface to be more usable in non-English languages. We also began work on a significant refactor of the GuidedTour API, in order to support interactive tours that are non-linear. Non-linear tours will not depend on a page load to run, which will enable better support for tours in VisualEditor, among other things. Last but not least, we made progress on measuring the impact of GettingStarted across all wikis where it is deployed, with results for the first 30 days of editor activity expected in early April.
This month, thanks to the work of Facebook Open Academy student JJ Liu, we added a new type of notification for course pages: users are now notified whenever they get added to a course. We also fixed inconsistencies with interface messages, user rights, and the deletion of institutions from the system.
Yuri continued analytics work on SMS/USSD pilot data. Post hoc analysis was performed on WML usage after its deprecation; it is still low, although obtaining more low-end phones to check for how well HTML renders and how to enhance the HTML could be useful. Post hoc analysis was also performed on anomalous declines and growth spurts in log lines (not strictly related to pageviews); in the former it much had to do with API changes and in the latter it had much to do with an external polling mechanisms.
With the assistance of the Apps team, User-Agent, Send App Feedback, and Random features were added to the forthcoming reboots of the Android and iOS apps, while making the Share feature for Android allow for a different target app each time and providing code review assistance on the Android and iOS apps code; proof of concept for fulltext search was started on iOS. Wikipedia for Firefox OS bugfixes were also pushed to production. Screencap workflows and preload information was put together for the Android reboot with respect to Wikipedia Zero as well.
The team worked with Ops on forward planning in light of the extremely infrastructure-oriented nature of the program. Quarterly review as held with the ED, VP of Engineering, and the W0 cross-functional team, and the W0 cross-functional team reviewed presentation material for publication. The team also continued work on additional proxy and gateway support. To help partner tech contacts, the team worked on reformatting the tech partner introductory documentation.
Finally, the team explored proactive MCC/MNC-to-IP address drift correction, and will be emailing the community for input soon.
Wikipedia Zero (partnerships)
Smart, the largest mobile operator in the Philippines, is giving access to Wikipedia free of data charges through the end of April. They announced the promotion in a press release. Ingrid Flores, Wikipedia Zero Partner Manager, visited the Philippines and arranged a meeting with local community members and Smart. They are now exploring ways to collaborate in support of education. The partnerships team kicked off account reviews with the 27 existing Wikipedia Zero partners, to update the implementation, identify opportunities for collaboration in corporate social responsibility (CSR) initiatives and get feedback on the program. The account reviews will continue for the next few months. Last, we continued recruiting for Wikipedia Partner Manager for the Asia region.
MediaWiki's LocalisationUpdate extension was rewritten by Niklas Laxström to modernize its internal architecture to be able to support JSON message file formats. Kartik Mistry released the team's monthly MediaWiki Language Extension Bundle (MLEB 2014.3) with the latest version of LocalisationUpdate (see release notes). Niklas Laxström also started migrating the Translate extension's translation memory and translation search back-end from Solr to ElasticSearch in line with Wikimedia's search migration. David Chan continued his work on input method support for the VisualEditor project.
Santhosh Thottingal, Kartik Mistry and Niklas Laxström fixed numerous bugs and made performance improvements in jquery.webfonts, jquery.ime and jquery.uls. Amir Aharoni started collecting metrics on usage of Universal Language Selector.
Runa Bhattacharjee and Kartik Mistry set up a manual testing infrastructure using the Test Case Management System (TCMS) to help get greater participation from the volunteer community of software tools and features developed by the team. Volunteer testing is expected to be kickstarted for language software this coming month. The team's monthly office hour was hosted by Runa Bhattacharjee on March 12. An overview of webfonts with advantages and challenges of using them on Wikimedia sites was also published by the team.
Santhosh Thottingal and David Chan continued development and technology research on the Content Translation project. Development was focused specifically on updates to the side-by-side translation editor and section alignment of translated text. Kartik Mistry and Santhosh Thottingal worked on infrastructure for testing the Content Translation server. David Chan continued his technology research on sentence segmentation.
Pau Giner updated the Content Translation UI design specification incorporating review comments from UX and product reviews. The team also participated in a review of the Content Translation project with the product team leadership.
The team continued to work on porting C extensions to HHVM. Tim Starling did major work on a compatibility layer allowing Zend extensions to be used by HHVM, and started further work on making the layer compatible with newer HHVM interfaces. The team has made a preliminary deployment of HHVM to the Beta cluster, but this still needs further debugging before it is useful to a wider audience.
The Beta Cluster has been migrated from the Tampa data center to the Ashburn data center. In the move, a ton of cleanup and Puppetization work was done. This will make future Beta Cluster work easier. In addition, the Beta Cluster is getting closer to a place where we can test our current main deployment tool known as "scap" along with future/other deployment tools.
The team continued on the rewrite of scap into python (from Bash scripts + PHP), improving both performance and maintainability in addition to being in a better position to move to a new tool in the future. We have also started doing SWAT deploy windows twice a day (Monday to Thursday) which has greatly increased momentum for many developers who would otherwise have to wait until the weekly deployment cycle.
In March we upgraded to the newest version of Elasticsearch and expanded onto more wikis. We also started a performance assessment which has started showing us the work required to use Cirrus as the primary search back-end for the larger wikis. We then started in on that work.
Support of production application during applicant review period continued in March. A dataset of applicants passing the phase 1 review criteria who had opted-in to sharing application details with chapters and thematic organizations was prepared and delivered to Foundation staff. The beta testing server was migrated from the Tampa data center to the Ashburn data center as a component of the Labs environment migration. The new beta server in Labs is now managed via the MediaWiki-Vagrant role::wikimania_scholarships puppet role and labs-vagrant. This should make keeping development changes and the testing application in sync easier in the future.
The QA team continues to identify and report issues in a timely way. Of particular interest in March was that an automated test uncovered an issue in the interaction of the MobileFrontend and VisualEditor extensions. This is exactly the kind of cross-cutting concern that our QA systems are designed to uncover. It is likely that we will be in a position to discuss these systems at the Wikimania conference in London.
There was a substantial effort to migrate the Beta cluster over from the Tampa data center to the Ashburn, VA ("eqiad") data center. This was led by Antoine Musso with assistance from Bryan Davis and many others.
Besides a particular focus on MobileFrontend browser tests in March, we have also made available some new features, in particular shared code to upload files properly in all browsers, the ability to check for ResourceLoader problems in any test in any repository, and a basic wrapper in order to use the Mediawiki API from within browser tests to set up and tear down test data.
In addition to ongoing communications support for the engineering staff, and contributing to the technical newsletter, Guillaume Paumier edited and published a series of essays on the Wikimedia Tech blog written by Google Code-in students, who shared their impressions, frustrations and surprises as they discovered the Wikimedia and MediaWiki technical community.
The bulk of work to create community metrics around five Key Progress Indicators is completed, and now we are polishing help strings and usability details. The next step is to share the news with the community and start looking at bottlenecks and actions. Check:
A page about Upstream projects was drafted collaboratively in order to start mapping the key communities where we Wikimedia should be active, either as contributor / stakeholder, or promoting our own tools. We helped selecting participants sponsored to travel to the Zürich Hackathon 2014 in May.
We reached a milestone in our ability to deploy Java applications at the Foundation this month when we stood up an Archiva build artifact repository. This enables us to consistently deploy Java libraries and applications and will be used in Hadoop and Search initially.
The first Analytics use case for this system will be Camus, Linked-In's open source application for loading Kafka data into Hadoop. Once this is productized, we'll have the ability to regularly load log data from our servers into Hadoop for processing and analysis.
We did some significant architectural work on WikiMetrics this month to prepare it for its role as our recurrent report scheduling and generation system. The first use case for this system will be the Editor Engagement Vital Signs project, which will provide daily updates on key metrics around participation.
We continue to investigate network issues between our data centers that are causing occasionally delivery issues. As noted above, we are currently deploying Camus, our software for transferring data between Kafka and Hadoop.
This month we concluded the first stage of work on metrics standardization. We created an overview of the project with a timeline and a list of milestones and deliverables. We also gave an update on metrics standardization during the March session of the Research and Data monthly showcase. The showcase also hosted a presentation by Aaron Halfaker on his research on the impact of quality control mechanisms on the growth of Wikipedia.
We published an extensive report from a session we hosted at CSCW '14 on Wikipedia research, discussing with academic researchers and students how to work with researchers at the Foundation.
We submitted 8 session proposals for Wikimania '14, authored or co-authored by members of the research team.
We completed the handover of Fundraising analytics tools and knowledge transfer in preparation for a new full-time research position that we will be opening shortly to support the Fundraising team.
We continued to provide support to teams in focus area (Growth and Mobile) with an analysis of the impact of the rollout of the new onboarding workflows across multiple wikis; an analysis of mobile browsing sessions and ongoing analysis of mobile user acquisition tests. We also supported the Ops team in measuring the impact of the deployment of the ULSFO cluster, which provides caching for West USA and East Asia.
The team worked on making ranks more useful. From now on, by default the property parser function and Lua always return the values with the "preferred" rank or, when none is available, the one with the "normal" rank. This allows for example to exclude past mayors when asking Wikidata for the mayor of a city. Additionally, considerable speed improvements have been made; browsing Wikidata is now a lot faster. Diffs between versions of pages on Wikidata have also been improved to make it easier to see what changes were made to an item. Last but not least, the user interface redesign research went on.
The engineering management team continues to update the Deployments page weekly, providing up-to-date information on the upcoming deployments to Wikimedia sites, as well as the annual goals, listing ongoing and future Wikimedia engineering efforts.