Wikimedia Platform Engineering/MediaWiki Core Team/Quarterly review, October 2013

This is an outline for a review of the MediaWiki Core team which will take place at WMF on October 15.
 * Notes for this review

Team
See Wikimedia MediaWiki Core Team page.

Changes since last review:
 * Bryan Davis started as Senior Software Engineer, on loan to Multimedia and focused on bugfixing and technical debt.
 * Dan Garry started as Product Manager for Platform
 * Ori Livneh moved into the Senior Performance Engineer role

Auth systems / Security
Chris Steipp, Aaron Schulz, and Brad Jorsch did the bulk of the software development on this, with support from Tim Starling on software design, May Tee-Galloway, Brandon Harris, and Jared Zimmerman on user experience, and Chad Horohoe, Ryan Lane and Ori Livneh on the SSL deployment.

We had planned to wrap this up early in the quarter. It unfortunately lasted the duration of the quarter, and still isn't quite finished.


 * SUL - most work on this completed in August. We had not fully anticipated the usability concerns with our initial implementation, so substantial rework was needed.  We additionally were nagged by some pernicious high-priority bugs, the last of which we finally got to the bottom of in September.
 * SSL deployment - significant work by the MediaWiki Core group was not in plan for this quarter, but our decision to deploy this took substantial time away from other aspects of the project.
 * OAuth - this is another case where we underestimated the amount of work that was needed, in particular on usability. After substantial iteration, we have this in a pretty good place.
 * OpenID - We're relying on volunteer development for this. It will also need substantial usability work, so we would like to postpone this work until 2014.

We're happy with where SUL, SSL and OAuth are right now, but want to do better with how we get there for our next project of similar scope.

Search (ElasticSearch) deployment
This project is primarily driven by Chad Horohoe and Nik Everett in MediaWiki Core, along with support from Faidon Liambotis, Peter Youngmeister and Asher Feldman in TechOps.

We have developed the CirrusSearch extension to MediaWiki, which is an application glue layer to ElasticSearch. ElasticSearch is an externally developed search indexing component.

Italian Wiktionary, Catalan Wikipedia and English Wikisource are all running CirrusSearch now. Additionally, we deployed to all "closed" wikis. Further feature refinement and bugfixing are ongoing, with roughly 2 to 3 deployments a week.

Architecture formalization
Tim Starling has largely been spearheading this work, with support from Brion Vibber and Mark Bergsma.

The Architecture guidelines document continues to progress, though there is still substantial work needed to come to consensus on many points of direction, and to make the document a well-organized and well-scoped document. Our RFC list is now much more organized, and there have now been two IRC meetings that have happened in consecutive weeks recently (September 25 and October 2).

DevOps sprint
This was postponed due to work on auth systems.

Upcoming quarter
Personnel allocations are still a little bit up for grabs at this point.

Search (ElasticSearch) deployment
We anticipate we'll be able to complete this deployment by the end of 2013, and disable our previous search technology (lsearchd) in January 2014. Nik Everett and Chad Horohoe will continue on this project through the quarter.

Architecture formalization
Tim Starling will continue to put effort in this area, though will likely spend some time on other projects. We plan to continue with RFC review meetings on IRC, more-or-less weekly.

Our first Architecture summit is planned for January 2014, where we plan to make substantial progress toward consensus on architecture changes planned in RFCs. We plan to make this an annual event.

DevOps sprint
The main areas of focus for this sprint are: We haven't finalized the personnel allocation in this area.
 * git-deploy
 * monitoring / reporting
 * deployment script improvements
 * multi-site awareness
 * Labs related

Wikimania scholarships app
This project has a January 2014 deadline. We haven't yet figured out what projects we need to pull resources from in order to do this work, and will likely need help from other groups to complete this in a timely manner.

Auth systems
Chris Steipp will continue work in this area as a solo effort, focusing mainly on refining our password expiration protocol as well as improving our password hashing.

Security auditing and response
Chris Steipp will also continue his work on security auditing. The review queue for this has stacked up, with reviews promised for Limn, TimedMediaHandler v2, Kraken, and GLAM upload, among others.

Performance
Since this is Ori's first quarter in his new role, his priority will be to understand and articulate the current state of site performance, with a focus on performance blind spots—i.e., performance bottlenecks that fall outside the purview of current Foundation engineering projects and existing expertise. He plans to bring this work to bear by building a set of tools and visualizations that make these bottlenecks visible and by building tools that help MediaWiki developers and gadget and template authors profile, understand, and optimize the performance impact of their work. An example of this is the mediawiki.inspect module, which lets savvy users scrutinize the static asset payload of a MediaWiki page in their browser debug console, and the instrumentation of real user monitoring in Ganglia and Graphite for page views and for VisualEditor.

Ori has also been working on augmenting client-side asset caching by using the Web Storage API to cache ResourceLoader modules (change If2ad2d80d). He would like to see this work through to completion and deployment in this quarter.

Ori also plans to continue working with the TechOps team on improving the state of profiling and monitoring tools on the cluster. A realistic goal for this quarter is to finish the work of migrating Graphite from Tampa to Ashburn and improving the usability of the interface provided by MediaWiki core for logging performance metrics to a remote host for aggregation and analysis.

Finally, Ori is interested in working with Dan Garry and with folks in Analytics, Fundraising and Features to begin the work of correlating site performance with editor engagement metrics and other product goals, with a view toward being able to relate the value of performance and infrastructure work to the Foundation's mission. This is a long-term project, obviously. A specific way in which it could be advanced this quarter is to ensure that engagement data collected from new users (edits attempted / saved, etc.) is annotated with latency measurements.

HipHop
HHVM is maturing quite quickly, and it has the potential to vastly improve performance. However, although the set of features supported by HHVM is nominally sufficient for running MediaWiki in production, a lot of work must be done to verify that the set of software components that are essential to our MediaWiki deployment work properly when running under HHVM. This work has a very long tail and we hope to engage other teams in this work so that they help us see it through. To do this, some infrastructure for collaboration needs to be in place: The big-ticket compatibility issue that we already know we have is the need to patch or rewrite several PHP extensions (wikidiff2, wmerrors, LuaSandbox) so that they run under HHVM.
 * Packages and Puppet manifests for provisioning HHVM on Ubuntu Precise.
 * Jenkins job that tests patches to core and extensions using HHVM.
 * Jenkins job that runs the full suite of unit tests against HHVM.
 * An HHVM role for MediaWiki-Vagrant.

The HHVM development team has expressed a strong commitment to ensuring that HHVM is compatible with the most popular PHP frameworks. We would like to capitalize on this momentum. The team is generally very interested in moving this project forward, and working on it would boost morale.

Admin tools sprint
We sadly keep kicking this can down the road, but unfortunately we need to keep kicking, as we're oversubscribed this quarter.

Deferred

 * Central code repo
 * Configuration database
 * Gerrit improvements (further BZ integration)