Wikimedia Platform Engineering/MediaWiki Core Team/Quarterly review, January 2014

This will be an outline for a review of the MediaWiki Core team which will take place at WMF on January 21, 2014.

As of right now, we're sorting through items on Wikimedia MediaWiki Core Team/Ideas list

This is an outline for a review of the MediaWiki Core team which will take place at WMF on January 21, 2014.
 * Notes for this review

Team
See Wikimedia MediaWiki Core Team page.

Changes since last review: (none)

Search (ElasticSearch/CirrusSearch) deployment
Nik Everett and Chad Horohoe have been working on this through the past quarter. We optimistically projected we'd be able to complete this deployment by the end of 2013, and disable our previous search technology (lsearchd) in January 2014. This rollout is going pretty well, but it's going to take a bit longer to get fully switched over. Several wikis are now using CirrusSearch as the primary search engine, constituting roughly 8% of our search traffic as of this writing. Also as of this writing, we are indexing English Wikipedia. If that indexing goes smoothly and the index size is manageable, we may attempt switching it next. We currently project that we'll be done rolling this out by the end of March with the ability to turn off lsearchd sometime shortly thereafter, subject to hardware availability and how smoothly our gradual rollout goes.

Architecture formalization
We held fortnightly IRC meetings to discuss many in-progress RFCs. Additionally, much of the planning for the Architecture summit happened in the quarter.

DevOps sprint
The main areas of focus for this sprint are: We haven't finalized the personnel allocation in this area.
 * git-deploy
 * monitoring / reporting
 * deployment script improvements
 * multi-site awareness
 * Labs related

Wikimania scholarships app
Bryan Davis refactored many parts of the Scholarship code, working with Chad Horohoe and Katie Filbert on code review and some coding.

Auth systems
Chris Steipp will continue work in this area as a solo effort, focusing mainly on refining our password expiration protocol as well as improving our password hashing.

Security auditing and response
Chris Steipp will also continue his work on security auditing. The review queue for this has stacked up, with reviews promised for Limn, TimedMediaHandler v2, Kraken, and GLAM upload, among others.

Performance
Since this is Ori's first quarter in his new role, his priority will be to understand and articulate the current state of site performance, with a focus on performance blind spots—i.e., performance bottlenecks that fall outside the purview of current Foundation engineering projects and existing expertise. He plans to bring this work to bear by building a set of tools and visualizations that make these bottlenecks visible and by building tools that help MediaWiki developers and gadget and template authors profile, understand, and optimize the performance impact of their work. An example of this is the mediawiki.inspect module, which lets savvy users scrutinize the static asset payload of a MediaWiki page in their browser debug console, and the instrumentation of real user monitoring in Ganglia and Graphite for page views and for VisualEditor.

Ori has also been working on augmenting client-side asset caching by using the Web Storage API to cache ResourceLoader modules (change If2ad2d80d). He would like to see this work through to completion and deployment in this quarter.

Ori also plans to continue working with the TechOps team on improving the state of profiling and monitoring tools on the cluster. A realistic goal for this quarter is to finish the work of migrating Graphite from Tampa to Ashburn and improving the usability of the interface provided by MediaWiki core for logging performance metrics to a remote host for aggregation and analysis.

Finally, Ori is interested in working with Dan Garry and with folks in Analytics, Fundraising and Features to begin the work of correlating site performance with editor engagement metrics and other product goals, with a view toward being able to relate the value of performance and infrastructure work to the Foundation's mission. This is a long-term project, obviously. A specific way in which it could be advanced this quarter is to ensure that engagement data collected from new users (edits attempted / saved, etc.) is annotated with latency measurements.

HipHop VM Deployment
As stated in the previous review, HHVM is maturing quite quickly, and it has the potential to vastly improve performance. However, although the set of features supported by HHVM is nominally sufficient for running MediaWiki in production, a lot of work must be done to verify that the set of software components that are essential to our MediaWiki deployment work properly when running under HHVM. This work has a very long tail and we hope to engage other teams in this work so that they help us see it through. To do this, some infrastructure for collaboration needs to be in place: The big-ticket compatibility issue that we already know we have is the need to patch or rewrite several PHP extensions (wikidiff2, wmerrors, LuaSandbox) so that they run under HHVM.
 * Packages and Puppet manifests for provisioning HHVM on Ubuntu Precise.
 * Jenkins job that tests patches to core and extensions using HHVM.
 * Jenkins job that runs the full suite of unit tests against HHVM.
 * An HHVM role for MediaWiki-Vagrant.

Tim Starling, Chad Horohoe, Ori Livheh, Aaron Schulz and Antione Musso plan to play a role in this. The goal for this coming quarter is to have at least one production service migrated over to HipHop VM.

Search
Chad and Nik plan have all sites using CirrusSearch (and thus, ElasticSearch) as the primary search engine by the end of quarter.

Deployment-related Development
LogStash is a new logging framework which should make it much easier to view and query system logs for purposes of debugging. Several team members have been and will continue to be involved in work on this as it nears initial rollout: Bryan Davis, Ori Livneh, Aaron Schulz and Antoine Musso.

Bryan Davis will work on requirements for a migration from scap to another tool.

Security
Password storage update, Security reviews

Admin tools development
Dan Garry will be scoping this project

SecurePoll cleanup
Dan Garry will be scoping this project, with Brad Jorsch doing development work if it can be scoped on time.

Allocations
This is our planned allocation for January through March of 2014:
 * Tim Starling: HHVM, Architecture/RFC Review
 * Bryan Davis: git-deploy, LogStash
 * Nik Everett: Search, Search in Wikidata
 * Chad Horohoe: HHVM, Search
 * Brad Jorsch: SecurePoll cleanup, PDF rendering, API Maintenance, Scribunto maintenance
 * Ori Livneh: HHVM, git-deploy, LogStash, Performance Infrastructure
 * Aaron Schulz: HHVM, git-deploy, LogStash, Password storage update, l10n cache
 * Chris Steipp: Password storage update, Security reviews
 * Antoine Musso: HHVM, LogStash, JobQueue, Zuul upgrade
 * Sam Reed: Deployments
 * Dan Garry: Admin tools development (scoping out), SecurePoll (scoping out), SUL finalisation