Wikimedia Platform Engineering/MediaWiki Core Team/Backlog

This page contains the backlog for the MediaWiki Core team. This backlog is maintained by the Product Manager for Platform (currently Dan Garry).

Items designated as high priority are either in active development, or are being considered for development during either the current or next quarter. Items are designated as medium priority if they lack the necessary details to be worked on or if they are not planned for the current or next quarter. Items are designated as low priority if they are recognised as good ideas, but are not currently being considered in the next few quarters.

For more details of the process behind this backlog, see the process subpage.

In progress

 * HHVM
 * CirrusSearch

Revision storage revamp
This would involve:
 * Specifying an API for storage of revisions; anything implementing that API could be trivially used with MediaWiki.
 * Refactor core so that revision storage is an implementation of that API.

There are two parts to revision storage: revision metadata and revision text. For revision text storage, we have External Storage, which was very innovative in its day, but is showing its age. Due to the design, reads and writes are largely focused on a single node (where the most recent revisions are), so adding nodes doesn't necessarily improve performance. The compression scripts may be fine, but we don't know for sure because we're a bit afraid of running them to find out. Thus, we're not really getting the benefit of the compression offered by the system.

A likely solution for storing revision text is Rashamon. Metadata could also be stored in Rashamon as well, but we may still need a copy of the metadata in our SQL database for fast and simple querying. Regardless of our plans for Rashamon, there is significant work involved in MediaWiki to abstract away the many built-in assumptions that our code has about retrieving revision metadata directly from a database.

This project would be done in service of a broader push toward a service-oriented architecture. We would use this project to set an example for how we foresee other aspects of MediaWiki being turned into modular services, and would likely use this as an opportunity to further establish use of the value-object pattern exemplified in TitleValue. As part of this work, we would also provide a simple SQL-based implementation for developer installations of MediaWiki. We would also like to work on the infrastructure for providing proper authentication tokens for API access. Possible solutions for this include oAuth 2, Kerberos, or a custom solution built on JSON Web Tokens; the solutions for non-interactive applications are not as clear as they are for interactive web applications (where basic oAuth works pretty well).

This project also offers us an opportunity to make revision storage a nicely abstracted interface in core, and could be quite complementary to Rashamon, giving Gabriel some clean API points to plug it in. It also offers us the ability to establish the template for how MediaWiki can be incrementally refactored into cleanly-separated services.

After some investigation, we decided to pass on this project. What made us consider this project was indications that our current revision table for English Wikipedia is becoming very unwieldy, and it's really time for a sharded implementation. However, Sean Pringle made some adjustments, and we're comfortable that we have enough headway that this project isn't as urgent as we first thought.

Job queue work
There have been repeated calls in the past for a "proper" job queue implementation for Wikimedia production use, where "proper" means "not relying on cron". Some recent discussion on the operations@ mailing list led the team to at least consider this as a possible near-term project.

The tracking bug for that comes closest to describing this project is Bug 46770 “Rewrite jobs-loop.sh in a proper programming language”. That is probably an oversimplification, though. In addition to removing shell scripting, we would at least like better monitoring of the system and more sophisticated system for assigning priorities to jobs. Aaron Schulz has already made the system much better with many incremental improvements over the past 18 months that, taken together, represent a pretty massive improvement to the features and reliability of the system. Hence, we've decided not to make further work on this a priority until an urgent need is identified.

Push Phabricator to production
The Engineering Community Team is currently working toward consensus around standardizing project management, and ended up with the possibility of a much bigger project: replacing many of our tools (Gerrit, Bugzilla, Trello, Mingle, maybe even RT) with Phabricator. This would need a substantial investment in engineering resources, but it's not entirely clear where this would come from. Platform Engineering may need help from other groups in order to accomplish this. It is not likely that members of MediaWiki Core would be substantially available for such a project in the coming quarter.

OpenID connect
One thing we've long wanted to do as complementary work with our OAuth work is to implement some form of standardized federated login. We briefly considered work on OpenID connect, which integrates well with OAuth as a project. Doing work in this area would make Phabricator integration with our Wikimedia project logins very simple.

File upload pipeline
The Multimedia team will, at some point, need to embark on substantial work on our file upload pipeline. Aaron Schulz in particular has deep expertise in this area. At this point, the Multimedia team hasn't requested help in this area, so it's premature for us to get involved, but this may come up again in July.

Localisation cache write performance improvement
One thing that makes our current deployments unreliable is the fact that we have many workarounds to make it possible to avoid rewriting the l10n cache. Ideally, we would use "scap" no matter how small the change we're making to the site, but that's not practical due to l10n cache rebuild and distribution often taking over 10 minutes. Some of us have some nascent ideas about how to make this faster, but we haven't yet agreed on a solution we want to pursue.

Other

 * Admin tools
 * See Admin tools development
 * git-deploy / deploy-tooling
 * will be fleshed out within the wider "Deployment status and improvements" work (on point: Greg)
 * Caching improvements:
 * Include version in thumbnail URL (17577)
 * Separate Cache-Control header for proxy and client (48835)
 * JobQueue
 * Implement immediate priority jobs (replaces DeferredUpdates basically?)
 * Blame map extension (we should do enough to avoid blocking Luca's work)
 * Figure out Parsoid strategy
 * SecurePoll cleanup
 * Central code repo
 * LogStash
 * buuugs
 * User preferences
 * Requests for comment/Redesign user preferences
 * HHVM
 * Non-web things, like job queue & terbium crons are probably the lowest hanging fruit
 * OpenID connect
 * OpenID provider
 * File upload pipeline (in conjunction with Multimedia team)
 * Scribunto improvements
 * Central wiki repo
 * Central git repo
 * ElasticSearch category intersection with simple JS UI embedded on CategoryPage, presented to the user as "filtering" or "refining" a category.
 * Setup.php speed improvements and service registry
 * 22102: Add edit notices/warnings in a consistent fashion, and put them in a sensible order
 * Phabricator
 * Project management tools/Review/Options
 * Dedicated cluster for private/closed wikis (e.g. officewiki)
 * Infrastructure for "claim an edit" feature