Wikimedia Platform Engineering/MediaWiki Core Team/Backlog

This page contains the backlog for the MediaWiki Core team. This backlog is maintained by the Product Manager for Platform (currently Dan Garry).

Items designated as high priority are either in active development, or are being considered for development during either the current or next quarter. Items are designated as medium priority if they lack the necessary details to be worked on or if they are not planned for the current or next quarter. Items are designated as low priority if they are recognised as good ideas, but are not currently being considered in the next few quarters.

For more details of the process behind this backlog, see the process subpage.

In progress

 * HHVM
 * CirrusSearch

Job queue work
There have been repeated calls in the past for a "proper" job queue implementation for Wikimedia production use, where "proper" means "not relying on a shell script". Some recent discussion on the operations@ mailing list led the team to at least consider this as a possible near-term project.

The tracking bug for that comes closest to describing this project is Bug 46770 “Rewrite jobs-loop.sh in a proper programming language”. That is probably an oversimplification, though. In addition to removing shell scripting, we would at least like better monitoring of the system and more sophisticated system for assigning priorities to jobs. Aaron Schulz has already made the system much better with many incremental improvements over the past 18 months that, taken together, represent a pretty massive improvement to the features and reliability of the system. Hence, we've decided not to make further work on this a priority until an urgent need is identified.

Benefits
 * Better monitoring
 * Fix poor code quality
 * Better performance (less overhead due to less need to start processes)
 * The ability to wrap runJobs.php in a per-section PoolCounter pool. It has recently become apparent (Ie5bb11b0) that if the CPU power available to the job runners is going to be efficiently utilised, then the number of job runners running on any given DB master needs to be limited.
 * Make it straightforward to provide a configuration file. Currently Puppet configures jobs-loop.sh in two different ways simultaneously: by changing the command line parameters via JR_EXTRA_ARGS in mw-job-runner.default, and by altering the code itself by making the shell script be a template. I don't think either is an especially elegant configuration method.
 * The ability to check for the exit status from runJobs.php, somehow avoiding a tight loop of respawns.
 * An error log might be nice too.

Other work:
 * Implement immediate priority jobs (replaces DeferredUpdates basically?)

Push Phabricator to production
The Engineering Community Team is currently working toward consensus around standardizing project management, and ended up with the possibility of a much bigger project: replacing many of our tools (Gerrit, Bugzilla, Trello, Mingle, maybe even RT) with Phabricator.

File handling and thumbnailing
The Multimedia team will, at some point, need to embark on substantial work on our file upload pipeline, how we name files, and how we handle thumbnailing. Aaron Schulz in particular has deep expertise in this area. At this point, the Multimedia team hasn't requested help in this area, so it's premature for us to get involved, but this may come up again in July.
 * Next step: scoping by Aaron and/or Gilles
 * Include version in thumbnail URL (17577)

Elasticsearch

 * Elasticsearch category intersection with simple JS UI embedded on CategoryPage, presented to the user as "filtering" or "refining" a category.

Setup.php improvements

 * Setup.php speed improvements and service registry

Next step: Aaron to explain what this means :-)

Dedicated cluster for private/closed wikis

 * Dedicated cluster for private/closed wikis (e.g. officewiki)

Authn/z service to interact with other services
As we move towards independent services (SOA), we need a system for identifying users and their rights across services, which involves building infrastructure for providing proper authentication tokens for API access. Possible solutions for this include oAuth 2, Kerberos, or a custom solution built on JSON Web Tokens; the solutions for non-interactive applications are not as clear as they are for interactive web applications (where basic oAuth works pretty well).

Revision storage revamp
This would involve:
 * Specifying an API for storage of revisions; anything implementing that API could be trivially used with MediaWiki.
 * Refactor core so that revision storage is an implementation of that API.

There are two parts to revision storage: revision metadata and revision text. For revision text storage, we have External Storage, which was very innovative in its day, but is showing its age. Due to the design, reads and writes are largely focused on a single node (where the most recent revisions are), so adding nodes doesn't necessarily improve performance. The compression scripts may be fine, but we don't know for sure because we're a bit afraid of running them to find out. Thus, we're not really getting the benefit of the compression offered by the system.

A likely solution for storing revision text is Rashamon. Metadata could also be stored in Rashamon as well, but we may still need a copy of the metadata in our SQL database for fast and simple querying. Regardless of our plans for Rashamon, there is significant work involved in MediaWiki to abstract away the many built-in assumptions that our code has about retrieving revision metadata directly from a database.

This project would be done in service of a broader push toward a service-oriented architecture. We would use this project to set an example for how we foresee other aspects of MediaWiki being turned into modular services, and would likely use this as an opportunity to further establish use of the value-object pattern exemplified in TitleValue. As part of this work, we would also provide a simple SQL-based implementation for developer installations of MediaWiki. We would also like to work on the infrastructure for providing proper authentication tokens for API access (see below). Possible solutions for this include oAuth 2, Kerberos, or a custom solution built on JSON Web Tokens; the solutions for non-interactive applications are not as clear as they are for interactive web applications (where basic oAuth works pretty well).

This project also offers us an opportunity to make revision storage a nicely abstracted interface in core, and could be quite complementary to Rashamon, giving Gabriel some clean API points to plug it in. It also offers us the ability to establish the template for how MediaWiki can be incrementally refactored into cleanly-separated services.

After some investigation, we decided to pass on this project. What made us consider this project was indications that our current revision table for English Wikipedia is becoming very unwieldy, and it's really time for a sharded implementation. However, Sean Pringle made some adjustments, and we're comfortable that we have enough headway that this project isn't as urgent as we first thought.

OpenID connect
One thing we've long wanted to do as complementary work with our OAuth work is to implement some form of standardized federated login. We briefly considered work on OpenID connect, which integrates well with OAuth as a project. Doing work in this area would make Phabricator integration with our Wikimedia project logins very simple.

See also: OpenID provider (request for OpenID for use in Tool Labs)

Localisation cache write performance improvement
One thing that makes our current deployments unreliable is the fact that we have many workarounds to make it possible to avoid rewriting the l10n cache. Ideally, we would use "scap" no matter how small the change we're making to the site, but that's not practical due to l10n cache rebuild and distribution often taking over 10 minutes. Some of us have some nascent ideas about how to make this faster, but we haven't yet agreed on a solution we want to pursue.

Next step: RFC from Ori and/or Bryan Davis

Admin tools
See Admin tools development

Next step: Dan to offer specific choices for the group to consider.

git-deploy / deploy-tooling

 * will be fleshed out within the wider "Deployment status and improvements" work (on point: Greg)


 * Dependency: Localisation cache

Moving VCL logic to MediaWiki

 * Separate Cache-Control header for proxy and client (48835)

Central code repo
Gadgets, Lua, templates

LogStash

 * buuugs

User attribution

 * Infrastructure for "claim an edit" feature
 * Making user renaming simple and fast

User preferences
Possible high priority for another team.
 * Requests for comment/Redesign user preferences

Edit notices

 * 22102: Add edit notices/warnings in a consistent fashion, and put them in a sensible order

Unprioritized

 * SUL finalization
 * Outputpage refactor and/or better UI <-> backend separation
 * Better Captcha infrastructure (possibly as part of thumbnailing revamp)