Wikimedia Platform Engineering/Site performance and architecture/status

Last update on: 2013-02-monthly

2012-06-29
An initial investigation has begun on the possibility of upgrading from PHP 5.3 to PHP 5.4. Benchmarks are very promising, but a security enhancement we currently using with PHP 5.3 (Suhosin) is not yet available for PHP 5.4, so the team is debating whether to carry on without it, as well as estimating the performance penalty introduced by this patch. More improvements have been made to Ganglia and Graphite.

2012-06-monthly
An initial investigation has begun on the possibility of upgrading from PHP 5.3 to PHP 5.4. Benchmarks are very promising, but a security enhancement we are currently using with PHP 5.3 (Suhosin) is not yet available for PHP 5.4, so the team is debating whether to carry on without it, as well as estimating the performance penalty introduced by this patch. More improvements have been made to Ganglia and Graphite.

2012-07-monthly
Tim Starling investigated an LLVM PHP bytecode converter this month, which looked like a promising direction for performance optimization (slides here). The theoretical gain seems pretty significant, but actual performance he was able to observe was disappointing and we probably won't go in that direction. Asher Feldman has deployed an upgraded version of the parser cache server (db40) and the results have been impressive. Comparing 90th percentile and 99th percentile cache response times averaged over several days (July 3-5) for the parser cache server versus the last 8 hour for new improved parsercache shows 90th percentile response time dropping from 53.6ms to 7.17ms, and 99th percentile response time dropping from 185.3ms to 17.1ms. This is relevant to every page request from logged in and cookied logged out users so should have a meaningful impact on the user experience.

2012-08-monthly
In addition to the Lua work, Tim Starling did some investigation of parallel parsing, but that project may go on the backburner until after Parsoid goes into production.

2012-10-monthly
Tim Starling committed changes to our implementation of libxml to use the PHP memory allocator, rather than using malloc (the C standard for allocating memory directly managed by the operating system). This will allow us to have per-page limits on the complexity of pages in a way that more closely mirrors their impact on our cluster.

2012-11-monthly
Various improvements to the job queue have been made to avoid CPU time wasted on duplicate jobs and redundant page cache purges. Changes have also been made to make it possible to edit heavily used templates without timeouts.

2012-12-monthly
After an assessment by Asher Feldman, Patrick Reilly and Tim Starling, the RDB database patch was canceled. Instead, in the short term, a separate vertically partitioned data cluster will be provided as a temporary storage until a horizontally scalable architecture can be finalized. Matthias Mullie is modifying the RDB-dependent ArticleFeedbackToolv5 to remove that dependency through an abstraction layer. When a sharded or horizontally scaled solution is provided, AFTv5's abstraction will be migrated.

An initial assessment of various non-MySQL alternatives for using Aaron Schulz's JobQueue core patch in 1.20 is being done for Echo. Because of the time it takes to exhaust the Echo queues, it is written to bypass the JobQueue through direct calls. Luke Welling is abstracting the JobQueue for Redis, ZeroMQ, and others.

2013-01-monthly
A patch to allow moving the DB job queue to another cluster is under review. An experimental redis-based job queue patch also exists in gerrit.

2013-02-monthly
<section begin="2013-02-monthly"/>A patch to allow moving the job queue to another DB cluster has been merged and a patch for support for an alternative Redis based queue is in review in gerrit. Currently, job related operations consume a significant portion of production database master walltime.<section end="2013-02-monthly"/>