PHP optimisation tips

Some tips for developers who want to make their code faster.

Benchmarking
Confirm any proposed performance measure by benchmarking it.

You can benchmark by timing, for example, in eval.php:

> $t = microtime(true); for( $i = 0; $i < 100000000; $i++ ) { md5('testing'); } print microtime(true)-$t; 13.321235179901

MediaWiki has benchmarking scripts in maintenance/benchmarks, including the generic utility benchmarkEval.php:

php benchmarkEval.php --code="md5('testing')" --inner=1000000 --count=100

Multiple timing runs will vary substantially. To minimise the impact of this:


 * Use a large loop count to benchmark for a long time — at least 10 seconds.
 * Avoid any other system activity while the benchmark runs. If you are using your laptop, kill your browser and anything else that might wake up periodically.
 * Don't use a VM if there is any other activity on the same hardware.
 * Avoid unnecessary I/O within the benchmark. For example, disable logging.
 * Benchmark a small amount of code in a tight loop, so that the relative effect of the intervention will be larger.

Extremely accurate performance measurements can be done using hardware performance counters. On Linux you can use perf. On other operating systems you can use vtune -collect-with runsa [please confirm].

For example, running a benchmark under `perf stat -e instructions` will give a metric which is not affected by background activity on the same host. It tells you how much machine code is executed, which may be a decent model for cost depending on what you're measuring.

Array separation
Arrays in PHP have copy-on-write semantics. When you modify an array, if the reference count is one, the modification is done in place and is fast. If the reference count is more than one, the array needs to be copied before the modification can take place. This is termed "separation" in the PHP source, since you start with one value and separate it so that there are two values.

Array separation means that innocuous-looking code can be surprisingly slow.

Modification of the iteration pointer by reset requires separation, so the getFirstElement call takes 14ms.

Anything that takes an array as input and returns a modified version of the array will require O(N) time.

The above code snippet requires n(n+1)/2 = 500500 single-element operations for a running time of 1ms. The alternative using $a[] = 1 is 50 times faster.

Some observations about built-in functions:


 * reset and end should be avoided. In PHP 7.3+ we will have array_key_first and array_key_last as alternatives.
 * array_merge should be replaced by a loop when the result replaces the first argument, especially if the first argument is large.
 * array_pop and array_key_last are fast as long as there are not too many holes at the end of the array
 * array_push is O(1), although $a[] = ... is faster unless there is a very large number of arguments.
 * array_splice is slow despite its apparent in-place semantics. It always copies its input arguments.

Constant factors
PHP code is compiled to an array of operations (an oparray). The PHP VM traverses the oparray, executing each opcode as it finds it. Some ops are faster than others.


 * Local variable access is heavily optimised and is generally fast. The only slow thing you can do with local variables is accessing them by name, e.g. $$varname = 1 -- this builds a hashtable of local variables at the first instance of such code in a function.
 * Function calls are relatively slow due to the need to initialise a new stack frame. Userspace function calls are slightly slower than built-in function calls.
 * Some things that look like functions are actually special opcodes. This makes them faster. For example, count, strlen, isset and empty are fast.
 * Object construction is comparable to a function call.
 * Access to declared properties of an object is pretty fast, since this has been heavily optimised. It is faster to access a declared property than an undeclared property or an element of an associative array. But in a tight loop it might still be worthwhile to copy an object property to a local variable.

The PHP compiler is not as smart as a C compiler because it operates under strict time constraints. Do not assume the PHP compiler is going to help you out by optimising away your slow code. With a few exceptions, what you write is what it executes.

Caching and memoization
Memoization means caching the result of a function call in a way that is transparent to the caller. It is often an easy and effective way to improve performance.

The optimisation operator
Most languages have an optimisation operator. The optimisation operator makes any code faster when the operator is placed in front of the code in question.

In other words, it is faster to not do a thing than to do the thing. Engineers need to push back against expensive requirements. Product managers do not necessarily understand the cost of a requirement in terms of user observed latency or hardware cost.

Wirth's law states that software becomes slower as hardware becomes faster. Increasing abstractions and features keep pace with hardware improvements so that the benefits of hardware improvements are never seen by users. In fact, latency tends to increase over time.

Wirth's law is inherent in the way engineers think and operate. Engineers cannot resist the dopamine hit which comes from introducing a neat abstraction. It must be consciously and continuously fought to preserve a reasonable user experience.