Manual:Parser cache/Serialization compatibility

Since the ParserCache persists serialized ParserOutput for a relatively long time, we must consider what happens if we change the code that reads and writes the data in any way. This includes adding or removing fields, making fields optional or required, changing fields from single value to multi-value, etc. Note: this extends to the serialization of any sub-structure that may be contained in a ParserOutput object, including "extension data" and "page props" that is not known to MediaWiki core.

In this context, we must consider both backward and forward compatibility:

Backward compatibility ensures that current code is able to read serialized data produced by older code. In the short term, this is a critical requirement when deploying new code to production: the new code must not break when reading older data, and must be able to actually use it. Note: it is not sufficient to simply discard old data - this would effectively mean running the site on a cold ParserCache, forcing all pages to be re-parsed on the next request. This is bound to overload application servers and bring down the site!

However, we also require backwards compatibility long term, to allow third parties to update MediaWiki without losing or breaking their cache content. This means that code in a given release of MediaWiki has to be able to handle ParserCache entries generated by all previous releases that we support updating from (as of this writing in December of 2022, this means two long term support releases).

In addition to backwards compatibility, forward compatibility is also required to ensure safe deployment to production to prevent breakage when a deployment is rolled back. Rolling back a deployment causes older code to read cache entries that were generated by newer code. To make this safe, it is necessary to first deploy a change that ensures we can read data in the future format, and then wait long enough that we are sure that we are not going to roll back to a version without this forward compatibility code (that is, at least one deployment "train", better two). Only then can we safely deploy the code that will write the new data format to the cache, since we can be sure that, if we need to roll back, we will roll back to a version which can handle that new data.

To safeguard against breaking serialization compatibility, MediaWiki has PHPUnit tests in place that assert that current code is able to successfully unserialize older forms of cache data, and that the serialization format is not changed unintentionally. This by itself will however only enforce backwards compatibility. To also ensure forward compatibility, the following procedure has to be followed:


 * You have a commit that changes what data gets written to the cache. Let's call this commit S.
 * Confirm that it is backwards compatible by running.
 * Once it passes, make sure all changes are committed.
 * Now, run  (where 1.40 is replaced by the actual development version of MediaWiki, and "_with_stuff" is replaced by some description of your change - NOTE: no dashes allowed!). This will create serialization files with your modification applied.
 * Use  followed by   to stash away the newly created files.
 * You now need to create a new commit that establishes forward compatibility. Let's call that commit F.
 * To create this new commit, first switch to the master branch . If you like, create a feature branch using  (where "my-stuff" is replaced with a description of your change).
 * Use  to get the stashed data files.
 * Run  to see if the old code is able to handle the new data. This emulates the case where the production environment is rolled back to an old version of the code, after the new version has already been writing to the parser cache.
 * Add forward-compatibility code as needed, until the tests pass.
 * Create commit F, including the new data files and any modifications you had to make to establish forward compatibility.
 * Push F for review.
 * Rebase the original commit S on top of the new commit F. The forward compatibility code may be removed during the rebase (typically it would be replaced by the backwards compatibility code).
 * The rebased commit S should be marked in some way so it does not get merged accidentally (e.g. using "WIP" or "DNM" or CR-2). It must only be merged after patch F has been live for long enough that a rollback is unlikely (for the WMF, that means at least one "train", better two to be safe)
 * In the rebased commit S, make the new data files the default for the current version of MediaWiki (by removing any suffix you added to the file names using the  parameter -- in the example above, it would be "_my_stuff"). If there are already serialization data files for the current version, they should be replaced with the updated ones. You may want to use the GNU   utility.
 * After F has been live for sufficiently long, S can safely be merged and deployed. Note that any backward compatibility code needs to remain in place, and will become part of the next release.