Manual:Parser cache/Serialization compatibility

Since the ParserCache persists serialized ParserOutput for a relatively long time, we must consider what happens if we change the code that reads and writes the data in any way. This includes adding or removing fields, making fields optional or required, changing fields from single value to multi-value, etc. Note: this extends to the serialization of any sub-structure that may be contained in a ParserOutput object, including "extension data" and "page props" that is not known to MediaWiki core.

In this context, we must consider both backward and forward compatibility:

Backward compatibility ensures that current code is able to read serialized data produced by older code. In the short term, this is a critical requirement when deploying new code to production: the new code must not break when reading older data, and must be able to actually use it. Note: it is not sufficient to simply discard old data - this would effectively mean running the site on a cold ParserCache, forcing all pages to be re-parsed on the next request. This is bound to overload application servers and bring down the site!

However, we also requrie backwards compatibility long term, to allow third parties to update MediaWiki without losing or breaking thier cache content. This means that code in a given release of MediaWiki has to be able to handle ParserCache entries generated by all previous releases that we support updating from (as of this writing in December of 2022, this means two long term support releases).

In addition to backwards compatibility, forward compatibility is also required to ensure safe deployment to production to prevent breakage when a deployment is rolled back. Rolling back a deployment causes older code to read cache entries that were generated by newer code. To make this safe, it is necessary to first deploy a change that ensures we can read data in the future format, and then wait long enough that we are sure that we are not going to roll back to a version without this forward compatibility code (that is, at least one deployment "train", better two). Only then can we safely deploy the code that will write the new data format to the cache, since we can be sure that, if we need to roll back, we will roll back to aversion which can handle that new data.


 * You have a commit the produces the new output. Let's call it commit S.
 * Confirm that it is backwards compatible by running ParserOutputTest.
 * Once it passes, make sure all changes are committed.
 * Now, run `tests/phpunit/includes/parser/validateParserCacheSerializatioanTestData.php --create --version 1.40-with-stuff` (where 1.40 is replaced by the actual development version of MediaWiki, and "with stuff" is replaced by some description of your change). This will create serialization files with your modification applied.
 * Use `git add tests/phpunit/data/ParserCache/` followed by `git stash` to stash away the newly created files.
 * You now need to create a new commit that establishes forward compatibility. Let's call that commit F.
 * To create this new commit, first switch to the master branch (`git switch master`). If you like, create a feature branch using `git checkout -b forward-compatibility-for-my-stuff` (where "my-stuff" is replaced with a description of your change).
 * Use `git stash pop` to get the stashed data files.
 * Run ParserOutputTest to see if the old code is able to handle th new data. This emulates the case where the production environment is rolled back to an old version of the code, after the new version has already been writing to the parser cache.
 * Add forwad-compatibility code until the tests pass.
 * Create commit F, including the new data files and any modifications you had to make to establish forward compatibility.
 * Push F for review
 * Rebase the original commit S on top of the new commit F. Mark it in some way so it does not get merged accidentally (e.g. using WIP or DNM or CR-2). It must only be merged after patch F has been live for long enough that a rollback is unlikely (for the WMF, that means at least one "train", better two to be safe)
 * In the rebased commit S, make the new data files the default for the current version of MediaWiki (by removing any suffix you added to the file names using the --version parameter -- in the example above, it would be "-my-stuff"). If there are already serialization data files for the current version, they should be replaced with the updated onces.