Jump to content

Manual:Parser cache/Serialization compatibility

From mediawiki.org

Since the ParserCache persists serialized ParserOutput for a relatively long time, we must consider what happens if we change the code that reads and writes the data in any way. This includes adding or removing fields, making fields optional or required, changing fields from single value to multi-value, etc.

This page specifically discusses caching ParserOutput objects, and any sub-structure that may be contained in a ParserOutput object, including "extension data" and "page props" that is not known to MediaWiki core. However, in principle, all of this applies to any code using the JsonUnserializable interface, or the PHP serialize() function. A change to the serialization format is much like a database schema change, which must also be forward-compatible and backward-compatible.[1]

In this context, we must consider both backward and forward compatibility:

Backward compatibility ensures that current code is able to read serialized data produced by older code. In the short term, this is a critical requirement when deploying new code to production: the new code must not break when reading older data, and must be able to actually use it. Note: it is not sufficient to simply discard old data - this would effectively mean running the site on a cold ParserCache, forcing all pages to be re-parsed on the next request. This is bound to overload application servers and bring down the site!

However, we also require backwards compatibility long term, to allow third parties to update MediaWiki without losing or breaking their cache content. This means that code in a given release of MediaWiki has to be able to handle ParserCache entries generated by whichever previous releases that we support updating from (as of April 2024, as documented at Version lifecycle, the previous two long term support releases).

In addition to backwards compatibility, forward compatibility is also required to ensure safe deployment to production to prevent breakage when a deployment is rolled back. Rolling back a deployment causes older code to read cache entries that were generated by newer code. To make this safe, it is necessary to first deploy a change that ensures we can read data in the future format, and then wait long enough that we are sure that we are not going to roll back to a version without this forward compatibility code (that is, at least one deployment "train"). Only then can we safely deploy the code that will write the new data format to the cache, since we can be sure that, if we need to roll back, we will roll back to a version which can handle that new data.

To safeguard against breaking serialization compatibility, MediaWiki has PHPUnit tests in place that assert that current code is able to successfully unserialize older forms of parser cache data (see /tests/phpunit/data/ParserCache), and that the serialization format is not changed unintentionally. This by itself will however only enforce backwards compatibility. To also ensure forward compatibility, the following procedure has to be followed:

  • You have a commit that changes what data gets written to the cache. Let's call this commit S.
  • Confirm that it is backwards compatible by running ParserOutputTest.
  • Once it passes, make sure all changes are committed.
  • Now, run php maintenance/run.php ./tests/phpunit/includes/parser/validateParserCacheSerializationTestData.php --create --version 1.43_with_stuff (where 1.43 is replaced by the actual development version of MediaWiki, and "_with_stuff" is replaced by some description of your change - NOTE: no dashes allowed!). This will create serialization files with your modification applied.
  • Use git add tests/phpunit/data/ParserCache/ followed by git stash to stash away the newly created files.
  • You now need to create a new commit that establishes forward compatibility. Let's call that commit F.
  • To create this new commit, first switch to the master branch (git switch master). If you like, create a feature branch using git checkout -b forward-compatibility-for-my-stuff (where "my-stuff" is replaced with a description of your change).
  • Use git stash pop to get the stashed data files.
  • Run vendor/bin/phpunit tests/phpunit/includes/parser/ParserOutputTest.php to see if the old code is able to handle the new data. This emulates the case where the production environment is rolled back to an old version of the code, after the new version has already been writing to the parser cache.
  • Add forward-compatibility code as needed, until the tests pass.
  • Create commit F, including the new data files and any modifications you had to make to establish forward compatibility.
  • Push F for review.
  • Rebase the original commit S on top of the new commit F. The forward compatibility code may be removed during the rebase (typically it would be replaced by the backwards compatibility code).
  • The rebased commit S should be marked in some way so it does not get merged accidentally (e.g. using "WIP" or "DNM" or CR-2). It must only be merged after patch F has been live for long enough that a rollback is unlikely (for the WMF, that means at least one "train", better two to be safe)
  • In the rebased commit S, make the new data files the default for the current version of MediaWiki (by removing any suffix you added to the file names using the --version parameter -- in the example above, it would be "_my_stuff"). If there are already serialization data files for the current version, they should be replaced with the updated ones. You may want to use the GNU rename utility.
  • After F has been live for sufficiently long, S can safely be merged and deployed. Note that any backward compatibility code needs to remain in place, and will become part of the next release.

NOTE: With gerrit 1016462 (MW 1.43) there is a similar serialization test data in tests/phpunit/data/Message/ for the MessageValue class. Use the script in tests/phpunit/unit/includes/lib/Message/validateMessageValueTestData.php to validate and update these serialization tests if MessageValue serialization changes.

Cleaning up old test cases[edit]

Since Version 1.36, MediaWiki only commits to supporting upgrades from two LTS releases ago (see phab:T259771). Upgrades from older versions of MediaWiki will have to be performed in multiple steps. This means that if you want to upgrade to 1.41 from 1.34 or earlier, you'll first have to upgrade your 1.34 wiki to 1.35 (or 1.39), and, from 1.35 (or 1.39), you'll be able to upgrade to 1.41.

As a consequence of the upgrade policy, test data in tests/phpunit/data/ParserCache/* from longer than two LTS releases ago, should be periodically cleaned up. See phab:T353570 for one example of such a maintenance task.