Incremental dumps/File format/Specification

This page describes the current format of the incremental dumps file. It is far from finished, which means the format can change daily and that this page can easily become out of date.

The format is binary; the file contains various objects and can also contain free space (remaining after deleted objects).

Data types
The encoding for various data types is as follows:


 * integers: 1, 2, 4 and 6-byte unsigned integers are saved directly in little-endian order. (6-byte integers are used to represent offsets in the file).
 * timestamps: timestamps from 1 January 2000 to beyond 2100 with second accuracy are represented as 4-byte unsigned integers. The integer is not the number of seconds from the start date, but is instead directly calculated from parts of the date as.
 * strings: strings are saved as length of the string (n) followed by n bytes of its content. For short strings (those that are guaranteed to be at most 255 bytes long), the length is 1-byte integer, for long strings it's 4-byte integer.
 * lists: lists are saved as 4-byte count of items (n) followed by n items. The size of each item depends on its type an can be variable (e.g. for a list of strings)

File header
File header always starts at offset 0 and contains offsets of indexes, which can be used to access the data stored in the file.


 * 4 bytes magic number:
 * 1 byte file format version: 1
 * 1 byte data version: 1
 * 6 bytes offset to the end of the file
 * 6 bytes offset to the root of the page id index
 * 6 bytes offset to the root of the revision id index
 * 6 bytes offset to the free space index

Index
The file currently contains 3 indexes:


 * page id index maps 4 byte page ids to 6 byte offsets of the corresponding page object
 * revision id index maps 4 byte revision ids to 6 byte offsets of the corresponding revision object
 * free space index contains pairs of 6 byte offsets and 4 byte lengths of blocks of free space in the file

The index is saved as:


 * 1 byte object kind:
 * 2 bytes count of items (n)
 * n keys
 * n values

This kind of index object is meant as a leaf in a B-tree, but that's not implemented yet.