Parsoid/Page metadata

Some common metadata like the title of the page, the revision number etc will be available in the head section of the HTML document. Other, internal information however will be in a separate header / index section for efficient processing.

Internal page metadata

 * page TTL : minimum of all (non-ESI) fragment TTLs (Time To Live, i.e. the amount of time this fragment is expected to be fresh), if any. Sets the HTTP cache headers.

Fragment index
A list of per-fragment index entries, each of which contains
 * byte length : Used for efficient seeking / updating withouth parsing full page
 * update events : set of condition in which fragment might need to be updated:
 * edit
 * Re-render on every edit. Examples: PAGESIZE, REVISION* etc magic words
 * view : Potentially re-render on view if TTL for fragment has elapsed
 * rights : Re-render if protection levels have changed
 * move : Re-render if page is moved / renamed (example: page name dependent templates)
 * fragment TTL : time to live for full fragment.
 * dependencies : other resources used in the rendering of this fragment. Any kind of transclusion (including templates / pages, parser functions, magic words, Lua modules etc), files

data-parsoid
In our current implementation we use a private data-parsoid attribute with JSON data to store per-node round-trip information. Since this is private information and not needed by clients we should move this out of the DOM itself, which will also reduce the size of the returned DOM. To preserve a link between nodes and external metadata we need a stable node key. A simple solution is to add an attribute with a UID to each node. We should try to be somewhat resistant against client-side reassignments of such UIDs, so at least the node type should probably be checked. We might also be able to avoid assigning UIDs to all nodes by using hash trees on DOM subtrees similar to those used in the XyDiff diff algorithm.