Parsoid/MediaWiki DOM spec/Element IDs/Brainstorm

Stable IDs
Summary of the Nov. 22, 2013 brainstorm, http://etherpad.wikimedia.org/p/Parsoid_stable_id_brainstorming

Metadata storage using ids vs. stable ids
It should be noted that metadata storage and stable ids are two separate issues. For correctness of VE edits, stable ids are not necessary. We can always reparse and regenerate ids (and associated metadata). This is about stable ids for switching back & forth between HTML and wikitext.

Ideas for stable ids across wikitext edits
We need to demarcate edited and unedited sections where we can reuse ids. Different angles to same problem, but 1) coarser, and 2) more perf-oriented,


 * HTML DOM diffing: https://www.mediawiki.org/wiki/Parsoid/MediaWiki_DOM_spec/Element_IDs#Implementation_issues
 * Incremental reparsing: https://www.mediawiki.org/wiki/Parsoid/Incremental_re-parsing_after_wikitext_edit

HTML DOM diffing

 * Re-parse modified wikitext, and DOM-diff the resulting DOM while ignoring data-parsoid.
 * For each DOM node that did not differ (significantly), transfer the old IDs to the new DOM.
 * Update data-parsoid, and any other element-associated metadata that needs updates (authorship maps for example).

Incremental reparsing

 * Bad nesting forces wider reparsing
 * Don't have a good sense for how hard it will be

Dependencies

 * Might need better DOM diffing, XyDiff port http://leo.saclay.inria.fr/software/XyDiff/?
 * Want this for move detection, more intelligent wrapper handling, performance, HTML diff UI anyway

Performance considerations

 * more aggressive template / fragment reuse
 * depends on nesting enforcement
 * shortcut DOM-diffing for unchanged wikitext in top-level sections

Copy & Paste

 * client needs to make unique ids and pass in html / metadata

Conclusions
DOM diffing seems to be the better starting point.


 * start work on id transplantation using dom diff output
 * refine DOM diff to handle moves, wrappers