User:SSastry (WMF)/Notes/Wikitext

'''Disclaimer: These are not necessarily all my ideas. These are a result of conversations we have had inside and outside the Parsoid team over the last couple years and also based on experience developing Parsoid. I am just pulling things together and organizing them and maybe extending them in some cases.'''

Goals:

 * Improve ability to reason about wikitext
 * Reduce edge cases by bounding range of wikitext errors
 * Improve editability in VE
 * Improve performance

DOM scopes
Introduce the notion of DOM scopes - i.e. markup within a DOM scope is processed to yield a forest of well-balanced DOM trees.

DOM scopes within a page are properly nested.

A top-level section is an obvious DOM scope. We could also consider lists and tables to be DOM scopes, and maybe image captions as well, and anything that is wrapped in .. markup (the previously proposed tag).

Core ideas

 * Parsing the wikitext for any page returns a DOM tree with metadata annotations on different nodes.
 * All pages are composed using 3 different forms of syntax:
 * Basic markup: (lists, bold, italic, headings, tables, links, ...):
 * Can be wikitext-1.0 basic markup OR markdown OR whatever else as long as there is a pluggable implementation for it.
 * For wikipedias, it will continue to be wikitext-1.0
 * Metadata markup
 * Markup for references, annotations, edit notices, category links, language links, etc. The following may not be best syntax for it, but something to riff off of.
 * .. (instead of &lt;!-- ... some edit-notice here ... --&gt;)
 * foo (instead of )
 * ..
 * ..
 * ..
 * pluggable processors for different metadata types.
 * All metadata is attached to a specific place in the DOM. Metadata could be page-specific or DOM-node-specific. The former could be treated a special case of the latter where it is attached to .
 * metadata markup generates visible output in some cases (refs) or generate JSON data / non-visible HTML markup (cat links, annotations, etc) in others.
 * Content-generator markup (transclusions, extensions, data widgets, wikidata-driven infoboxes, whatever)
 * Depending on context in which it is used, the output is treated as a string, a list of k=v HTML attributes, or a DOM forest. There is nothing in between.
 * No concept of preprocessing at the top level. Top-level parser is just that, a parser that returns a DOM tree.
 * However, the transclusion generator implementation can support preprocessing. The output of the preprocessor can be wikitext which is then processed according to what the use context demands (string, html attributes, DOM). We can treat this as just another extension -- just that it is shipped and enabled by default on most wikis.
 * No page-global state for anything, even for s . The effect of global state is reproduced by doing a DOM walk on the final DOM. For example, as a fall-back scenario for when CSS counters are not supported by some browsers, a post-pass might update the DOM by inserting ref counter values.

Other notes

 * A page can be a special case of a content generator.
 * This enables caching at a content generator level, so, output of 2 km can be cached no matter what page it is used on (unlike now where it is processed once for every page it is found on)

Open questions

 * Where do parser functions fit in this scheme of things?
 * Does this anticipate and meet all templating uses currently?