Parsoid/Internals/data-parsoid

From MediaWiki.org
Jump to navigation Jump to search

Temporarily in data-parsoid, but not in final DOM output[edit]

tsr: Tag widths for all tokens (from tokenizer)

extTagWidths: Width of opening and closing tags for extension tags.  Ex: <ref ...>..</ref>, <gallery ....>..</gallery>

Proposal: Make these temporary properties used till we lint the HTML (instead of emitting to final DOM output)[edit]

autoInsertedStart: whether this start HTML tag has no corresponding wikitext and was auto-inserted to generated well-formed html. Usually happens when treebuilder fixes up badly nested HTML.

autoInsertedEnd: whether this end HTML tag has no corresponding wikitext and was auto-inserted to generated well-formed html. Ex: <tr>, <th>, <td>, <li>, etc. that have no explicit closing markup. Or, html tags that aren't closed

Proposal: Remove from data-parsoid and rely on selser to preserve syntax variations[edit]

selfClose: are void tags self-closed? (ex: <br> vs <br />)

noClose: void tags that are not self-closed (ex: <br>)

brokenHTMLTag: used to RT back these kind of tags: </br> or <br/  > or <hr/  >

srcTagName: source tag name (records case variations) for HTML tags. Ex: <div> vs <DiV> vs <DIV>

startTagSrc, endTagSrc, attrSepSrc: source for start/end/attribute-text separators (used in table wikitext)

  • |foo || bar
  • |foo {{!}}{{!}}bar
  • {{!}} foo
  • |style='color:red;'{{!}}foo || bar

pipetrick: true if the link was a pipetrick [[Foo|]] (NOTE: This will likely be removed soon since this should not show up in saved wikitext since this is a pre-save transformation trick.)

Proposal: Maybe move to data-mw?[edit]

stx_v: "row"  set for td/th cells that show up on the same line. Ex: |foo ||bar ||baz (Maybe use stx: for this as well)

stx:

  • "html" - set for html tags. Ex: <div>foo</div>
  • "row" - set for dt/dd that show on the same line. Ex: ";a:b" vs ";a\n:b"
  • "piped" - set for piped wikilinks with explicit content Ex: [[Foo|bar]] vs [[Foo]]
  • "magiclink"- set for magic links (RFC/PMID/ISBN) Ex: RFC 1234, ISBN 1234567890 (Not needed anymore?)
  • "url" - set for url links Ex: http://google.com (Not needed anymore?)

Required properties[edit]

dsr: Wikitext source ranges that generated this DOM node (start-offset, end-offset, start-tag-width, end-tag-width).

Consider input wikitext: abcdef ''foo'' something else . Let us look at the ''foo'' part of the input. It generates <i data-parsoid='{"dsr":[7,14,2,2]}'>foo</i> . The dsr property of the data-parsoid attribute of this i-tag tesll us the following. This HTML node maps to input wikitext substring 7..14. The opening tag <i> was 2 characters wide in wikitext and the closing tag </i> was also 2 characters wide in wikitext.

src: used to emit original wikitext in some scenarios (entities, placeholder spans)

tail: link trail source (Ex: the "l" in [[Foo]]l)

prefix: link prefix source