Parsing/Notes/Section Wrapping

TODO:
 * 1) Better terminology for the two different section notions: MediaWiki notion used by the section-edit interface, and the notion that will be represented by   tags in output
 * 2) Link up to other pages (wiki, phab, elsewhere) to make this discussion self-contained
 * 3) Any other language cleanup / tightening to clarify the problem, options, and solution space

Adding wrappers to HTML output
MediaWiki currently has a notion of a section which is used for section editing that is commonly used by editors to edit a small fragment of a page and avoid edit conflicts. This identification is regexp-based. Separately, T114072 is a proposal to add  tags to MediaWiki output. This page is a discussion about what the considerations and constraints are around adding these section tags.

Constraints
tags in output will be used by clients to either selectively load / display content of a page or to provide editing interfaces for just that part of the page. Consequently, it is important that there be some consistency in the two differing notions of sections.

Consistency Options
There are three possibilities here: It seems that guarantee 1 is the simplest to provide right now. Given unbalanced html tags and/or tags around multiple partial sections, 2. is harder to guarantee since it is essentially a requirement that content of every MediaWiki-section has well-balanced output. But, this isn't true right now. For example, see this section-edit form in the wikitext editor. There is an unclosed  tag there in the middle of the section. Given this, it is not possible to add a  wrapper around the contents of this MediaWiki section.
 * 1) If HTML output has a   tag, there is a corresponding MediaWiki section that can be edited.
 * 2) If there is a MediaWiki section that can be edited, the HTML output for that wikitext fragment has a   tag around it.
 * 3) Both of the above are true which guarantees a 1-1 mapping between the two notions.

The expectation is that we want to gradually move MediaWiki output towards this guarantee, but we aren't there right now and don't want to block section wrapping on getting there first.

First step: A section wrapping solution that provides consistency guarantee 1
So, we could instead focus on ensuring that any -wrapping solution provides guarantee 1. With this guarantee, the set of sections that are wrapped in  tags are going to be a strict subset of the set of sections that MediaWiki knows about. This is not necessarily a problem. This just means that VE section editing or mobile notion of sections will have support for a restricted set of MediaWiki sections on pages. So, on pages with block tags around MediaWiki sections, there will be degraded functionality for reading and editing in certain clients. This doesn't introduce broken or inconsistent support for sections.

So, it seems we can provide a section wrapping solution in Parsoid for now that provides guarantee 1 above. This can be easily computed on a DOM and there is currently a library that Gabriel developed previously and which is being used by MCS for its apps. We just need to audit it to ensure that its output provides guarantee 1 above.

However, we are not going to provide this section wrapping functionality in the PHP parser right now because we cannot provide the guarantee we identified above without DOM-based processing. With RemexHTML, it is more of a possibility, but that is not something we are considering right now.

Long term: Supporting use cases for adding wrappers around multiple (partial) sections
So, why are editors adding div (or other block-tag) wrappers around partial sections or multiple sections?

I need more input on this, but it looks like this is possibly to highlight that portion of the page for action / attention. This is likely an use-case on user pages and other non-article pages like wikipedia or talk namespaces, for example where there might be a call to action or a notice or something that spans multiple sections.

Solution 1: Stronger consistency guarantee for the Article namespace only
Given this, one way to get to stronger consistency guarantees (2 and 3) is by restricting section wrapping to only the article namespace. In that namespace, we could provide the stronger guarantee by breaking some badly nested sections by independently DOM-balancing output for every section on its own. This will forcibly close unclosed div (and other "block") tags and discard stray closing tags. If we want to go down this route, we need to get a sense of usage of bad nesting in the article namespace and amenability of editors to fixing their pages that have this behaviour. The Linter extension can help precisely identify these set of pages.

Note however that this breakage will only be in Parsoid output (and hence clients that use Parsoid output - mobile, VE). But, since we are progressing towards adopting Parsoid output as the de facto output for MediaWiki, this is not a concern per se.

Solution 2: Provide other solutions for the use cases that require multi-section styling
This is a bit more longer term strategy compared to solution 1, but doing so let us provide uniform section editing / wrapping / reading semantics across the board for all namespaces and all clients. Some nascent ideas for this include relying on the templates and the upcoming template styles solution, or introduce new wikitext syntax for styling individual sections (see below).

Solution 3: Make it possible to style sections "properly"
For example (strawman alert!), one option would be to support "class" and "style" attributes on sections in the same way we do for table cells: which would generate: The  attribute could also be moved to the   tag. You would need to lint for all section titles currently containing a  character, and   the.

Another option would let the user write  tags manually in wikitext, which would then be detected and suppress normal   generation. The above example would then be written as: This is perhaps more "compatible" (no existing headings need to be 'ed, but rather encourages the use (advertent or inadvertent) of explicit  s which don't line up with headings: It's probably best to avoid candy machine interfaces of this sort which make it easy to generate bad results.