Parsoid/MediaWiki DOM spec/Language conversion blocks

Status: provisional / strawman. See bug 41716. Also see Writing_systems/Syntax.

 foo-{bar baz}- quux 

 foo-{zh-cn:blog; zh-hk:WEBJOURNAL; zh-tw:WEBLOG;}- quux 

 foo-{zh;zh-hans;zh-hant|blog, WEBJOURNAL, WEBLOG}- quux 

General language conversion plan
The wikitext language variant converter interface documented in Writing_systems/Syntax exposes two classes of operations:
 * 1) Selecting content in place by variant, and
 * 2) dynamic modification of conversion rules that apply from that point in the page on.

In-place content selection is not just used for regular translation pairs, but also for constructs like. Content is mostly well-nested, so we can represent this as an element. The exception from grepping (see regexp used below) are constructs like. Those partly stem from times when the language converter could not be used inside attributes. We can probably fix this automatically by moving the variant block inside the attribute.

In general, we will render content-producing variant code based on the wiki's default variant and the fallback chain. Regular content conversion will only happen as a post-processing step on the saved Parsoid HTML.

Dynamic modification of rules does not seem to be needed in general. Page-global and per-category rules can replace template-based definitions. Until that is implemented, we need however represent existing add / remove rules inline. For also content-producing constructs like  we can both render and record the rule modification in data-mw. Pure modifications (H flag) can be represented as meta tags.

Rule format for separately stored page-global rules
-{H|..}- and -{-|..}- can be represented as metas, others as spans. Block-level content seems to be rare.


 * {"*": "XXX"} for rules migrated from -{A|XXX}-
 * {"zh-cn": "tom hanks", "zh-hk": "SOUP HANS", "zh-tw": "TOM HANKS"}
 * {"zh-cn": {"HUGEBLOCK":"macro"}, "zh-hk": {"BLOCKHUGE":"big"}} for migrated -{H|HUGEBLOCK=>zh-cn:macro;BLOCKHUGE=>zh-hk:big;}-
 * {"zh-cn": {"HUGEBLOCK":"macro"}, "zh-hk": {"HUGEBLOCK":"big"}} for migrated -{H|HUGEBLOCK=>zh-cn:macro;HUGEBLOCK=>zh-hk:big;}-

For consumers of this format:


 * If a rule value is a string, it is a direct translation rule
 * If a rule value is an object, it contains one or more unidirectional nested rules

Other considerations

 * $wgDefaultLanguageVariant and fallback chain for it (search for variantfallbacks in LanguageZh.php, retrievable from . Note that getVariantFallbacks can return a string OR an array for different input... It seems to make more sense to have getVariantFallbacks do array_diff itself but it's not doing so currently... ) is not currently exposed in the API. We'll need both to pick the right content to render for -{zh-tw:foo;zh-cn:bar}-.
 * See also 52700.


 * What to store in  when it contains some other structures?
 * HTML, but the content needs to be properly nested. Run  on a zhwiki dump to find potentially problematic language conversion blocks, then check nesting for them. Problematic cases seem to be
 * different start tags per variant that really only differ in an attribute (title for example). Conversion pairs are now also supported in attributes, so try to fix wikitext to convert attribute only.

Result of : Total revisions: 2234532 Total matches: 773 Ratio: 0.034593373467016804%

  
 * Need a way to mark up in-place variant conversions in attributes. Idea that might also be useful for transclusion-affected attributes: