Requests for comment/Page and category based language variant conversion

The current language variant converter uses a rule-based conversion engine. There are local and global rules. Local rules select content at the point of definition only, and don't modify the global rule table. Global rules modify the global rule table, which means that new rules apply to the remainder of the page from the point of definition on. See Writing systems/Syntax for more information. This RFC is mainly concerned with the scope and definition of the global conversion rule table.

Allowing arbitrary modifications of the global rule table has several disadvantages, which are discussed in bug 41716. Rules defined in a template can unexpectedly leak into the body of an article. Changes to a template require the entire page to be re-rendered from the point of inclusion on. Finally, it exposes unnecessary complexity to the author/editor of an article, who would like to edit language converter rules on a global basis (in a page properties dialog, for example) without having to worry about exactly where in the article the rules are defined. A block of inline rules at the top of pages makes it harder to edit the wikitext, in particular for novice editors.

A positive feature enabled by leaking of rules out of templates is that this lets editors define a set of common rules for a topic. All articles in a given topic area can include that template at the top of the page.

Proposal
This RFC proposes to migrate global rules from the wikitext into page and category-specific conversion rules stored in page properties. These rules will always apply to full pages, which makes the processing model more efficient and predictable. It is also designed to improve the support for visual editing of both rules and page content.

Semantics
Page-global rules are specific to a page and will apply to all parts of the page. This includes templated content. Page-specific rules take precedence over category-provided rules.

Category-global rules are shared between pages in a specific content area. They apply to all pages that are direct members of that category. Sub-category relationships are not considered to avoid issues with cycles and performance.

Dynamic categories added from templates are problematic as they can change at any time, which can in turn make it necessary to re-render the entire page. This runs counter to the long-term architectural goal of better encapsulation for transclusion and extension content. The issue could be avoided by considering only static category memberships for rule inheritance.

Editing
Moving rule sets out of the wikitext content avoids confronting authors with large blocks of strange-looking inline conversion rules, and eliminates the need to update both wikitext and auxiliary storage on edit.

The public JSON page property API planned in bug 49143 frees user interface writers from extracting and updating conversion rules in content. This lets them concentrate on building the best user interface possible. In the VisualEditor, page-global rules can be added to the existing page property dialog. Editing support could also be added to gadgets like zh:MediaWiki:Gadget-noteTA.js which already display rule information. A text-based property editing interface for power users is a possibility as well, and can also be developed as a gadget first.

Storage and API
Global translation rules can be stored in page properties as discussed in bug 49143. Page properties are designed to be efficient to retrieve without having to parse wikitext, which is important as this information is needed when rendering. They are also versioned with the page content and will have a generic API that can be used by gadgets like zh:MediaWiki:Gadget-noteTA.js and the VisualEditor to provide a friendly editor experience.

Migration process
Once support for page and category-global rules is implemented, dynamic table modification in wikitext can be deprecated and gradually replaced with page or category-global rules. Much of this conversion can be automated. Once all dynamic table updates are removed, the old update syntax can be disabled and the migration is complete.

Discussion
The proposed RFC allows a incremental transition to more easily-editable language converter rule semantics. It avoids the need to additionally implement new language variant conversion rule syntax. Once implemented, the VE and other UIs can directly use a clean JSON interface to edit page- or category-level language variant conversion rules.

Implementation
The Parsoid team plans to implement page property storage in the next months. This will provide fast and standardized access to JSON-encoded page properties. This can be used by the language variant converter to pull in page- and category-global rules efficiently. The generic page property API will also provide editing support without a need to implement a custom API.