VisualEditor/Design/Bidirectional text requirements

UNFINISHED

This is a very, very rough draft. Any comment is welcome.

Preamble
Since MediaWiki in particular and Wikimedia projects in particular are used in right-to-left languages, such as Arabic, Pashto, Persian and Hebrew, the projected visual editor must support writing and displaying them easily.

"Easily" in this context means that any user must be able to write mixed right-to-left and left-to-right text as easily as he would do it with pen and paper and see the results while editing.

Although this is true for any text in right-to-left languages, proper support for mixing right-to-left and left-to-right is of particular importance for Wikimedia projects, which are heavily multilingual and frequently cite foreign names.

This document tries to be more functional - to describe what the software should do, rather than technical - to describe how it should be implemented. Some implementation suggestions may creep into the document nevertheless.

The direction property
It must be possible to apply the direction property at several levels:
 * the level of a content page
 * a block-level element (usually a paragraph, but possibly also a table cell, and possible a template etc.)
 * an inline element (effectively , but probably set using a template).

Page-level
The page-level direction property is inherited from the direction of the content language of the wiki, but there must be a possibility to override it. This is not currently implemented in any structured way in MediaWiki and is usually addressed by manually applying  to the whole page. (See bug 28970.) Having this is particularly useful for current Wikimedia sites like Meta, Commons, Outreach, Strategy etc., for "embassy" pages in various projects, and for other purposes.

Paragraph-level
The paragraph-level direction property is inherited from the direction of the page, but it must be possible to override it.

Inline
A common example of an inline element where a direction setting would be useful is the English spelling of a monarch's name in an Arabic Wikipedia article about the monarch or a quote from the Hebrew Bible in an English article. Most likely this would be an HTML , but it can also be (in HTML5), and other elements. It is likely that it would be applied using a template.

Usage of control characters
Several transparent control characters are used for controlling directionality:
 * RLM / LRM - transparent characters with strong directionality. It's easy to understand them as Hebrew and Latin letters, respectively, but invisible. They are used for isolating inline pieces of text with ambiguous directionality, for example a Latin letter from a following number in a right-to-left paragraph. Equivalent to the HTML entities $amp;$rlm; and $amp;$lrm;.
 * RLE / LRE - start an embedded inline block of right-to-left or left-to-right text that runs until a PDF character. Similar to /.
 * RLO / LRO - starts a section in which the Unicode bidirectional algorithm doesn't apply. That is, Hebrew letters will appear left-to-right. Equivalent to the HTML tag.
 * PDF - pop directional formatting. Ends a block of text started by RLE, LRE, RLO or LRO.

There's no practical use for RLE, LRE, RLO, LRO and PDF in text that can be marked up with HTML.

RLM and LRM are currently frequently used in MediaWiki, as there is no other common way to isolate pieces of text with ambiguous directionality. They are often inserted as a template (such as w:he:Template:כ). Implementation of the HTML5 tag may diminish the need for them, but currently they are needed.

Other considerations

 * Offline and PDF export: These features shouldn't be directly affected by this if parsing and rendering stays the same, but they must be tested anyway.
 * Web fonts.

Current bidirectional features in MediaWiki

 * Directionality support
 * Existing and resolved bugs related to right-to-left text

Related standards

 * Unicode Bidirectional Algorithm
 * Language information and text direction - Existing HTML4 standard.
 * Additional Requirements for Bidi in HTML - Prospective bidirectional features in HTML5.
 * Open Document Format for Office Applications (OpenDocument) v1.1 - a.k.a. ODF and ODT, used in LibreOffice, OpenOffice and other applications.