VisualEditor/Internals/CE

From mediawiki.org

What is the CE view?[edit]

The CE view handles rendering, selection and input. The state of the data model is rendered into ContentEditable HTML. Only a highly limited set of operations are allowed to occur without intervention. Javascript listeners constantly watch the DOM for changes in the content, which are then sent to the model as transactions. Selection and input are normally allowed to happen natively, but many actions such as cursor movement or clipboard actions are overridden or quickly corrected. In certain cases selection and input are emulated in Javascript.

The CE HTML is optimized to achieve a high level of software control over the ContentEditable functionality, which means it is more elaborate than the simple rendering that would represent the data model in HTML in the most straightforward way.

Why the CE view is technically challenging to code[edit]

There is a very strong need for VisualEditor to roundtrip cleanly. Each Wikipedia page has a potentially unlimited lifetime and can keeps developing forever, with a huge total number of editors. (This is completely unlike most written content, for which one of those factors is usually limited). Moreover many of the editors edit in wikitext form. Therefore the editing system as a whole must be able to produce clean wikitext and clean diffs.

Plain contenteditable does not come close to meeting this requirement. Editing “Hello <i>world</i>” can very easily turn into “Hello, <i>world</i>” or even “Hello, <span style=”font-style: italic”>world</span>”. Most Javascript editing software can improve on this but still will not roundtrip and diff 100% cleanly, as this is not typically a core requirement for their use cases.

Wikipedia pages also make very heavy use of templates. There are block templates (whole paragraphs) as well as inline templates (within running text). Inline templates are most technically problematic for VisualEditor: we can think of these as islands within the text that cannot be edited in same way as normal text: they need a custom editing experience. In larger wikis, articles will typically contain one or more inline templates within the first paragraph, so supporting them is a completely core requirement.

These requirements combine to mean VisualEditor must “second guess” the browser. VisualEditor must prevent, or “fix up”, native behaviour when it would change content in undesirable ways (e.g. breaking clean roundtripping or inserting unwanted markup). That means detecting when to perform fixups, and knowing what to fix up (which can vary between browsers).

Cursoring, selection and text input[edit]

During normal text input, VisualEditor uses “native selection”, i.e. there is an actual browser selection within the contentEditable. But at certain other times, it uses “emulated selection”. For example, if the user selects an image or a template, the browser selection is cleared and instead VisualEditor “emulates” selection markers and shading on the selected object.

The same applies to cursoring: during normal text input, the browser native behaviour handles cursor key presses, mouse clicks and mobile selection handle drags. But in certain circumstances VisualEditor intercepts the event and programmatically determines how to modify the selection.

Input methods present engineering challenges for cursoring and selection. There are over 300 language versions of Wikipedia, but even more languages are used within wikis (e.g. Egyptian hieroglyphs). Different languages and have radically different input methods (ways of inputting text). Unfortunately the API between the input method software and the web browser is very rudimentary right now, and there are a bewildering number of variations between browser + OS + input method software combinations. Most of the time, it is unsafe to fixup text while the user is actively composing text in the input method: technically speaking, the browser events are “uncancellable”, and attempting to change the text during the event can even cause the input method software to crash.

Bidirectionality is another issue. Certain languages such as Arabic and Hebrew are written right to left (RTL) but can have “islands” of left to right (LTR) text, such as Latin words or numerals; this bidirectional text presents engineering challenges because selections and cursoring can work either logically (in the order the reader would read) or visually (in the order the text appears on screen). This can make it impossible to predict in which direction a native cursor movement will go, and therefore also impossible to emulate the behaviour in javascript.

Grapheme clusters are multiple logical characters that form a single visual character on screen, for example in Indian scripts. Cursor movements typically cross the entire grapheme cluster in one go (so the cursor cannot land inside the cluster), but this depends on script and even font, so this makes it impossible to predict how far a native cursor movement will jump, and therefore also impossible to emulate the behaviour in javascript. Since VisualEditor cannot reliably detect when the IME window may be open, or whether certain text in the document is tentative “IME candidate text” that the user may not be ready to commit, most of the time it is unsafe for VisualEditor to modify the text or selection programmatically. However, it is always required to modify the text or selection programmatically in many circumstances:

  • cut/paste
  • drag/drop
  • pressing Enter
  • deletion into alienated content etc.
  • Many UI actions (e.g. applying Italic)

In other circumstances, it is sometimes required to modify the text or selection:

  • Cursoring (if the cursor has landed in the wrong place)
  • Cursor selection (if the selection has landed in the wrong place)
  • Delete/backspace on/over a branch node
  • Mouse selection (when a FocusableNode is involved)

These are opposing constraints — modifications are often unsafe but often required, and it is not always possible to determine which the current situation is. This makes it make it extremely challenging to write correct fixup behaviour.

On the other hand, it is safe to update the data model from the CE view as long as this will not affect the CE content/selection. We do this in the following situations:

  • Keydown / keypress / input / keyup events
  • On ve.ce.SurfaceObserver's poll timer (for eventless changes e.g. from spellchecker)
  • On mutation events (unmerged prototype)

DM-CE tree synchronization[edit]

The purpose of the CE view is to show the state of the data model, and make it possible to edit the data model. An edits can originate in either of two places: in the native contentEditable or in the data model.

Changes originating in the native contentEditable[edit]

The ve.ce.SurfaceObserver periodically polls the active block node of the native contentEditable to determine whether its text has changed in a way that needs to be synchronized to the data model. If so, the CE view builds a transaction or selection change and passes it to the data model in ve.ce.Surface#changeModel. This method disables synchronizing the model back to the CE, to avoid never-ending feedback loops (“event storms”) when the CE surface is already correct.

Changes originating in the data model[edit]

When a transaction is applied to the data model, it splices the change into the linear model and each change triggers any necessary DM tree mutations, step by step (which is possible because of the algorithm explained in VisualEditor/Internals/DM/TreeModifier ).

Each DM tree mutation fires synchronous events to corresponding CE tree nodes. This means the CE tree gets mutated immediately, before control returns to the DM and the next tree mutation is process.

This means the linear model, the DM tree and the CE tree all mutate in lockstep. This is a very useful guarantee; for example, as a CE tree node mutates, it can look into its corresponding DM tree node, and the linear data, to determine how to render. The whole tree update mechanism fundamentally depends on a synchronous event paradigm.

When VueJS was adopted as the preferred Javascript framework for use in MediaWiki, researched was done on modifying VisualEditor to use VueJS instead of the current OOUI toolkit. VueJS uses an asynchronous paradigm, and the research found a significant “impedance mismatch” between this paradigm and VisualEditor’s fundamental dependence on synchronous events. See https://phabricator.wikimedia.org/T287214.

ve.ce.SurfaceObserver[edit]

The SurfaceObserver periodically polls the native contentEditable content and compares it to the data model, to see if a contentEditable-originated change needs to be propagated. It uses ve.ce.TextState and ve.ce.RangeState objects to compare text content and selections respectively.

There are three types of change that can detected:

  • contentChanged
  • selectionChanged
  • branchNodeChanged

ve.ce.Surface#handleObservedChanges is called to generate the corresponding data model transactions / selection changes.

Unicorns[edit]

When the user applies an inline annotation (e.g. Italic) but no text is selected, VisualEditor needs to insert a corresponding empty tag pair (e.g. <i></i>) into the contentEditable. Unfortunately such tags can disappear immediately because the browser normalizes them away. So VisualEditor keeps them in existence by inserting temporary 1px images that we call unicorns. In debug mode, the 1px images are replaced by literal unicorn icons, so the developer can see what is happening. Unicorns exist only in the CE view (not the data model).

Unicorns are created by ve.ce.ContentBranchNode.js#getRenderedContents, which includes unicorns where pre-annotation so requires. They are registered with the CE surface so they can be removed when no longer needed.

This is called by ve.ce.Surface.js#onInsertionAnnotationsChange, which re-renders the selected content branch node in case unicorns are needed.

ve.ce.Surface#afterDocumentKeyDown removes unicorns (and fixes up the cursor position) if an arrow key was pressed and the unicorns can be removed. Fixing up cursor positions is challenging because of bidirectionality: see ve.ce#nextCursorOffset.

Slugs and chimeras[edit]

Positions adjacent to certain elements in native contentEditable do not accept the cursor properly. To work around this this, the CE view inserts slugs: either block slugs (a div between branch nodes) or inline slugs (a span within a ContentBranchNode).

Chimeras are single, unicorn-like images, used in inline slugs. Text typed into the slug ends up inside the span, either before/after the chimera.

Once there is text content that renders the slug unnecessary, it will be removed at the next re-render.

Link cartouches and annotation nails[edit]

In native contentEditable, the boundaries of most annotations (like <i> or <a>) are “soft”, meaning it may not be well-defined whether the cursor at the edge of the annotation is inside or outside. Browsers tend not to extend links if the user types at the link boundary. This is a particular problem with links, because the user typing new text including a link usually wants the link to continue to the end of the word, and then stop. Previous versions of VisualEditor would automatically end the link when the user pressed Space, but that was found to cause problems with input method software, as outlined in the introduction above.

In modern VisualEditor, links have a “hard” boundary that must be cursored across to enter/leave. The link is highlighted with a CSS “cartouche” when the cursor is inside the link. When inside a link at either end, typing always extends the link.

The cartouche is implemented with four nails (special img nodes) to bookend each link annotation: two at the start, and two at the end. See ve.ce.NailedAnnotation. This makes the native behaviour work almost perfectly, except that there is an unwanted extra cursor position at each end. ve.ce.Surface#fixupCursorPosition is carefully written to fix this in an IME-safe, bidirectionality-aware way.

“Snowman” placeholders[edit]

Text changes originating in a native contentEditable ContentBranchNode are converted into data model transactions, by using ve.ce.TextState#getChangeTransaction to perform a text diff. But a ContentBranchNode can contain non-text content such as images and templates. These are represented within the diffed text with U+2603 placeholder characters. U+2603 ☃ was chosen because it is the SNOWMAN emoji, which is unlikely to be used in real text, so if users report spurious snowmen appearing, we know something is going wrong with the text diffing.

Focusable nodes[edit]

A focusable node is an element for which, when selected, VisualEditor will clear the native selection and emulate selection shading and handles. Examples include images and templates.

Focusable elements have a special treatment by ve.ce.Surface. When the user selects only a single node, if it is focusable, the surface will set the focusable node's focused state.

Tables[edit]

Copy and paste[edit]

Aliens and generated content[edit]

Input methods[edit]

The CE surface[edit]

Classes: ve.ce.Surface

See also: ve.ce.SurfaceObserver, ve.ce.RangeState, ve.ce.TextState, ve.ce.*KeyDownHandler.

This is the CE equivalent of the DM surface. It is easily the most complex CE class, because it contains tricks, hacks and workarounds for a lot of quirky browser behaviour.