VisualEditor/Phase 2

The first phase of research and development for the VisualEditor has completed. The strategies that have been explored and technologies that have been developed have provided greatly needed clarity on for the way forward. In an effort to bring these technologies to users for research purposes, some critical features were deferred, while others were implemented with limited functionality or known problems. In the next phase of research and development, we plan to design and prototype deferred and incomplete features and resolve known issues in the current implementation.

Deferred features

Parser integration

A new Wikitext parser has been developed in parallel, and is now reaching a stage of maturity that it will be possible to begin using it together with the visual editor to edit existing Wikitext articles. The approach taken by the parser team diverged from the estimations made by the editor team. The parser does not directly produce a JSON representation (WikiDOM) as was originally conceived. Instead, the parser takes advantage of an HTML5 tree builder and produces an HTML DOM with wiki-specific information stored in data attributes.

The parser team also created an HTML DOM to WikiDOM converter to meet the expectations of the editor team. It's been identified now that WikiDOM is an intermediate format between the HTML DOM and the linear data model used inside the editor, which could be bypassed by making the client able to produce a linear model from an HTML DOM directly. This change would also suggest some minor changes in how the linear model works, bringing it conceptually closer to the HTML DOM.

Given these proposed changes, the linear model would be equivalent to a token stream of the HTML DOM at the block level. The linear model will still retain its approach to inline formatting, which encodes each character with its formatting data. Only modest changes are required for the linear model to be adapted to be more similar to the HTML DOM, but bringing these data structures closer together will greatly improve the simplicity and efficiency of the system.

Input method editor (IME)

To insert text in non-English languages, especially those that use non-latin character sets, user depend on systems called input method editors (IME). IME allows users to produce a single character from multiple keystrokes, and is implemented at the operating system level for many languages. For languages which are not supported by the operating system, IME can be implemented in the browser. Because the editor does not use a browser provided editing control, operating system IME may be difficult or impossible to integrate. Alternative implementations of the display layer which employ browser provided editing controls naturally support IME, but have complex problems around capturing the input correctly. Both native and alternative display layers need to be prototyped further to asses their abilities to provide IME support.

Floating elements

Block-level elements such as tables and images frequently take advantage of floating, a layout feature in web browsers. The editor does not currently support this functionality. Browser provided editing controls do support this, but are limited in their ability to provide ideal user interactions when dragging and dropping the floated content to change its position. Both native and alternative display layers need to be prototyped further to asses their abilities to provide floating support with sensible user interaction characteristics.

Bi-directional text

When combining left-to-right and right-to-left scripts in the same body of text, an editor must support bi-directional text editing. For example, in a paragraph containing English with Arabic text in the middle, selection and cursor advance should behave in reverse within the Arabic text only. Details of the requirements for bi-directional text editing have been provided by the internationalization team. Browser provided editing controls may not provide a complete bi-directional text implementation on all platforms, and further research is still required to evaluate the ideal user experience when interacting with bi-directional text. Further research also needs to be conducted to assess the feasibility of implementing bi-directional text editing into the editor.

Selection on mobile devices

The editor does not currently use native selection, but instead mimics the interactive characteristics and appearance of it using DOM events and DOM elements. On mobile devices, due to smaller screen space and touch based user interfaces selection looks and feels different. Additionally, each mobile platform implements selection differently, and the frequency at which these implementations evolve is relatively high. Selection on mobile devices could be mimicked in the browser, but more research in the feasibility of retaining a high quality user experience using this technique is needed.

Spell check

Checking for spelling errors as the user types has become a widely implemented feature in both rich and plain text editing controls across browsers and platforms. A dictionary and efficient data structure will need to be generated, and could possibly be sourced from Wiktionary. Browser provided editing controls use operating system provided spell checking systems, some of which make corrections automatically as the users types. Not all browsers provide events which notify the web application when corrections occur, either automatically or manually through a context menu. Further research needs to be performed to assess whether spell checking can be reliably and consistently implemented across browsers and platforms on both native and alternative display layers.

Incomplete features

Toolbars and Inspectors

The initial implementation of the toolbar and inspector system provided a prototype that can be tested with users, but will require more research and development before an API can be developed around it. The strategy has been to keep the user interface as lightweight as possible, employing in-place inspection where possible, while also using some well known affordances to ease transition of new users. More design work is required around where this line will continue to be drawn. The link inspector has a large number of bugs filed against it. Many of these are enhancement suggestions, most of which are in line with bringing the inspector up to the same feature level as the existing and widely deployed WikiEditor link dialog box. Some of the more subtle interactions to do with selection and links needs to be explored and refined. User testing needs to be conducted on existing designs to validate them before moving forward with development.

Cut/Copy/Paste

Only plain text can be copied and pasted at this time, preventing inline or structural formatting from being preserved when using the clipboard. The use cases around these behaviors (content importing, exporting, rearranging, etc.) need to be identified more clearly for prioritization to take place. Strategies exist to support rich text clipboard actions within the document, as well as between the document and external sources. These strategies need to be prototyped and evaluated against a set of priorities.

Broken features

Tree synchronization

The editor uses a linear data model which is indexed using a tree structure called the model tree. The user interface is derived directly from the model tree. When changes are made to the linear model through the application of transactions, the linear model and the model tree become out of sync. To update the user interface or prepare additional transactions, the model tree must be reconciled. The current approach updates the tree on a per-operation basis. This approach leads to inefficiencies in both the kinds of transactions that are allowed to be processed and the frequency at which the user interface is redrawn. Specifically, because synchronization occurs after each operation, operations must never leave the document in an unbalanced or otherwise invalid state. An example of where this becomes a problem is changing the type of an element while leaving it's content in tact.

A transaction could either be created which retains up to the element, deletes the opening, inserts a new opening, retains the contents, deletes the closing, inserts the new closing and then retains the rest of the content proceeding the element. This is the most efficient way to change the element type, because it only replaces the data that's being changed. However because after the second operation the document is in an invalid state this transaction will result in a tree sync failure. The limitations of the current tree sync approach cause a more expensive version of this transaction to be used, which retains up to the element, deletes the entire element and it's contents, inserts a new element with a copy of the deleted element's contents and then retains all content proceeding the content.

A more appropriate approach will allow transactions to contain operations which leave the document in an invalid state as long as the document is valid at the end of the transaction. By deferring tree synchronization until the end of the transaction more efficient transactions can be created and applied and fewer user interface updates will be initiated, some of which may have previously overlapped. This system will be required to be more efficient moving forward, and is utilized independently of the display layer approach used.

Attribute operations

The current linear model's transaction processing system does not support replacing an attribute value. This causes a problem when set operations are used because the previous value is not stored in the transaction and thus can not be reversed.