User:PerfektesChaos/WikidiffLX

The current standard diff (Wikidiff2) suffers from some limitations. The implementation can be improved a bit without major decrease in performance. A million of users world wide should not be flabbergasted by an entirely new appearance. The basic algorithm (line based diff engine) is quite fast and should be kept. However, preparing the input and displaying the results might be enhanced.

Six improvements are suggested.

It is expected that performance is not impaired. Suggested code is longer, but saves a lot of work from existing and untouched diff engine, e.g. by hiding empty lines. Showing the differences in a better way should pay off even some slight additional resources.

The implementation for all suggestions has been done recently. For some simple cases the new code has been tested locally already. More sophisticated examples need to be designed now in order to detect obstacles. A statement on performance can't be made at this stage.

Avoid confusion by empty lines
Objective: If an “empty line” (maybe with some invisible content) has been inserted or removed by the author and the adjacent paragraphs are modified in some way, the presentation of the result is currently disturbed.

Example:

Could be presented as: more…

Improve modified consecutive lines
Objective: The line based algorithm makes minor corrections in consecutive lines appear as a dramatic change.

Example: results currently in:

This shall be made more readable: more…

Improve visualization of context lines
Objectives: Currently any kind of line is displayed when two lines preceding or following shall give an impression of the unchanged context where a changed block is located. The suggested code has a slightly different behaviour:
 * 1) If one or both of these lines are empty they are put into HTML source but invisible and not informative.
 * 2) If these lines are very long (sometimes each 1000 bytes and more), both paragraphs are displayed anyway, making the output lengthy and hard to survey.
 * 1) The most adjacent non-empty (visible) lines will be shown.
 * 2) Not two paragraphs but the next unchanged two virtual lines (each expected of at least 100 bytes if not full paragraph) will be displayed.

Visualize space-only differences
Objective: The current function doesn’t show space-only differences. Readers find identical black text and may guess that the reason is a superfluous space character somewhere, or perhaps a period changed into a comma?

Example: old: The old lady looks  confused. new: The young girl looks confused. more…
 * Make space-only differences visible, including heading/trailing space.
 * Enable reader to count number of spacing characters.
 * Don’t confuse reader with space-▯ if there are visible changes: If one of the adjacent words is already red, space difference is negliable.
 * Show not only different number of spacing characters, but also varying types (currently only: ASCII Space U+0020 and HorTab U+0009, but there are many more spaces like U+2004-200A).

Visualize non-ASCII spaces
Objective: Other spaces shall be treated like ASCII space. This goes for both word-splitting and displaying of modification.

Affected unicodes: 2002;EN SPACE 2003;EM SPACE 2004;THREE-PER-EM SPACE 2005;FOUR-PER-EM SPACE 2006;SIX-PER-EM SPACE 2007;FIGURE SPACE 2008;PUNCTUATION SPACE 2009;THIN SPACE 200A;HAIR SPACE more…

Visualize zero-width differences
Objective: There are modified words where the modification keeps invisible, since a zero-width character has been added or removed. The user encounters two red words without any visible difference.

Affected unicodes: Example (invisible characters shown as HTML entities): old: Meaning&amp;shy;less to &amp;rlm;change&amp;lrm; direction. new: Meaningless to change direction. The users have no clue why these words are red, give’em one: Any red word is affected. Deleted and added lines are not considered. more…
 * 00AD;SOFT HYPHEN  &amp;shy;
 * 200B;ZERO WIDTH SPACE
 * 200C;ZERO WIDTH NON-JOINER   &amp;zwnj;
 * 200D;ZERO WIDTH JOINER  &amp;zwj;
 * 200E;LEFT-TO-RIGHT MARK  &amp;lrm;
 * 200F;RIGHT-TO-LEFT MARK  &amp;rlm;
 * 202A;LEFT-TO-RIGHT EMBEDDING
 * 202B;RIGHT-TO-LEFT EMBEDDING
 * 202C;POP DIRECTIONAL FORMATTING</tt>
 * 202D;LEFT-TO-RIGHT OVERRIDE</tt>
 * 202E;RIGHT-TO-LEFT OVERRIDE</tt>

General remarks on diff presentation
There are two basic approaches for presenting differences to the user: This was chosen by WikEdDiff and previously intended by Visual Diff.
 * 1) Show recent and later version side by side.
 * 2) Show the changes within one text and markup deviating previous and current text inline.

Both methods have advantages and disadvantages. It depends on the amount and type of changes. Experienced users might find it easy to understand minor modifications in the inline method. Even though context and grammar of a sentence in human language are lost if several times interrupted by insertion from the other sentence. Larger block movements and rearranging sequence of paragraphs is always a problem.

For new users dealing with archeology or football it is reasonable to present both versions undisturbed. Therefore it is a good choice to provide Wikidiff2/WikidiffLX as standard method.

For those who are familiar with interpretation of diff results a change of the presentation type is welcome, to be configured as default or on a case by case base. Using WikEd the inline presentation can be requested if helpful at the moment.

The choice of the identifier for this project results from Wikidiff Line approach eXtended.