Wikidiff2

Wikidiff2 is a native extension for PHP that provides a faster diff engine to MediaWiki. It is partly based on the original wikidiff, and partly on MediaWiki's DifferenceEngine class. It produces diffs from input text (line-based or word-level) and can format these as HTML or JSON.

Wikidiff2 includes support for character-level diffs for text composed of characters from the Japanese and Thai alphabets and the unified Han, and includes support for Thai segmentation for word-level diffs in that language. Japanese, Chinese and Thai do not use spaces to separate words. The input is assumed to be UTF-8 encoded. Invalid UTF-8 may cause undesirable operation, such as truncation of the output, so the input should be validated by the application. The input text should have unix-style line endings.

Debian or Ubuntu
On older versions of the package you may need to run a command to actually enable the extension:

Manually
First, get and compile libthai (it should be on your OS or distro's packages).

You can download wikidiff2 through git or by downloading a tarball from https://releases.wikimedia.org/wikidiff2/.

Then compile wikidiff2. You need phpize (shipped with PHP).

Make sure that your php option is set. This is usually set in your "php.ini" file.

Configuration
The following php.ini settings are supported:

wikidiff2.moved_line_threshold
Wikidiff2 estimates similarity of added and deleted lines based on changed character count. When the similarity of an added and deleted line is greater than this threshold, the lines are displayed as moved.

Range 0.0 .. 1.0. Default 0.4.

wikidiff2.change_threshold
Changed lines with a similarity value below this threshold will be split into a deleted line and added line. This helps matching up moved lines in some cases.

Range 0.0 .. 1.0. Default 0.2.

wikidiff2.moved_paragraph_detection_cutoff
When the number of added and deleted lines in a table diff is greater than this limit, no attempt to detect moved lines will be made.

Default 100.

wikidiff2.max_word_level_diff_complexity
When comparing two lines for changes within the line, a word-level diff will be done unless the product of the LHS word count and the RHS word count exceeds this limit.

Default 40000000.

MediaWiki
If the module is installed into PHP, MediaWiki will try and use it. See $wgExternalDiffEngine for configuration options.

HTML
The HTML diff—a number of HTML table rows with the rest of the document structure omitted—is available as a side-by-side or inline comparison. The characters "<", ">" and "&" will be HTML-escaped in the output. In the Wikidiff2 C++ library, you can access the side-by-side diff using the  class or the inline diff using the   class. Both classes include an execute method that returns the diff of the text passed in as parameters. You can also access these execute methods using the PHP wrapper functions  (for the side-by-side diff) and   (for the inline diff).

JSON
The JSON diff provides structured data to compose a visual, line-by-line comparison between two sets of text. In the Wikidiff2 C++ library, you can access the JSON diff using the  class, which includes an execute method that returns the diff of the text passed in as parameters. You can also access this execute method using the PHP wrapper function.

JSON diff schema

The JSON diff includes properties to identify changes between the two sets of text. For an example of a JSON diff, see the MediaWiki REST API compare revisions endpoint.

Links

 * Overview on the Wikidiff2 improvements by the WMDE tech team in 2017/18
 * How to release Wikidiff2 on the Wikimedia production system
 * Wikidiff2 on Phabricator