User:Dragons flight/EditSyntax

EditSyntax refers to an experimental tool and notation for expressing the revision history used for data dumps and page exports in a more compact notation.

Specifically, for article histories that contain multiple revisions, EditSyntax provides a mechanism for expressing each subsequent revision as a series of changes (i.e. "edits") to the preceding revision rather than requiring that the full text of each revision be included in the dump. This is accomplished by adding a series of new XML compatible tags that define the necessary revisions.

The current version of the tool is implemented in Python as editSyntax.py, and the accompanying sample code ConvertToEditSyntax.py and ConvertFromEditSyntax.py demonstrate converting XML dump files to and from this notation.

The tag defines the EditSyntax block and replaces the tag within the block.

The changes applied are relative to the immediately preceding revision.

When applying changes, the text is broken into lines defined by the newline character. These form an array with initial index 0. The array is recalculated after each operation, so line numbering will change as lines are added/deleted.

The tag inserts a new line immediately before the line referenced by its "line" keyword.

Trains are expensive.

The inserted text may contain newline characters, allowing the effect of inserting multiple lines.

The tag will replace one or more existing lines with new text.

Trains are expensive. OR Mary had a little lamb.

The existing line (or lines) are removed and the new text is left in it's place. The replacement text may also contain newline characters if multiple lines are being inserted.

The tag will remove one or more existing lines with new text.

OR 


The  tag will replace characters within a given line with the specified replacements

elephant

The text on line 5, characters 20-29 is removed and the new text "elephant" is inserted in its place. Characters are referenced in a zero-indexed fashion so, the first character is character 0. The "pos" keyword, runs from the first character to be removed through to the first character to be kept. In other words, pos="x-y" replaces characters x to y-1. If y == x, then no characters are removed and the new text is simply inserted at position x.

The tag will append one or more lines to the end of the revision.

Mary had a little lamb.

The tag will delete all lines at or below the specified index.

Line 17 and all subsequent lines are removed.

The tag will revert to specified prior revision.

Reverts to the revision specified by id="123456". This revision id must belong to the same article and the revision must have occurred before the one currently being considered.

The tag specifies an entirely new text for the revision. It is functionally equivalent to using rather than.

full text goes here ...

The tag specifies that the lines of the revision should be permuted.

{4: 15, 8: 4, 10: 12}

The permute tag contains a "length" keyword specifying how long the new revision should be and a map relating new lines to old ones.

The new revision is generated via the following operation:


 * 1) Define two counters: new_line, old_line
 * 2) Set both to zero.
 * 3) Perform the following loop until new_line equals length
 * 4) If new_line is in the permute map, set old_line equal to value of map at new_line.
 * 5) Set the new revision at new_line equal to the revision at old_line
 * 6) Increment new_line and old_line.

For the example {4: 15, 8: 4, 10: 12} above, this results in the following mapping:

Lines of the old revision may be used once, more than once, or not at all. By convention if the map expects a line of the old revision that exceeds the number of lines it has, the result is a blank line.