User:EProdromou (WMF)/The hard way to check for reverted edits

Suppose you want to differentiate between revisions of a page that contribute to the final content of the page, and revisions that don't. How could you do this?

Consider a page with a simplified history like the following. Revisions are in forward chronological order. Most of our wiki articles are much larger than a single word, but keeping it short helps later.

We can think of each text value of a revision as a state, and the ID of the revision as an edge that connects the previous state to the current state. We could then draw a graph that shows how the page evolves over time.

We can then use a simple graph-walking algorithm to determine the unreverted states:


 * Start with the last state ("motocross") and add it to our list of unreverted states
 * Find the earliest incoming edge (8)
 * Follow that edge to the previous state ('throwback") and add it to our list of unreverted states
 * Find the earliest incoming edge (4)
 * Follow that edge to the previous state ("charity") and add it to our list of unreverted states
 * Find the earliest incoming edge (1)
 * Follow that edge to the previous state (null) and add it to our list of unreverted states
 * Stop, because we're at the beginning null state.

So, our unreverted states are "charity", "throwback", and "motocross". Our reverted states are "premiere", "snuggle", "perish". Finally, our reverted edges are any edges that are incoming to reverted states: 2, 5, 6, and 9.

Making it efficient
A couple of things can be done to make this a little bit efficient.


 * 1) Cache the graph.
 * 2) Cache the reverted and unreverted edges lists.
 * 3) When a new edge is added (on edit), update the graph, but flush the reverted and unreverted edges lists.

A new edit can radically change the graph; for example, a new edit (11, "perish") would make the reverted edges 2 and 8.