Requests for comment/Branching

This request for comment proposes a MediaWiki extension to allow branching histories for article content.

Background
Encyclopedia projects based on MediaWiki are limited by a linear article history. The effects of this limitation are that, any edit is dependent on previous edits, and tools which protect an article from open editing (Pending changes, Flagged Revisions, Page protection) cause undesirable side effects such as discouraging new editors with good intentions from working on popular or controversial articles. There is a huge anti-vandalism maintenance overhead on Wikipedia, mostly attributable to the linear revision model &mdash; vandalism must be carefully reverted rather than simply ignored. Even the newest protection and antivandalism tools are using a "release tag" concept, which in the world of software development has been deprecated in favor of release branches for ... a very long time.

There has been some steady interest in implementing a branching model for Wikipedia, or even a forkable repository, which would have far-reaching implications ranging from lower maintenance overhead, to social policy changes. See Tilman Bayer's excellent "Timeline of 'distributed Wikipedia' proposals" for an overview of similar initiatives. Many content forks have already been attempted, for example, the Spanish Wikipedia was challenged by the "Enciclopedia Libre Universal en Español" in 2002: a central demand of the seceding group was to guarantee that Wikipedia would not accept paid advertisements (which L. Sanger had promised to introduce). They have claimed that their actions forced WMF to eventually adopt this as a policy. Therefore, forking is good for your health ;) Since there is no infrastructure supporting forking, derivative works quickly diverge and cannot be reintegrated easily.  In the case of the Spanish fork, tens of thousands of articles were rewritten and synchronized manually.

This proposed change also has the potential to kill the "View source" tab, which is antithetical to the idea that the public is free to edit anything on the site. For example, when new editors create an account and attempt to edit a popular page such as (*cough*) Justin Bieber, they will discover that the page is semi-protected and cannot be edited, until the new user has become "autoconfirmed", proving that they are not a young fan :P

Use-cases

 * Branching model determined by editor class
 * New and anonymous editors' changes could be on a branch, and confirmed editors' changes on trunk.
 * Article is protected
 * Editor wants to make a change for public review, and does not have permissions to edit the page
 * Sandboxed changes
 * The edits are incomplete, and will be resumed at a later date.
 * Provisional edits, such as suggested edits to another user's page.
 * Edit conflicts
 * An edit was made to an out-of-date revision, and automatic rebase is impossible or not desired.
 * Decoupled conflict resolution workflow. With branching, resolution does not have to happen immediately.
 * Outdated, unreviewed changes can be rebased.
 * Research involving revision graph structure
 * Currently, nonlinear revision data must be extracted using complex heuristics.
 * Closed websites (see below)
 * Offline editorship
 * Changes are made without internet access. They will be synchronized later.
 * Interwiki synchronization
 * Merging and comparison between wikis (a fork or even a translation) in order to combat, or highlight, divergence

Divergence
We should build a system which encourages frequent merge/split resolution. The possibility of parallel articles is interesting, but would be a usability disaster. Also, because merging patches in non-line-based documents is not a well-studied problem, and prose is very sensitive to context, the opportunity for merging recedes as branches diverge.

Branching model
There are many options here. I am prejudiced towards a system like Wikimedia's git/gerrit review, where changes are always based on the currently accepted revision, "master". This solves the problem in which vandalism and its rollback are leapfrogged over legitimate changes. A major drawback is that patches can get lost or ignored, and become stale relative to master, causing the divergence problem above.

Complexity
It's probably best if this feature is mostly transparent to editors and readers. Branching is the cause of much confusion in source control systems, and there is little agreement on the best practices, even for the most basic tasks like development and releases. Ideally, a branch is created by the backend whenever necessary (i.e., whenever the trunk does not track the change), according to predetermined logic.

Interwiki / Closed wikis
This concept could be used across wikis, if the schema supported it.

For example, I find a protected page on a wiki which does not have the nonlinear extension. I copy that page onto my home wiki, using a tool which preserves origin metadata. I can edit, and publish the altered page and diff for review, locally.

No common ancestor
If revision metadata is lost or damaged, it would still be nice to have some ability to reintegrate the lost changes. For example, a page is copied into a sandbox incorrectly, then edited. We should be able to diff any pair of articles, and merge on a per-chunk basis.

Implementation
A very basic prototype, Extension:Nonlinear, can track branches and display them in the history log. There is no UI to create branches.

Plan

 * Gather requirements and community feedback, especially looking for help with the UI and finding the minimal useful scope for an initial release.
 * Write an extension to create and manage branches
 * Integrate with protection and antivandalism extensions
 * Extend schema to provide enhanced metadata and multiple inheritance

UI changes
The edit page will always be available.

If the user does not have permission to edit the live version of a page, the "Save" button will change to "Submit for review".

A "Save draft" button may appear in the edit page. The effect would be to save the current text to a user sandbox page, where it will not appear in the site's "Recent changes". When editing is resumed and then completed, it can be saved under its original article name.

Schema changes
It has been pointed out that the current "revision" table already supports arbitrary directed graphs, so our extensions will simply provide extra information about these links. If the extension is disabled or uninstalled, the remaining links should not corrupt the database. Any revisions not appearing in the live article (trunk) should appear in the article's history page with a branch marker.

In later stages of implementation, we will need to improve the schema in order to support merging and other metadata.