Requests for comment/Branching

This request for comment proposes a MediaWiki extension to allow branching histories for article content.

Background
Encyclopedia projects based on MediaWiki are limited by a linear article history. The effects of this limitation are that any edit is dependent on previous edits, and that tools which protect an article from open editing (Flagged Revisions, Pending changes, Page protection) cause undesirable side effects such as discouraging new editors with good intentions from working on popular or controversial articles. There is a huge anti-vandalism maintenance overhead on Wikipedia, mostly attributable to the linear revision model &mdash; vandalism must be carefully reverted rather than simply ignored. Even the newest protection and antivandalism tools are using a "release tag" concept, which in the world of software development has been deprecated in favor of release branches for ... a very long time.

There has been some steady interest in implementing a branching model for Wikipedia, or even a forkable repository, which would have far-reaching implications ranging from lower maintenance overhead, to social policy changes. See Tilman Bayer's excellent Timeline of "distributed Wikipedia" proposals for an overview of some potential embodiments. Many content forks have already been attempted, for example, the Spanish Wikipedia was challenged by the "Enciclopedia Libre Universal en Español" in 2002: a central demand of the seceding group was to guarantee that Wikipedia would not accept paid advertisements (which L. Sanger had promised to introduce). They have claimed that their actions forced WMF to eventually adopt this policy. Therefore, forking is good for your health ;) Since there is no infrastructure supporting forking, most derivative works quickly diverge and cannot be reintegrated easily.  In the case of the Spanish fork, tens of thousands of articles were rewritten and synchronized manually.

This proposed change has the potential to kill the "View source" tab &mdash; there would be no harm in making every article available for editing, with the exception of a relatively small number of "protected titles".

Use-cases

 * Article is protected
 * Editor wants to make a change for public review
 * Sandboxed changes
 * The edits are incomplete, and will be resumed at a later date.
 * Provisional edits, such as suggested edits to another user's page.
 * Edit conflicts
 * An edit was made on an out-of-date revision, and automatic rebase is impossible or not desired.
 * Decoupled conflict resolution workflow. With branching, resolution does not have to happen immediately.
 * Closed wikis (see below)
 * Offline editorship
 * Changes are made without internet access. They will be synchronized later.
 * Interwiki synchronization
 * Merges and comparison between wikis (forks or even between languages) in order to combat or highlight divergence

Divergence
We should build a system where branches are either merged back into the trunk, or split into a separate article. The possibility of parallel articles is interesting, but would be a usability disaster. Also, merging patches in non-line-based documents is not a well-studied problem, and prose is very sensitive to context, so the difficulty of a merge would increase quickly as branches diverge.

Branching model
There are many options here. I am prejudiced towards a system like Wikimedia's git/gerrit review, where changes are always based on the currently accepted revision, "master". This solves the problem in which vandalism and its rollback are leapfrogged over legitimate changes. A major drawback is that patches can get lost or ignored, and become stale relative to master, causing the divergence problem above.

Complexity
It's probably best if this feature is mostly transparent to editors and readers. Branching is the cause of much confusion in source control systems, and there is little agreement on the best practices, even for the most basic tasks like development and releases. Ideally, a branch is created by the backend whenever necessary (i.e., whenever the trunk does not track the change), according to predetermined logic.

Interwiki / Closed wikis
This concept could be used across wikis, if the schema supported it.

For example, I find a protected page on a wiki which does not have the nonlinear extension. I copy that page onto my home wiki, using a tool which preserves origin metadata. I can edit, and publish the altered page and diff for review, locally.

No common ancestor
If revision metadata is lost or damaged, it would still be nice to have some ability to reintegrate the lost changes. For example, a page is copied into a sandbox incorrectly, then edited. We should be able to diff any pair of articles, and merge on a per-chunk basis.

Implementation
A very basic prototype exists, Extension:Nonlinear, can track branches and display them in the history log. There is no UI to create branches.


 * Gather requirements and community feedback, especially looking for help with the UI and finding the minimal useful scope for an initial release.
 * Write an extension to create and manage branches
 * Integrate with protection and antivandalism extensions