Requests for comment/Branching

This request for comment proposes a branching history feature for article content.

Overview
Branching could play a major role in the following use cases:


 * Collaborative, real-time editing.
 * Branching can be thought of as long-term collaborative editing. Although our future etherpad-like protocol will probably speak to itself in streams of operational transformation tokens, both features use many of the same algorithms to resolve and render content.  If realtime, collaborative editing will be tolerant of large net lags, the problems we face are almost identical.


 * Draft-mode or unpublished editing.
 * We can improve the experience over "save a copy and edit", by tracking ancestors and rebasing on demand. Personal drafts can be made private, or advertised using branching mechanisms.


 * Distributed content storage.
 * Long-term divergence can be tolerated if we have enough information to synchronize again later. This enables offline work, redundant or peer-to-peer decentralized stores, or even competing forks.


 * Independent editor societies.
 * We can agree to disagree, and achieve consensus at any pace. Branches could have their own rules, for example a "review" branch for original writing about music or legislation, a "research" branch where scientists share initial results, or if there are issues of force (edit warring), in the extreme case disputed content can be held on independent servers during arbitration.
 * Wikipedias in their canonical form can continue in their present, monolithic form, or may decide to introduce and publicize some form of branching. Either way, we can support any configuration of multiple editions.

Implementation
A very basic prototype, Extension:Nonlinear, can track branches and display them in the history log. There is no UI to create branches, yet.

Planning

 * 1) Break tasks into phases with well-defined scope.
 * 2) Gather requirements and community feedback.
 * 3) Find support for development.

Prototype
Build an internal-facing prototype.


 * 1) Write an extension which can create and manage branches.
 * 2) Branches should probably be explicitly visible.  Balance debugging and conceptual needs against the vision of completely unobtrusive branching.
 * 3) Need ideas for UI design.
 * 4) Integrate with protection and antivandalism extensions.

Pilot
Deploy the MVP.

Future

 * 1) Synchronization between servers.
 * 2) Branch metadata should support: N-way merge, remote servers (unique identifier that includes local db name, etc.)

UI changes
The UI is the fun part of this project. Probably, branches will develop some of the character of what we call a namespace today, and will be interpreted in various ways. For example, content structured as an essay should be rendered as a sidebar, but ordinary drafts should be displayed as inline differences.

Branch visibility can be toggled. A summary or legend announces the existence of branches.

Some wireframes:









Editing an article in draft mode should be possible everywhere. If the title is protected or you have chosen to make your drafts private, your changes will be saved to your sandbox. Otherwise, your draft will be discoverable from the main article.

If the editor cannot or does not wish to edit the published version of a page, they can "Submit for review" rather than "Save".

There's a merge workflow, which will somehow feel like an incremental improvement to existing antivandalism tools.

Schema changes
It has been pointed out that the current "revision" table already supports arbitrary directed graphs, so our extensions will simply provide extra information about these links. If the extension is disabled or uninstalled, the remaining links should not corrupt the database. Any revisions not appearing in the live article (trunk) should appear in the article's history page with a branch marker.

In later stages of implementation, we will need to improve the schema in order to support merging and other metadata.

Review and merge workflow
We'll have to enhance the review workflow in order to support more complex types of merging.

Diff engine
Producing nice differences between arbitrary contents is a common theme in any implementation of branching: Patchoid.

Performance
It's unacceptable to slow down the site experience.

Extra data and indexes to support branching will add load on the database. Is there a way to avoid this? Updates can be non-blocking...

Divergence
We should build a system which encourages frequent merge/split resolution, for the common cases. The possibility of parallel articles is interesting, but unintentional forking would be a usability disaster. Also, because merging patches in non-line-based documents is not a well-studied problem, and prose is very sensitive to context, the opportunity for merging diminishes as branches diverge.

Branching model
We need to choose a representation, such as git, vector clock, simple linking...

Complexity
It's probably best if this feature is mostly transparent to editors and readers. Branching is the cause of much confusion in source control systems, and there is little agreement on the best practices, even over basic tasks such as development and release management.

Ideally, a branch is created by the backend whenever necessary (i.e., whenever the trunk does not track the change), according to predetermined logic. No humans should have to maintain branches.

Interwiki / Closed wikis
This concept could be used to synchronize with remote articles, even when the remote server is not participating in branching.

For example, I find a protected page on a wiki which does not have the branching extension. I copy that page onto my home wiki, using a tool which preserves origin metadata. I can edit, and publish the altered page and diff for review, locally. As it is updated on the protected wiki, I can synchronize with the changes.

Spam and slander
We can't increase the reward for vandalism, so we'll continue to respect existing protection, banning offensive titles, and flagging changes to protected pages for review before publishing. However, we can leverage branching by always allowing the edit to take place, and storing the change in a format that can be rebased, merged in any sequence, and so on.

No common ancestor
If revision metadata is lost or damaged, it would still be nice to have some ability to reintegrate the lost changes. For example, a page is copied into a sandbox incorrectly, then edited. We should be able to diff any pair of articles, and merge on a per-chunk basis.

It would be great to have partial conflict resolution.

Resources
IdeaLab proposal