Extension:WebDAV

WebDAV is a set of extensions to HTTP to support distributed authoring and versioning. It defines some request methods, message headers and XML message bodies which at their most basic, add metadata and locking to HTTP. Because it's based on HTTP and XML, it's quite easy to implement a WebDAV server in CGI or PHP. WebDAV maps very cleanly to file system primitives, so most modern operating systems support mounting WebDAV resources as file systems.

The WebDAV article and the WebDAV home page describe WebDAV in more detail. WebDAV is formally defined in RFC 4918. The WebDAV versioning extension, DeltaV, is defined in RFC 3253.

MediaWiki
This is a project to create a WebDAV interface to MediaWiki. This would let users,
 * Manage wiki articles with WebDAV clients like cadaver.
 * Mount MediaWiki and access articles like file system resources, using for instance fusedav. Users could potentially even manage metadata like file system extended attributes.
 * Edit articles directly using editors with WebDAV support, like Emacs and Eclipse.

This project is based on the WebDAV module I contributed to the Gallery project.

Installation
Execute in the top directory of your MediaWiki installation,

$ svn co http://svn.wikimedia.org/svnroot/mediawiki/trunk/extensions/WebDAV.

This should checkout "WebDavServer.php", "webdav.php", "deltav.php", and "lib/". Then point your WebDAV client at "http://.../mediawiki/webdav.php".

Status
Getting articles, putting articles, and directory listings should all work. Right now I'm focusing on support for WebDAV's versioning features, so "history Main_Page" works in the cadaver client. Next I need to figure out how HTTP authentication to MediaWiki works.

I'd like to make this project a MediaWiki extension. The trouble is I'm not sure how requests are delegated to a particular extension. Also, most WebDAV clients don't support query strings. If you want to add a resource which doesn't already exist, WebDAV clients predict the URL of the new resource by appending the new resource name to the URL of it's parent, e.g. http://.../parent/New_Name

Currently, this project works using an alternate PHP landing to "index.php". "webdav.php" supports requests to URLs of the form "http://.../mediawiki/webdav.php/New_Name". I'm not yet sure how to ship this landing as a MediaWiki extension.

The Gallery module works using Gallery's "URL rewrite" facility, so URLs of the form "http://.../gallery2/w/New_Name" are delegated to the module. I don't know if something similar is feasible in MediaWiki?

As of revision 33, Subversion update functionality is working. All checkouts and updates now use the "send-all" style Subversion communication, available I think since Subversion 1.3? "send-all" includes the bodies of every resource in an update-report in svndiff format. This extension implements svndiff version zero. Version one adds support for zlib compression in the svndiff format, but it is slightly more complicated and I haven't figured out why it's necessary, since an entire update-report can be zlib compressed as an HTTP encoding.

Diffs are not actually calculated yet, svndiff is simply used to encode the replacement of the old text with the new text. This could be optimized by implementing Subversion's binary difference system in PHP, or by serializing MediaWiki's Diff system as svndiff.

The next challenge is to get commit working. This is difficult because Subversion commits multiple pages at a time, using multiple requests and the DeltaV "activity" system. Consequently, this extension needs to store the changes somewhere until they are actually merged into the articles. Does this mean we need to add a table to the database? Semantic MediaWiki does this, so it's possible. Could we reuse some other facility already in MediaWiki?

After discussion with Brion Vibber, a "temporary-queue" table for storing DeltaV/Subversion "activities" seems like the best way to go. Extensions can tie into the updater hooks to maintain their custom tables. Brion tied updater hooks into a couple extensions. "checkuser" might have 'em.

One alternative to distributing this interface as and extension would be an extra front end app.

Subversion
Subversion is close to a DeltaV client and is very pervasive (command line, Emacs VC mode, Subclipse). If it's not too difficult to build a Subversion interface on top of this WebDAV/DeltaV interface, it would enable people to edit MediaWiki using the Emacs VC mode, or Eclipse Subclipse plugin.

One diversion from the DeltaV spec is the Subversion update-report. In this report txdelta values are base64 encodings of the svndiff format defined here, http://svn.collab.net/repos/svn/trunk/notes/svndiff

So we simply need to translate MediaWiki revisions to svndiff format.

Is it a bug for subversion clients to try getting the baseline-collection property from the version-controlled-configuration resource? Ah, I think it's the Label: header doing this work...

We need a per-MediaWiki installation UUID.

First successful MediaWiki checkout with Subversion, $ svn co http://localhost/~jablko/mediawiki/webdav.php A   webdav.php/Main_Page A   webdav.php/Resume A   webdav.php/Test A   webdav.php/Cover_Letter A   webdav.php/Home.ics Checked out revision 22. $

First successful MediaWiki exploration with Subclipse, http://cgi.sfu.ca/~jdbates/tmp/mediawiki/200706110/Screenshot-SVN%20Repository%20Exploring%20-%20Eclipse%20SDK%20.png

Another handy feature of the Subversion interface is if a project maintains documentation in MediaWiki, as many software projects do, and want to distribute documentation with the project, you can use the Subversion externals feature to automatically checkout or export documentation from MediaWiki when the project is retrieved from Subversion. The documentation can then be optionally processed to PDF using XSL_Formatting_Objects as part of the build process.

Experimental organization
For the time being, I use the following layout, based on Subversion,
 * webdav.php/* - WebDAV landing, everything after webdav.php corresponds to a MediaWiki page title.
 * deltav.php/* - DeltaV landing for versioning features.
 * ver/* - Everything after ver corresponds to a MediaWiki rev_id and is a version resource representing the page at rev_id.
 * vcc/default - The version controlled configuration.
 * bln/* - Everything after bln corresponds to a MediaWiki rev_id and is a baseline resource representing the Wiki at rev_id.
 * bc/* - Everything after bc corresponds to a MediaWiki rev_id and is a baseline collection representing the Wiki at rev_id.

In the future, we might experiment with making these URLs more like existing MediaWiki URLs (though this would require query string support on the part of clients.)

Code/package documentation
This interface might also be useful for maintaining code/package documentation, like jQuery, http://docs.jquery.com/action/edit/Events/toggle

One could use the package information in OpenWrt's Subversion repository, for instance, to build a package browser like http://packages.debian.org, based on MediaWiki.

The Wiki could also be used to author package information - package metadata could be retrieved from and maintained in MediaWiki.

I'm not sure jQuery's API documentation syntax is in fact the most elegant. XML/XHTML + Markdown might in fact be cleaner?

Commit
When Subversion does a commit, it does an OPTIONS request to get the DeltaV activity-collection-set. Then it does an MKACTIVITY request to create a new DeltaV activity. It does a CHECKOUT on the DeltaV baseline resource, then sets its log property. It does a CHECKOUT on each of the updated resources, then updates their content and properties. Finally it does a MERGE.

A nice thing is that the resources which are CHECKOUT depend only on the state of the working copy - not the state of the server/repository/MediaWiki. If the working copy does not match the latest article revisions, the commit will fail.

The checked out baseline is put in /wln and the checked out resources in /wrk. Since they're associated with the activity resource, I guess I assumed they would be under /act/.../ I wonder what happens if you do a PROPFIND on the activity resource? On a Subversion note, it would be nice for clients which support WebDAV but not DeltaV/Subversion, to be able to create an activity through an HTML interface, mount the activity and edit it, then inspect, enter a log message, and commit it with the HTML interface.

This extension needs to maintain information about activities in the database, and turn it into new article revisions on MERGE. Ideally, the way it stores information should make it as easy as possible to make new article revisions. One thought is to add a column to the revisions table, indicating which activity they belong to. This column could be set to NULL when the activity is merged. Unfortunately by changing the revisions column, other MediaWiki queries would surely be affected.

Therefore a new table or tables will be created. The easiest thing would be on MERGE to move rows from this table to the revisions table, but in SQL this is no more optimal than an INSERT and DELETE - there is no "MOVE" in SQL. Also, it's certainly more robust to go through the $article->doEdit interface. I assume calling doEdit multiple times in a request is still wrapped in one transaction?

The two arguments to ->doEdit are the new article text and the log message. So these are what the "activity" table needs to store. Note that new article text is shipped to MediaWiki in svndiff format, so a question is whether the activity table should store new text, or svndiff? I think it is better design (simpler), if more expensive, to store new text. This is because we have all the information to build the new text from the svndiff in the request that INSERTS to the activity table. We may not have all the necessary information in the request that SELECTs from the activity table, unless we add more columns to the table, making it more complex... This could be revisited in future.

It's interesting that the revision table doesn't store wikitext.

The text is stored in a separate table. So this extension could also use other MediaWiki tables for storing the text of an activity, but I think it's better to keep everything contained in the tables this extension defines.

When we get to MERGE, we need to know the text and the log message, but we also need to know the revision, to fail if it's not the latest revision. This information comes from the CHECKOUT requests. We can check that the working copy was up-to-date using $article->getLatest. If we can get the whole thing to rollback, we can interleave getLatest and doEdit. Otherwise we need to call all the getLatest first. Even then, could we trample a simultaneous wiki edit? Do we need some optimistic and pessimistic locking?

mod_dav_svn only supports CHECKOUT on the youngest revisions. Doing likewise would catch a commit of an outdated revision early. version.c line 473

TODO

 * Use global $wgTitle where appropriate, possibly in file scope.
 * Support WebDAV server side search, so "search ..." works in the cadaver client.
 * Decide what to do when someone requests a directory listing for a very large wiki. At some point (millions of articles) does this become a denial of service issue?
 * Decide what URL scheme to use for article revisions (".../mediawiki/deltav.php/Article_Name/42"?)