Extension:GitAccess

(This is just a draft; the extension is not yet done.)

The GitAccess extension allows access to a wiki's content with the Git revision control system. It implements the smart-HTTP Git protocol. As both fetch (upload-pack) and push (receive-pack) operations necessitate a significant amount of database queries, hashing, and zlib compression, access is limited to users with the  permission, that is, only administrators by default.

"Safe" Configuration Options

 * : String containing the repository name as accessed via Special:GitAccess. Default: . This corresponds to Special:GitAccess/wiki.git.

"Dangerous" Configuration Options

 * : Array of canonical namespace names to skip generating trees for. If modifying this, you may want to use  Default:

Limitations
The following actions may cause hash mismatches to occur, in which case the GitAccess tables will have to be wiped and re-populated:
 * Merging page histories
 * Enabling subpages on a namespace after GitAccess has already recorded history for pages within that namespace

GitAccess circumvents the following features of MediaWiki:
 * Hiding revisions of pages

Other limitations:
 * You can only push the master branch to a wiki with GitAccess. You may however merge branches in another repository, then push to the wiki. Pushing other branches doesn't make any sense anyway, since MediaWiki is only designed to display one set of data anyway.
 * You may not have a namespace literally called "(Main)", as this is is used as a directory (folder) name to store pages in the main namespace.
 * You may not have page named "Aliases" in the GitAccess_root namespace, as this file is used to map MediaWiki page titles to safe, operating system-friendly file names.
 * Adding files with characters that are disallowed on some operating systems (e.g. Windows) from an operating system that supports such characters is not supported and may cause problems with the Aliases file.
 * The Aliases.xml file must NEVER have invalid syntax, at any time. You may not fix the file in a later commit since GitAccess will still try to use the old revision of Aliases.xml to convert titles for commits after (and including) the commit with bad syntax, up to the revision with corrected syntax. If a pull request is received with a commit to correct syntax mistakes, use a squash commit to avoid old revisions with bad syntax. Any pushes to the repository will be rejected if there are syntax mistakes in any revisions of Aliases.xml.

The GitTree class
The GitTree class is by far the most complicated part of GitAccess. It handles nearly all of the translation between MediaWiki pages and Git's format. (Though serialization and communication with Git clients are of course handled separately.)

When generating a new tree (i.e. MediaWiki to Git conversion), the tree object is subjected to four "filter passes":


 * Subpages pass: GitTree->processSubpages is run, which checks if subpages are enabled for current namespace. If so, it spawns new GitTrees (GitTree::newFromSubpage) for all files with a forward slash (/) and text in front of the slash. If there is no text in front of the slash, then the slash is left alone for the next filter pass to deal with.
 * Illegal characters pass: The aliases registry checks filenames for illegal characters and attempts to resolve them first by checking if there are already alias registry entries for them, then by generating and storing a new alias if no aliases were found.
 * Capitalization pass: The aliases registry checks for filenames that differ only by capitalization. While this is not a problem on case-sensitive operating systems like Linux, macOS, and the BSDs, it is a problem on Windows. The alias registry attempts to resolve any conflicts first by checking if there are already alias registry entries for them, then by generating and storing a new alias if no aliases were found.
 * Aliases application pass: All the above filter passes only change metadata; however this pass creates a new revision of GitAccess_root:Aliases, modifies the root GitTree accordingly, and updates the corresponding GitCommit. Prior to generating the XML for GitAccess_root:Aliases, however, it first checks for and condenses double aliases, which result from titles with both illegal characters and capitalization conflicts.

When generating a diff to apply (i.e. Git to MediaWiki conversion), a similar series of un-filters are run.

The Aliases Registry
MediaWiki is very permissive regarding the characters a title can contain. Lots of these characters are disallowed on certain operating systems, especially Windows, for good reasons. In order to translate these page names into file names, there has to be a way of recording how filenames that have been stripped of illegal characters match to MediaWiki titles. Additionally, users should be able to create pages in the Git repository and push them to the wiki with a title containing characters that would be illegal as filenames on some platforms.

The GitAliasRegistry class is required to be quite complex. Since the Aliases file is not automatically updated when a new title appears on the wiki, the Aliases page can sometimes have a newer revision ID when updated by GitAccess, despite applying to a Git commit based on revisions many IDs back. Therefore, each GitAliasRegistry class requires a reference to the relevant GitCommit so that it can attach a new revision of GitAccess_root:Aliases to that commit, a reference to the root GitTree to change the referenced GitBlob for Aliases.xml, and a reference to the GitTree which is being inspected for illegal characters and titles that are exactly the same except for capitalization (which is disallowed on Windows).