User:MTres19/Extension:GitAccess (draft)

From MediaWiki.org
Jump to navigation Jump to search
This is just a draft; the extension is not yet done.
MediaWiki extensions manual
Crystal Clear action run.svg
GitAccess
Release status: experimental
Implementation Data extraction, Database, Special page
Description Provides smart HTTP Git protocol access
Author(s) Matthew Trescott (MTres19talk)
Latest version 1.0 (2017-1-11)
MediaWiki 1.28
PHP 7.2 with this patch
Database changes Yes
Tables git_hash
git_edit_hash
git_status_modify_hash
License GNU Affero General Public License 3.0
Download Downloads on GitHub
README on GitHub
Added rights
gitaccess
Hooks used
LoadExtensionSchemaUpdates
ArticleMergeComplete
ChangeTagCanCreate
ListDefinedTags
ChangeTagsListActive
FileDeleteComplete
ArticleRevisionUndeleted
UndeleteForm::undelete
FileUndeleteComplete
Translate the MTres19 extension if it is available at translatewiki.net

The GitAccess extension allows access to a wiki's content with the Git revision control system. It implements the smart-HTTP Git protocol. As both fetch (upload-pack) and push (receive-pack) operations necessitate a significant amount of database queries, hashing, and zlib compression, access is limited to users with the gitaccess permission, that is, only administrators by default.


Installation[edit]

  • Download and place the file(s) in a directory called GitAccess in your extensions/ folder.
  • Add the following code at the bottom of your LocalSettings.php:
    wfLoadExtension( 'GitAccess' );
    
  • Run the update script which will automatically create the necessary database tables that this extension needs.
  • Yes Done – Navigate to Special:Version on your wiki to verify that the extension is successfully installed.

Configuration Options[edit]

"Safe" Configuration Options[edit]

  • $wgGitAccessRepoName: String containing the repository name as accessed via Special:GitAccess. Default: 'wiki'. This corresponds to Special:GitAccess/wiki.git.

"Dangerous" Configuration Options[edit]

  • $wgGitAccessNSIncluded: Array with canonical namespace names as keys and boolean (true/false) values to indicate whether to include the namespace. All default (core MediaWiki) talk namespaces, as well as the GitAccess_root_talk namespace are disabled by default. However, you can enable a namespace like this:
    $wgGitAccessNSIncluded['Talk'] = true;
    

Limitations[edit]

The following actions may cause hash mismatches to occur, in which case the GitAccess tables will have to be wiped and re-populated, or will cause errors in the Git history:

  • Enabling subpages on a namespace after GitAccess has already recorded history for pages within that namespace
  • Deleting archived revisions of pages and files. (e.g. DeleteArchivedRevisions.php—this means wiping the database records of pages, not just hiding them from public view.)
  • Changing the default language of the wiki (if you are installing GitAccess on an existing wiki)

GitAccess circumvents the following features of MediaWiki:

Other limitations:

  • If your MediaWiki installation was migrated from a rather ancient version of MediaWiki (at least 1.24, possibly even newer versions) you'll likely encounter major problems due to changes in MediaWiki's storage format. Your mileage may vary; please don't report bugs about it. You'll probably need to make a fresh installation and import/export all the pages.
  • You can only push the master branch to a wiki with GitAccess. You may however merge branches in another repository, then push to the wiki. Pushing other branches doesn't make any sense anyway, since MediaWiki is only designed to display one set of data anyway.
  • You may not have a namespace literally called "(Main)", as this is is used as a directory (folder) name to store pages in the main namespace.
  • You may not have a regular page named "Aliases" in the GitAccess_root namespace, as this file is used to map MediaWiki page titles to safe, operating system-friendly file names. However, you may create this page yourself, provided it is given the XML content-format, or allow it to be generated automatically by GitAccess when needed.
  • Adding files with characters that are disallowed on some operating systems (e.g. Windows) from an operating system that supports such characters is not supported and may cause problems with the Aliases file. The same goes for case sensitivity—make sure all file names in a directory are unique, and edit the Aliases file if needed.
  • The Aliases.xml file must NEVER have invalid syntax, at any time. You may not fix the file in a later commit since GitAccess will still try to use the old revision of Aliases.xml to convert titles for commits after (and including) the commit with bad syntax, up to the revision with corrected syntax. If a pull request is received with a commit to correct syntax mistakes, use a squash commit to avoid old revisions with bad syntax. Any pushes to the repository will be rejected if there are syntax mistakes in any revisions of Aliases.xml.
  • Subpages will not appear as folders in the File and Media namespaces. (Why would you be using subpages in the file namespace anyway?)
  • All files in the File directory (folder) must have an accompanying text-based file with the file extension for wikitext (.wiki by default).
  • Executable files are not permitted to be added outside of the Media namespace.
  • You may not make more than 1024 edits and/or 1024 deletions in any single commit. This is related to the use of GROUP_CONCAT, but it's probably not a big deal. If needed the limit could be increased.

Technical Design Information[edit]

The GitTree class[edit]

The GitTree class is by far the most complicated part of GitAccess. It handles nearly all of the translation between MediaWiki pages and Git's format. (Though serialization and communication with Git clients are of course handled separately.)

When generating a new tree (i.e. MediaWiki to Git conversion), the tree object is subjected to four "filter passes":

  • Subpages pass: GitTree->processSubpages() is run, which checks if subpages are enabled for current namespace. If so, it spawns new GitTrees (GitTree::newFromSubpage()) for all files with a forward slash (/) and text in front of the slash. If there is no text in front of the slash, then the slash is left alone for the next filter pass to deal with.
  • Illegal characters pass: The aliases registry checks filenames for illegal characters and attempts to resolve them first by checking if there are already alias registry entries for them, then by generating and storing a new alias if no aliases were found.
  • Capitalization pass: The aliases registry checks for filenames that differ only by capitalization. While this is not a problem on case-sensitive operating systems like Linux, macOS, and the BSDs, it is a problem on Windows. The alias registry attempts to resolve any conflicts first by checking if there are already alias registry entries for them, then by generating and storing a new alias if no aliases were found.
  • Aliases application pass: All the above filter passes only change metadata; however this pass creates a new revision of GitAccess_root:Aliases, modifies the root GitTree accordingly, and updates the corresponding GitCommit. Prior to generating the XML for GitAccess_root:Aliases, however, it first checks for and condenses double aliases, which result from titles with both illegal characters and capitalization conflicts.

When generating a diff to apply (i.e. Git to MediaWiki conversion), a similar series of un-filters are run.

File Namespace Handling[edit]

If files and their description pages were thrown together in a single folder, it would be difficult to tell with certainty which is the file and which is the description page without relying on MIME types or file extensions, which will inevitably break when someone tries to upload a piece of wikitext as a file. Instead, the Media folder is used to store the files, and GitAccess searches the File directory to find the corresponding description page.

The Aliases Registry[edit]

MediaWiki is very permissive regarding the characters a title can contain. Lots of these characters are disallowed on certain operating systems, especially Windows, for good reasons. In order to translate these page names into file names, there has to be a way of recording how filenames that have been stripped of illegal characters match to MediaWiki titles. Additionally, users should be able to create pages in the Git repository and push them to the wiki with a title containing characters that would be illegal as filenames on some platforms.

The GitAliasRegistry class is required to be quite complex. Since the Aliases file is not automatically updated when a new title appears on the wiki, the Aliases page can sometimes have a newer revision ID when updated by GitAccess, despite applying to a Git commit based on revisions many IDs back. Therefore, each GitAliasRegistry class requires a reference to the relevant GitCommit so that it can attach a new revision of GitAccess_root:Aliases to that commit, a reference to the root GitTree to change the referenced GitBlob for Aliases.xml, and a reference to the GitTree which is being inspected for illegal characters and titles that are exactly the same except for capitalization (which is disallowed on Windows).