User:Tim Starling/Non-NFS file storage

NFS is bad for a number of reasons:


 * The in-kernel client is inflexible and has some nasty failure modes.
 * The number of network round-trips needed for common operations is very large.
 * There is essentially no locking or other concurrency support.
 * Scaling the system up to several file storage servers would be a hassle for sysadmins.

Discussion
The FileRepo component provides abstraction for the file storage backend, for the purposes of file uploads. It was designed with NFS removal in mind. However, the project was never completed.

FileRepo does provide a $file->getPath method for callers which need access to the local file for some reason. This is used by the media module to provide image scaling and video thumbnailing. WebStore uses a special subclass of File to implement image scaling on temporary files.

wfStreamFile streams a local file out to the current PHP client. An abstract interface for this was never implemented in the FileRepo architecture, so its callers use $file->getPath.

WebStore was a project to implement non-NFS file storage, as a FileRepo module, but it was always an experimental project meant to explore the design issues. To be concrete and practical it will need to be mostly rewritten.

WebStore was written in PHP, but a practical project will probably need to be in a faster language to avoid making the CPU on the file storage server a bottleneck. Also, our storage servers now run Solaris, and making the main Wikimedia instance of MediaWiki work on Solaris as on Linux may provide extra challenges. Hence there is an argument for using an existing remote file storage protocol, with a fast implementation for Solaris already existing.

WebDAV is a good possibility, but may require external locking to prevent concurrency issues. The current NFS scheme does not provide locking either, and there are no outstanding bug reports about concurrency issues, but the reasons for this are subtle and mostly accidental so it might be nice to address this rigorously. The FileRepo interface does not expose locking primitives, rather it provides atomic operations which can be implemented internally using locking.

Modules that use NFS for reasons other than file uploads were not considered during the original project. However they now seem to be the main users of NFS in terms of request rate, so they deserve to be considered in the current project.

Proposed outline

 * Implement a FileRepo subclass which is a WebDAV client.
 * Add basic support for multiple storage servers. This is not yet being requested by Wikimedia sysadmins, but it's a nice thing to have since bottlenecks tend to come up without much warning. There are multiple possible designs:
 * Use squid ACLs to distribute the upload.wikimedia.org traffic to various servers, based on directory name. Mirror this scheme in MediaWiki by having the WebDAV client connect to the appropriate storage server.
 * Use domain names to direct upload traffic to the appropriate place. This would eliminate the need for complex squid configuration and put distribution in MediaWiki's control. However, support for the old URLs would have to be kept somehow, possibly by writing a redirect script.
 * Add a stream member function to the File hierarchy and migrate wfStreamFile callers.
 * Add a streamFile member function to the FileRepo hierarchy which streams out a given mwrepo:// virtual URL. File::stream would call this. Some wfStreamFile callers might use this.
 * Add a function to FileRepo for listing directory entries. Several extensions require this.
 * Do we need this, or would better resource tracking be more appropriate -- eg keeping a list of which thumbs we've generated instead of looking in the directory at purge time? --brion 20:46, 3 August 2009 (UTC)
 * I think it would be useful to provide an interface similar to opendir/readdir so that extensions, including non-Wikimedia extensions written by developers with zero interest in optimising for slow remote filesystems, can easily be ported to non-NFS systems. WebDAV has interfaces for directory listing so it shouldn't be hard to implement. -- Tim Starling 02:44, 4 August 2009 (UTC)
 * Migrate non-upload modules to publish their files in a FileRepo.
 * In some cases (e.g. Math) the repo needs to be shared between all wikis on a site, and so it makes sense to make it separately configurable, and not part of the RepoGroup singleton.
 * Math does not need to be shared, we just use a shared dir since it seemed convenient and avoided duplicate storage. Note that generated math files are actually tracked in per-wiki database tables... also if we add some language-specific rendering options as requested in BZ, we might need to store them separately. --brion 20:49, 3 August 2009 (UTC)
 * It's not just Math, it's also Timeline, ConfirmEdit, ExtensionDistributor and ScanSet -- pretty much all of the extensions which access the filesystem for non-upload purposes. You're not seriously suggesting breaking backwards compatibility with all those extensions by requiring all users to immediately switch to per-wiki directory structures are you? -- Tim Starling 02:44, 4 August 2009 (UTC)
 * Modules which generate files to be published on the web (Math, Timeline, etc.) should be migrated to generate the file at a local temporary location and then to publish it to the shared FileRepo.
 * Modules which make heavy use of existence checks and directory listings should be considered for optimisation. ConfirmEdit, ExtensionDistributor and ScanSet do file operations for rarely-changed data, these would benefit from a simple cache with a fixed expiry time.
 * Math and Timeline would need more aggressive treatment. A scheme very similar to the one used for file transformations would be appropriate, using a 404 handler for existence tests and file generation, and directory listings only for squid cache purge operations.

Lists
Wikimedia modules which use NFS:


 * ConfirmEdit
 * Uses opendir/readdir to pick a random image
 * Uses wfStreamFile to stream out the image


 * ExtensionDistributor
 * Invokes an external utility (tar) to publish files
 * Uses opendir/readdir to get extension lists
 * Invokes svn on a remote server to act on the shared files locally


 * Math
 * Invokes an external utility (texvc) to publish files
 * Checks for file existence during parse
 * Manages directories in the shared location and migrates files from a previous directory structure


 * FileRepo
 * thumb.php uses getPath. The WebStore module uses fake File objects to interface with the media module, but doesn't provide the same UI.
 * maintenance/archives/populateSha1.php
 * maintenance/checkImages.php
 * LocalFile::getThumbnails uses opendir/readdir


 * ScanSet
 * Uses opendir/readdir to generate indexes on parse


 * Timeline
 * Invokes an external utility (EasyTimeline.pl) to publish files
 * Reads HTML snippet files generated by EasyTimeline.pl during parse
 * Checks for file existence on parse


 * ReaderFeedback
 * Creates svg/png files in /graphs directory


 * wfStreamFile callers:
 * img_auth.php
 * thumb.php
 * Special:Revisiondelete
 * Special:Undelete

Non-Wikimedia:
 * Asksql
 * DynamicPageList
 * Gnuplot
 * LatexDoc
 * PdfBook
 * RandomUsersWithAvatars
 * SemanticResultFormats
 * SocialProfile