FileStore
From MediaWiki.org
The image undeletion archive for MediaWiki 1.7 uses the new FileStore class. My general intention is to migrate all image storage in the future to use this, possibly with some changes.
It's based on earlier musings, see 1.6 image storage
Contents |
[edit] Filenames
A given file is identified by a storage key, which by amazing coincidence is its filename in the basic filesystem-based implementation.
The key consists of a content hash (SHA-1 encoded as base-36) and the normalized file extension.
Example filename:
- 0224mu8tgnphimr3bksx9r0p1lhfj7u8.png
The use of a hash of the contents allows:
- most duplicates are stored only once
- renaming of front-end file references
- existing filesystem restrictions on front-end names can be lifted (slashes, quotes, etc)
- servers with funny filesystems like NTFS etc. can run compatibly
The use of an extension allows:
- files can be served straight out of the filesystem by a web server
- files can be mirrored very easily
Base-36 encoding for the 160-bit SHA-1 hash brings the filename down from 40 characters to 32. There is a possibility some rare filenames may contain filterable words, however...
Much more than base-36 is not practical without making things complicated; base-64 would be still more compact but will break on case-insensitive filesystems (Mac, Windows). Using other characters could also have portability problems, and makes URLs uglier with encoding.
[edit] Subdirectory hashing
Hashed subdirectories are made on a char-at-a-time basis, eg:
- 0/2/2/0224mu8tgnphimr3bksx9r0p1lhfj7u8.png
One possibility suggested in the past was to remove the split chars from the filename, so you'd have:
- 0/2/2/4mu8tgnphimr3bksx9r0p1lhfj7u8.png
That could make URLs shorter when serving straight out of a filesystem, so might be considered as a future enhancement.
[edit] Non-file storage
It should be possible to adapt this interface to database or other server storage of data.
[edit] Thumbs
Not sure how best to handle thumbnails etc yet.
[edit] API
The FileStore class provides a global lock; currently this uses MySQL-specific locking functions and may not be portable. This should be abstracted for the other databases.
By using the lock and some simple transaction classes, basic insert/remove operations can be rolled back or committed along with the database relatively safely (I hope!) by performing copy operations immediately and deletions only on commit. By catching a database exception, you can roll back the copies before passing the exception on to the error handler.
(Some fatal error conditions could cause the copies to be left in place while database additions get rolled back on connection drop. This is to be considered the safer alternative to having files vanish when the database insertions are rolled back!)
See Image::delete() and friends for examples using this interface.
Note that MySQL only allows your connection to have one lock open at a time. While this can help avoid deadlocks, it does mean we have to be careful about what other things we might use such locks for.
The lock is automatically released when your connection drops.

