Manual:External Storage

From MediaWiki.org
Jump to navigation Jump to search

External storage is an abstraction for storing the wiki's content (ie. what would normally go into the text table) outside the normal database, possibly with some kind of compression applied. Some extensions (such as StructuredDiscussions) can use external storage directly for storing other kinds of data.

The contents of external storage are addressed with an URL in the form <protocol>://<location>/<object name>, with the protocol determining what type of storage should be used. Pre-1.32 these URLs were stored in the old_text field of the text table, with old_flags set to external; since 1.32 they are stored in the content_address field of the content table.

Code[edit]

The main class for interacting with external storage is ExternalStore. You can use insert or (more typically) insertToDefault to store a piece of data and receive the URL at which it was stored; that URL can be used with fetchFromURL to retrieve the data.

Internally, ExternalStore interacts with the ExternalStoreMedium subclass corresponding with the protocol. ExternalStoreDB, which is the commonly used one, differs from the others in that it provides special handling when the stored data is a serialized HistoryBlob subclass; such objects can be retrieved with <protocol>://<location>/<object name>/<item id>, in which case the store will unserialize the object and get the appropriate item (by calling getItem on it).

In practice, you should avoid using ExternalStorage directly most of the time, and use SqlBlobStore (or an even higher-level abstraction such as RevisionStore) instead.

Configuration[edit]

An example LocalSettings.php setup:

$wgExternalStores = [ 'DB' ];
$wgExternalServers = [ 'demoCluster' => [
  [ 'host' => 'master.example.org', 'user' => 'userM',  'password' =>'pwdM',  'dbname' => 'dbM',  'type' => "mysql", 'load' => 1 ],
  [ 'host' => 'slave1.example.org', 'user' => 'userS1', 'password' =>'pwdS1', 'dbname' => 'dbS1', 'type' => "mysql", 'load' => 1 ],
  [ 'host' => 'slave2.example.org', 'user' => 'userS2', 'password' =>'pwdS2', 'dbname' => 'dbS2', 'type' => "mysql", 'load' => 1 ],
] ];
$wgDefaultExternalStore = [ 'DB://demoCluster' ];
  • The Manual:$wgExternalStores line states that a DB external store can be used. (The DB part is not an arbitrary name that can be adjusted. It has to be DB.) This corresponds to the ExternalStoreMedium subclass used, and the protocol of the blob address.
  • The Manual:$wgExternalServers line states all the usable clusters with all usable nodes of a cluster. The top-level array's keys denote a cluster's name (The above example defines only one cluster. It has the name demoCluster). The value to those keys are again arrays. They hold the specifications of the individual nodes. The first node is consider the master. All writes to the database are performed through this master node. Zero or more slave nodes may follow. (In the above example, you find two slave nodes). Each node may have its own host, user, password, dbname, and type, as shown in the example. The load parameter allows to specify how much of the load should pass through this note.
  • The Manual:$wgDefaultExternalStore line holds those external stores that may be used for storage of new text. If you omit this line, the external store will be read-only and new texts will go into the default database (i.e.: the same database holding page, revision, image data; not the cluster).

For a multi-master wiki farm setup (like Wikimedia), consider using LBFactory_Multi instead.

Database setup[edit]

For the above configuration example, you would have to:

  1. Create the database dbM on the host master.example.org
  2. Run the maintenance/storage/blobs.sql SQL-script on the on the database dbM on the host master.example.org. Do not use maintainance/sql.php for this task, as it will add the required tables to your default database (i.e.: the database holding page, revision, image data) and not to dbM. If you are not sure how to run the SQL-script on the database dbM on the host master.example.org, please consult your database documentation.
  3. Set up replication (consult your database's documentation on how to set up replication) towards dbS1 on the host slave1.example.org, and
  4. Set up replication towards dbS2 on the host slave2.example.org.

Maintenance scripts[edit]

There are several maintenance scripts for moving content to the external store:

  • moveToExternal.php - move old revisions to external storage
  • compressOld.php - compress old revisions and potentially move them to external storage
  • recompressTracked.php - move revisions (or other data) from one external storage to another and recompress them in the process