Extension:OracleTextSearch

From MediaWiki.org
Jump to navigation Jump to search
MediaWiki extensions manual
Crystal Clear action run.svg
OracleTextSearch
Release status: beta
Implementation Search
Description Extends normal oracle search with file indexing capabilities.
Author(s) freakolowsky
Latest version 1.0 (14.06.11)
MediaWiki Works with 1.17
License No license specified
Download
Translate the OracleTextSearch extension if it is available at translatewiki.net
Check usage and version matrix.

What can this extension do?[edit]

This extension extends the standard SearchOracle class by adding Oracle Text indexing of files that are stored outside of DB. The search index per key on external data is limited to 2GB in oracle internally.

Indexing can be done over (links to detailed list in Supported Types):

  • word processing and desktop publishing formats
  • spreadsheet and presentation formats
  • database formats (i.e. Access, dBase)
  • archive formats (archived documents are indexed and combined into a single index key)
  • graphic formats (image metadata, EXIF, ...)
  • other formats (executable or library metadata, text in Macromedia Flash, ID3 tags of MP3 files, vCards, ...)

Installation[edit]

  1. Download the files from Git and place them in $IP/extensions/OracleTextSearch/
  2. Add
    require_once("$IP/extensions/OracleTextSearch/OracleTextSearch.php");
    
    into your wiki's LocalSettings.php
  3. Enable file uploads
  4. Set SearchOracleText as your default search engine
    $wgSearchType="SearchOracleText";
    
  5. Add MIME types you want to index to $wgExIndexMIMETypes (see Supported Types)

Installation DB[edit]

As this is very specific extension you have to for now manually execute the patch script provided with the extension. It adds a field to searchindex table and creates a context index using URL_DATASTORE and INSO_FILTER (for 9iR2 compatibility) over it.

Creating such index requires the FILE_ACCESS_ROLE CTX role in newer version of Oracle.

If you hit a ORA-03113 when creating the index, check the Troubleshooting section.

Configuration[edit]

Extension has two global parameters:

$wgExIndexOnHTTP=true;

Rewrite https local urls to http. This can be set to false if DB and web server are connected trough a public network or in case of paranoia. Setting it to false also requires appropriate ACL and Oracle Wallet settings (depending on the version od Oracle DB).


$wgExIndexMIMETypes = array(	'application/pdf', 
				'application/xml', 
				'text/xml', 
				'application/msword', 
				'application/vnd.ms-office', 
				'application/vnd.oasis.opendocument.text');

List of MIME types the search engine will consider for indexing.

Supported Types[edit]

The list of supported document types varies depending on the version of Oracle DB. Below are links to the lists in most used, supported versions.

  • 11g (11.1) [1]
  • 10gR2 (10.2) [2]
  • 9iR2 (9.2) [3]

Troubleshooting[edit]

Oracle Text can sometime be a bit difficult to set up especially if you're using the database on a not completely standard platform. Here are some failures and how to solve them.

  • ORA-03113 (actually ORA-07445) when creating the index
    • Usually caused by the ctxhx helper on the OS if DB is on 64 bit OS as the helper is in certain version compiled for 32 bit
    • Can sometimes be resolved by providing 32 bit compatibility libraries and relinking the ctx, but in most cases you do not have a simple solution and will have to update the DB to the latest patchset or even upgrade the DB
    • Can be checked by running the ctxhx helper manually on the OS
  • Search fails to index/return results with DRG-11222 in logs
    • Same as the above error but can in most cases be resolved by providing 32 bit compatibility libraries and relinking the ctx

Changelog[edit]

1.1 - Initial release - 14.06.11