Extension:FileIndexer
|
|
This extension stores its code inside a wiki page. Please be aware that MediaWiki developers do not review or keep track of extensions that put their code on the wiki.
|
|
|
WARNING: the code or configuration described here poses a major security risk.
Problem: Vulnerable to code injection attacks, because it passes user input directly to executable statements, such as exec(), passthru() or include(). This may lead to arbitrary code being run on your server, among other things |
|
FileIndexer Release status: beta |
|||
|---|---|---|---|
| Implementation | Special page | ||
| Description | This extension makes uploaded document files searchable. | ||
| Author(s) | MHart, Flominator, Hxwiki, raZe (RaZeTalk) | ||
| Last version | 0.4.5.03 | ||
| MediaWiki | New version: 1.14 + Flominators version: 1.9.0 - 1.11 (Versions before require patching) |
||
| License | No license specified | ||
| Download | No link See section 'Changelog' |
||
|
|||
|
|||
|
Check usage (experimental) |
|||
Contents |
[edit] History
MHart modified the standard upload page so that uploaded Microsoft Word, Microsoft Excel, Microsoft PowerPoint, and Adobe PDF documents will have their contents indexable.
He started by downloading and installing various Linux command line utilities that will take one of the above formats and output the text - antiword, xls2csv, ppt2text (catppt), and pdftotext.
Then he modified SpecialUpload.php where it tests for a successful upload and just before it inserts the uploaded file information into the database. What it did was make the text of the word document an HTML comment block in the description text of the image's file page.
Two years later Flominator found the hook UploadForm:BeforeProcessing and created an extension out of it. Then Hxwiki came and modified the code to work with MediaWiki 1.11. See section Historical Versions
Until MediaWiki version 1.8 (2006) the extension required patches to core MediaWiki code (if you would like to use this extension, we strongly recommend you upgrade to a version of mediawiki later than 1.8).
In May 2008 I (raZe) published a new, much more advanced version of this extension as a complete rewrite.
[edit] Compatibility with MediaWiki
1.11 untested | noone 1.12 untested | noone 1.13 fails | Johannekie | 23.02.2010 1.14.x running 100% | raZe | 29.06.2009 1.15.x running 100% | raZe | 03.10.2010 1.16.0-5 running 100% | SmartK | 16.08.2011 1.17.0 running 100% | SmartK | 20.10.2011 1.18.0 fails | Swus | 29.11.2011
[edit] Requirements
- The extension uses different external tools to read the content of supported filetypes. The default configuration requires the following Linux tools installed on the server:
- /usr/local/bin/pdftotext - http://www.foolabs.com/xpdf/
- /usr/local/bin/iconv - http://www.gnu.org/software/libiconv/
- /usr/local/bin/antiword - http://www.winfield.demon.nl/
- /usr/local/bin/xls2csv - catdoc
- /usr/local/bin/catppt - catdoc
- /usr/local/bin/strings
- /usr/local/bin/unzip
- If you want to use this extension on a Windows server, then you need to find and configure corresponding tools.
[edit] Installation
Following steps are needed for installation:
[edit] Step 1: Prepare Extension Directory
Create a folder 'extensions/FileIndexer' in MediaWikis documentroot.
[edit] Step 2: Copy Code
Create all (actually four) codefiles inside this new folder and make sure the webserver can read these files.
Files:
[edit] Step 3: Configuration
Open file FileIndexer_cfg.php with an editor and configurate the extension for your needs. See section Configuration for detail.
[edit] Step 4: Temporary Files Directory
Make sure the webserver has writeaccess to the directory configurated in parameter '$wgFiRequestIndexCreationFile'.
[edit] Step 5. LocalSettings.php
- Add the following lines to LocalSettings.php:
# Makes uploaded documents searchable include("$IP/extensions/FileIndexer/FileIndexer.php");
- Also, by default, the NS_IMAGE namespace is not searched:
$wgNamespacesToBeSearchedDefault = array( NS_MAIN => true, NS_TALK => false, NS_USER => false, NS_USER_TALK => false, NS_PROJECT => false, NS_PROJECT_TALK => false, NS_IMAGE => true, NS_IMAGE_TALK => false, NS_MEDIAWIKI => false, NS_MEDIAWIKI_TALK => false, NS_TEMPLATE => false, NS_TEMPLATE_TALK => false, NS_HELP => false, NS_HELP_TALK => false, NS_CATEGORY => false, NS_CATEGORY_TALK => false );
- While new users will inherit the above settings, if there are pre-existing users, it's necessary to update each of their userOptions by (NS_IMAGE == searchNs6):
/path/to/wiki# php maintenance/userOptions.php searchNs6 --new 1 --old ''
[edit] Step 6: Template:FileIndex
NOTE: This is just needed if you use the template 'FileIndex' in configuration parameters $wgFiPrefix/$wgFiPostfix (see section 'Configuration')
Create a template [[Template:FileIndex]] that fits your needs for the output of indexes.
The following is just a simple example that can be changed any time later:
== File Index ==
The following index was taken from the files content:
<!-- {{{index}}} -->
[edit] Configuration
The following switches and parameters are implemented:
| Parametername | Type | Developers Default | Description |
|---|---|---|---|
| $wgFiCheckSystem | BOOL | FALSE | If TRUE system will be checked each time the specialpage is called or an index creation is started.
NOTE: if you install this extension for the first time you should set this option to TRUE to check if all external tools are reachable. |
| $wgFiCommandPaths | ARRAY | see configuration file | Maps the fully qualified callpaths of all external tools to a short name. |
| $wgFiCommandCalls | ARRAY | see configuration file | Maps file extentions to a comandline template. A template uses the constant WC_FI_FILEPATH for the path to the file to be indexed. The external tools are referenced by using the constant WC_FI_COMMAND, followed by an opening '[', the mapped name for the fully qualified callpath (see Parameter $wgFiCommandPaths) and a closing ']'.
Example: 'odt' => WC_FI_COMMAND . "[unzip] -p \"" . WC_FI_FILEPATH . "\" content.xml" |
| $wgFiTypesToRemoveTags | ARRAY | see configuration file | Lists all file extensions (filetypes) that use tags like xmlfiles. These files will be cleared from tags. |
| $wgFiRequestIndexCreationFile | STRING | "/tmp" | This is a required path to a systemdirectory where the webserver has writeaccess. It is used to leave a note during uploads when an index shall be created. Otherwise any form of warning during the upload dialog would result in no index creation. |
| $wgFiPrefix | STRING | "<!-- FI:INDEX-START -->{{FileIndex |index=" | Unique string to mark the head of the indexblock which is needed to actualize automaticaly. Additionaly it formats the output of the index. New with version 0.4.5.03 it makes use of a template 'FileIndex'. This template should be used to format the output of an index finally.
|
| $wgFiPostfix | STRING | " }}<!-- FI:INDEX-ENDE -->" | Unique string to mark the tail of the indexblock which is needed to actualize automaticaly. New with version 0.4.5.03 it closes the template 'FileIndex'. (see also parameter $wgFiPrefix)
|
| $wgFiArticleNamespace | INT | NS_IMAGE | Sets the namespace to place indexes on file uploads with index creation. For example if this is set to the mainnamespace and you upload a file X with index creation, the index will be saved in article X in the mainnamespace.
The configured namespace is also the default namespace selected in the special page. The namespace is specified by ints number, not its name! NOTICE: If you use the namespace NS_IMAGES ('File:') for your indexes, make sure you configure your wiki to search this namespace, too. |
| $wgFiMinWordLen | INT | 3 | As the filtering algorithms are very basic till now by this value you may at least specify a minimum length a string must have to be registered in the index. Values lower than 1 are switched to value 3. |
| $wgFiLowercaseIndex | BOOL | TRUE | Switch to decide if all words of the index shall be lowercased or be left as in original (which results in bigger indexes in general). |
| $wgFiSpDefaultWildcardSign | CHAR | "*" | Sets the wildcard sign that can be used in the special page to filter files. |
| $wgFiSpWildcardSignChangeable | BOOL | TRUE | If FALSE the wildcard sign on the special page may not be changed per request. |
| $wgFiSpNamespaceChangeable | BOOL | TRUE | If FALSE the destination namespace on the special page may not be changed per request and indexes will only be createable in the namespace specified in $wgFiArticleNamespace (and only this namespaces will be searched for indexes in 'check mode'). |
| $wgFiCreateOnUploadByDefault | BOOL | TRUE | Switch to determine if the checkbox in the uploadform to create/update an index shall be set in general at first. |
| $wgFiUpdateOnEditArticleByDefault | BOOL | FALSE | Switch to determine if the checkbox in the editform to update an index of an article that may be an indexarticle to a file shall be set in general at first. |
[edit] Open shortcomings
- Breaking up on an upload results in a small but useless file in the specified directory - cronjobs may clean this but... :-(
[edit] Historical Versions
You can find the first version of this extension developed by Flominator in the following subarticle:
- Code for MediaWiki 1.11 +
- Code for MediaWiki 1.9 - 1.11
- Code for up to MediaWiki 1.8
Caution: This is just an addition to Code for MediaWiki 1.9 - 1.11 to use with older wikis!
[edit] Changelog
| Date | Version | Editor | Changes |
|---|---|---|---|
| 08.08.2007 | n/a | Flominator |
|
| 28.11.2007 | n/a | Flominator |
|
| 14.05.2008 | n/a | Flominator |
|
| 15.05.2008 | v0.1.0.00 | raZe |
|
| 29.06.2009 | v0.2.1.00 | raZe |
|
| 01.07.2009 | v0.2.2.00 | raZe |
|
| 03.10.2010 | v0.4.5.03 | raZe |
|
[edit] Any Questions
For more hints and a place to ask your questions, see Extension talk:FileIndexer
- Extensions which host their code in-wiki
- Extensions with arbitrary execution vulnerabilities
- Beta status extensions
- Special page extensions
- Extensions with no license specified
- EditPage::showEditForm:initial extensions
- UploadForm:initial extensions
- UploadForm:BeforeProcessing extensions
- ArticleSave extensions
- UploadComplete extensions
- All extensions
- 2012 Q1 Extension Page Review Drive