Extension talk:FileIndexer

Please, can yo tell me the exakt position, where to insert your script in "specialUpload.php". I use the wiki in local domain.

thanks casio

Unique Words
Perhaps for efficiency, it might be worth using a script to attach unique words only to the Wiki as opposed to the file contents. I might give it a go once I learn how.

I had taken a crack at this. here is an example of my hack for PDF's. Should work with other formats:

. . $toexec = "/usr/local/bin/pdftotext ". $this->mSavedFile. " - | tr -d ',\"<>/\:\?\;1234567890!@#$%^&*{}[]+' | tr ' ' ' \n' | sort | uniq -i"; ..

Works in my 1.8.2.

Catdoc
I performed the catdoc 0.94 installation, and my found the xls & ppt executables in /usr/local/bin rather than the /usr/bin as shown above (The pdf and word files installed as shown above for me.) --Erik Heidt, 31 July 2005

large files
Regarding MySQL error "1153: Got a packet bigger than 'max_allowed_packet' bytes"

In using the above patch to upload large PDFs, I encountered the MYSQL error 1153. This error results when you attempt to execute a SQL statement against mySQL which is larger than the system set default. ( For more information see the error explanation on the mySQL website -> MySql Packet-too-large page )

Rather than increase the size of allowable packets, I decided to truncate the text which is returned in $NewDesc to a value large enough that I "probably" get a sample of text for good searches, but small enough that I don't (1) get this error or (2) commit tons of db storage to a single files index text.

After some research I set the value at 512K of text, here is the code I inserted into the MHart patch from above:

MW 1.9.3 Problem
I have encountered a problem using this extension with MediaWiki 1.9.3 -> no description was sent to SpecialUpload.php and description field was empty. This was resolved by removing "\r\n" when sending it to $NewDesc

--Erik Heidt, 1 September 2005