Extension talk:FileIndexer

Please, can yo tell me the exakt position, where to insert your script in "specialUpload.php". I use the wiki in local domain.

thanks casio

Unique Words
Perhaps for efficiency, it might be worth using a script to attach unique words only to the Wiki as opposed to the file contents. I might give it a go once I learn how.

I had taken a crack at this. here is an example of my hack for PDF's. Should work with other formats:

. . $toexec = "/usr/local/bin/pdftotext ". $this->mSavedFile. " - | tr -d ',\"<>/\:\?\;1234567890!@#$%^&*{}[]+' | tr ' ' ' \n' | sort | uniq -i"; ..

Works in my 1.8.2.

Catdoc
I performed the catdoc 0.94 installation, and my found the xls & ppt executables in /usr/local/bin rather than the /usr/bin as shown above (The pdf and word files installed as shown above for me.) --Erik Heidt, 31 July 2005

large files
Regarding MySQL error "1153: Got a packet bigger than 'max_allowed_packet' bytes"

In using the above patch to upload large PDFs, I encountered the MYSQL error 1153. This error results when you attempt to execute a SQL statement against mySQL which is larger than the system set default. ( For more information see the error explanation on the mySQL website -> MySql Packet-too-large page )

Rather than increase the size of allowable packets, I decided to truncate the text which is returned in $NewDesc to a value large enough that I "probably" get a sample of text for good searches, but small enough that I don't (1) get this error or (2) commit tons of db storage to a single files index text.

After some research I set the value at 512K of text, here is the code I inserted into the MHart patch from above:

MW 1.9.3 Problem
I have encountered a problem using this extension with MediaWiki 1.9.3 -> no description was sent to SpecialUpload.php and description field was empty. This was resolved by removing "\r\n" when sending it to $NewDesc

--Erik Heidt, 1 September 2005

FileIndexing does not work correct
I've installed the extension and it works (half).

When I upload a PDF file, some text is beeing inserted in the comments field. It looks correct, but the content is just limited to 255 characters. So when i'm uploading any PDF i'm getting in the comments field something like that:

"<!- - Leitfaden zur Nutzung der MP-Protokolle Seite 1 von 6 05.11.2007 Leitfaden zur Nutzung der MP-Protokolle Das Programm zum Anlegen/Bearbeiten der MP-Protokolle ist in Ferryt an der folgenden Stelle zu finden: Menü +Personal ->MP_Protokoll Abbildung"

The problem is, theres much more text than that. When i'm searching e.g. for "Leitfaden" with the search funciton (with all checkboxes activated) i'm getting nothing. Any idea how to solve this problem?


 * I was having the same problem, but figured out why it happens. It occurs only when you upload a newer version of an existing file, both text in comments field is limited and searching does not work as you pointed out, the reason being old comments page being preserved. Try uploading that pdf file under a different name. It will be indexed correctly and will be searchable.