Extension talk:FileIndexer

Discussions Pre Version 0.4.5.03
Hi everyone.

To make maintenance work to this extension a bit easier for me, I decided to clear this talkpage. All discussions from before this version are placed in a subpage: Discussions Pre Version 0.4.5.03.

So... party on. --RaZe 16:21, 3 October 2010 (UTC)

German signs make code unreadable + File contents not ending with ?>
Hi, thanks for this great extension! I had some trouble getting it to work, because at first I hadn't noticed the 4 pages with the file contents did not end the code with ?>. Second, the line for German only characters caused php to generate an error, apparently the characters were converted to something unreadable by php, after pasting the content in vi. I had to delete that whole part. -Bob


 * Hi Bob. To be honest, I myself haven't seen PHP code not ending with the closing tag till mediawiki... even now this is the only place where I use this feature cause I felt it was normal for mediawiki developers as I found more and more extensions doing this.
 * About the german signs: I will try to find a better way soon/next time. To let others know I will add this to the topic.
 * --RaZe 00:03, 20 November 2010 (UTC)


 * It is apart of our Coding_Conventions because it can cause issues when they are included, and we don't loose anything (benefit wise) if they aren't. Peachey88 00:54, 20 November 2010 (UTC)

Getting no index
Hi I'm running version 0.4.5.03 on MW 1.16.2 and PHP 5.3.3 on Ubuntu 10.10. I've carefully followed all the installation instructions and installed all the required tools. The SpecialPage all works and it gives a list of articles for which the index update process was started. However, when I search for a word that I know is in one of the documents nothing is found. If I go to a file page it shows: :--Mitchelln 11:32, 21 April 2011 (UTC) File Index

The following index was taken from the files content:

Any ideas? How can I debug this? Many thanks. :--Mitchelln 11:32, 21 April 2011 (UTC)
 * Fixed. Of course you need to remove the pre /pre from the FileIndex Template
 * -Mitchelln 11:32, 21 April 2011 (UTC)

Anyone have this working with MediaWiki 1.17.0 and PHP 5.3.6?
MediaWiki	1.17.0 PHP	5.3.6 (cgi-fcgi) MySQL	5.1.48-log

Running with...
Software instalado Extensiones instaladas
 * MediaWiki 	1.17.0
 * PHP 	5.3.2 (apache2handler)
 * MySQL 	5.1.52
 * FCKeditor (Versión 1.0.1)	Permitir edición usando el editor WYSIWYG FCKeditor 	Frederico Caldeira Knabben, Wiktor Walc, others y Jack Phoenix
 * FileIndexer (Versión 0.4.5.03)	Index-Erzeugung aus hochgeladenen Dateien zur Erfassung durch Suchfunktionen 	Ramon Dohle (raZe) | Original: MHart and Flominator
 * poppler-utils

Missing Checkbox "FileIndexer: [ ] Create/update index"
I'm missing the checkbox "FileIndexer: [ ] Create/update index" afer installing FileIndexer. It is shon on the page "Special:Upload" but not on "Special:UploadWindow". How can I make the checkbox appear on "Special:UploadWindow"? - stevewilson 11:45, 18 November 2011 (UTC)

No index...
Added FileIndexer to my MediaWiki installation. Uploaded a .docx file, and tried to search for some text that was inside the file... and I got nothing.

Also, when I try to upload a .doc file, it says MIME type doesn't match file extension!?

Epistasis 16:20, 24 November 2011 (UTC)

Any websites running FileIndexer? (22dec2011)
Hi everybody, this extension seems a bit difficult for me to install. I love the idea of being able to search within pdfs, but don't want to expend all of the effort if the extension isn't working. Does anybody have a link to a wiki I can see where it is working? Thanks in advance
 * Hi, you are right. It's not that easy to install but once it's working it is definitely worth it. I only have it running on 3 internal Wikis. (1.16.and 1.17). FileIndexer works great there! --SmartK 07:09, 23 December 2011 (UTC)
 * Any chance you can provide a link to any of the wikis so that I could get a sense of how it works?

Word documents not working
I've tried to upload and index .odt, .pdf and .doc-Files. The .odt and .pdf files get an index, the .doc file doesn't. Does anyone has a solution? - stevewilson 15:58, 29 December 2011 (UTC)
 * Have you installed "antiword"? And it is set up correctly? e.g. "usr/local/bin/antiword" --SmartK 16:44, 29 December 2011 (UTC)

Any Windows file reader?
This is obviously designed with Linux in mind. But there are still many out there who prefer to use Windows-based web servers. So my question is, does any know if there are any Windows file readers for the file types listed in the article? Jamesjiao 22:49, 31 January 2012 (UTC)
 * Hi Jamesjiao, sorry but I havn't researched this but I am sure there are.
 * Right now i am not really sure if there was anything in the code that would prevent this extension from running on windows. Except obviesly the configuration. May be I will take a look at this when I will try to fix the incompatibility issue with 1.18.x of mediawiki. --RaZe (talk) 12:36, 24 February 2012 (UTC)

FileIndexer fails in MediaWiki 1.18.1 :-( --> NOW working with update
"Fatal error: Call to a member function addMessages on a non-object in /var/www/kcwiki/mykochwiki/extensions/FileIndexer/FileIndexer.php on line 133"
 * This is the error code I get with 1.18.1:
 * Maybe Raze knows something... Would be GREAT!!! --SmartK (talk) 14:53, 22 February 2012 (UTC)
 * Hi SmartK,
 * Swus asked for help on my talkpage allready. As I told him, I will try that out as soon as possible. Sorry I can't say any more right now.--RaZe (talk) 12:31, 24 February 2012 (UTC)
 * Thank you RaZe. We should remove the phrase from the "extension page": "The author of this extension is no longer maintaining it!" I think it's great that you are still trying to help.... and I know it's a lot of work! --SmartK (talk) 13:47, 24 February 2012 (UTC)
 * SmartK did release a new version that works with 1.18.1 (i have not tested it but you may try out now).

Does this extension get all of the text from a pdf?
$wgFiMinWordLen = 1; $wgFiLowercaseIndex = false;
 * This extension seems great and I had no trouble installing it.
 * However, I notice that only some of the words in the pdf I upload go into the wiki. If I take the same file and use "pdftotext name.pdf text.txt" in the commandline, I get a text file with the full content of the pdf. But in the wiki, many words are missing. Is there a switch I'm missing?
 * Ideally, I'd want to just dump in the full-text of a pdf the same way it comes out when I run pdftotext by itself. Is that possible? In case it helps, I am also trying to start with a file with charset=binary. Would that potentially be causing a problem?
 * I looked in the source code but could not find this option yet. But I think it's a good idea to implement this. Maybe raZe can help?!? His idea was at that time that an "index" does only need the word once to be searched but I understand your idea. So let's hope he is willing to put some time into this.
 * You should change the following in the file "FileIndexer_cfg.php". This will not help to solve your problem but you will need this anyway.
 * --SmartK (talk) 07:25, 19 April 2012 (UTC)
 * Yea, I did figure that out. I'd like to get around the removal of duplicate words though. Also, I'd rather preserve the formatting of the original document (as much as possible!). Since the default pdftotext option does exactly what I want, there has to be a way where I could just comment out a lot of the special indexing and just put the text file in directly.


 * I really hope I can find a way to fix this.
 * Hi, if I get you right the following is what you want.
 * Fragment of codefile FileIndexer.php:


 * Note that this solution limits your requirements to pdf files... feel free to add other filetypes.
 * Right now I don't have resources to test this, so if I have typos I hope you may eliminate them. In general I think that should do it.
 * May be I will implement this with an option (by filetype) in the next version (when ever that comes).
 * Regards --RaZe (talk) 11:07, 20 April 2012 (UTC)
 * Thank you RaZe for your fast answer. I also think it would be great to add this option as a variable in the config file (in the future). Let's hope it works and Mr. Anonymous is now happy ;-) --SmartK (talk) 11:18, 20 April 2012 (UTC)
 * Yes, thanks so much RaZe. I can't thank you enough for your kindness. I, Mr. Anonymous/too lazy to reset password am very very happy now! :)

Cant get my index to work
Hey i hope i can get some help, with this problem. I've got the extension to work, i can select the speciel page and, select the files. When i select "Main" namespace and say "Create" i get this message: "For the following list of articles the index creation process was started:" After that i get a new page created with the file name (test.pdf) and a index, but the index is emty and there is no file on the page, and i am not able to search for anything within the pdf file.

if i could get some help i would be very happy :)

regards Mikkel