Extension talk:FileIndexer
[edit] Discussions Pre Version 0.4.5.03
Hi everyone.
To make maintenance work to this extension a bit easier for me, I decided to clear this talkpage.
All discussions from before this version are placed in a subpage: Discussions Pre Version 0.4.5.03.
So... party on.
--RaZe 16:21, 3 October 2010 (UTC)
[edit] German signs make code unreadable + File contents not ending with ?>
Hi, thanks for this great extension! I had some trouble getting it to work, because at first I hadn't noticed the 4 pages with the file contents did not end the code with ?>. Second, the line for German only characters caused php to generate an error, apparently the characters were converted to something unreadable by php, after pasting the content in vi. I had to delete that whole part. -Bob
- Hi Bob. To be honest, I myself haven't seen PHP code not ending with the closing tag till mediawiki... even now this is the only place where I use this feature cause I felt it was normal for mediawiki developers as I found more and more extensions doing this.
- About the german signs: I will try to find a better way soon/next time. To let others know I will add this to the topic.
- --RaZe 00:03, 20 November 2010 (UTC)
-
- It is apart of our Coding_Conventions#PHP_pitfalls because it can cause issues when they are included, and we don't loose anything (benefit wise) if they aren't. Peachey88 00:54, 20 November 2010 (UTC)
[edit] Getting no index
Hi I'm running version 0.4.5.03 on MW 1.16.2 and PHP 5.3.3 on Ubuntu 10.10. I've carefully followed all the installation instructions and installed all the required tools.
The SpecialPage all works and it gives a list of articles for which the index update process was started. However, when I search for a word that I know is in one of the documents nothing is found. If I go to a file page it shows:
:--[[User:Mitchelln|Mitchelln]] 11:32, 21 April 2011 (UTC)
File Index
The following index was taken from the files content:
{{{index}}}
Any ideas? How can I debug this?
Many thanks. :--Mitchelln 11:32, 21 April 2011 (UTC)
- Fixed. Of course you need to remove the pre /pre from the FileIndex Template
- -Mitchelln 11:32, 21 April 2011 (UTC)
[edit] Anyone have this working with MediaWiki 1.17.0 and PHP 5.3.6?
MediaWiki 1.17.0 PHP 5.3.6 (cgi-fcgi) MySQL 5.1.48-log
[edit] Running with...
Software instalado
- MediaWiki 1.17.0
- PHP 5.3.2 (apache2handler)
- MySQL 5.1.52
Extensiones instaladas
- FCKeditor (Versión 1.0.1) Permitir edición usando el editor WYSIWYG FCKeditor Frederico Caldeira Knabben, Wiktor Walc, others y Jack Phoenix
- FileIndexer (Versión 0.4.5.03) Index-Erzeugung aus hochgeladenen Dateien zur Erfassung durch Suchfunktionen Ramon Dohle (raZe) | Original: MHart and Flominator
- poppler-utils
[edit] Missing Checkbox "FileIndexer: [ ] Create/update index"
I'm missing the checkbox "FileIndexer: [ ] Create/update index" afer installing FileIndexer. It is shon on the page "Special:Upload" but not on "Special:UploadWindow". How can I make the checkbox appear on "Special:UploadWindow"? - stevewilson 11:45, 18 November 2011 (UTC)
[edit] No index...
Added FileIndexer to my MediaWiki installation. Uploaded a .docx file, and tried to search for some text that was inside the file... and I got nothing.
Also, when I try to upload a .doc file, it says MIME type doesn't match file extension!?
Epistasis 16:20, 24 November 2011 (UTC)
[edit] Any websites running FileIndexer? (22dec2011)
Hi everybody, this extension seems a bit difficult for me to install. I love the idea of being able to search within pdfs, but don't want to expend all of the effort if the extension isn't working. Does anybody have a link to a wiki I can see where it is working? Thanks in advance
- Hi, you are right. It's not that easy to install but once it's working it is definitely worth it. I only have it running on 3 internal Wikis. (1.16.and 1.17). FileIndexer works great there! --SmartK 07:09, 23 December 2011 (UTC)
- Any chance you can provide a link to any of the wikis so that I could get a sense of how it works?
[edit] Word documents not working
I've tried to upload and index .odt, .pdf and .doc-Files. The .odt and .pdf files get an index, the .doc file doesn't. Does anyone has a solution? - stevewilson 15:58, 29 December 2011 (UTC)
- Have you installed "antiword"? And it is set up correctly? e.g. "usr/local/bin/antiword" --SmartK 16:44, 29 December 2011 (UTC)
[edit] Any Windows file reader?
This is obviously designed with Linux in mind. But there are still many out there who prefer to use Windows-based web servers. So my question is, does any know if there are any Windows file readers for the file types listed in the article? Jamesjiao 22:49, 31 January 2012 (UTC)
- Hi Jamesjiao, sorry but I havn't researched this but I am sure there are.
- Right now i am not really sure if there was anything in the code that would prevent this extension from running on windows. Except obviesly the configuration. May be I will take a look at this when I will try to fix the incompatibility issue with 1.18.x of mediawiki. --RaZe (talk) 12:36, 24 February 2012 (UTC)
[edit] FileIndexer fails in MediaWiki 1.18.1 :-( --> NOW working with update
- This is the error code I get with 1.18.1:
"Fatal error: Call to a member function addMessages() on a non-object in /var/www/kcwiki/mykochwiki/extensions/FileIndexer/FileIndexer.php on line 133"
- Hi SmartK,
- Swus asked for help on my talkpage allready. As I told him, I will try that out as soon as possible. Sorry I can't say any more right now.--RaZe (talk) 12:31, 24 February 2012 (UTC)
- Thank you RaZe. We should remove the phrase from the "extension page": "The author of this extension is no longer maintaining it!" I think it's great that you are still trying to help.... and I know it's a lot of work! --SmartK (talk) 13:47, 24 February 2012 (UTC)
- SmartK did release a new version that works with 1.18.1 (i have not tested it but you may try out now).
- Thank you RaZe. We should remove the phrase from the "extension page": "The author of this extension is no longer maintaining it!" I think it's great that you are still trying to help.... and I know it's a lot of work! --SmartK (talk) 13:47, 24 February 2012 (UTC)
[edit] Does this extension get all of the text from a pdf?
- This extension seems great and I had no trouble installing it.
- However, I notice that only some of the words in the pdf I upload go into the wiki. If I take the same file and use "pdftotext name.pdf text.txt" in the commandline, I get a text file with the full content of the pdf. But in the wiki, many words are missing. Is there a switch I'm missing?
- Ideally, I'd want to just dump in the full-text of a pdf the same way it comes out when I run pdftotext by itself. Is that possible? In case it helps, I am also trying to start with a file with charset=binary. Would that potentially be causing a problem?
- I looked in the source code but could not find this option yet. But I think it's a good idea to implement this. Maybe raZe can help?!? His idea was at that time that an "index" does only need the word once to be searched but I understand your idea. So let's hope he is willing to put some time into this.
- You should change the following in the file "FileIndexer_cfg.php". This will not help to solve your problem but you will need this anyway.
$wgFiMinWordLen = 1; $wgFiLowercaseIndex = false;
-
- Yea, I did figure that out. I'd like to get around the removal of duplicate words though. Also, I'd rather preserve the formatting of the original document (as much as possible!). Since the default pdftotext option does exactly what I want, there has to be a way where I could just comment out a lot of the special indexing and just put the text file in directly.
-
- I really hope I can find a way to fix this.
- Hi, if I get you right the following is what you want.
- Fragment of codefile FileIndexer.php:
- I really hope I can find a way to fix this.
function wfFiGetIndex($sFileHashPath){ ~~~ CUT ~~~ foreach ($sDocText as $sDocLine){ if(in_array($sFileExtension, $wgFiTypesToRemoveTags)){ // Tags entfernen... Vorher vor jedem "<" Leerzeichen einfuegen, damit keine Worte zusammenfallen! $sDocLine = strip_tags(str_replace("<", " <", $sDocLine)); } // *** ADD THIS 1ST SHORT BLOCK IF YOU WANT THE FULL PDF CONTENT AS INDEX if ($sFileExtension = "pdf"){ $sReturn .= $sDocLine; continue; } // *** END OF 1ST BLOCK ~~~CUT~~~ } // *** ADD THIS SHORT 2ND BLOCK IF YOU WANT THE FULL PDF CONTENT AS INDEX if ($sFileExtension = "pdf"){ return $sReturn . $wgFiPostfix; } // *** END OF 2ND BLOCK // Index global setzen... foreach(array_keys($aIndex) as $skeyword){ $sReturn .= $skeyword . " "; } $sReturn .= $wgFiPostfix; } return $sReturn; }
-
-
- Note that this solution limits your requirements to pdf files... feel free to add other filetypes.
- Right now I don't have resources to test this, so if I have typos I hope you may eliminate them. In general I think that should do it.
- May be I will implement this with an option (by filetype) in the next version (when ever that comes).
- Regards --RaZe (talk) 11:07, 20 April 2012 (UTC)
- Thank you RaZe for your fast answer. I also think it would be great to add this option as a variable in the config file (in the future). Let's hope it works and Mr. Anonymous is now happy ;-) --SmartK (talk) 11:18, 20 April 2012 (UTC)
- Yes, thanks so much RaZe. I can't thank you enough for your kindness. I, Mr. Anonymous/too lazy to reset password am very very happy now! :)
-
[edit] Cant get my index to work
Hey i hope i can get some help, with this problem. I've got the extension to work, i can select the speciel page and, select the files. When i select "Main" namespace and say "Create" i get this message: "For the following list of articles the index creation process was started:" After that i get a new page created with the file name (test.pdf) and a index, but the index is emty and there is no file on the page, and i am not able to search for anything within the pdf file.
if i could get some help i would be very happy :)
regards Mikkel
- Hi Mikkel, there is no file on this page. Thats correct. I dont link files to the created pages, these are still only linked in the images namespace (or files namespace, which is an alias).
- That you reach the spezialpage does only say that the extensions code is reachable. One Problem (most certain) can be that the required tool are not reachable (see Extension:FileIndexer#Requirements) May be you have to adjust $wgFiCommandPaths in the file FileIndexer_cfg.php
- Did you try to set $wgFiCheckSystem = true in file FileIndexer_cfg.php? Please send me the result.
- Did you follow all instructions from the install description? --RaZe (talk) 14:18, 7 May 2012 (UTC)
-
- Hi i got all the other programs installed and working, and they should be right in CommandPaths.
- can you tell me where i read the output from "$wgFiCheckSystem"?
- Hi, the output is written in the specialpage when you call it and $wgFiCheckSystem is set to true. But only if the command is not reachable - what I am not really sure about right now is, if this really works as intended (didnt really test this). I will asap check if the which command really does what i expect it to do... I will be back on this --RaZe (talk) 11:53, 15 May 2012 (UTC)
- Hi thats sounds good,, i really hope you can help me, getting this plugin to work. ¨
- Mikkel 14:00
- Hi, the output is written in the specialpage when you call it and $wgFiCheckSystem is set to true. But only if the command is not reachable - what I am not really sure about right now is, if this really works as intended (didnt really test this). I will asap check if the which command really does what i expect it to do... I will be back on this --RaZe (talk) 11:53, 15 May 2012 (UTC)
[edit] this extension changes my layout when in edit model
i like this extension which can index docs,some docx ppt pdf and excel(i do not kown why it is not working on some files, may be the /usr/bin/xxx's usage has different syntax?). The biggest problem is when i use this extension, my layout changes when in edit model.like this: wrong layout
but when i ignored this extension in LocalSettings.php, things turn normal. like this: good_layout
- Hi, never heared of that problem (layoutchanges) and i myself didnt expirienced it so far... i will have to look into the code for this if there are some html bugs (this may be)
- but for your 2nd problem (some docs not indexed): this might be a problem with mime-type recognition. I have a small hack/fix for this posted earlier (though i dont really know it this is still the actual problem because i still run an ols wiki version). Look on my user page for the link --RaZe (talk) 12:09, 15 May 2012 (UTC)