User talk:RaZe

File indexer
Hi RaZe, thank you very much for this. If it covers the same functionality please feel free to replace "mine" with "yours". If you want to, you can add yourself to the developers in the box. I'd be glad, if someone would be willing to continue developing this thing, since I don't work with MediaWiki at work any more. I changed the company and they're running a Foswiki :(( What do you think? --Flominator 18:54, 30 June 2009 (UTC)
 * Hi Flominator. first of all: thx for your message. I am very glad that I didn't start a rumor by editing your extensionpage without asking for permission previously.
 * As I saw in you userpage you are member of 'German Wikipedia'. This and the fact that the first message on you usertalk was in german, können wir auch gern deutsch kommunizieren, was mir unheimlich viel leichter fällt :D Aber da dies eine englisch geführte Seite ist, I think english would be prefered by anyone else... may be a good exercise.
 * To be true: I have no experiences with your companies type of wiki. My first real contact with wikis in general was during my final thesis at university. It was directly connected with the mediawiki-engine...
 * Back to topic: I will release a small change to the extention right now but to change the whole content of the page I will have to find an other day. I will come back to this later. Thx for your trust in my work :-D --Razqubik 14:21, 1 July 2009 (UTC)
 * Thank you for doing it. Can you maybe translate the comments to English? Regards, --Flominator 19:42, 3 July 2009 (UTC)
 * I will do so next time. --Razqubik 07:50, 6 July 2009 (UTC)

Hi RaZe

It looks like you are the only person that can help me with this extension. I saw that you replied on the discussions page, and from above discussion it looks like you took over the extension. I would greatly appreciate some help with this. I think I'm misunderstanding how this extension works.


 * I go to the 'Special:FileIndexer' page.
 * In the edit box I put in the following: 'File:SomeDocument.docx', then I click "create".
 * If I then search for a word I know is in the document, it still finds nothing.

Please Help me,

--Johannekie 16:05, 14 June 2010 Hi. As a first shot try and analyse the following:
 * is 'File' a valid namespace which is enabled in your search configurations / did you enable it on seachtime?
 * please review the content of your article 'File:SomeDocument.docx' - did something change?
 * please post the following values from your LocalSettings.php: globals $wgFiPrefix and $wgFiPostfix
 * set $wgFiCheckSystem to true and check if the requirements are installed

--RaZe 21:19, 24 June 2010 (UTC)

Hi RaZe, the FileInder Extention runs not in MW 1.16.x, Can you fix it? --Swus 12:46, 30 September 2010 (UTC)
 * Hi Swus. I think it will please you that I am really short to a totaly new version of this extension with a much improved spezialpage and much more. It will also run on mw 1.16 . Greetings --RaZe 14:26, 30 September 2010 (UTC)
 * New version (for 1.16) available now. Greetings --RaZe 16:13, 3 October 2010 (UTC)

Hi RaZe, a new MW version (1.18.x) an the Extension:FileIndexer is not running :( Can you fix it? The n flags on the extension page sounds not good. "major security risk" --Swus (talk) 08:23, 17 February 2012 (UTC)
 * Hi Swus,
 * I would love to have a look at this. I cant really say when I will be able to do this. Sorry --RaZe (talk) 10:20, 22 February 2012 (UTC)

Link to a working wiki with FileIndexer?
Hi RaZe,

Thanks for creating this extension. I was wondering if you could provide a link to a wiki that has a working version (perhaps your own). I'm a novice programmer and am trying to find the best way to index pdf files so that they are searchable.

Thanks in advance, Dana (dchandXXler@mit.edu, without the XXs)


 * Hi. I awfully sorry but i dont have any list of public wikis using FileIndexer. I am just running internal wikis - they are using it but you will not be ale to access them. Awfully i dont have any demo system running. :(
 * --RaZe 02:46, 12 January 2012 (UTC)

Hej RaZe,

we are willing to develop and promote a fully functioning demo mediawiki with file-indexer and Tesseract (further goal OCRopus) as an OCR GNU solution and started quite a bit of development already. I'd be thrilled to have you discuss and talk with us for just a brief moment. It would be great to have you as our advisor. Danke schön. CollinBloom2 12:16, 30 January 2012 (UTC)
 * Hi CollinBloom2
 * How can I be of service for you? Gern doch ;) --RaZe 07:44, 31 January 2012 (UTC)


 * Hi RaZe,
 * Thanks for the quick response. We are trying to index and make searchable multiple PDF documents with the help of FileIndexer. We have the documents on the server, and we would like to call FileIndexer functions by script to index
 * all the PDF's in a  given location. We would be very graceful if you cud give us a hint.


 * Hi,
 * if you say "on the server", do you mean in "uploaded in a wikisystem"? What version of mediawiki do you use? Its stated above somewhere that it is not compatible with the latest wiki version - though i myself didnt have the time to check the problem with it so far...
 * I need detailed information about what is your goal here. In general it will be possible to index these files by script if they are not part of the wiki - i am not really using wiki-functions for the pure indexingpart... but if you want to upload them into your wiki at the same time, i wouldnt go that way. In this case do it in two steps... using a multiple-upload-extension and afterwards use the specialpage of fileindexer to index a bunch of files each request (if it are that many do it in parts to not run into server-response timeouts)... just list the filenames (by using wildcards)
 * If you even dont want to upload these files into a wiki (lets say you even dont want the index in it) dont use my extensions functions - i believe there are many index-engines out there way better then this one - dunno.
 * I hope i could help you with it... i will try to carry on the help when you leave me some detailed infos. --RaZe 10:14, 6 February 2012 (UTC)


 * Hi again,
 * We managed to do the programming to upload and ocr the documents via script, we used snoopy class to simulate a browser.
 * We have now just one small problem, the search basic or SphinxSearch that we use does not search the uploaded document's content it just searches by title.
 * If you put a link of the uploaded file to a page it is searchable but we would like to have them searchable right away even if they are not linked to a pages.
 * Is this possible, by some config, or have you encountered this problem?
 * Thank you for the support. --CollinBloom2 11:25, 08 February 2012 2012 (UTC)
 * Hi there,
 * I am awfully sorry, but I am not really sure if I got what you are doing. As I understand you have files uploaded in you wiki now that you want to make content-searchable. OCR sounds like these files are not textbased but picturebased. This is something my extension is not prepared for so far as there is the needed commandline missing in the configs. But you may add that.
 * What i see as a problem is that you dont want any pages created to your files. This extension does not really lead in searching files contents but creates an index to an uploaded file into the content of a page. This in normally the page with the name of the file in the files namespace so that, when you search a keyword of one file you find that page and can directly access the corresponding file.
 * E.g. file "x.txt" has the word "key" in it the extension adds the word "key" to the pagecontent "Files:x.txt" in some form. When now you search for that word "key" the mediawiki build in searchengine (or any other engine that parses the pagecontent directly or indirectly) finds the page "Files.x.txt". It didnt search the files content.
 * You did say you "ocr the documents via script" - by that I interprete you allready got an index... where did you put that?
 * Or am i still wrong about you goals?--RaZe (talk) 10:42, 22 February 2012 (UTC)

A barnstar for you!

 * Danke Dir - das freut mich aber! Ich hab mir mal erlaubt den Stern zu übertragen auf die Version dieser Seite, da Dein Edit meine beiden letzten Posts überschrieben hat (paralleles Posten im Wiki ;-) Es möge der letztere gewinnen!)
 * --RaZe (talk) 12:24, 24 February 2012 (UTC)
 * Sorry RaZe. Und ja klar... gerne --SmartK (talk) 13:49, 24 February 2012 (UTC)
 * Wie auch immer Du das schaffst... es scheint reproduzierbar zu sein :P Ich musste es gerade noch mal korrigieren :D --RaZe (talk) 13:17, 27 February 2012 (UTC)
 * Ich bin unschuldig ;-) --SmartK (talk) 15:56, 27 February 2012 (UTC)

Dumping the full output from pdftotext into the wiki directly?
Hi raZe. This really is an incredible extension. I second the barnstar and am truly thankful you created this. If at all possible, can you please check out the question I wrote about on the extension talk page and let me know if you have a quick suggestion for how to dump the full output from a basic pdftotext operation into the index. I would be immensely appreciative.

I imagine this is actually pretty easy, but I'm at a loss to figure out the place in the code where I could do that.