Extension talk:Lucene-search/LQT Archive 1

Error when editing pages
I followed your tutorial and installed LuceneSearch. All went fine, but when I edit a page, I get this error:

Fatal error: Call to undefined method LuceneSearch::setLimitOffset in /path/to/wiki/includes/SearchEngine.php on line 222

I'm using Mediawiki 1.10.0. Is this a known problem or just a configuration issue? Looks like LuceneSearch.php or LuceneSearch_body.php don't define that function at all. Same with LuceneSearch::update function...


 * You're missing

$wgDisableSearchUpdate = true;
 * in your LocalSettings.php. It should be placed before the require_once statement. --Rainman 17:48, 12 July 2007 (UTC)

Installing Lucene on Windows 2003 Server
Is there a way to install the LuceneSearch under Windows? I Run my wiki on a Windows 2003 Server with XAMPP and I want to use the features of Lucene. I found at http://meta.wikimedia.org/wiki/Installing_lucene_search that wikipedia uses the C# engine of Lucene.

Is there a compiled version of the C# engine to install it on my Apache running on Windows 2003 Server?----stp-- 13:40, 1 August 2007 (UTC)


 * As far as I know, no. --Rainman 09:54, 3 August 2007 (UTC)

I am also interested in a Windows 2003 tutorial for improving MediaWiki search results. Cedarrapidsboy 14:29, 2 August 2007 (UTC)


 * You can use the old C# daemon following tutorial on Installing lucene search. Wikimedia sites used to use this one, but now use to the latest (java) version. The new version could in principle run on windows with some modifications (main problem is usage of symbolic and hard links), but there is no-one around the patch it. --Rainman 09:54, 3 August 2007 (UTC)

RE: Installing Lucene on Windows 2003 Server --jdpond 21:53, 27 August 2007 (UTC)
There is a .dll version available here: http://incubator.apache.org/lucene.net/download/, but I don't know if this helps
 * The problem is not in the lucene itself, but the LSearch daemon, that makes use of linux fs to efficiently fetch new indexes, keep old copies, and swap copies after a background warmup phrase. --Rainman 09:18, 28 August 2007 (UTC)

Missing Method?
I installed everything following the instructions (on MediaWiki 1.10.1), but I'm getting this when I hit the search-button:

Fatal error: Call to undefined method LuceneSearch::getRedirect in /var/www/mediawiki-1.10.1/includes/SpecialPage.php on line 396

Is this a known issue with 1.10.1, or am I missing something? --217.6.3.114 06:34, 6 August 2007 (UTC)


 * No idea, getRedirect is defined in SpecialPage, and LuceneSearch inherits SpecialPage. You might be using some odd php version, or something else might be wrong... --Rainman 10:55, 6 August 2007 (UTC)


 * My PHP- Version is (PHP 5.2.0-8+etch7 (cli) (built: Jul 2 2007 21:46:15)). Do you really think this might be a problem? I believe it is more likely that I forgot something obvious, not mentioned in the instructions. For example: I had to download ExtensionFunctions.php from svn, because it is not shipped with Mediawiki or the Extension. Do I need to register the Extension anywhere other than in LocalSettings.php? --217.6.3.114 12:55, 6 August 2007 (UTC)
 * I've seen people complain about various mediawiki stuff not working with php 5.2, switching back to php 5.1 usually fixes it. But I'm by no means php expert (I mainly do the java part), so I cannot really tell if it would help. If you can, give it a try, and let us know if it helps. --Rainman 16:48, 6 August 2007 (UTC)


 * There seems to be no php 5.1 package available for debian etch, so I guess there's no chance to make search work.--217.6.3.114 12:10, 7 August 2007 (UTC)
 * I submitted a bugreport: http://bugzilla.wikimedia.org/show_bug.cgi?id=10835
 * Yep, seen it .. I still think it might be a php problem, or maybe a broken eAccelerator or something like that... --Rainman 10:33, 21 August 2007 (UTC)
 * Is eAccelerator required for this extension? We do not use it.--217.6.3.114 08:58, 7 September 2007 (UTC)
 * Found the Solution! The problem was incompatibility between the MWSearch-Extension and LuceneSearch. I forgot that MWSearch was still active when I installed LuceneSearch. After deactivating MWSearch the problem was gone. --217.6.3.114 08:05, 11 September 2007 (UTC)

Wildcard Search
Is there a way to use wildcards as described on http://lucene.apache.org/java/docs/queryparsersyntax.html#Wildcard%20Searches? --217.6.3.114 12:50, 12 September 2007 (UTC)


 * Yes. Currently only simple prefixes work (e.g. test*) since I didn't get to test the performance impact of other wildcard schemes. If you want to patch it yourself, look at WikiQueryParser.java around line 669 (function makeQueryFromTokens), you probably want to replace buffer[length-1]=='*' with something that checks if * or ? are anywhere in the buffer. --Rainman 16:23, 12 September 2007 (UTC)

dumpBackup.php causes DB connection error: Unknown error
Following the simple Index creation tutorial "Building the index" I tryed to run php maintenance/dumpBackup.php --current --quiet > wikidb.xml && java -cp LuceneSearch.jar org.wikimedia.lsearch.importer.Importer -s wikidb.xml wikidb But the Script throws the mentioned error. After big trouble and consideration of this script I've found a solution for this/my and our Problem. The Problem exists, because of the for dumpBackup.php required File "includes/backup.inc". This File does the main-backup-work and uses some MediaWiki-Variables($wg...). This is really no Problem, if dumpBackup.php runs with mediaWiki but as standalone console-script, it will miss this $wg..-Parameters. So dumpBackup.php uses empty strings for $wgDBtype,$wgDBadminuser,$wgDBadminpassword,$wgDBname,$wgDebugDumpSql and this causes the DB connection error: Unknown error while running. I've solved this Problem with a self-written php-wrapper-script, which only initializes this Variables and then simply include dumpBackup.php and now it works fine. This is my php-wrapper-script: <?php
 * 1) dumpBackupInit - Wrapper Script to run the mediaWiki xml-dump "dumpBackup.php" correctly
 * 2) @author: Stefan Furcht
 * 3) @version: 1.0
 * 4) @require: /srv/www/htdocs/wiki/maintenance/dumpBackup.php

$wgDBtype = 'mysql'; $wgDBadminuser="[MySQL-Username]"; $wgDBadminpassword ="[MySQL-Usernames-Password]"; $wgDBname = '[mediaWiki-Database-scheme]'; $wgDebugDumpSql='true';
 * 1) The following Variables musst be set, to get dumpBackup.php at work
 * 1) you'll find this Values in the DB-section into your mediaWiki-Config: LocalSettings.php

require_once("/srv/www/htdocs/wiki/maintenance/dumpBackup.php"); ?>
 * 1) XML-Dumper 'dumpBackup.php' requires the setted Vars to run
 * 2) simply include the original dumpBackup-Script

Now you can use this script as like as the dumpBackup.php with exception it will (hopefully) now run correctly. Example:  php dumpBackupInit.php --current > WikiDatabaseDump.xml 

I hope this will help you. Please excuse my properly bad english

Regards -Stefan-
 * dumpBackup.php uses AdminSettings.php (and not LocalSettings.php), so you need to set it up (basically you would rename AdminSettings.sample and fill-in the data). What would be in AdminSettings.php is exactly what you provide in your wrapper, see Manual:System_administration. --Rainman 16:12, 12 September 2007 (UTC)

Questions running lsearch
I notice in lsearch.conf there are a number of variables for the Storage backend:


 * Storage.username
 * Storage.password
 * Storage.defaultDB
 * Storage.lib

etc. Do these need to be modified to my environment, or do they get ignored?


 * These are for the incremental updater (it stores articles rank info). If you don't use it, it gets ignored. --Rainman 17:23, 15 September 2007 (UTC)

Also, lsearch appears to be spawning many Java processes, so (I believe) my hosting provider's software kills the lsearchd process. Does this point to a particular kind of configuration error? Thanks! -David


 * There should be one java process for the indexer/search, and another one if you run a cronjob for the importer. Apart from that, lsearch relies on some external programs (e.g. ln to make symbolic links), so it will make extra processes, but they should finish in a very short time. --Rainman 17:23, 15 September 2007 (UTC)

Thank you very much. Some additional data:  USER      PID %CPU %MEM   VSZ  RSS TTY      STAT START   TIME COMMAND davidk5 24987  0.0  1.0 1208464 43988 pts/0 SN   15:19   0:00 java -Djava.rmi.server.codebase=file://./LuceneSearch.jar -Djava.rmi.server.hostname=coyote.he.net [...] davidk5 13973  0.0  1.0 1208464 43988 pts/0 SN   15:19   0:00 java -Djava.rmi.server.codebase=file://./LuceneSearch.jar -Djava.rmi.server.hostname=coyote.he.net [...] davidk5 20700 19.1  1.0 1208464 43988 pts/0 SN   15:19   0:01 java -Djava.rmi.server.codebase=file://./LuceneSearch.jar -Djava.rmi.server.hostname=coyote.he.net [...] [25 additional lines identical except for PID deleted]
 * The search appears to work, if I make the search request in the 5 seconds or so I have before the application is killed. (This is very exciting.)
 * I include output from ps aux done while the application is running, as well as the final message showing the application being killed:

davidk5@coyote:~/public_html/wiki/lucene-search-2$ ./lsearchd: line 3: 24987 Killed                 java -Djava.rmi.server.codebase=file://$jardir/LuceneSearch.jar -Djava.rmi.server.hostname=$HOSTNAME -jar $jardir/LuceneSearch.jar $* 


 * Based on your previous note, I'm expecting that 28 instances of the running application isn't expected, but I'd appreciate your confirming. I've looked at the -verbose output from the java command line application, but I didn't see any obvious place where these processes were being spawned


 * No, it is not normal. Try running java -jar LuceneSearch.jar (which is what lsearchd should be doing) - you should get exactly one java process. The multiple processes happen only when you try to run it from console? You have confirmed it's not a broken cronjob? --Rainman 10:20, 18 September 2007 (UTC)


 * A final data point: I'm running on a virtual hosting environment, so I may have significant resource constraints.  I asked the system administrator if he had any insights by looking at the syslog, and he replied my Java process is using 1.6G of RAM.  I currently have a "toy" test wiki with an XML dump of 16Kb.


 * That's not normal either. It would probably be around 64mb (together with the java vm). --Rainman 10:20, 18 September 2007 (UTC)

Thank you very much for your help. Your documentation repays careful reading, but has been extremely helpful. Please let me know if I should take this request for support elsewhere.

Dbkayanda 22:36, 17 September 2007 (UTC)