Extension talk:Lucene-search/LQT Archive 1

Error running the Deamon
RMI registry started. Trying config file at path /root/.lsearch.conf Trying config file at path /var/www/vhosts/kidneycancerknol.com/httpdocs/Lucene/ls2-bin/lsearch.conf 0   [main] INFO  org.wikimedia.lsearch.util.Localization  - Reading localization for En 530  [main] INFO  org.wikimedia.lsearch.util.UnicodeDecomposer  - Loaded unicode decomposer 602 [main] INFO  org.wikimedia.lsearch.interoperability.RMIServer  - RMIMessenger bound 619 [main] ERROR org.wikimedia.lsearch.search.SearcherCache  - I/O Error opening index at path /var/www/vhosts/kidneycancerknol.com/httpdocs/Lucene/ls2-bin/indexes/search/kck_wiki : /var/www/vhosts/kidneycancerknol.com/httpdocs/Lucene/ls2-bin/indexes/search/kck_wiki/segments (No such file or directory) 621 [main] ERROR org.wikimedia.lsearch.search.SearcherCache  - I/O Error opening index at path /var/www/vhosts/kidneycancerknol.com/httpdocs/Lucene/ls2-bin/indexes/search/kck_wiki : /var/www/vhosts/kidneycancerknol.com/httpdocs/Lucene/ls2-bin/indexes/search/kck_wiki/segments (No such file or directory) 621 [main] WARN  org.wikimedia.lsearch.search.SearcherCache  - I/O error warming index for kck_wiki 621 [Thread-3] INFO  org.wikimedia.lsearch.frontend.SearchServer  - Binding server to port 8123 623 [Thread-2] INFO  org.wikimedia.lsearch.frontend.HTTPIndexServer  - Started server at port 8321
 * 1)  . lsearchd

I'm getting this error saying no file or directory. The directory exists, owever I don't know where the "segments" file comes from

I ran this to create the indexes

php maintenance/dumpBackup.php --current --quiet > wikidb.xml && java -cp LuceneSearch.jar org.wikimedia.lsearch.importer.Importer -s wikidb.xml kck_wiki

The wikidb.xml file exists in the httpdocs directory

...and then I started the deamon

Am I missing a trick?

Thanks

Andy


 * And what is the output from the importer? It should give you a success messages that it created the indexes and successfully made a snapshot. --Rainman 01:30, 20 February 2008 (UTC)

I'm most likely doing something dumb (being a bit of a newbie) but This is what I get when I just run the java -cp LuceneSearch.jar org.wikimedia.lsearch.importer.Importer -s wikidb.xml kck_wiki

Exception in thread "main" java.lang.NoClassDefFoundError: org/wikimedia/lsearch/importer/Importer

--Andy 17:00, 20 February 2008 (GMT)


 * The java command you're running assumes that LuceneSearch.jar is in your current directory, the full command would be java -cp /full/path/to/LuceneSearch.jar org.wikimedia.lsearch.importer.Importer -s wikidb.xml kck_wiki

--Rainman 18:04, 20 February 2008 (UTC)

I'm getting further thanks that helped. Sorry - I'm being dumb I know and I apologise for asking you to hand hold me in this way but I now get this

rying config file at path /root/.lsearch.conf Trying config file at path /var/www/vhosts/kidneycancerknol.com/httpdocs/lsearch.conf 0   [main] INFO  org.wikimedia.lsearch.util.UnicodeDecomposer  - Loaded unicode decomposer 3   [main] INFO  org.wikimedia.lsearch.util.Localization  - Reading localization for En 60   [main] INFO  org.wikimedia.lsearch.ranks.RankBuilder  - First pass, getting a list of valid articles... 175 [main] FATAL org.wikimedia.lsearch.ranks.RankBuilder  - I/O error reading dump while getting titles from wikidb.xml 175 [main] INFO  org.wikimedia.lsearch.ranks.RankBuilder  - Second pass, calculating article links... 179 [main] FATAL org.wikimedia.lsearch.ranks.RankBuilder  - I/O error reading dump while calculating ranks for from wikidb.xml Exception in thread "main" java.lang.NullPointerException at org.wikimedia.lsearch.importer.Importer.main(Importer.java:114)

Do I need to set the OIA settings in the global config? I've just kept them s the default. --Andy 18:30, 20 February 2008 (GMT)


 * No, you don't need oai.. Seems to me something is wrong with the xml file .. sure would be helpful if exception weren't suppressed :\ unfortunately cannot help you much more than that.. is wikidb.xml a valid xml file? did you give full path to it? --Rainman 01:00, 21 February 2008 (UTC)

Error when editing pages
I followed your tutorial and installed LuceneSearch. All went fine, but when I edit a page, I get this error:

Fatal error: Call to undefined method LuceneSearch::setLimitOffset in /path/to/wiki/includes/SearchEngine.php on line 222

I'm using Mediawiki 1.10.0. Is this a known problem or just a configuration issue? Looks like LuceneSearch.php or LuceneSearch_body.php don't define that function at all. Same with LuceneSearch::update function...


 * You're missing

$wgDisableSearchUpdate = true;
 * in your LocalSettings.php. It should be placed before the require_once statement. --Rainman 17:48, 12 July 2007 (UTC)

Installing Lucene on Windows 2003 Server
Is there a way to install the LuceneSearch under Windows? I Run my wiki on a Windows 2003 Server with XAMPP and I want to use the features of Lucene. I found at http://meta.wikimedia.org/wiki/Installing_lucene_search that wikipedia uses the C# engine of Lucene.

Is there a compiled version of the C# engine to install it on my Apache running on Windows 2003 Server?----stp-- 13:40, 1 August 2007 (UTC)


 * As far as I know, no. --Rainman 09:54, 3 August 2007 (UTC)

I am also interested in a Windows 2003 tutorial for improving MediaWiki search results. Cedarrapidsboy 14:29, 2 August 2007 (UTC)


 * You can use the old C# daemon following tutorial on Installing lucene search. Wikimedia sites used to use this one, but now use to the latest (java) version. The new version could in principle run on windows with some modifications (main problem is usage of symbolic and hard links), but there is no-one around the patch it. --Rainman 09:54, 3 August 2007 (UTC)

Could you explain, how to compile old C# daemon under windows with Mono? There is no "make" and "make install" commands under Windows :((( --Konstbel 09:04, 31 March 2008 (UTC)

RE: Installing Lucene on Windows 2003 Server --jdpond 21:53, 27 August 2007 (UTC)
There is a .dll version available here: http://incubator.apache.org/lucene.net/download/, but I don't know if this helps
 * The problem is not in the lucene itself, but the LSearch daemon, that makes use of linux fs to efficiently fetch new indexes, keep old copies, and swap copies after a background warmup phrase. --Rainman 09:18, 28 August 2007 (UTC)

Missing Method?
I installed everything following the instructions (on MediaWiki 1.10.1), but I'm getting this when I hit the search-button:

Fatal error: Call to undefined method LuceneSearch::getRedirect in /var/www/mediawiki-1.10.1/includes/SpecialPage.php on line 396

Is this a known issue with 1.10.1, or am I missing something? --217.6.3.114 06:34, 6 August 2007 (UTC)


 * No idea, getRedirect is defined in SpecialPage, and LuceneSearch inherits SpecialPage. You might be using some odd php version, or something else might be wrong... --Rainman 10:55, 6 August 2007 (UTC)


 * My PHP- Version is (PHP 5.2.0-8+etch7 (cli) (built: Jul 2 2007 21:46:15)). Do you really think this might be a problem? I believe it is more likely that I forgot something obvious, not mentioned in the instructions. For example: I had to download ExtensionFunctions.php from svn, because it is not shipped with Mediawiki or the Extension. Do I need to register the Extension anywhere other than in LocalSettings.php? --217.6.3.114 12:55, 6 August 2007 (UTC)
 * I've seen people complain about various mediawiki stuff not working with php 5.2, switching back to php 5.1 usually fixes it. But I'm by no means php expert (I mainly do the java part), so I cannot really tell if it would help. If you can, give it a try, and let us know if it helps. --Rainman 16:48, 6 August 2007 (UTC)


 * There seems to be no php 5.1 package available for debian etch, so I guess there's no chance to make search work.--217.6.3.114 12:10, 7 August 2007 (UTC)
 * I submitted a bugreport: http://bugzilla.wikimedia.org/show_bug.cgi?id=10835
 * Yep, seen it .. I still think it might be a php problem, or maybe a broken eAccelerator or something like that... --Rainman 10:33, 21 August 2007 (UTC)
 * Is eAccelerator required for this extension? We do not use it.--217.6.3.114 08:58, 7 September 2007 (UTC)
 * Found the Solution! The problem was incompatibility between the MWSearch-Extension and LuceneSearch. I forgot that MWSearch was still active when I installed LuceneSearch. After deactivating MWSearch the problem was gone. --217.6.3.114 08:05, 11 September 2007 (UTC)

Wildcard Search
Is there a way to use wildcards as described on http://lucene.apache.org/java/docs/queryparsersyntax.html#Wildcard%20Searches? --217.6.3.114 12:50, 12 September 2007 (UTC)


 * Yes. Currently only simple prefixes work (e.g. test*) since I didn't get to test the performance impact of other wildcard schemes. If you want to patch it yourself, look at WikiQueryParser.java around line 669 (function makeQueryFromTokens), you probably want to replace buffer[length-1]=='*' with something that checks if * or ? are anywhere in the buffer. --Rainman 16:23, 12 September 2007 (UTC)

Query String Syntax
Please document the subset of Lucene query string syntax that has been implemented. -- 216.143.51.66 22:52, 8 February 2008 (UTC)

dumpBackup.php causes DB connection error: Unknown error
Following the simple Index creation tutorial "Building the index" I tryed to run php maintenance/dumpBackup.php --current --quiet > wikidb.xml && java -cp LuceneSearch.jar org.wikimedia.lsearch.importer.Importer -s wikidb.xml wikidb But the Script throws the mentioned error. After big trouble and consideration of this script I've found a solution for this/my and our Problem. The Problem exists, because of the for dumpBackup.php required File "includes/backup.inc". This File does the main-backup-work and uses some MediaWiki-Variables($wg...). This is really no Problem, if dumpBackup.php runs with mediaWiki but as standalone console-script, it will miss this $wg..-Parameters. So dumpBackup.php uses empty strings for $wgDBtype,$wgDBadminuser,$wgDBadminpassword,$wgDBname,$wgDebugDumpSql and this causes the DB connection error: Unknown error while running. I've solved this Problem with a self-written php-wrapper-script, which only initializes this Variables and then simply include dumpBackup.php and now it works fine. This is my php-wrapper-script: <?php
 * 1) dumpBackupInit - Wrapper Script to run the mediaWiki xml-dump "dumpBackup.php" correctly
 * 2) @author: Stefan Furcht
 * 3) @version: 1.0
 * 4) @require: /srv/www/htdocs/wiki/maintenance/dumpBackup.php

$wgDBtype = 'mysql'; $wgDBadminuser="[MySQL-Username]"; $wgDBadminpassword ="[MySQL-Usernames-Password]"; $wgDBname = '[mediaWiki-Database-scheme]'; $wgDebugDumpSql='true';
 * 1) The following Variables musst be set, to get dumpBackup.php at work
 * 1) you'll find this Values in the DB-section into your mediaWiki-Config: LocalSettings.php

require_once("/srv/www/htdocs/wiki/maintenance/dumpBackup.php"); ?>
 * 1) XML-Dumper 'dumpBackup.php' requires the setted Vars to run
 * 2) simply include the original dumpBackup-Script

Now you can use this script as like as the dumpBackup.php with exception it will (hopefully) now run correctly. Example:  php dumpBackupInit.php --current > WikiDatabaseDump.xml 

I hope this will help you. Please excuse my properly bad english

Regards -Stefan-
 * dumpBackup.php uses AdminSettings.php (and not LocalSettings.php), so you need to set it up (basically you would rename AdminSettings.sample and fill-in the data). What would be in AdminSettings.php is exactly what you provide in your wrapper, see Manual:System_administration. --Rainman 16:12, 12 September 2007 (UTC)

Thank you very much. I've never read what 'AdminSettings.php' exactly does. By setting this vars, it works finde. So you can delete my "wrapper script" from this discussion page. But perhaps it's usefull to mention explicitly on the extension page that 'AdminSettings.php' musst be set to run 'dumpBackup.php', because somebody may never had to issue on this file before. Thanks for this very great extension. -Stefan- 79.211.199.66 08:14, 20 September 2007 (UTC)

lsearchd killed in virtual hosting environment
When running lsearchd in a virtual hosting environment, it would work for 10-20 seconds or so, then it would fail with the message "killed." Thanks to Rainman's help, I verified that the resource requirements of the application exceeded the capacity available in the virtual hosting environment (whether it was the size of the JVM or number of threads, I was never sure.) It runs fine and with modest resource requirements on a dedicated server. Dbkayanda 20:44, 14 October 2007 (UTC)

Also, I notice in lsearch.conf there are a number of variables for the Storage backend:


 * Storage.username
 * Storage.password

etc. Do these need to be modified to my environment, or do they get ignored?


 * These are for the incremental updater (it stores articles rank info). If you don't use it, it gets ignored. --Rainman 17:23, 15 September 2007 (UTC)

Error while initially creating index
I am trying to get the LuceneSearch-Extension running on a mediawiki-1.11.0rc1 installation under opensuse10.2. LuceneSearch.jar and mwdumper.jar were generated from svn sources with ant and javac-version 1.5.0_12. I followed the instructions, but when I try to build the index, I get a Null-pointer exception:

me@mypc:~/var/lucene> java -cp ~/bin/lucene-search-2/LuceneSearch.jar org.wikimedia.lsearch.importer.Importer -s wikidb_TEST.xml wikidb_TEST MediaWiki Lucene search indexer - index builder from xml database dumps. Trying config file at path /home/muenzebrock/.lsearch.conf 0   [main] INFO  org.wikimedia.lsearch.util.UnicodeDecomposer  - Loaded unicode decomposer 8   [main] INFO  org.wikimedia.lsearch.ranks.RankBuilder  - First pass, getting a list of valid articles... 324 pages (1.213,483/sec), 324 revs (1.213,483/sec) 316 [main] INFO  org.wikimedia.lsearch.ranks.RankBuilder  - Second pass, calculating article links... 375 [main] INFO  org.wikimedia.lsearch.util.Localization  - Reading localization for En 377  [main] WARN  org.wikimedia.lsearch.util.Localization  - Error processing message file at file:///srv/www/htdocs/php/mediawiki1.11.0rc1/languages/messages/MessagesEn.php 378 [main] WARN  org.wikimedia.lsearch.util.Localization  - Could not load localization for En 324 pages (2.677,686/sec), 324 revs (2.677,686/sec) 465 [main] INFO  org.wikimedia.lsearch.importer.Importer  - Third pass, indexing articles... Exception in thread "main" java.lang.NullPointerException at java.io.File. (File.java:194) at org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:117) at org.apache.lucene.index.IndexWriter. (IndexWriter.java:204) at org.wikimedia.lsearch.importer.SimpleIndexWriter.openIndex(SimpleIndexWriter.java:67) at org.wikimedia.lsearch.importer.SimpleIndexWriter. (SimpleIndexWriter.java:49) at org.wikimedia.lsearch.importer.DumpImporter. (DumpImporter.java:39) at org.wikimedia.lsearch.importer.Importer.main(Importer.java:128)

I played with the Indexes.path-variable in lsearch.conf, but with no luck.
 * Do you have permissions to write to directory you set as Indexes.path in /home/muenzebrock/.lsearch.conf ? --Rainman 14:13, 19 September 2007 (UTC)
 * Yes. For debugging, I set it to be world-writable. --205.175.225.24 14:20, 19 September 2007 (UTC)
 * You can do imports only at the indexer, so, did you set your lsearch-global.conf right? i.e. assign the index wikidb_TEST to your host mypc (not localhost or 127.0.0.1) in the Index section? --Rainman 14:47, 19 September 2007 (UTC)
 * This is the part of lsearch-global.conf that I touched (i.e. the rest is similar to the file in svn):

[Database] wikidb_TEST : (single) (language,de) (warmup,100) [Search-Group] oblak : wikidb_TEST [Index] oblak : wikidb_TEST
 * 1) databases can be writen as {url}, where url contains list of dbs
 * 1) wikilucene : (single) (language,en) (warmup,0)
 * 2) wikidev : (single) (language,sr)
 * 3) wikilucene : (nssplit,3) (nspart1,[0]) (nspart2,[4,5,12,13]), (nspart3,[])
 * 4) wikilucene : (language,en) (warmup,10)
 * 1) Search groups
 * 2) Index parts of a split index are always taken from the node's group
 * 3) host : db1.part db2.part
 * 4) Mulitple hosts can search multiple dbs (N-N mapping)
 * 1) oblak : wikilucene wikidev
 * 1) Index nodes
 * 2) host: db1.part db2.part
 * 3) Each db.part can be indexed by only one host
 * 1) oblak: wikilucene wikidev


 * Now I seem to recognize my failure: I should have replaced oblak with my hostname, right? I was wondering what this should mean anyway ;-) Thanks for your quick help on this. --205.175.225.24 15:00, 19 September 2007 (UTC)

This error can also occur if you follow the installation instructions exactly and use a FQDN in the [Search-Group] and [Index] sections. Use only the hostname part of the $HOSTNAME, omitting the domain name part, if it is included. -- 216.143.51.66 15:54, 7 February 2008 (UTC)

Hi I've got an similar error :

root@rainbow:/usr/local/search/ls2 # java -cp LuceneSearch.jar org.wikimedia.lsearch.importer.Importer -s /srv/www/htdocs/mwiki/wikidb.xml wikidb MediaWiki Lucene search indexer - index builder from xml database dumps. Trying config file at path /root/.lsearch.conf Trying config file at path /usr/local/search/ls2/lsearch.conf 1   [main] INFO  org.wikimedia.lsearch.util.UnicodeDecomposer  - Loaded unicode decomposer 15  [main] INFO  org.wikimedia.lsearch.util.Localization  - Reading localization for De 507  [main] INFO  org.wikimedia.lsearch.ranks.RankBuilder  - First pass, getting a list of valid articles... 114 pages (118.626/sec), 114 revs (118.626/sec) 1666 [main] INFO org.wikimedia.lsearch.ranks.RankBuilder  - Second pass, calculating article links... 114 pages (428.571/sec), 114 revs (428.571/sec) 2044 [main] INFO org.wikimedia.lsearch.importer.Importer  - Third pass, indexing articles... Exception in thread "main" java.lang.NullPointerException at java.io.File. (File.java:194) at org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:117) at org.apache.lucene.index.IndexWriter. (IndexWriter.java:204) at org.wikimedia.lsearch.importer.SimpleIndexWriter.openIndex(SimpleIndexWriter.java:67) at org.wikimedia.lsearch.importer.SimpleIndexWriter. (SimpleIndexWriter.java:49) at org.wikimedia.lsearch.importer.DumpImporter. (DumpImporter.java:39) at org.wikimedia.lsearch.importer.Importer.main(Importer.java:128) My configs :

root@rainbow:/usr/local/search/ls2 # cat lsearch-global.conf | grep ^[^#] [Database] wikidb : (single) (language,de) (warmup,10) [Search-Group] rainbow : wikidb [Index] rainbow : wikidb [Index-Path] : /usr/local/search/indexes [OAI] wikidd : http://rainbow.local.com/mwiki/index.php [Properties] Database.suffix=itowiki_ ExactCase.suffix=itowiki_ [Namespace-Prefix] all : [0] : 0 [1] : 1 [2] : 2 [3] : 3 [4] : 4 [5] : 5 [6] : 6 [7] : 7 [8] : 8 [9] : 9 [10] : 10 [11] : 11 [12] : 12 [13] : 13 [14] : 14 [15] : 15

and the other config :

root@rainbow:/usr/local/search/ls2 # cat lsearch.conf | grep ^[^#] MWConfig.global=file:///usr/local/search/ls2/lsearch-global.conf MWConfig.lib=/usr/local/search/ls2/lib Indexes.path=/usr/local/search/indexes Search.updateinterval=1 Search.updatedelay=0 Search.checkinterval=30 Index.snapshotinterval=5 Index.maxqueuecount=5000 Index.maxqueuetimeout=12 Storage.master=rainbow Storage.username=root Storage.password=mysecret Storage.adminuser=root Storage.adminpass=mysecret Storage.useSeparateDBs=false Storage.defaultDB=lsearch Storage.lib=/usr/local/search/ls2/sql SearcherPool.size=3 Localization.url=file:///srv/www/htdocs/mwiki/languages/messages Logging.logconfig=/usr/local/search/ls2/lsearch.log4j Logging.debug=false

and finally : root@rainbow:/usr/local/search/ls2 # cat lsearch.log4j | grep ^[^#] log4j.rootLogger=INFO, A1 log4j.appender.A1=org.apache.log4j.ConsoleAppender log4j.appender.A1.layout=org.apache.log4j.PatternLayout log4j.appender.A1.layout.ConversionPattern=%-4r [%t] %-5p %c %x - %m%n Kind regards Stefan

Multiple wikis in one database
Is there a way to index and search multiple wikis that are contained within one database? I've tried a few things in the configuration and command lines, and I've not figured out a way to do this.

Thanks! --Laduncan 16:31, 8 October 2007 (UTC)


 * If you want to get search results combined from multiple wikis, that is still not supported (as of v2.0). Next minor release might show some improvements in that direction.. --Rainman 16:55, 8 October 2007 (UTC)


 * Thanks for the quick info! --Laduncan 20:31, 8 October 2007 (UTC)

Requiring less exact matches
It appears that the search in the fulltext is doing an implicit AND -- that is, all the words need to be in the document for it to appear in the results list.

For what I'm doing, I'd like to have the default be "OR," and let the ranking algorithm hopefully bring the most relevant content to the top. (The queries my users will be using will be long and complex, and will generally match nothing with "AND.")

I can manually search with OR between the words, but I wanted to know if I could change the configuration of the extension to have it do that by default.

Thanks in advance, Dbkayanda 00:57, 15 October 2007 (UTC)


 * Personally, I think ranking is not smart enough to give best results if the default operator is OR, but you can change it with hacking the code a bit. In WikiQueryParser.java, on line 112 there is:, replace the last part with  . --Rainman 14:41, 15 October 2007 (UTC)


 * Worked like a charm. Thanks, as always, for your help.

Index of attachments (doc, pdf, xls)
Hi Robert,

I found the cool mediawiki extension for the lucene search engine. Is there a possibility to index all attachments like PDF, HTML, DOC and XLS with this addon?

I found some informations in the lucene faq - http://wiki.apache.org/lucene-java/LuceneFAQ#head-37523379241b88fd90bcd1de81b74e7ec8843f72 - how to index attachments. Is it able to use such indexed files with the mediawiki extension you wrote?

Thanks a lot! Alex--14:51, 22 October 2007 (UTC)


 * Yes, there are libraries that can parse pdf, doc,.. that work with lucene, but I haven't got around to include them in the extension yet, and I probably won't have time in next few months ... If you really need it, you can try to hack it yourself, you would probably want Importer to fetch the media file (maybe with ?action=raw), and then construct an Article object whose contents would be the parsed text and pass it to the indexer. --Rainman 21:08, 22 October 2007 (UTC)


 * Were all namespaces indexed in the current LuceneSearch extension? Also the namespace image that contains all file-data? Does the extension then only index the recent file description? Where I have to start in the LuceneSearch_body.php ?

Thanks! Alex --12:06, 23 October 2007 (UTC)

All articles from the database get indexed. LuceneSearch_body.php is just an interface for the java daemon that does all the work. So, you'll need to modify the java code. What currently gets indexed is just the image descriptions, the media files themself are stored outside the database, in the file system... --Rainman 10:20, 23 October 2007 (UTC)

Binary version of LuceneSearch.jar?
Hello,

Where can I get a binary version of LuceneSearch.jar? I don't have ant on the server this is being installed on, and I tried building LuceneSearch.jar on my desktop computer using ant, but it failed with errors about missing MediaWiki Java classes. I'd prefer a binary, if possible, so I can get this up and running ASAP.

Ben

Soundex searches?
Will this extension support Soundex like searches for spelling mistakes etc..?
 * Probably in the next major release (hopefully end of january). --Rainman 14:11, 13 December 2007 (UTC)

Port 8123 already in use.
Hi again,

I'm still trying to make it run. I've found that most of the problems are due to an ill configuration of my part. Java error messages at first are not very helpful, but that is just the case with any new functionality one comes across.

When I tried to run. It came up with this.

java.rmi.ConnectIOException: error during JRMP connection establishment; nested exception is: java.net.SocketTimeoutException: Read timed out at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:286) at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:184) at sun.rmi.server.UnicastRef.newCall(UnicastRef.java:322) at sun.rmi.registry.RegistryImpl_Stub.rebind(Unknown Source) at org.wikimedia.lsearch.interoperability.RMIServer.register(RMIServer.java:24) at org.wikimedia.lsearch.interoperability.RMIServer.bindRMIObjects(RMIServer.java:60) at org.wikimedia.lsearch.config.StartupManager.main(StartupManager.java:52) Caused by: java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) at java.io.DataInputStream.readByte(DataInputStream.java:248) at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:228)

further down, it came up with this message: 120488 [Thread-1] FATAL org.wikimedia.lsearch.frontend.HTTPIndexServer - Dying: bind error: Address already in use

Has anybody seen this? I still think is a trivial error from my part, but I still cannot find the cause of the error. --Cartoro 00:00, 20 December 2007 (UTC)
 * The above is RMI complaining it cannot register the networked objects. That should be harmless unless you're using distributed searching. About the below, seems to be what it says: some other app is using the ports (the searcher is by default on 8123, and indexer on 8321) - make sure you don't have any old version of lsearchd still running. Use command: nmap localhost to find out which ports are taken. If those default ports are taken by other apps, change them in lsearch.conf, and in LocalSettings.php ... --Rainman 13:36, 20 December 2007 (UTC)

Special page search complains about "problem with wiki search"
After following, as close as possible instructions. Plugin renders special page as such:

[ search_string on text area     ] [   dropdown_list ] [search_button]  There was a problem with the wiki search. This is probably temporary; try again in a few moments, or you can search the wiki through an external search service:

Content in square brackets are just my attempt to recreate the gui.

Is there something missing in the way it is using the host to do the search? --Cartoro 00:00, 20 December 2007 (UTC)
 * Check your log files for more info about what went wrong ... --Rainman 18:31, 20 December 2007 (UTC)
 * Yes, I wanted to see that... but I couldn't find any log files.... sorry, silly question, but where are they? Could this be a problem with accessing the actual DB? --Cartoro 22:11, 20 December 2007 (UTC)
 * Extension:LuceneSearch --Rainman 22:17, 20 December 2007 (UTC)

Error when running
I am getting the following error when running :

53664-jpbaello:/srv/www/htdocs/search/ls2 # ./lsearchd RMI registry started. Trying config file at path /root/.lsearch.conf Trying config file at path /srv/www/htdocs/search/ls2/lsearch.conf Error resolving local hostname. Make sure that hostname is setup correctly. java.net.UnknownHostException: 53664-jpbaello: 53664-jpbaello at java.net.InetAddress.getLocalHost(InetAddress.java:1346) at org.wikimedia.lsearch.config.GlobalConfiguration.determineInetAddress(GlobalConfiguration.java:124) at org.wikimedia.lsearch.config.GlobalConfiguration. (GlobalConfiguration.java:102) at org.wikimedia.lsearch.config.GlobalConfiguration.getInstance(GlobalConfiguration.java:112) at org.wikimedia.lsearch.config.Configuration. (Configuration.java:105) at org.wikimedia.lsearch.config.Configuration.open(Configuration.java:68) at org.wikimedia.lsearch.config.StartupManager.main(StartupManager.java:39) Exception in thread "main" java.lang.NullPointerException at java.util.Hashtable.get(Hashtable.java:336) at org.wikimedia.lsearch.config.GlobalConfiguration.makeIndexIdPool(GlobalConfiguration.java:468) at org.wikimedia.lsearch.config.GlobalConfiguration.read(GlobalConfiguration.java:413) at org.wikimedia.lsearch.config.GlobalConfiguration.readFromURL(GlobalConfiguration.java:247) at org.wikimedia.lsearch.config.Configuration. (Configuration.java:116) at org.wikimedia.lsearch.config.Configuration.open(Configuration.java:68) at org.wikimedia.lsearch.config.StartupManager.main(StartupManager.java:39)

And then it goes back to the command prompt I believe this is an error because I can not get it to create the index. A little new to this though and not sure if I am doing things right. Also, Sorry if I am not putting this in right either! Any ideas?


 * As the error message suggests, your hostname seems to be wrong. Is "53664-jpbaello" really your hostname? Use "echo $HOSTNAME" to verify this. Check if this hostname correctly maps to your IP in /etc/hosts. Or, try using your IP instead of your hostname. --Rainman 12:21, 22 December 2007 (UTC)

Compiling to create lucenesearch.jar failed
I am trying to install the lucene engine for our wiki but the compile of lucene fails.

Ant gives back a lot of error messages during the compilation, errors like:

[javac] /root/lucene/lucene-search/org/wikimedia/lsearch/SearchState.java:331: cannot find symbol [javac] symbol : class Hits [javac] location: class org.wikimedia.lsearch.SearchState [javac]            Hits hits = searcher.search(new TermQuery( [javac]                ^ [javac] /root/lucene/lucene-search/org/wikimedia/lsearch/SearchState.java:331: cannot find symbol [javac] symbol : class TermQuery [javac] location: class org.wikimedia.lsearch.SearchState [javac]            Hits hits = searcher.search(new TermQuery( [javac]                                                ^ [javac] /root/lucene/lucene-search/org/wikimedia/lsearch/SearchState.java:332: cannot find symbol [javac] symbol : class Term [javac] location: class org.wikimedia.lsearch.SearchState [javac]                            new Term("key", key))); [javac]                                    ^ [javac] Note: Some input files use unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. [javac] 85 errors

Can you help me to solve these error messages or provide a binary?

Many thanks in advance.


 * Are you compiling with a Sun Java 1.5+ compiler? If so, can you provide the beginning of the error log? --Rainman 00:55, 13 January 2008 (UTC)

Yes, I am using the opensuse 10.3 distribution and javac 1.5.0_13. I hope that the error messages I provide below are enough, sorry for my low experience in java build processes.

I´ll provide the first part and the last messages here:

Apache Ant version 1.7.0 compiled on September 22 2007 Buildfile: build.xml Detected Java version: 1.5 in: /usr/lib/jvm/java-1.5.0-sun-1.5.0_update13-sr2/jre Detected OS: Linux parsing buildfile /root/lucene/lucene-search/build.xml with URI = file:/root/lucene/lucene-search/build.xml Project base dir set to: /root/lucene/lucene-search [antlib:org.apache.tools.ant] Could not load definitions from resource org/apache/tools/ant/antlib.xml. It could not be found. [property] Loading /root/lucene-search.build.properties [property] Unable to find property file: /root/lucene-search.build.properties [property] Loading /root/build.properties [property] Unable to find property file: /root/build.properties [property] Loading /root/lucene/lucene-search/build.properties [property] Unable to find property file: /root/lucene/lucene-search/build.properties Property "current.year" has not been set Build sequence for target(s) `default' is [init, compile-core, compile, default] Complete build sequence is [init, compile-core, compile, default, package-tgz-src, jar-core, javadocs, package, package-zip, package-tgz, package-all-binary, dist, package-zip-src, package-all-src, dist-src, dist-all, jar, jar-src, clean, ]

init: [mkdir] Skipping /root/lucene/lucene-search/bin because it already exists. [mkdir] Skipping /root/lucene/lucene-search/dist because it already exists.

compile-core: [mkdir] Skipping /root/lucene/lucene-search/bin because it already exists. [javac] wikimedia/lsearch/Article.java added as wikimedia/lsearch/Article.class doesn't exist. [javac] wikimedia/lsearch/ArticleList.java added as wikimedia/lsearch/ArticleList.class doesn't exist. [javac] wikimedia/lsearch/Configuration.java added as wikimedia/lsearch/Configuration.class doesn't exist. [javac] wikimedia/lsearch/DatabaseConnection.java added as wikimedia/lsearch/DatabaseConnection.class doesn't exist. [javac] wikimedia/lsearch/EnglishAnalyzer.java added as wikimedia/lsearch/EnglishAnalyzer.class doesn't exist. [javac] wikimedia/lsearch/EsperantoAnalyzer.java added as wikimedia/lsearch/EsperantoAnalyzer.class doesn't exist. [javac] wikimedia/lsearch/EsperantoStemFilter.java added as wikimedia/lsearch/EsperantoStemFilter.class doesn't exist. [javac] wikimedia/lsearch/MWDaemon.java added as wikimedia/lsearch/MWDaemon.class doesn't exist. [javac] wikimedia/lsearch/MWSearch.java added as wikimedia/lsearch/MWSearch.class doesn't exist. [javac] wikimedia/lsearch/NamespaceFilter.java added as wikimedia/lsearch/NamespaceFilter.class doesn't exist. [javac] wikimedia/lsearch/QueryStringMap.java added as wikimedia/lsearch/QueryStringMap.class doesn't exist. [javac] wikimedia/lsearch/SearchClientReader.java added as wikimedia/lsearch/SearchClientReader.class doesn't exist. [javac] wikimedia/lsearch/SearchDbException.java added as wikimedia/lsearch/SearchDbException.class doesn't exist. [javac] wikimedia/lsearch/SearchState.java added as wikimedia/lsearch/SearchState.class doesn't exist. [javac] wikimedia/lsearch/Title.java added as wikimedia/lsearch/Title.class doesn't exist. [javac] wikimedia/lsearch/TitlePrefixMatcher.java added as wikimedia/lsearch/TitlePrefixMatcher.class doesn't exist. [javac] Compiling 16 source files to /root/lucene/lucene-search/bin [javac] Using modern compiler dropping /root/lucene/lucene-search/bin/bin from path as it doesn't exist [javac] Compilation arguments: [javac] '-deprecation' [javac] '-d' [javac] '/root/lucene/lucene-search/bin' [javac] '-classpath' [javac] '/root/lucene/lucene-search/bin:/usr/share/java/ant.jar:/usr/share/java/ant-launcher.jar:/usr/share/java/jaxp_parser_impl.jar:/usr/share/java/xml-commons-apis.jar:/usr/share/java/ant/ant-antlr.jar:/usr/share/java/bcel.jar:/usr/share/java/ant/ant-apache-bcel.jar:/usr/share/java/bsf.jar:/usr/share/java/ant/ant-apache-bsf.jar:/usr/share/java/log4j.jar:/usr/share/java/ant/ant-apache-log4j.jar:/usr/share/java/oro.jar:/usr/share/java/ant/ant-apache-oro.jar:/usr/share/java/regexp.jar:/usr/share/java/ant/ant-apache-regexp.jar:/usr/share/java/xml-commons-resolver.jar:/usr/share/java/ant/ant-apache-resolver.jar:/usr/share/java/jakarta-commons-logging.jar:/usr/share/java/ant/ant-commons-logging.jar:/usr/share/java/javamail.jar:/usr/share/java/jaf.jar:/usr/share/java/ant/ant-javamail.jar:/usr/share/java/jdepend.jar:/usr/share/java/ant/ant-jdepend.jar:/usr/share/java/ant/ant-jmf.jar:/usr/share/java/junit.jar:/usr/share/java/ant/ant-junit.jar:/usr/share/java/ant/ant-nodeps.jar:/usr/lib/jvm/java/lib/tools.jar:/usr/share/ant/lib/ant-apache-resolver-1.7.0.jar:/usr/share/ant/lib/ant-apache-bsf.jar:/usr/share/ant/lib/ant-nodeps.jar:/usr/share/ant/lib/ant-commons-logging.jar:/usr/share/ant/lib/ant-junit.jar:/usr/share/ant/lib/ant-javamail-1.7.0.jar:/usr/share/ant/lib/ant-junit-1.7.0.jar:/usr/share/ant/lib/ant-launcher.jar:/usr/share/ant/lib/ant-apache-log4j.jar:/usr/share/ant/lib/ant-apache-oro-1.7.0.jar:/usr/share/ant/lib/ant-javamail.jar:/usr/share/ant/lib/ant-apache-log4j-1.7.0.jar:/usr/share/ant/lib/ant-apache-bcel-1.7.0.jar:/usr/share/ant/lib/ant-nodeps-1.7.0.jar:/usr/share/ant/lib/ant-jmf.jar:/usr/share/ant/lib/ant-jmf-1.7.0.jar:/usr/share/ant/lib/ant-commons-logging-1.7.0.jar:/usr/share/ant/lib/ant-jdepend-1.7.0.jar:/usr/share/ant/lib/ant-1.7.0.jar:/usr/share/ant/lib/ant-apache-regexp.jar:/usr/share/ant/lib/ant-apache-oro.jar:/usr/share/ant/lib/ant-apache-resolver.jar:/usr/share/ant/lib/ant-jdepend.jar:/usr/share/ant/lib/ant-antlr.jar:/usr/share/ant/lib/ant-antlr-1.7.0.jar:/usr/share/ant/lib/ant-apache-regexp-1.7.0.jar:/usr/share/ant/lib/ant-apache-bcel.jar:/usr/share/ant/lib/ant-apache-bsf-1.7.0.jar:/usr/share/ant/lib/ant-launcher-1.7.0.jar:/usr/share/ant/lib/ant.jar' [javac] '-sourcepath' [javac] '/root/lucene/lucene-search/org' [javac] '-encoding' [javac] 'utf-8' [javac] '-g' [javac] [javac] The ' characters around the executable and arguments are [javac] not part of the command. [javac] Files to be compiled: [javac]    /root/lucene/lucene-search/org/wikimedia/lsearch/Article.java [javac]    /root/lucene/lucene-search/org/wikimedia/lsearch/ArticleList.java [javac]    /root/lucene/lucene-search/org/wikimedia/lsearch/Configuration.java [javac]    /root/lucene/lucene-search/org/wikimedia/lsearch/DatabaseConnection.java [javac]    /root/lucene/lucene-search/org/wikimedia/lsearch/EnglishAnalyzer.java [javac]    /root/lucene/lucene-search/org/wikimedia/lsearch/EsperantoAnalyzer.java [javac]    /root/lucene/lucene-search/org/wikimedia/lsearch/EsperantoStemFilter.java [javac]    /root/lucene/lucene-search/org/wikimedia/lsearch/MWDaemon.java [javac]    /root/lucene/lucene-search/org/wikimedia/lsearch/MWSearch.java [javac]    /root/lucene/lucene-search/org/wikimedia/lsearch/NamespaceFilter.java [javac]    /root/lucene/lucene-search/org/wikimedia/lsearch/QueryStringMap.java [javac]    /root/lucene/lucene-search/org/wikimedia/lsearch/SearchClientReader.java [javac]    /root/lucene/lucene-search/org/wikimedia/lsearch/SearchDbException.java [javac]    /root/lucene/lucene-search/org/wikimedia/lsearch/SearchState.java [javac]    /root/lucene/lucene-search/org/wikimedia/lsearch/Title.java [javac]    /root/lucene/lucene-search/org/wikimedia/lsearch/TitlePrefixMatcher.java [javac] /root/lucene/lucene-search/org/wikimedia/lsearch/EnglishAnalyzer.java:28: package org.apache.lucene.analysis does not exist [javac] import org.apache.lucene.analysis.Analyzer; [javac]                                  ^ [javac] /root/lucene/lucene-search/org/wikimedia/lsearch/EnglishAnalyzer.java:29: package org.apache.lucene.analysis does not exist [javac] import org.apache.lucene.analysis.LowerCaseTokenizer; [javac]                                  ^ [javac] /root/lucene/lucene-search/org/wikimedia/lsearch/EnglishAnalyzer.java:30: package org.apache.lucene.analysis does not exist [javac] import org.apache.lucene.analysis.PorterStemFilter; [javac]                                  ^ [javac] /root/lucene/lucene-search/org/wikimedia/lsearch/EnglishAnalyzer.java:31: package org.apache.lucene.analysis does not exist [javac] import org.apache.lucene.analysis.TokenStream; [javac]                                  ^ [javac] /root/lucene/lucene-search/org/wikimedia/lsearch/EnglishAnalyzer.java:37: cannot find symbol [javac] symbol: class Analyzer [javac] public class EnglishAnalyzer extends Analyzer { [javac]                                     ^ [javac] /root/lucene/lucene-search/org/wikimedia/lsearch/EnglishAnalyzer.java:38: cannot find symbol [javac] symbol : class TokenStream [javac] location: class org.wikimedia.lsearch.EnglishAnalyzer [javac]    public final TokenStream tokenStream(String fieldName, Reader reader) { [javac]                     ^ [javac] /root/lucene/lucene-search/org/wikimedia/lsearch/EsperantoAnalyzer.java:31: package org.apache.lucene.analysis does not exist [javac] import org.apache.lucene.analysis.Analyzer; [javac]                                  ^ [javac] /root/lucene/lucene-search/org/wikimedia/lsearch/EsperantoAnalyzer.java:32: package org.apache.lucene.analysis does not exist [javac] import org.apache.lucene.analysis.LowerCaseTokenizer; [javac]                                  ^ [javac] /root/lucene/lucene-search/org/wikimedia/lsearch/EsperantoAnalyzer.java:33: package org.apache.lucene.analysis does not exist [javac] import org.apache.lucene.analysis.Token; [javac]                                  ^ [javac] /root/lucene/lucene-search/org/wikimedia/lsearch/EsperantoAnalyzer.java:34: package org.apache.lucene.analysis does not exist [javac] import org.apache.lucene.analysis.TokenStream; [javac]                                  ^ [javac] /root/lucene/lucene-search/org/wikimedia/lsearch/EsperantoAnalyzer.java:36: cannot find symbol [javac] symbol: class Analyzer [javac] public class EsperantoAnalyzer extends Analyzer{ [javac]                                       ^ [javac] /root/lucene/lucene-search/org/wikimedia/lsearch/EsperantoAnalyzer.java:37: cannot find symbol [javac] symbol : class TokenStream [javac] location: class org.wikimedia.lsearch.EsperantoAnalyzer [javac]    public final TokenStream tokenStream(String fieldName, Reader reader) { [javac]                     ^ [javac] /root/lucene/lucene-search/org/wikimedia/lsearch/EsperantoStemFilter.java:31: package org.apache.lucene.analysis does not exist [javac] import org.apache.lucene.analysis.Token; [javac]                                  ^ [javac] /root/lucene/lucene-search/org/wikimedia/lsearch/EsperantoStemFilter.java:32: package org.apache.lucene.analysis does not exist [javac] import org.apache.lucene.analysis.TokenStream; [javac]                                  ^ [javac] /root/lucene/lucene-search/org/wikimedia/lsearch/EsperantoStemFilter.java:33: package org.apache.lucene.analysis does not exist [javac] import org.apache.lucene.analysis.TokenFilter; [javac]                                  ^ [javac] /root/lucene/lucene-search/org/wikimedia/lsearch/EsperantoStemFilter.java:36: cannot find symbol [javac] symbol: class TokenFilter [javac] public class EsperantoStemFilter extends TokenFilter { [javac]                                         ^ [javac] /root/lucene/lucene-search/org/wikimedia/lsearch/EsperantoStemFilter.java:37: cannot find symbol [javac] symbol : class TokenStream [javac] location: class org.wikimedia.lsearch.EsperantoStemFilter [javac]    public EsperantoStemFilter(TokenStream tokenizer) {

--- snipp --- cutted some lines here --- snipp ---

[javac] /root/lucene/lucene-search/org/wikimedia/lsearch/SearchState.java:332: cannot find symbol [javac] symbol : class Term [javac] location: class org.wikimedia.lsearch.SearchState [javac]                            new Term("key", key))); [javac]                                    ^ [javac] Note: Some input files use unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. [javac] 85 errors

BUILD FAILED /root/lucene/lucene-search/build.xml:55: Compile failed; see the compiler error output for details. at org.apache.tools.ant.taskdefs.Javac.compile(Javac.java:999) at org.apache.tools.ant.taskdefs.Javac.execute(Javac.java:820) at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:288) at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:105) at org.apache.tools.ant.Task.perform(Task.java:348) at org.apache.tools.ant.Target.execute(Target.java:357) at org.apache.tools.ant.Target.performTasks(Target.java:385) at org.apache.tools.ant.Project.executeSortedTargets(Project.java:1329) at org.apache.tools.ant.Project.executeTarget(Project.java:1298) at org.apache.tools.ant.helper.DefaultExecutor.executeTargets(DefaultExecutor.java:41) at org.apache.tools.ant.Project.executeTargets(Project.java:1181) at org.apache.tools.ant.Main.runBuild(Main.java:698) at org.apache.tools.ant.Main.startAnt(Main.java:199) at org.apache.tools.ant.launch.Launcher.run(Launcher.java:257) at org.apache.tools.ant.launch.Launcher.main(Launcher.java:104)


 * Looks like your ant is broken and cannot find the relevant libraries. I've compiled the package and put it here.--Rainman 11:01, 26 January 2008 (UTC)

Cannot bind RMIMessenger exception: non-JRMP server at remote endpoint
Hello everyone,

I'm quite new in Lucene stuff and I have a problem. I can't get Lucene Java working on one of my server. I've setup it on another server for Mediawiki and it works fine.

It's a GNU/Linux Ubuntu Edgy i686 with kernel 2.6.17-11-server running Apache 2.0 with PHP5 for Mediawiki, some others stuffs like Tomcat & Jboss. Got Java installed : j2re1.4, j2sdk1.4, java-common, libgcj-common, sun-java5-bin, sun-java5-demo , sun-java5-jdk and sun-java5-jre

In the case of the first server (fresh Ubuntu Gutsy 64bits with almost anything running) it worked fine, I can use Lucene to search into my Wiki. In the case of my second server, here is the error when I would like to start the engine :

 www-data@myserver:/usr/local/search/ls2$ ./lsearchd . Trying config file at path /var/www/.lsearch.conf Trying config file at path /usr/local/search/ls2/lsearch.conf 0 [main] INFO org.wikimedia.lsearch.util.UnicodeDecomposer  - Loaded unicode decomposer java.rmi.ConnectIOException: non-JRMP server at remote endpoint
 * at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:217)
 * at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:171)
 * at sun.rmi.server.UnicastRef.newCall(UnicastRef.java:306)
 * at sun.rmi.registry.RegistryImpl_Stub.rebind(Unknown Source)
 * at org.wikimedia.lsearch.interoperability.RMIServer.register(RMIServer.java:24)
 * at org.wikimedia.lsearch.interoperability.RMIServer.bindRMIObjects(RMIServer.java:60)
 * at org.wikimedia.lsearch.config.StartupManager.main(StartupManager.java:52)

76  [main] WARN  org.wikimedia.lsearch.interoperability.RMIServer  - Cannot bind RMIMessenger exception:non-JRMP server at remote endpoint 

But NOTHING use the port 8321. I've tried to use another port, it's the same problem. Any ideas how to solve this problem please? Here is my contact :

Thanks, LMJ
 * First verify that jboss, tomcat and lsearchd all run under sun-java5-bin (and not j2re1.4). If this is the case then maybe the RMI registry is colliding with jboss (so try stopping it if you can). If this appears to be the case, then you can either configure jboss not to use the port 1099, or edit RMIRegistry.java to use a different port (replace 1099 there with your port, and provide the port as param to getRegistry calls in RMIRegistry.java and RMIMessengerClient.java). --Rainman 15:05, 15 January 2008 (UTC)


 * Indeed Rainman, thanks for your help! look at this :
 * # lsof +i :1099

COMMAND  PID    USER   FD   TYPE    DEVICE SIZE NODE NAME

java   20832 syncron    7u  IPv4 149877937       TCP *:rmiregistry (LISTEN)

 The port is used by Jboss rmiregistry :-/ I need some extra help to change that port. Can we exchange emails about it Rainman? I tried to contact you via your personal page but I just read English & French ;)


 * I've edited /usr/local/jboss-3.2.7/server/default/conf/jboss-service.xml and change to port to 10999. It seems to work better ;) Got another problem but it seems to be lsearch.conf related issue.

Daemon status
On the German Wikipedia, I am often irritated because changes in content are not reflected immediately by the full text search and – at the moment – I cannot see whether and when the changes have already or will be processed by the daemon. Therefore, I would like to know:


 * whether the daemon processes the changes chronologically so one could be certain that if one's changes were made at time T and the daemon has processed all changes up to T + 1, they will be reflected in the full text search, and
 * whether there is any way to obtain the daemon status (all changes up to T, n articles in queue, etc.) from a current or future Wikipedia installation.

Thanks, Tim Landscheidt 19:52, 7 February 2008 (UTC)


 * The index is updated around 5 am GMT every day on wikimedia projects (when nothing goes wrong which is most of the time). About 1) - yes, it processes the changes chronologically. 2) - this interface is available but only for system admins, for everybody else - just wait till tomorrow for changes to be applied. --Rainman 10:07, 8 February 2008 (UTC)


 * Hmmm. If I search for "Lassithi" (note the double "s") now, I see that changes in de:Panagia i Kera (8 days ago), de:Kritsa (7 days ago), de:Ierapetra (10 days ago), de:Kera Kardiotissa (11 days ago), de:Griechische Toponyme (11 days ago), de:Venezianische Kolonien (9 days ago) and de:Sitia (11 days ago) have not been processed. Is that what you mean by "when nothing goes wrong"? :-) Would it be technically feasible to include the last time a change was successfully worked into the index in the result page, i. e. "All changes until T considered."? Tim Landscheidt 17:24, 8 February 2008 (UTC)
 * Yes, this seems to be a case of "if nothing is broken" :) one of the dewiki search servers (srv21) is broken and stopped updating its index and seems to have a broken logrotate and possibly some other things. We'll fix it when a sysadmin become available. Whenever you see changes not going in for more than a couple of days you should report it. --Rainman 18:07, 8 February 2008 (UTC)
 * Ok, we tracked this down to a hard drive failure on srv21, now one just needs to wait for cache to expire (~12h) and you should get fresh results - thanks for the report! --Rainman 18:57, 8 February 2008 (UTC)
 * Thanks for the information :-). What would be the proper place to report such things in the future? Tim Landscheidt 21:32, 8 February 2008 (UTC)
 * Technical issues are usually reported via IRC channel #wikimedia-tech where all of the sysadmins are. If there's no-one online to fix the problem then you could submit a bug. You could also send me an e-mail via this wiki or leave a message on my talk page, since I'm more-or-less in change of maintaining the search subsystem. --Rainman 21:44, 8 February 2008 (UTC)
 * Okay, I'll keep that in mind. Thanks again, Tim Landscheidt 22:53, 8 February 2008 (UTC)

Exception in thread "main" java.lang.UnsupportedClassVersionError
Hi I use following configuration:


 * MediaWiki: 1.11.0
 * PHP: 5.2.5 (apache2handler)
 * MySQL: 5.0.51

If I call this:

java -cp /usr/local/search/ls2/ls2-bin/LuceneSearch.jar org.wikimedia.lsearch.importer.Importer -s basiswikidb.xml basiswiki

I get the error:

Exception in thread "main" java.lang.UnsupportedClassVersionError: org/wikimedia/lsearch/importer/Importer (Unsupported major.minor version 49.0) at java.lang.ClassLoader.defineClass0(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:539) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:123) at java.net.URLClassLoader.defineClass(URLClassLoader.java:251) at java.net.URLClassLoader.access$100(URLClassLoader.java:55) at java.net.URLClassLoader$1.run(URLClassLoader.java:194) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:187) at java.lang.ClassLoader.loadClass(ClassLoader.java:289) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:274) at java.lang.ClassLoader.loadClass(ClassLoader.java:235) at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:302)

My Configuration

 * all Files are in /usr/local/search/ls2/
 * MWConfig.global=file:///usr/local/search/ls2/lsearch-global.conf
 * MWConfig.lib=/usr/local/search/ls2/lib
 * Indexes.path=/usr/local/search/indexes
 * Localization.url=file:///opt/lampp/htdocs/basiswiki/languages/messages
 * Logging.logconfig=/usr/local/search/ls2/lsearch.log4j
 * mwdumper.jar => /usr/local/search/ls2/lib
 * lsearch.conf: Storage.lib=/usr/local/search/ls2/sql

lsearch-global.conf
[Database] wikidev : (single) (language,sr) wikilucene : (nssplit,3) (nspart1,[0]) (nspart2,[4,5,12,13]), (nspart3,[]) wikilucene : (language,en) (warmup,10) basiswiki : (single) (language,en) (warmup,10) [Search-Group] : wikilucene wikidev : basiswiki
 * 1) wikilucene : (single) (language,en) (warmup,0)
 * 1) Search groups
 * 2) Index parts of a split index are always taken from the node's group
 * 3) host : db1.part db2.part
 * 4) Mulitple hosts can search multiple dbs (N-N mapping)

Please can you help me?!

85.158.226.1 11:03, 31 March 2008 (UTC)


 * Run java -version. I probably have old java, you need to update to 1.5 or later. --Rainman 11:57, 31 March 2008 (UTC)

MediaWiki+Lucene-Search+MWSearch = ZERO search results ??!@#?!
Can someone please assist me? =)'' I've followed the steps per Extension:Lucene-search and Extension:MWSearch pages, to the T - I've gone over and over them several times, I've been to MediaWiki Forums, and the MediaWiki-L mailing list ... please help me! =)
 * Slackware 12.0, on i686 Pentium III [Linux 2.6.21.5]
 * MediaWiki: 1.9.1
 * PHP: 5.2.5 (apache2handler)
 * MySQL: 5.0.37
 * MediaWiki Extension(s): MWSearch SVN (05122008), and Lucene-search SVN (05122008), + I downloaded & installed mwdumper.jar into the Lucene2 lib dir.
 * other tools: jre-6u2-i586-1, jdk-1_5_0_09-i586-1, apache-ant-1.7.0-i586-1bj, rsync-2.6.9-i486-1

My Local LuceneSearch configuration

 * LuceneSearch SVN Install dir: /usr/local/search/lucene-search-2svn05112008
 * Indexes stored: /usr/local/search/indexes

/etc/lsearch.conf
MWConfig.global=file:///etc/lsearch-global.conf MWConfig.lib=/usr/local/search/lucene-search-2svn05112008/lib Indexes.path=/usr/local/search/indexes Search.updateinterval=1 Search.updatedelay=0 Search.checkinterval=30 Index.snapshotinterval=5 Index.maxqueuecount=5000 Index.maxqueuetimeout=12 Storage.master=localhost Storage.username=wikiuser Storage.password=mypass Storage.useSeparateDBs=false Storage.defaultDB=wikidb Storage.lib=/usr/local/search/lucene-search-2svn05112008/sql Localization.url=file:///var/www/htdocs/wiki/languages/messages Logging.logconfig=/etc/lsearch.log4j Logging.debug=true

/etc/lsearch-global.conf
[Database] wikidb : (single) (language,en) (warmup,10) [Search-Group] nen-tftp : wikidb [Index] nen-tftp : wikidb [Index-Path] : /usr/local/search/indexes [OAI] wiktionary : http://$lang.wiktionary.org/w/index.php wikilucene : http://localhost/wiki-lucene/phase3/index.php : http://$lang.wikipedia.org/w/index.php [Properties] Database.suffix=wiki wiktionary wikidb KeywordScoring.suffix=wikidb wiki wikilucene wikidev ExactCase.suffix=wikidb wiktionary wikilucene [Namespace-Prefix] all : [0] : 0 [1] : 1 [2] : 2 [3] : 3 [4] : 4 [5] : 5 [6] : 6 [7] : 7 [8] : 8 [9] : 9 [10] : 10 [11] : 11 [12] : 12 [13] : 13 [14] : 14 [15] : 15

/etc/lsearch.log4j
log4j.rootLogger=INFO, A1 log4j.appender.A1=org.apache.log4j.ConsoleAppender log4j.appender.A1.layout=org.apache.log4j.PatternLayout log4j.appender.A1.layout.ConversionPattern=%-4r [%t] %-5p %c %x - %m%n

relevant /var/www/htdocs/wiki/LocalSettings.php settings
$wgSearchType = 'LuceneSearch'; $wgLuceneHost = 'localhost'; $wgLucenePort = 8123; require_once("extensions/MWSearch/MWSearch.php");

building the index works running dumpBackup(Init).php
> php maintenance/dumpBackupInit.php --current --quiet > wikidb.xml && java -cp /usr/local/search/lucene-search-2svn05112008/LuceneSearch.jar org.wikimedia.lsearch.importer.Importer -s /var/www/htdocs/wiki/wikidb.xml wikidb MediaWiki Lucene search indexer - index builder from xml database dumps.

Trying config file at path /root/.lsearch.conf Trying config file at path /var/www/htdocs/wiki/lsearch.conf Trying config file at path /etc/lsearch.conf log4j: Trying to find [log4j.xml] using context classloader sun.misc.Launcher$AppClassLoader@133056f. log4j: Trying to find [log4j.xml] using sun.misc.Launcher$AppClassLoader@133056f class loader. log4j: Trying to find [log4j.xml] using ClassLoader.getSystemResource. log4j: Trying to find [log4j.properties] using context classloader sun.misc.Launcher$AppClassLoader@133056f. log4j: Trying to find [log4j.properties] using sun.misc.Launcher$AppClassLoader@133056f class loader. log4j: Trying to find [log4j.properties] using ClassLoader.getSystemResource. log4j: Could not find resource: [null]. log4j: Parsing for [root] with value=[INFO, A1]. log4j: Level token is [INFO]. log4j: Category root set to INFO log4j: Parsing appender named "A1". log4j: Parsing layout options for "A1". log4j: Setting property [conversionPattern] to [%-4r [%t] %-5p %c %x - %m%n]. log4j: End of parsing for "A1". log4j: Parsed "A1" options. log4j: Finished configuring. 0   [main] INFO  org.wikimedia.lsearch.util.UnicodeDecomposer  - Loaded unicode decomposer 18  [main] INFO  org.wikimedia.lsearch.util.Localization  - Reading localization for En 434  [main] INFO  org.wikimedia.lsearch.ranks.RankBuilder  - First pass, getting a list of valid articles... 94 pages (99.576/sec), 94 revs (99.576/sec) 1527 [main] INFO org.wikimedia.lsearch.ranks.RankBuilder  - Second pass, calculating article links... 94 pages (326.389/sec), 94 revs (326.389/sec) 1928 [main] INFO org.wikimedia.lsearch.importer.Importer  - Third pass, indexing articles... 94 pages (24.588/sec), 94 revs (24.588/sec) 6005 [main] INFO org.wikimedia.lsearch.importer.Importer  - Closing/optimizing index... Finished indexing in 5s, with final index optimization in 0s Total time: 6s 6530 [main] INFO org.wikimedia.lsearch.index.IndexThread  - Making snapshot for wikidb 6582 [main] INFO org.wikimedia.lsearch.index.IndexThread  - Made snapshot /usr/local/search/indexes/snapshot/wikidb/20080512024654 That creates a 277KB file @ /var/www/htdocs/wiki/wikidb.xml, which looks just fine to me...

Starting the lsearch daemon is working
When I run my script /usr/local/search/lucene-search-2svn05112008/lsearchd - which starts the lsearch deamon, I get the following, which ALSO looks fine ; java -Djava.rmi.server.codebase=file:///usr/local/search/lucene-search-2svn05112008/LuceneSeah.jar -Djava.rmi.server.hostname=nen-tftp -jar /usr/local/search/lucene-search-2svn05112008/LuceneSearch.jar $* RMI registry started. Trying config file at path /root/.lsearch.conf Trying config file at path /usr/local/search/lucene-search-2svn05112008/lsearch.conf log4j: Parsing for [root] with value=[INFO, A1]. log4j: Level token is [INFO]. log4j: Category root set to INFO log4j: Parsing appender named "A1". log4j: Parsing layout options for "A1". log4j: Setting property [conversionPattern] to [%-4r [%t] %-5p %c %x - %m%n]. log4j: End of parsing for "A1". log4j: Parsed "A1" options. log4j: Finished configuring. 0   [main] INFO  org.wikimedia.lsearch.util.Localization  - Reading localization for En 2351 [main] INFO  org.wikimedia.lsearch.util.UnicodeDecomposer  - Loaded unicode decomposer 2600 [main] INFO org.wikimedia.lsearch.interoperability.RMIServer  - RMIMessenger bound 2882 [main] INFO org.wikimedia.lsearch.interoperability.RMIServer  - RemoteSearchable bound 2914 [main] INFO org.wikimedia.lsearch.search.Warmup  - Warming up index wikidb ... 2928 [Thread-2] INFO org.wikimedia.lsearch.frontend.HTTPIndexServer  - Started server at port 8321 2929 [Thread-3] INFO org.wikimedia.lsearch.frontend.SearchServer  - Binding server to port 8123 4246 [main] INFO org.wikimedia.lsearch.search.Warmup  - Warmed up wikidb in 1331 ms 4246 [main] INFO  org.wikimedia.lsearch.search.Warmup  - Warming up index wikidb ... 5079 [main] INFO org.wikimedia.lsearch.search.Warmup  - Warmed up wikidb in 833 ms 5079 [main] INFO  org.wikimedia.lsearch.search.Warmup  - Warming up index wikidb ... 5861 [main] INFO org.wikimedia.lsearch.search.Warmup  - Warmed up wikidb in 782 ms From here, I pull up my normal wiki, which has been working fine ALL along - but now, I get ZERO search results, no matter what I do! I know I am searching correctly, I just type in 1 single word for searching (that I know is on several pages in the wiki) I've even tried to edit the file before and after building the index, and starting/stoping the lsearch daemon, yet I get this error in my MediaWiki search results page; Search results From AgentDcooper's Wiki

You searched for wiki

For more information about searching AgentDcooper's Wiki, see Searching AgentDcooper's Wiki.

Showing below 0 results starting with #1. No page text matches

Note: Unsuccessful searches are often caused by searching for common words like "have" and "from", which are not indexed, or by specifying more than one search term (only pages containing all of the search terms will appear in the result).

I notice that the lsearch daemon console output scrolls the following; right after doing a search within the wiki
293744 [pool-2-thread-1] INFO org.wikimedia.lsearch.frontend.HttpHandler  - query:/search/wikidb/wiki?namespaces=0%2C1%2C2%2C3%2C4%2C5%2C6%2C7%2C8%2C9%2C10%2C11%2C12%2C13%2C14%2C15&offset=0&limit=100&version=2&iwlimit=10 what:search dbname:wikidb term:wiki 293759 [pool-2-thread-1] INFO org.wikimedia.lsearch.search.SearchEngine  - Using NamespaceFilterWrapper wrap: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15} 293786 [pool-2-thread-1] INFO org.wikimedia.lsearch.search.SearchEngine  - search wikidb: query=[wiki] parsed=[contents:wiki (title:wiki^6.0 stemtitle:wiki^2.0) (alttitle1:wiki^4.0 alttitle2:wiki^4.0 alttitle3:wiki^4.0) (keyword1:wiki^0.02 keyword2:wiki^0.01 keyword3:wiki^0.0066666664 keyword4:wiki^0.0050 keyword5:wiki^0.0039999997)] hit=[27] in 16ms using IndexSearcherMul:1210585609666

With Mediawiki Debuging enabled, my /var/log/mediawiki/debug_log.txt shows this
Start request GET /wiki/index.php/Special:Search?search=wiki&fulltext=Search Host: nen-tftp.techiekb.com User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14 Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0. 5 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive Referer: http://nen-tftp.techiekb.com/wiki/index.php/Special:Version Cookie: wikidb_session=3jptdli2pf3nkuq924tq1ihlt0 Authorization: Basic ZGNvb3Blcjp0ZXN0cGFzcw==

Main cache: FakeMemCachedClient Message cache: MediaWikiBagOStuff Parser cache: MediaWikiBagOStuff Unstubbing $wgParser on call of $wgParser->setHook from require_once Fully initialised Unstubbing $wgContLang on call of $wgContLang->checkTitleEncoding from WebRequest::getGPCVal Language::loadLocalisation: got localisation for en from source Unstubbing $wgUser on call of $wgUser->isAllowed from Title::userCanRead Cache miss for user 2 Unstubbing $wgLoadBalancer on call of $wgLoadBalancer->getConnection from wfGetDB Logged in from session Unstubbing $wgMessageCache on call of $wgMessageCache->getTransform from wfMsgGetKey Unstubbing $wgLang on call of $wgLang->getCode from MessageCache::get MessageCache::load: got from global cache Unstubbing $wgOut on call of $wgOut->setPageTitle from SpecialSearch::setupPage Fetching search data from http://localhost:8123/search/wikidb/wiki?namespaces=0%2C1%2C2%2C3%2C4%2C5%2C6%2C 7%2C8%2C9%2C10%2C11%2C12%2C13%2C14%2C15&offset=0&limit=100&version=2&iwlimit=10 total [0] hits OutputPage::sendCacheControl: private caching; ** Request ended normally Now get this, if I goto the link from the debug from above = http://localhost:8123/search/wikidb/wiki?namespaces=0%2C1%2C2%2C3%2C4%2C5%2C6%2C7%2C8%2C9%2C10%2C11%2C12%2C13%2C14%2C15&offset=0&limit=100&version=2&iwlimit=10, I get this page;; 3 1.0 0 Main_Page 0.9577699303627014 0 EFFICIENT%2FCISCO%2FNETSCREEN%2FNETOPIA_Router_Command_Matrix 0.7121278643608093 0 DBU_-_DialBackUp Which leads me to my question: what am I doing wrong?? I have tried everything I can think of, I just cannot get my search within my mediawiki to work proplery. It seems like the search itself is working when going to the link directly above -- somehow the "total hits" in the log as well as the wiki are showing ZERO? Yet manually going to the link in the debug, shows me what appears to be a result indicating 3 PAGES were found with corresponding results data!?@# Why is MediaWiki not showing this? Anyhelp would be kindly appreciated, or even a link for reference! -peace- --Agentdcooper


 * I would suspect the problem is the MW version. Search front-end has been heavily refactored in MediaWiki 1.13, and MWSearch is designed to run with latest mediawiki, so there might be some compatibility issues. Note that MW 1.13 is still not released, but is still in development. Try using Extension:LuceneSearch instead. --Rainman 13:20, 12 May 2008 (UTC)


 * Thanks a TON, I will try this out in just a few, I half suspected it was a MediaWiki versioning issue, I really need to upgrade! =) --Agentdcooper 20:16, 12 May 2008 (UTC)