Extension talk:Lucene-search/LQT Archive 1

=2007=

Error when editing pages
I followed your tutorial and installed LuceneSearch. All went fine, but when I edit a page, I get this error:

Fatal error: Call to undefined method LuceneSearch::setLimitOffset in /path/to/wiki/includes/SearchEngine.php on line 222

I'm using Mediawiki 1.10.0. Is this a known problem or just a configuration issue? Looks like LuceneSearch.php or LuceneSearch_body.php don't define that function at all. Same with LuceneSearch::update function... --12 July 2007


 * You're missing

$wgDisableSearchUpdate = true;
 * in your LocalSettings.php. It should be placed before the require_once statement. --Rainman 17:48, 12 July 2007 (UTC)

Installing Lucene on Windows 2003 Server
Is there a way to install the LuceneSearch under Windows? I Run my wiki on a Windows 2003 Server with XAMPP and I want to use the features of Lucene. I found at http://meta.wikimedia.org/wiki/Installing_lucene_search that wikipedia uses the C# engine of Lucene.

Is there a compiled version of the C# engine to install it on my Apache running on Windows 2003 Server?----stp-- 13:40, 1 August 2007 (UTC)


 * As far as I know, no. --Rainman 09:54, 3 August 2007 (UTC)

I am also interested in a Windows 2003 tutorial for improving MediaWiki search results. Cedarrapidsboy 14:29, 2 August 2007 (UTC)


 * You can use the old C# daemon following tutorial on Installing lucene search. Wikimedia sites used to use this one, but now use to the latest (java) version. The new version could in principle run on windows with some modifications (main problem is usage of symbolic and hard links), but there is no-one around the patch it. --Rainman 09:54, 3 August 2007 (UTC)

Could you explain, how to compile old C# daemon under windows with Mono? There is no "make" and "make install" commands under Windows :((( --Konstbel 09:04, 31 March 2008 (UTC)


 * any luck on the patch for windows? --zhamrock 16:48, 28 July 2008 (SGT)

There is a .dll version available here: http://incubator.apache.org/lucene.net/download/, but I don't know if this helps --jdpond 21:53, 27 August 2007 (UTC)
 * The problem is not in the lucene itself, but the LSearch daemon, that makes use of linux fs to efficiently fetch new indexes, keep old copies, and swap copies after a background warmup phrase. --Rainman 09:18, 28 August 2007 (UTC)


 * The 2.1 branch seems to have some support for Windows (see FSUtil.java). Is someone actively working on this?  Any idea what the status is?  --Cneubauer 19:35, 13 January 2009 (UTC)
 * No, no-one is actively working on windows support.. the lucene-search-2.1 branch won't work on windows, although it could with some poking around, e.g. restructuring the indexregistry class.. --Rainman 22:08, 13 January 2009 (UTC)
 * I managed to get it run on Windows by patching FSUtil.java. I'm using NTFS hardlinks and a free Microsoft tool to create directory links (linkd.exe). It may not be that flexible as the Linux version using symbolic links but it works for me, especially because I'm able to do the development of a wiki search client completely on my Windows machine. If someone is interested, leave a note on my user page. I would commit it to the repository myself but I guess I'm not allowed to so. --Kai Kühn. 19:24, 2 February 2009 (UTC)
 * Please put the patch on bugzilla. --Rainman 18:38, 2 February 2009 (UTC)
 * done. Patch is here --Kai Kühn 20:53, 2 February 2009 (UTC)
 * Can we get the remaining Windows-related tasks posted that is needed to build the configuration files, indexes, and fetch the updates? After compiling the lucene-search-2.1 branch with Kai's FSUtils.java patch, I tried running the command in the /configure script, but it failed due to it still looking for Bash.

C:\lucene-search-2.1>java -cp LuceneSearch.jar org.wikimedia.lsearch.util.Configure C:\Inetpub\wwwroot\mediawiki Exception in thread "main" java.io.IOException: Cannot run program "/bin/bash": CreateProcess error=3, The system cannot find the path specified at java.lang.ProcessBuilder.start(ProcessBuilder.java:459) at java.lang.Runtime.exec(Runtime.java:593) at java.lang.Runtime.exec(Runtime.java:466) at org.wikimedia.lsearch.util.Command.exec(Command.java:41) at org.wikimedia.lsearch.util.Configure.getVariable(Configure.java:84) at org.wikimedia.lsearch.util.Configure.main(Configure.java:49) Caused by: java.io.IOException: CreateProcess error=3, The system cannot find th e path specified at java.lang.ProcessImpl.create(Native Method) at java.lang.ProcessImpl. (ProcessImpl.java:81) at java.lang.ProcessImpl.start(ProcessImpl.java:30) at java.lang.ProcessBuilder.start(ProcessBuilder.java:452) ... 5 more
 * --User:MadX 12:50, 3 December 2009 (UTC)


 * FSUtils is only tip of the iceberg, there are many other issues, especially how we handle index updates (which rely on symlinks and such for efficiency and simplicity)... So, there is no "simple hack" you can do and make it work... --Rainman 13:17, 3 December 2009 (UTC)

Missing Method?
I installed everything following the instructions (on MediaWiki 1.10.1), but I'm getting this when I hit the search-button:

Fatal error: Call to undefined method LuceneSearch::getRedirect in /var/www/mediawiki-1.10.1/includes/SpecialPage.php on line 396

Is this a known issue with 1.10.1, or am I missing something? --217.6.3.114 06:34, 6 August 2007 (UTC)


 * No idea, getRedirect is defined in SpecialPage, and LuceneSearch inherits SpecialPage. You might be using some odd php version, or something else might be wrong... --Rainman 10:55, 6 August 2007 (UTC)


 * My PHP- Version is (PHP 5.2.0-8+etch7 (cli) (built: Jul 2 2007 21:46:15)). Do you really think this might be a problem? I believe it is more likely that I forgot something obvious, not mentioned in the instructions. For example: I had to download ExtensionFunctions.php from svn, because it is not shipped with Mediawiki or the Extension. Do I need to register the Extension anywhere other than in LocalSettings.php? --217.6.3.114 12:55, 6 August 2007 (UTC)
 * I've seen people complain about various mediawiki stuff not working with php 5.2, switching back to php 5.1 usually fixes it. But I'm by no means php expert (I mainly do the java part), so I cannot really tell if it would help. If you can, give it a try, and let us know if it helps. --Rainman 16:48, 6 August 2007 (UTC)


 * There seems to be no php 5.1 package available for debian etch, so I guess there's no chance to make search work.--217.6.3.114 12:10, 7 August 2007 (UTC)
 * I submitted a bugreport: http://bugzilla.wikimedia.org/show_bug.cgi?id=10835 --7 August 2007
 * Yep, seen it .. I still think it might be a php problem, or maybe a broken eAccelerator or something like that... --Rainman 10:33, 21 August 2007 (UTC)
 * Is eAccelerator required for this extension? We do not use it.--217.6.3.114 08:58, 7 September 2007 (UTC)
 * Found the Solution! The problem was incompatibility between the MWSearch-Extension and LuceneSearch. I forgot that MWSearch was still active when I installed LuceneSearch. After deactivating MWSearch the problem was gone. --217.6.3.114 08:05, 11 September 2007 (UTC)

Wildcard Search
Is there a way to use wildcards as described on http://lucene.apache.org/java/docs/queryparsersyntax.html#Wildcard%20Searches? --217.6.3.114 12:50, 12 September 2007 (UTC)


 * Yes. Currently only simple prefixes work (e.g. test*) since I didn't get to test the performance impact of other wildcard schemes. If you want to patch it yourself, look at WikiQueryParser.java around line 669 (function makeQueryFromTokens), you probably want to replace buffer[length-1]=='*' with something that checks if * or ? are anywhere in the buffer. --Rainman 16:23, 12 September 2007 (UTC)

dumpBackup.php causes DB connection error: Unknown error
Following the simple Index creation tutorial "Building the index" I tryed to run php maintenance/dumpBackup.php --current --quiet > wikidb.xml && java -cp LuceneSearch.jar org.wikimedia.lsearch.importer.Importer -s wikidb.xml wikidb But the Script throws the mentioned error. After big trouble and consideration of this script I've found a solution for this/my and our Problem. The Problem exists, because of the for dumpBackup.php required File "includes/backup.inc". This File does the main-backup-work and uses some MediaWiki-Variables($wg...). This is really no Problem, if dumpBackup.php runs with mediaWiki but as standalone console-script, it will miss this $wg..-Parameters. So dumpBackup.php uses empty strings for $wgDBtype,$wgDBadminuser,$wgDBadminpassword,$wgDBname,$wgDebugDumpSql and this causes the DB connection error: Unknown error while running. I've solved this Problem with a self-written php-wrapper-script, which only initializes this Variables and then simply include dumpBackup.php and now it works fine. This is my php-wrapper-script: <?php
 * 1) dumpBackupInit - Wrapper Script to run the mediaWiki xml-dump "dumpBackup.php" correctly
 * 2) @author: Stefan Furcht
 * 3) @version: 1.0
 * 4) @require: /srv/www/htdocs/wiki/maintenance/dumpBackup.php

$wgDBtype = 'mysql'; $wgDBadminuser="[MySQL-Username]"; $wgDBadminpassword ="[MySQL-Usernames-Password]"; $wgDBname = '[mediaWiki-Database-scheme]'; $wgDebugDumpSql='true';
 * 1) The following Variables musst be set, to get dumpBackup.php at work
 * 1) you'll find this Values in the DB-section into your mediaWiki-Config: LocalSettings.php

require_once("/srv/www/htdocs/wiki/maintenance/dumpBackup.php"); ?>
 * 1) XML-Dumper 'dumpBackup.php' requires the setted Vars to run
 * 2) simply include the original dumpBackup-Script

Now you can use this script as like as the dumpBackup.php with exception it will (hopefully) now run correctly. Example:  php dumpBackupInit.php --current > WikiDatabaseDump.xml 

I hope this will help you. Please excuse my properly bad english

Regards -Stefan- 12 September 2007
 * dumpBackup.php uses AdminSettings.php (and not LocalSettings.php), so you need to set it up (basically you would rename AdminSettings.sample and fill-in the data). What would be in AdminSettings.php is exactly what you provide in your wrapper, see Manual:System_administration. --Rainman 16:12, 12 September 2007 (UTC)

Thank you very much. I've never read what 'AdminSettings.php' exactly does. By setting this vars, it works finde. So you can delete my "wrapper script" from this discussion page. But perhaps it's usefull to mention explicitly on the extension page that 'AdminSettings.php' musst be set to run 'dumpBackup.php', because somebody may never had to issue on this file before. Thanks for this very great extension. -Stefan- 79.211.199.66 08:14, 20 September 2007 (UTC)

lsearchd killed in virtual hosting environment
When running lsearchd in a virtual hosting environment, it would work for 10-20 seconds or so, then it would fail with the message "killed." Thanks to Rainman's help, I verified that the resource requirements of the application exceeded the capacity available in the virtual hosting environment (whether it was the size of the JVM or number of threads, I was never sure.) It runs fine and with modest resource requirements on a dedicated server. Dbkayanda 20:44, 14 October 2007 (UTC)

Also, I notice in lsearch.conf there are a number of variables for the Storage backend:


 * Storage.username
 * Storage.password

etc. Do these need to be modified to my environment, or do they get ignored? --15 September 2007


 * These are for the incremental updater (it stores articles rank info). If you don't use it, it gets ignored. --Rainman 17:23, 15 September 2007 (UTC)

Error while initially creating index
I am trying to get the LuceneSearch-Extension running on a mediawiki-1.11.0rc1 installation under opensuse10.2. LuceneSearch.jar and mwdumper.jar were generated from svn sources with ant and javac-version 1.5.0_12. I followed the instructions, but when I try to build the index, I get a Null-pointer exception:

me@mypc:~/var/lucene> java -cp ~/bin/lucene-search-2/LuceneSearch.jar org.wikimedia.lsearch.importer.Importer -s wikidb_TEST.xml wikidb_TEST MediaWiki Lucene search indexer - index builder from xml database dumps. Trying config file at path /home/muenzebrock/.lsearch.conf 0   [main] INFO  org.wikimedia.lsearch.util.UnicodeDecomposer  - Loaded unicode decomposer 8   [main] INFO  org.wikimedia.lsearch.ranks.RankBuilder  - First pass, getting a list of valid articles... 324 pages (1.213,483/sec), 324 revs (1.213,483/sec) 316 [main] INFO  org.wikimedia.lsearch.ranks.RankBuilder  - Second pass, calculating article links... 375 [main] INFO  org.wikimedia.lsearch.util.Localization  - Reading localization for En 377  [main] WARN  org.wikimedia.lsearch.util.Localization  - Error processing message file at file:///srv/www/htdocs/php/mediawiki1.11.0rc1/languages/messages/MessagesEn.php 378 [main] WARN  org.wikimedia.lsearch.util.Localization  - Could not load localization for En 324 pages (2.677,686/sec), 324 revs (2.677,686/sec) 465 [main] INFO  org.wikimedia.lsearch.importer.Importer  - Third pass, indexing articles... Exception in thread "main" java.lang.NullPointerException at java.io.File. (File.java:194) at org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:117) at org.apache.lucene.index.IndexWriter. (IndexWriter.java:204) at org.wikimedia.lsearch.importer.SimpleIndexWriter.openIndex(SimpleIndexWriter.java:67) at org.wikimedia.lsearch.importer.SimpleIndexWriter. (SimpleIndexWriter.java:49) at org.wikimedia.lsearch.importer.DumpImporter. (DumpImporter.java:39) at org.wikimedia.lsearch.importer.Importer.main(Importer.java:128)

I played with the Indexes.path-variable in lsearch.conf, but with no luck. --19 September 2007
 * Do you have permissions to write to directory you set as Indexes.path in /home/muenzebrock/.lsearch.conf ? --Rainman 14:13, 19 September 2007 (UTC)
 * Yes. For debugging, I set it to be world-writable. --205.175.225.24 14:20, 19 September 2007 (UTC)
 * You can do imports only at the indexer, so, did you set your lsearch-global.conf right? i.e. assign the index wikidb_TEST to your host mypc (not localhost or 127.0.0.1) in the Index section? --Rainman 14:47, 19 September 2007 (UTC)
 * This is the part of lsearch-global.conf that I touched (i.e. the rest is similar to the file in svn):

[Database] wikidb_TEST : (single) (language,de) (warmup,100) [Search-Group] oblak : wikidb_TEST [Index] oblak : wikidb_TEST
 * 1) databases can be writen as {url}, where url contains list of dbs
 * 1) wikilucene : (single) (language,en) (warmup,0)
 * 2) wikidev : (single) (language,sr)
 * 3) wikilucene : (nssplit,3) (nspart1,[0]) (nspart2,[4,5,12,13]), (nspart3,[])
 * 4) wikilucene : (language,en) (warmup,10)
 * 1) Search groups
 * 2) Index parts of a split index are always taken from the node's group
 * 3) host : db1.part db2.part
 * 4) Mulitple hosts can search multiple dbs (N-N mapping)
 * 1) oblak : wikilucene wikidev
 * 1) Index nodes
 * 2) host: db1.part db2.part
 * 3) Each db.part can be indexed by only one host
 * 1) oblak: wikilucene wikidev


 * Now I seem to recognize my failure: I should have replaced oblak with my hostname, right? I was wondering what this should mean anyway ;-) Thanks for your quick help on this. --205.175.225.24 15:00, 19 September 2007 (UTC)

This error can also occur if you follow the installation instructions exactly and use a FQDN in the [Search-Group] and [Index] sections. Use only the hostname part of the $HOSTNAME, omitting the domain name part, if it is included. -- 216.143.51.66 15:54, 7 February 2008 (UTC)


 * Hmm, I got this same error and fixed it by adding my complete hostname and domain to the various config files and the hostname file. In my case   didn't work but   did.  --Cneubauer


 * ANOTHER EXPERIENCE: the installation-manual mentioned that you use the envirementvariable $HOSTNAME in the global.conf - for SuSE i can say that you need to use the complete hostname standeing in /etc/HOSTNAME ! --195.216.198.100 10:58, 12 June 2008 (UTC)

Hi I've got an similar error :

root@rainbow:/usr/local/search/ls2 # java -cp LuceneSearch.jar org.wikimedia.lsearch.importer.Importer -s /srv/www/htdocs/mwiki/wikidb.xml wikidb MediaWiki Lucene search indexer - index builder from xml database dumps. Trying config file at path /root/.lsearch.conf Trying config file at path /usr/local/search/ls2/lsearch.conf 1   [main] INFO  org.wikimedia.lsearch.util.UnicodeDecomposer  - Loaded unicode decomposer 15  [main] INFO  org.wikimedia.lsearch.util.Localization  - Reading localization for De 507  [main] INFO  org.wikimedia.lsearch.ranks.RankBuilder  - First pass, getting a list of valid articles... 114 pages (118.626/sec), 114 revs (118.626/sec) 1666 [main] INFO org.wikimedia.lsearch.ranks.RankBuilder  - Second pass, calculating article links... 114 pages (428.571/sec), 114 revs (428.571/sec) 2044 [main] INFO org.wikimedia.lsearch.importer.Importer  - Third pass, indexing articles... Exception in thread "main" java.lang.NullPointerException at java.io.File. (File.java:194) at org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:117) at org.apache.lucene.index.IndexWriter. (IndexWriter.java:204) at org.wikimedia.lsearch.importer.SimpleIndexWriter.openIndex(SimpleIndexWriter.java:67) at org.wikimedia.lsearch.importer.SimpleIndexWriter. (SimpleIndexWriter.java:49) at org.wikimedia.lsearch.importer.DumpImporter. (DumpImporter.java:39) at org.wikimedia.lsearch.importer.Importer.main(Importer.java:128)

My configs :

root@rainbow:/usr/local/search/ls2 # cat lsearch-global.conf | grep ^[^#] [Database] wikidb : (single) (language,de) (warmup,10) [Search-Group] rainbow : wikidb [Index] rainbow : wikidb [Index-Path] : /usr/local/search/indexes [OAI] wikidd : http://rainbow.local.com/mwiki/index.php [Properties] Database.suffix=itowiki_ ExactCase.suffix=itowiki_ [Namespace-Prefix] all : [0] : 0 [1] : 1 [2] : 2 [3] : 3 [4] : 4 [5] : 5 [6] : 6 [7] : 7 [8] : 8 [9] : 9 [10] : 10 [11] : 11 [12] : 12 [13] : 13 [14] : 14 [15] : 15

and the other config :

root@rainbow:/usr/local/search/ls2 # cat lsearch.conf | grep ^[^#] MWConfig.global=file:///usr/local/search/ls2/lsearch-global.conf MWConfig.lib=/usr/local/search/ls2/lib Indexes.path=/usr/local/search/indexes Search.updateinterval=1 Search.updatedelay=0 Search.checkinterval=30 Index.snapshotinterval=5 Index.maxqueuecount=5000 Index.maxqueuetimeout=12 Storage.master=rainbow Storage.username=root Storage.password=mysecret Storage.adminuser=root Storage.adminpass=mysecret Storage.useSeparateDBs=false Storage.defaultDB=lsearch Storage.lib=/usr/local/search/ls2/sql SearcherPool.size=3 Localization.url=file:///srv/www/htdocs/mwiki/languages/messages Logging.logconfig=/usr/local/search/ls2/lsearch.log4j Logging.debug=false

and finally : root@rainbow:/usr/local/search/ls2 # cat lsearch.log4j | grep ^[^#] log4j.rootLogger=INFO, A1 log4j.appender.A1=org.apache.log4j.ConsoleAppender log4j.appender.A1.layout=org.apache.log4j.PatternLayout log4j.appender.A1.layout.ConversionPattern=%-4r [%t] %-5p %c %x - %m%n Kind regards Stefan 17 January 2008

Multiple wikis in one database
Is there a way to index and search multiple wikis that are contained within one database? I've tried a few things in the configuration and command lines, and I've not figured out a way to do this.

Thanks! --Laduncan 16:31, 8 October 2007 (UTC)


 * If you want to get search results combined from multiple wikis, that is still not supported (as of v2.0). Next minor release might show some improvements in that direction.. --Rainman 16:55, 8 October 2007 (UTC)


 * Thanks for the quick info! --Laduncan 20:31, 8 October 2007 (UTC)


 * I have Lucene search running on an installation which contains 3 wikis sharing the same database, using pefixes. The search results give the wrong count; when I search within a wiki, it seems to actually search in all 3 wikis, but shows only the hits fitting in the current wiki. That way, I get for example only 6 hits listed on the result page, because the other invisible hits were in the other two wikis. How do I get the first 20 hits for the current wiki listed (I do not want to see the hits into the other wikis, and I do not want them counted). --83.202.49.58 10:06, 16 March 2009 (UTC)

Requiring less exact matches
It appears that the search in the fulltext is doing an implicit AND -- that is, all the words need to be in the document for it to appear in the results list.

For what I'm doing, I'd like to have the default be "OR," and let the ranking algorithm hopefully bring the most relevant content to the top. (The queries my users will be using will be long and complex, and will generally match nothing with "AND.")

I can manually search with OR between the words, but I wanted to know if I could change the configuration of the extension to have it do that by default.

Thanks in advance, Dbkayanda 00:57, 15 October 2007 (UTC)


 * Personally, I think ranking is not smart enough to give best results if the default operator is OR, but you can change it with hacking the code a bit. In WikiQueryParser.java, on line 112 there is:, replace the last part with  . --Rainman 14:41, 15 October 2007 (UTC)


 * Worked like a charm. Thanks, as always, for your help. --16 October 2007

Index of attachments (doc, pdf, xls)
Hi Robert,

I found the cool mediawiki extension for the lucene search engine. Is there a possibility to index all attachments like PDF, HTML, DOC and XLS with this addon?

I found some informations in the lucene faq - http://wiki.apache.org/lucene-java/LuceneFAQ#head-37523379241b88fd90bcd1de81b74e7ec8843f72 - how to index attachments. Is it able to use such indexed files with the mediawiki extension you wrote?

Thanks a lot! Alex--14:51, 22 October 2007 (UTC)


 * Yes, there are libraries that can parse pdf, doc,.. that work with lucene, but I haven't got around to include them in the extension yet, and I probably won't have time in next few months ... If you really need it, you can try to hack it yourself, you would probably want Importer to fetch the media file (maybe with ?action=raw), and then construct an Article object whose contents would be the parsed text and pass it to the indexer. --Rainman 21:08, 22 October 2007 (UTC)


 * Were all namespaces indexed in the current LuceneSearch extension? Also the namespace image that contains all file-data? Does the extension then only index the recent file description? Where I have to start in the LuceneSearch_body.php ?
 * Thanks! Alex --12:06, 23 October 2007 (UTC)


 * All articles from the database get indexed. LuceneSearch_body.php is just an interface for the java daemon that does all the work. So, you'll need to modify the java code. What currently gets indexed is just the image descriptions, the media files themself are stored outside the database, in the file system... --Rainman 10:20, 23 October 2007 (UTC)

Binary version of LuceneSearch.jar?
Hello,

Where can I get a binary version of LuceneSearch.jar? I don't have ant on the server this is being installed on, and I tried building LuceneSearch.jar on my desktop computer using ant, but it failed with errors about missing MediaWiki Java classes. I'd prefer a binary, if possible, so I can get this up and running ASAP.

Ben --8 December 2007

Soundex searches?
Will this extension support Soundex like searches for spelling mistakes etc..? --12 December 2007
 * Probably in the next major release (hopefully end of january). --Rainman 14:11, 13 December 2007 (UTC)

Special page search complains about "problem with wiki search"
After following, as close as possible instructions. Plugin renders special page as such:

[ search_string on text area     ] [   dropdown_list ] [search_button]  There was a problem with the wiki search. This is probably temporary; try again in a few moments, or you can search the wiki through an external search service:

Content in square brackets are just my attempt to recreate the gui.

Is there something missing in the way it is using the host to do the search? --Cartoro 00:00, 20 December 2007 (UTC)
 * Check your log files for more info about what went wrong ... --Rainman 18:31, 20 December 2007 (UTC)
 * Yes, I wanted to see that... but I couldn't find any log files.... sorry, silly question, but where are they? Could this be a problem with accessing the actual DB? --Cartoro 22:11, 20 December 2007 (UTC)
 * Extension:LuceneSearch --Rainman 22:17, 20 December 2007 (UTC)

Port 8123 already in use.
Hi again,

I'm still trying to make it run. I've found that most of the problems are due to an ill configuration of my part. Java error messages at first are not very helpful, but that is just the case with any new functionality one comes across.

When I tried to run. It came up with this.

java.rmi.ConnectIOException: error during JRMP connection establishment; nested exception is: java.net.SocketTimeoutException: Read timed out at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:286) at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:184) at sun.rmi.server.UnicastRef.newCall(UnicastRef.java:322) at sun.rmi.registry.RegistryImpl_Stub.rebind(Unknown Source) at org.wikimedia.lsearch.interoperability.RMIServer.register(RMIServer.java:24) at org.wikimedia.lsearch.interoperability.RMIServer.bindRMIObjects(RMIServer.java:60) at org.wikimedia.lsearch.config.StartupManager.main(StartupManager.java:52) Caused by: java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) at java.io.DataInputStream.readByte(DataInputStream.java:248) at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:228)

further down, it came up with this message: 120488 [Thread-1] FATAL org.wikimedia.lsearch.frontend.HTTPIndexServer - Dying: bind error: Address already in use

Has anybody seen this? I still think is a trivial error from my part, but I still cannot find the cause of the error. --Cartoro 00:00, 20 December 2007 (UTC)
 * The above is RMI complaining it cannot register the networked objects. That should be harmless unless you're using distributed searching. About the below, seems to be what it says: some other app is using the ports (the searcher is by default on 8123, and indexer on 8321) - make sure you don't have any old version of lsearchd still running. Use command: nmap localhost to find out which ports are taken. If those default ports are taken by other apps, change them in lsearch.conf, and in LocalSettings.php ... --Rainman 13:36, 20 December 2007 (UTC)

Error when running
I am getting the following error when running :

53664-jpbaello:/srv/www/htdocs/search/ls2 # ./lsearchd RMI registry started. Trying config file at path /root/.lsearch.conf Trying config file at path /srv/www/htdocs/search/ls2/lsearch.conf Error resolving local hostname. Make sure that hostname is setup correctly. java.net.UnknownHostException: 53664-jpbaello: 53664-jpbaello at java.net.InetAddress.getLocalHost(InetAddress.java:1346) at org.wikimedia.lsearch.config.GlobalConfiguration.determineInetAddress(GlobalConfiguration.java:124) at org.wikimedia.lsearch.config.GlobalConfiguration. (GlobalConfiguration.java:102) at org.wikimedia.lsearch.config.GlobalConfiguration.getInstance(GlobalConfiguration.java:112) at org.wikimedia.lsearch.config.Configuration. (Configuration.java:105) at org.wikimedia.lsearch.config.Configuration.open(Configuration.java:68) at org.wikimedia.lsearch.config.StartupManager.main(StartupManager.java:39) Exception in thread "main" java.lang.NullPointerException at java.util.Hashtable.get(Hashtable.java:336) at org.wikimedia.lsearch.config.GlobalConfiguration.makeIndexIdPool(GlobalConfiguration.java:468) at org.wikimedia.lsearch.config.GlobalConfiguration.read(GlobalConfiguration.java:413) at org.wikimedia.lsearch.config.GlobalConfiguration.readFromURL(GlobalConfiguration.java:247) at org.wikimedia.lsearch.config.Configuration. (Configuration.java:116) at org.wikimedia.lsearch.config.Configuration.open(Configuration.java:68) at org.wikimedia.lsearch.config.StartupManager.main(StartupManager.java:39)

And then it goes back to the command prompt I believe this is an error because I can not get it to create the index. A little new to this though and not sure if I am doing things right. Also, Sorry if I am not putting this in right either! Any ideas? --Think411 22 December 2007


 * As the error message suggests, your hostname seems to be wrong. Is "53664-jpbaello" really your hostname? Use "echo $HOSTNAME" to verify this. Check if this hostname correctly maps to your IP in /etc/hosts. Or, try using your IP instead of your hostname. --Rainman 12:21, 22 December 2007 (UTC)

=2008=

Compiling to create lucenesearch.jar failed
I am trying to install the lucene engine for our wiki but the compile of lucene fails.

Ant gives back a lot of error messages during the compilation, errors like:

[javac] /root/lucene/lucene-search/org/wikimedia/lsearch/SearchState.java:331: cannot find symbol [javac] symbol : class Hits [javac] location: class org.wikimedia.lsearch.SearchState [javac]            Hits hits = searcher.search(new TermQuery( [javac]                ^ [javac] /root/lucene/lucene-search/org/wikimedia/lsearch/SearchState.java:331: cannot find symbol [javac] symbol : class TermQuery [javac] location: class org.wikimedia.lsearch.SearchState [javac]            Hits hits = searcher.search(new TermQuery( [javac]                                                ^ [javac] /root/lucene/lucene-search/org/wikimedia/lsearch/SearchState.java:332: cannot find symbol [javac] symbol : class Term [javac] location: class org.wikimedia.lsearch.SearchState [javac]                            new Term("key", key))); [javac]                                    ^ [javac] Note: Some input files use unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. [javac] 85 errors

Can you help me to solve these error messages or provide a binary?

Many thanks in advance. --Phaidros 12 January 2008


 * Are you compiling with a Sun Java 1.5+ compiler? If so, can you provide the beginning of the error log? --Rainman 00:55, 13 January 2008 (UTC)

Yes, I am using the opensuse 10.3 distribution and javac 1.5.0_13. I hope that the error messages I provide below are enough, sorry for my low experience in java build processes.

I´ll provide the first part and the last messages here: Apache Ant version 1.7.0 compiled on September 22 2007 Buildfile: build.xml Detected Java version: 1.5 in: /usr/lib/jvm/java-1.5.0-sun-1.5.0_update13-sr2/jre Detected OS: Linux parsing buildfile /root/lucene/lucene-search/build.xml with URI = file:/root/lucene/lucene-search/build.xml Project base dir set to: /root/lucene/lucene-search [antlib:org.apache.tools.ant] Could not load definitions from resource org/apache/tools/ant/antlib.xml. It could not be found. [property] Loading /root/lucene-search.build.properties [property] Unable to find property file: /root/lucene-search.build.properties [property] Loading /root/build.properties [property] Unable to find property file: /root/build.properties [property] Loading /root/lucene/lucene-search/build.properties [property] Unable to find property file: /root/lucene/lucene-search/build.properties Property "current.year" has not been set Build sequence for target(s) `default' is [init, compile-core, compile, default] Complete build sequence is [init, compile-core, compile, default, package-tgz-src, jar-core, javadocs, package, package-zip, package-tgz, package-all-binary, dist, package-zip-src, package-all-src, dist-src, dist-all, jar, jar-src, clean, ]

init: [mkdir] Skipping /root/lucene/lucene-search/bin because it already exists. [mkdir] Skipping /root/lucene/lucene-search/dist because it already exists.

compile-core: [mkdir] Skipping /root/lucene/lucene-search/bin because it already exists. [javac] wikimedia/lsearch/Article.java added as wikimedia/lsearch/Article.class doesn't exist. [javac] wikimedia/lsearch/ArticleList.java added as wikimedia/lsearch/ArticleList.class doesn't exist. [javac] wikimedia/lsearch/Configuration.java added as wikimedia/lsearch/Configuration.class doesn't exist. [javac] wikimedia/lsearch/DatabaseConnection.java added as wikimedia/lsearch/DatabaseConnection.class doesn't exist. [javac] wikimedia/lsearch/EnglishAnalyzer.java added as wikimedia/lsearch/EnglishAnalyzer.class doesn't exist. [javac] wikimedia/lsearch/EsperantoAnalyzer.java added as wikimedia/lsearch/EsperantoAnalyzer.class doesn't exist. [javac] wikimedia/lsearch/EsperantoStemFilter.java added as wikimedia/lsearch/EsperantoStemFilter.class doesn't exist. [javac] wikimedia/lsearch/MWDaemon.java added as wikimedia/lsearch/MWDaemon.class doesn't exist. [javac] wikimedia/lsearch/MWSearch.java added as wikimedia/lsearch/MWSearch.class doesn't exist. [javac] wikimedia/lsearch/NamespaceFilter.java added as wikimedia/lsearch/NamespaceFilter.class doesn't exist. [javac] wikimedia/lsearch/QueryStringMap.java added as wikimedia/lsearch/QueryStringMap.class doesn't exist. [javac] wikimedia/lsearch/SearchClientReader.java added as wikimedia/lsearch/SearchClientReader.class doesn't exist. [javac] wikimedia/lsearch/SearchDbException.java added as wikimedia/lsearch/SearchDbException.class doesn't exist. [javac] wikimedia/lsearch/SearchState.java added as wikimedia/lsearch/SearchState.class doesn't exist. [javac] wikimedia/lsearch/Title.java added as wikimedia/lsearch/Title.class doesn't exist. [javac] wikimedia/lsearch/TitlePrefixMatcher.java added as wikimedia/lsearch/TitlePrefixMatcher.class doesn't exist. [javac] Compiling 16 source files to /root/lucene/lucene-search/bin [javac] Using modern compiler dropping /root/lucene/lucene-search/bin/bin from path as it doesn't exist [javac] Compilation arguments: [javac] '-deprecation' [javac] '-d' [javac] '/root/lucene/lucene-search/bin' [javac] '-classpath' [javac] '/root/lucene/lucene-search/bin:/usr/share/java/ant.jar:/usr/share/java/ant-launcher.jar:/usr/share/java/jaxp_parser_impl.jar:/usr/share/java/xml-commons-apis.jar:/usr/share/java/ant/ant-antlr.jar:/usr/share/java/bcel.jar:/usr/share/java/ant/ant-apache-bcel.jar:/usr/share/java/bsf.jar:/usr/share/java/ant/ant-apache-bsf.jar:/usr/share/java/log4j.jar:/usr/share/java/ant/ant-apache-log4j.jar:/usr/share/java/oro.jar:/usr/share/java/ant/ant-apache-oro.jar:/usr/share/java/regexp.jar:/usr/share/java/ant/ant-apache-regexp.jar:/usr/share/java/xml-commons-resolver.jar:/usr/share/java/ant/ant-apache-resolver.jar:/usr/share/java/jakarta-commons-logging.jar:/usr/share/java/ant/ant-commons-logging.jar:/usr/share/java/javamail.jar:/usr/share/java/jaf.jar:/usr/share/java/ant/ant-javamail.jar:/usr/share/java/jdepend.jar:/usr/share/java/ant/ant-jdepend.jar:/usr/share/java/ant/ant-jmf.jar:/usr/share/java/junit.jar:/usr/share/java/ant/ant-junit.jar:/usr/share/java/ant/ant-nodeps.jar:/usr/lib/jvm/java/lib/tools.jar:/usr/share/ant/lib/ant-apache-resolver-1.7.0.jar:/usr/share/ant/lib/ant-apache-bsf.jar:/usr/share/ant/lib/ant-nodeps.jar:/usr/share/ant/lib/ant-commons-logging.jar:/usr/share/ant/lib/ant-junit.jar:/usr/share/ant/lib/ant-javamail-1.7.0.jar:/usr/share/ant/lib/ant-junit-1.7.0.jar:/usr/share/ant/lib/ant-launcher.jar:/usr/share/ant/lib/ant-apache-log4j.jar:/usr/share/ant/lib/ant-apache-oro-1.7.0.jar:/usr/share/ant/lib/ant-javamail.jar:/usr/share/ant/lib/ant-apache-log4j-1.7.0.jar:/usr/share/ant/lib/ant-apache-bcel-1.7.0.jar:/usr/share/ant/lib/ant-nodeps-1.7.0.jar:/usr/share/ant/lib/ant-jmf.jar:/usr/share/ant/lib/ant-jmf-1.7.0.jar:/usr/share/ant/lib/ant-commons-logging-1.7.0.jar:/usr/share/ant/lib/ant-jdepend-1.7.0.jar:/usr/share/ant/lib/ant-1.7.0.jar:/usr/share/ant/lib/ant-apache-regexp.jar:/usr/share/ant/lib/ant-apache-oro.jar:/usr/share/ant/lib/ant-apache-resolver.jar:/usr/share/ant/lib/ant-jdepend.jar:/usr/share/ant/lib/ant-antlr.jar:/usr/share/ant/lib/ant-antlr-1.7.0.jar:/usr/share/ant/lib/ant-apache-regexp-1.7.0.jar:/usr/share/ant/lib/ant-apache-bcel.jar:/usr/share/ant/lib/ant-apache-bsf-1.7.0.jar:/usr/share/ant/lib/ant-launcher-1.7.0.jar:/usr/share/ant/lib/ant.jar' [javac] '-sourcepath' [javac] '/root/lucene/lucene-search/org' [javac] '-encoding' [javac] 'utf-8' [javac] '-g' [javac] [javac] The ' characters around the executable and arguments are [javac] not part of the command. [javac] Files to be compiled: [javac]    /root/lucene/lucene-search/org/wikimedia/lsearch/Article.java [javac]    /root/lucene/lucene-search/org/wikimedia/lsearch/ArticleList.java [javac]    /root/lucene/lucene-search/org/wikimedia/lsearch/Configuration.java [javac]    /root/lucene/lucene-search/org/wikimedia/lsearch/DatabaseConnection.java [javac]    /root/lucene/lucene-search/org/wikimedia/lsearch/EnglishAnalyzer.java [javac]    /root/lucene/lucene-search/org/wikimedia/lsearch/EsperantoAnalyzer.java [javac]    /root/lucene/lucene-search/org/wikimedia/lsearch/EsperantoStemFilter.java [javac]    /root/lucene/lucene-search/org/wikimedia/lsearch/MWDaemon.java [javac]    /root/lucene/lucene-search/org/wikimedia/lsearch/MWSearch.java [javac]    /root/lucene/lucene-search/org/wikimedia/lsearch/NamespaceFilter.java [javac]    /root/lucene/lucene-search/org/wikimedia/lsearch/QueryStringMap.java [javac]    /root/lucene/lucene-search/org/wikimedia/lsearch/SearchClientReader.java [javac]    /root/lucene/lucene-search/org/wikimedia/lsearch/SearchDbException.java [javac]    /root/lucene/lucene-search/org/wikimedia/lsearch/SearchState.java [javac]    /root/lucene/lucene-search/org/wikimedia/lsearch/Title.java [javac]    /root/lucene/lucene-search/org/wikimedia/lsearch/TitlePrefixMatcher.java [javac] /root/lucene/lucene-search/org/wikimedia/lsearch/EnglishAnalyzer.java:28: package org.apache.lucene.analysis does not exist [javac] import org.apache.lucene.analysis.Analyzer; [javac]                                  ^ [javac] /root/lucene/lucene-search/org/wikimedia/lsearch/EnglishAnalyzer.java:29: package org.apache.lucene.analysis does not exist [javac] import org.apache.lucene.analysis.LowerCaseTokenizer; [javac]                                  ^ [javac] /root/lucene/lucene-search/org/wikimedia/lsearch/EnglishAnalyzer.java:30: package org.apache.lucene.analysis does not exist [javac] import org.apache.lucene.analysis.PorterStemFilter; [javac]                                  ^ [javac] /root/lucene/lucene-search/org/wikimedia/lsearch/EnglishAnalyzer.java:31: package org.apache.lucene.analysis does not exist [javac] import org.apache.lucene.analysis.TokenStream; [javac]                                  ^ [javac] /root/lucene/lucene-search/org/wikimedia/lsearch/EnglishAnalyzer.java:37: cannot find symbol [javac] symbol: class Analyzer [javac] public class EnglishAnalyzer extends Analyzer { [javac]                                     ^ [javac] /root/lucene/lucene-search/org/wikimedia/lsearch/EnglishAnalyzer.java:38: cannot find symbol [javac] symbol : class TokenStream [javac] location: class org.wikimedia.lsearch.EnglishAnalyzer [javac]    public final TokenStream tokenStream(String fieldName, Reader reader) { [javac]                     ^ [javac] /root/lucene/lucene-search/org/wikimedia/lsearch/EsperantoAnalyzer.java:31: package org.apache.lucene.analysis does not exist [javac] import org.apache.lucene.analysis.Analyzer; [javac]                                  ^ [javac] /root/lucene/lucene-search/org/wikimedia/lsearch/EsperantoAnalyzer.java:32: package org.apache.lucene.analysis does not exist [javac] import org.apache.lucene.analysis.LowerCaseTokenizer; [javac]                                  ^ [javac] /root/lucene/lucene-search/org/wikimedia/lsearch/EsperantoAnalyzer.java:33: package org.apache.lucene.analysis does not exist [javac] import org.apache.lucene.analysis.Token; [javac]                                  ^ [javac] /root/lucene/lucene-search/org/wikimedia/lsearch/EsperantoAnalyzer.java:34: package org.apache.lucene.analysis does not exist [javac] import org.apache.lucene.analysis.TokenStream; [javac]                                  ^ [javac] /root/lucene/lucene-search/org/wikimedia/lsearch/EsperantoAnalyzer.java:36: cannot find symbol [javac] symbol: class Analyzer [javac] public class EsperantoAnalyzer extends Analyzer{ [javac]                                       ^ [javac] /root/lucene/lucene-search/org/wikimedia/lsearch/EsperantoAnalyzer.java:37: cannot find symbol [javac] symbol : class TokenStream [javac] location: class org.wikimedia.lsearch.EsperantoAnalyzer [javac]    public final TokenStream tokenStream(String fieldName, Reader reader) { [javac]                     ^ [javac] /root/lucene/lucene-search/org/wikimedia/lsearch/EsperantoStemFilter.java:31: package org.apache.lucene.analysis does not exist [javac] import org.apache.lucene.analysis.Token; [javac]                                  ^ [javac] /root/lucene/lucene-search/org/wikimedia/lsearch/EsperantoStemFilter.java:32: package org.apache.lucene.analysis does not exist [javac] import org.apache.lucene.analysis.TokenStream; [javac]                                  ^ [javac] /root/lucene/lucene-search/org/wikimedia/lsearch/EsperantoStemFilter.java:33: package org.apache.lucene.analysis does not exist [javac] import org.apache.lucene.analysis.TokenFilter; [javac]                                  ^ [javac] /root/lucene/lucene-search/org/wikimedia/lsearch/EsperantoStemFilter.java:36: cannot find symbol [javac] symbol: class TokenFilter [javac] public class EsperantoStemFilter extends TokenFilter { [javac]                                         ^ [javac] /root/lucene/lucene-search/org/wikimedia/lsearch/EsperantoStemFilter.java:37: cannot find symbol [javac] symbol : class TokenStream [javac] location: class org.wikimedia.lsearch.EsperantoStemFilter [javac]    public EsperantoStemFilter(TokenStream tokenizer) {

--- snipp --- cutted some lines here --- snipp ---

[javac] /root/lucene/lucene-search/org/wikimedia/lsearch/SearchState.java:332: cannot find symbol [javac] symbol : class Term [javac] location: class org.wikimedia.lsearch.SearchState [javac]                            new Term("key", key))); [javac]                                    ^ [javac] Note: Some input files use unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. [javac] 85 errors

BUILD FAILED /root/lucene/lucene-search/build.xml:55: Compile failed; see the compiler error output for details. at org.apache.tools.ant.taskdefs.Javac.compile(Javac.java:999) at org.apache.tools.ant.taskdefs.Javac.execute(Javac.java:820) at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:288) at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:105) at org.apache.tools.ant.Task.perform(Task.java:348) at org.apache.tools.ant.Target.execute(Target.java:357) at org.apache.tools.ant.Target.performTasks(Target.java:385) at org.apache.tools.ant.Project.executeSortedTargets(Project.java:1329) at org.apache.tools.ant.Project.executeTarget(Project.java:1298) at org.apache.tools.ant.helper.DefaultExecutor.executeTargets(DefaultExecutor.java:41) at org.apache.tools.ant.Project.executeTargets(Project.java:1181) at org.apache.tools.ant.Main.runBuild(Main.java:698) at org.apache.tools.ant.Main.startAnt(Main.java:199) at org.apache.tools.ant.launch.Launcher.run(Launcher.java:257) at org.apache.tools.ant.launch.Launcher.main(Launcher.java:104) --Phaidros 24 January 2008
 * Looks like your ant is broken and cannot find the relevant libraries. I've compiled the package and put it here.--Rainman 11:01, 26 January 2008 (UTC)

Cannot bind RMIMessenger exception: non-JRMP server at remote endpoint
Hello everyone,

I'm quite new in Lucene stuff and I have a problem. I can't get Lucene Java working on one of my server. I've setup it on another server for Mediawiki and it works fine.

It's a GNU/Linux Ubuntu Edgy i686 with kernel 2.6.17-11-server running Apache 2.0 with PHP5 for Mediawiki, some others stuffs like Tomcat & Jboss. Got Java installed : j2re1.4, j2sdk1.4, java-common, libgcj-common, sun-java5-bin, sun-java5-demo , sun-java5-jdk and sun-java5-jre

In the case of the first server (fresh Ubuntu Gutsy 64bits with almost anything running) it worked fine, I can use Lucene to search into my Wiki. In the case of my second server, here is the error when I would like to start the engine :

 www-data@myserver:/usr/local/search/ls2$ ./lsearchd . Trying config file at path /var/www/.lsearch.conf Trying config file at path /usr/local/search/ls2/lsearch.conf 0 [main] INFO org.wikimedia.lsearch.util.UnicodeDecomposer  - Loaded unicode decomposer java.rmi.ConnectIOException: non-JRMP server at remote endpoint
 * at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:217)
 * at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:171)
 * at sun.rmi.server.UnicastRef.newCall(UnicastRef.java:306)
 * at sun.rmi.registry.RegistryImpl_Stub.rebind(Unknown Source)
 * at org.wikimedia.lsearch.interoperability.RMIServer.register(RMIServer.java:24)
 * at org.wikimedia.lsearch.interoperability.RMIServer.bindRMIObjects(RMIServer.java:60)
 * at org.wikimedia.lsearch.config.StartupManager.main(StartupManager.java:52)

76  [main] WARN  org.wikimedia.lsearch.interoperability.RMIServer  - Cannot bind RMIMessenger exception:non-JRMP server at remote endpoint 

But NOTHING use the port 8321. I've tried to use another port, it's the same problem. Any ideas how to solve this problem please? Here is my contact :

Thanks, LMJ 15 January 2008
 * First verify that jboss, tomcat and lsearchd all run under sun-java5-bin (and not j2re1.4). If this is the case then maybe the RMI registry is colliding with jboss (so try stopping it if you can). If this appears to be the case, then you can either configure jboss not to use the port 1099, or edit RMIRegistry.java to use a different port (replace 1099 there with your port, and provide the port as param to getRegistry calls in RMIRegistry.java and RMIMessengerClient.java). --Rainman 15:05, 15 January 2008 (UTC)


 * Indeed Rainman, thanks for your help! look at this :
 * # lsof +i :1099

COMMAND  PID    USER   FD   TYPE    DEVICE SIZE NODE NAME

java   20832 syncron    7u  IPv4 149877937       TCP *:rmiregistry (LISTEN)

 The port is used by Jboss rmiregistry :-/ I need some extra help to change that port. Can we exchange emails about it Rainman? I tried to contact you via your personal page but I just read English & French ;) --16 January 2008


 * I've edited /usr/local/jboss-3.2.7/server/default/conf/jboss-service.xml and change to port to 10999. It seems to work better ;) Got another problem but it seems to be lsearch.conf related issue. --22 January 2008

Daemon status
On the German Wikipedia, I am often irritated because changes in content are not reflected immediately by the full text search and – at the moment – I cannot see whether and when the changes have already or will be processed by the daemon. Therefore, I would like to know:


 * whether the daemon processes the changes chronologically so one could be certain that if one's changes were made at time T and the daemon has processed all changes up to T + 1, they will be reflected in the full text search, and
 * whether there is any way to obtain the daemon status (all changes up to T, n articles in queue, etc.) from a current or future Wikipedia installation.

Thanks, Tim Landscheidt 19:52, 7 February 2008 (UTC)


 * The index is updated around 5 am GMT every day on wikimedia projects (when nothing goes wrong which is most of the time). About 1) - yes, it processes the changes chronologically. 2) - this interface is available but only for system admins, for everybody else - just wait till tomorrow for changes to be applied. --Rainman 10:07, 8 February 2008 (UTC)


 * Hmmm. If I search for "Lassithi" (note the double "s") now, I see that changes in de:Panagia i Kera (8 days ago), de:Kritsa (7 days ago), de:Ierapetra (10 days ago), de:Kera Kardiotissa (11 days ago), de:Griechische Toponyme (11 days ago), de:Venezianische Kolonien (9 days ago) and de:Sitia (11 days ago) have not been processed. Is that what you mean by "when nothing goes wrong"? :-) Would it be technically feasible to include the last time a change was successfully worked into the index in the result page, i. e. "All changes until T considered."? Tim Landscheidt 17:24, 8 February 2008 (UTC)
 * Yes, this seems to be a case of "if nothing is broken" :) one of the dewiki search servers (srv21) is broken and stopped updating its index and seems to have a broken logrotate and possibly some other things. We'll fix it when a sysadmin become available. Whenever you see changes not going in for more than a couple of days you should report it. --Rainman 18:07, 8 February 2008 (UTC)
 * Ok, we tracked this down to a hard drive failure on srv21, now one just needs to wait for cache to expire (~12h) and you should get fresh results - thanks for the report! --Rainman 18:57, 8 February 2008 (UTC)
 * Thanks for the information :-). What would be the proper place to report such things in the future? Tim Landscheidt 21:32, 8 February 2008 (UTC)
 * Technical issues are usually reported via IRC channel #wikimedia-tech where all of the sysadmins are. If there's no-one online to fix the problem then you could submit a bug. You could also send me an e-mail via this wiki or leave a message on my talk page, since I'm more-or-less in change of maintaining the search subsystem. --Rainman 21:44, 8 February 2008 (UTC)
 * Okay, I'll keep that in mind. Thanks again, Tim Landscheidt 22:53, 8 February 2008 (UTC)

Query String Syntax
Please document the subset of Lucene query string syntax that has been implemented. -- 216.143.51.66 22:52, 8 February 2008 (UTC)

Error running the Daemon
RMI registry started. Trying config file at path /root/.lsearch.conf Trying config file at path /var/www/vhosts/kidneycancerknol.com/httpdocs/Lucene/ls2-bin/lsearch.conf 0   [main] INFO  org.wikimedia.lsearch.util.Localization  - Reading localization for En 530  [main] INFO  org.wikimedia.lsearch.util.UnicodeDecomposer  - Loaded unicode decomposer 602 [main] INFO  org.wikimedia.lsearch.interoperability.RMIServer  - RMIMessenger bound 619 [main] ERROR org.wikimedia.lsearch.search.SearcherCache  - I/O Error opening index at path /var/www/vhosts/kidneycancerknol.com/httpdocs/Lucene/ls2-bin/indexes/search/kck_wiki : /var/www/vhosts/kidneycancerknol.com/httpdocs/Lucene/ls2-bin/indexes/search/kck_wiki/segments (No such file or directory) 621 [main] ERROR org.wikimedia.lsearch.search.SearcherCache  - I/O Error opening index at path /var/www/vhosts/kidneycancerknol.com/httpdocs/Lucene/ls2-bin/indexes/search/kck_wiki : /var/www/vhosts/kidneycancerknol.com/httpdocs/Lucene/ls2-bin/indexes/search/kck_wiki/segments (No such file or directory) 621 [main] WARN  org.wikimedia.lsearch.search.SearcherCache  - I/O error warming index for kck_wiki 621 [Thread-3] INFO  org.wikimedia.lsearch.frontend.SearchServer  - Binding server to port 8123 623 [Thread-2] INFO  org.wikimedia.lsearch.frontend.HTTPIndexServer  - Started server at port 8321
 * 1)  . lsearchd

I'm getting this error saying no file or directory. The directory exists, owever I don't know where the "segments" file comes from

I ran this to create the indexes

php maintenance/dumpBackup.php --current --quiet > wikidb.xml && java -cp LuceneSearch.jar org.wikimedia.lsearch.importer.Importer -s wikidb.xml kck_wiki

The wikidb.xml file exists in the httpdocs directory

...and then I started the deamon

Am I missing a trick?

Thanks

Andy Andy.thomas 19 February 2008


 * And what is the output from the importer? It should give you a success messages that it created the indexes and successfully made a snapshot. --Rainman 01:30, 20 February 2008 (UTC)

I'm most likely doing something dumb (being a bit of a newbie) but This is what I get when I just run the java -cp LuceneSearch.jar org.wikimedia.lsearch.importer.Importer -s wikidb.xml kck_wiki

Exception in thread "main" java.lang.NoClassDefFoundError: org/wikimedia/lsearch/importer/Importer

--Andy 17:00, 20 February 2008 (GMT)


 * The java command you're running assumes that LuceneSearch.jar is in your current directory, the full command would be java -cp /full/path/to/LuceneSearch.jar org.wikimedia.lsearch.importer.Importer -s wikidb.xml kck_wiki
 * --Rainman 18:04, 20 February 2008 (UTC)

I'm getting further thanks that helped. Sorry - I'm being dumb I know and I apologise for asking you to hand hold me in this way but I now get this

rying config file at path /root/.lsearch.conf Trying config file at path /var/www/vhosts/kidneycancerknol.com/httpdocs/lsearch.conf 0   [main] INFO  org.wikimedia.lsearch.util.UnicodeDecomposer  - Loaded unicode decomposer 3   [main] INFO  org.wikimedia.lsearch.util.Localization  - Reading localization for En 60   [main] INFO  org.wikimedia.lsearch.ranks.RankBuilder  - First pass, getting a list of valid articles... 175 [main] FATAL org.wikimedia.lsearch.ranks.RankBuilder  - I/O error reading dump while getting titles from wikidb.xml 175 [main] INFO  org.wikimedia.lsearch.ranks.RankBuilder  - Second pass, calculating article links... 179 [main] FATAL org.wikimedia.lsearch.ranks.RankBuilder  - I/O error reading dump while calculating ranks for from wikidb.xml Exception in thread "main" java.lang.NullPointerException at org.wikimedia.lsearch.importer.Importer.main(Importer.java:114)

Do I need to set the OIA settings in the global config? I've just kept them s the default. --Andy 18:30, 20 February 2008 (GMT)


 * No, you don't need oai.. Seems to me something is wrong with the xml file .. sure would be helpful if exception weren't suppressed :\ unfortunately cannot help you much more than that.. is wikidb.xml a valid xml file? did you give full path to it? --Rainman 01:00, 21 February 2008 (UTC)

Exception in thread "main" java.lang.UnsupportedClassVersionError
Hi I use following configuration:


 * MediaWiki: 1.11.0
 * PHP: 5.2.5 (apache2handler)
 * MySQL: 5.0.51

If I call this:

java -cp /usr/local/search/ls2/ls2-bin/LuceneSearch.jar org.wikimedia.lsearch.importer.Importer -s basiswikidb.xml basiswiki

I get the error:

Exception in thread "main" java.lang.UnsupportedClassVersionError: org/wikimedia/lsearch/importer/Importer (Unsupported major.minor version 49.0) at java.lang.ClassLoader.defineClass0(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:539) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:123) at java.net.URLClassLoader.defineClass(URLClassLoader.java:251) at java.net.URLClassLoader.access$100(URLClassLoader.java:55) at java.net.URLClassLoader$1.run(URLClassLoader.java:194) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:187) at java.lang.ClassLoader.loadClass(ClassLoader.java:289) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:274) at java.lang.ClassLoader.loadClass(ClassLoader.java:235) at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:302)

My Configuration


 * all Files are in /usr/local/search/ls2/
 * MWConfig.global=file:///usr/local/search/ls2/lsearch-global.conf
 * MWConfig.lib=/usr/local/search/ls2/lib
 * Indexes.path=/usr/local/search/indexes
 * Localization.url=file:///opt/lampp/htdocs/basiswiki/languages/messages
 * Logging.logconfig=/usr/local/search/ls2/lsearch.log4j
 * mwdumper.jar => /usr/local/search/ls2/lib
 * lsearch.conf: Storage.lib=/usr/local/search/ls2/sql

lsearch-global.conf

[Database] wikidev : (single) (language,sr) wikilucene : (nssplit,3) (nspart1,[0]) (nspart2,[4,5,12,13]), (nspart3,[]) wikilucene : (language,en) (warmup,10) basiswiki : (single) (language,en) (warmup,10) [Search-Group] : wikilucene wikidev : basiswiki
 * 1) wikilucene : (single) (language,en) (warmup,0)
 * 1) Search groups
 * 2) Index parts of a split index are always taken from the node's group
 * 3) host : db1.part db2.part
 * 4) Mulitple hosts can search multiple dbs (N-N mapping)

Please can you help me?!

85.158.226.1 11:03, 31 March 2008 (UTC)


 * Run java -version. I probably have old java, you need to update to 1.5 or later. --Rainman 11:57, 31 March 2008 (UTC)

MediaWiki+Lucene-Search+MWSearch = ZERO search results ??!@#?!
Can someone please assist me? =)''
 * Slackware 12.0, on i686 Pentium III [Linux 2.6.21.5]
 * MediaWiki: 1.9.1
 * PHP: 5.2.5 (apache2handler)
 * MySQL: 5.0.37
 * MediaWiki Extension(s): MWSearch SVN (05122008), and Lucene-search SVN (05122008), + I downloaded & installed mwdumper.jar into the Lucene2 lib dir.
 * other tools: jre-6u2-i586-1, jdk-1_5_0_09-i586-1, apache-ant-1.7.0-i586-1bj, rsync-2.6.9-i486-1

I've followed the steps per Extension:Lucene-search and Extension:MWSearch pages, to the T - I've gone over and over them several times, I've been to MediaWiki Forums, and the MediaWiki-L mailing list ... please help me! =)

My Local LuceneSearch configuration /etc/lsearch.conf MWConfig.global=file:///etc/lsearch-global.conf MWConfig.lib=/usr/local/search/lucene-search-2svn05112008/lib Indexes.path=/usr/local/search/indexes Search.updateinterval=1 Search.updatedelay=0 Search.checkinterval=30 Index.snapshotinterval=5 Index.maxqueuecount=5000 Index.maxqueuetimeout=12 Storage.master=localhost Storage.username=wikiuser Storage.password=mypass Storage.useSeparateDBs=false Storage.defaultDB=wikidb Storage.lib=/usr/local/search/lucene-search-2svn05112008/sql Localization.url=file:///var/www/htdocs/wiki/languages/messages Logging.logconfig=/etc/lsearch.log4j Logging.debug=true /etc/lsearch-global.conf [Database] wikidb : (single) (language,en) (warmup,10) [Search-Group] nen-tftp : wikidb [Index] nen-tftp : wikidb [Index-Path] : /usr/local/search/indexes [OAI] wiktionary : http://$lang.wiktionary.org/w/index.php wikilucene : http://localhost/wiki-lucene/phase3/index.php : http://$lang.wikipedia.org/w/index.php [Properties] Database.suffix=wiki wiktionary wikidb KeywordScoring.suffix=wikidb wiki wikilucene wikidev ExactCase.suffix=wikidb wiktionary wikilucene [Namespace-Prefix] all : [0] : 0 [1] : 1 [2] : 2 [3] : 3 [4] : 4 [5] : 5 [6] : 6 [7] : 7 [8] : 8 [9] : 9 [10] : 10 [11] : 11 [12] : 12 [13] : 13 [14] : 14 [15] : 15 /etc/lsearch.log4j log4j.rootLogger=INFO, A1 log4j.appender.A1=org.apache.log4j.ConsoleAppender log4j.appender.A1.layout=org.apache.log4j.PatternLayout log4j.appender.A1.layout.ConversionPattern=%-4r [%t] %-5p %c %x - %m%n
 * LuceneSearch SVN Install dir: /usr/local/search/lucene-search-2svn05112008
 * Indexes stored: /usr/local/search/indexes

relevant /var/www/htdocs/wiki/LocalSettings.php settings $wgSearchType = 'LuceneSearch'; $wgLuceneHost = 'localhost'; $wgLucenePort = 8123; require_once("extensions/MWSearch/MWSearch.php");

building the index works running dumpBackup(Init).php > php maintenance/dumpBackupInit.php --current --quiet > wikidb.xml && java -cp /usr/local/search/lucene-search-2svn05112008/LuceneSearch.jar org.wikimedia.lsearch.importer.Importer -s /var/www/htdocs/wiki/wikidb.xml wikidb MediaWiki Lucene search indexer - index builder from xml database dumps.

Trying config file at path /root/.lsearch.conf Trying config file at path /var/www/htdocs/wiki/lsearch.conf Trying config file at path /etc/lsearch.conf log4j: Trying to find [log4j.xml] using context classloader sun.misc.Launcher$AppClassLoader@133056f. log4j: Trying to find [log4j.xml] using sun.misc.Launcher$AppClassLoader@133056f class loader. log4j: Trying to find [log4j.xml] using ClassLoader.getSystemResource. log4j: Trying to find [log4j.properties] using context classloader sun.misc.Launcher$AppClassLoader@133056f. log4j: Trying to find [log4j.properties] using sun.misc.Launcher$AppClassLoader@133056f class loader. log4j: Trying to find [log4j.properties] using ClassLoader.getSystemResource. log4j: Could not find resource: [null]. log4j: Parsing for [root] with value=[INFO, A1]. log4j: Level token is [INFO]. log4j: Category root set to INFO log4j: Parsing appender named "A1". log4j: Parsing layout options for "A1". log4j: Setting property [conversionPattern] to [%-4r [%t] %-5p %c %x - %m%n]. log4j: End of parsing for "A1". log4j: Parsed "A1" options. log4j: Finished configuring. 0   [main] INFO  org.wikimedia.lsearch.util.UnicodeDecomposer  - Loaded unicode decomposer 18  [main] INFO  org.wikimedia.lsearch.util.Localization  - Reading localization for En 434  [main] INFO  org.wikimedia.lsearch.ranks.RankBuilder  - First pass, getting a list of valid articles... 94 pages (99.576/sec), 94 revs (99.576/sec) 1527 [main] INFO org.wikimedia.lsearch.ranks.RankBuilder  - Second pass, calculating article links... 94 pages (326.389/sec), 94 revs (326.389/sec) 1928 [main] INFO org.wikimedia.lsearch.importer.Importer  - Third pass, indexing articles... 94 pages (24.588/sec), 94 revs (24.588/sec) 6005 [main] INFO org.wikimedia.lsearch.importer.Importer  - Closing/optimizing index... Finished indexing in 5s, with final index optimization in 0s Total time: 6s 6530 [main] INFO org.wikimedia.lsearch.index.IndexThread  - Making snapshot for wikidb 6582 [main] INFO org.wikimedia.lsearch.index.IndexThread  - Made snapshot /usr/local/search/indexes/snapshot/wikidb/20080512024654

That creates a 277KB file @ /var/www/htdocs/wiki/wikidb.xml, which looks just fine to me...

Starting the lsearch daemon is working When I run my script /usr/local/search/lucene-search-2svn05112008/lsearchd - which starts the lsearch deamon, I get the following, which ALSO looks fine ; java -Djava.rmi.server.codebase=file:///usr/local/search/lucene-search-2svn05112008/LuceneSeah.jar -Djava.rmi.server.hostname=nen-tftp -jar /usr/local/search/lucene-search-2svn05112008/LuceneSearch.jar $* RMI registry started. Trying config file at path /root/.lsearch.conf Trying config file at path /usr/local/search/lucene-search-2svn05112008/lsearch.conf log4j: Parsing for [root] with value=[INFO, A1]. log4j: Level token is [INFO]. log4j: Category root set to INFO log4j: Parsing appender named "A1". log4j: Parsing layout options for "A1". log4j: Setting property [conversionPattern] to [%-4r [%t] %-5p %c %x - %m%n]. log4j: End of parsing for "A1". log4j: Parsed "A1" options. log4j: Finished configuring. 0   [main] INFO  org.wikimedia.lsearch.util.Localization  - Reading localization for En 2351 [main] INFO  org.wikimedia.lsearch.util.UnicodeDecomposer  - Loaded unicode decomposer 2600 [main] INFO org.wikimedia.lsearch.interoperability.RMIServer  - RMIMessenger bound 2882 [main] INFO org.wikimedia.lsearch.interoperability.RMIServer  - RemoteSearchable bound 2914 [main] INFO org.wikimedia.lsearch.search.Warmup  - Warming up index wikidb ... 2928 [Thread-2] INFO org.wikimedia.lsearch.frontend.HTTPIndexServer  - Started server at port 8321 2929 [Thread-3] INFO org.wikimedia.lsearch.frontend.SearchServer  - Binding server to port 8123 4246 [main] INFO org.wikimedia.lsearch.search.Warmup  - Warmed up wikidb in 1331 ms 4246 [main] INFO  org.wikimedia.lsearch.search.Warmup  - Warming up index wikidb ... 5079 [main] INFO org.wikimedia.lsearch.search.Warmup  - Warmed up wikidb in 833 ms 5079 [main] INFO  org.wikimedia.lsearch.search.Warmup  - Warming up index wikidb ... 5861 [main] INFO org.wikimedia.lsearch.search.Warmup  - Warmed up wikidb in 782 ms From here, I pull up my normal wiki, which has been working fine ALL along - but now, I get ZERO search results, no matter what I do! I know I am searching correctly, I just type in 1 single word for searching (that I know is on several pages in the wiki) I've even tried to edit the file before and after building the index, and starting/stoping the lsearch daemon, yet I get this error in my MediaWiki search results page; Search results From AgentDcooper's Wiki

You searched for wiki

For more information about searching AgentDcooper's Wiki, see Searching AgentDcooper's Wiki.

Showing below 0 results starting with #1. No page text matches

Note: Unsuccessful searches are often caused by searching for common words like "have" and "from", which are not indexed, or by specifying more than one search term (only pages containing all of the search terms will appear in the result).

I notice that the lsearch daemon console output scrolls the following; right after doing a search within the wiki 293744 [pool-2-thread-1] INFO org.wikimedia.lsearch.frontend.HttpHandler  - query:/search/wikidb/wiki?namespaces=0%2C1%2C2%2C3%2C4%2C5%2C6%2C7%2C8%2C9%2C10%2C11%2C12%2C13%2C14%2C15&offset=0&limit=100&version=2&iwlimit=10 what:search dbname:wikidb term:wiki 293759 [pool-2-thread-1] INFO org.wikimedia.lsearch.search.SearchEngine  - Using NamespaceFilterWrapper wrap: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15} 293786 [pool-2-thread-1] INFO org.wikimedia.lsearch.search.SearchEngine  - search wikidb: query=[wiki] parsed=[contents:wiki (title:wiki^6.0 stemtitle:wiki^2.0) (alttitle1:wiki^4.0 alttitle2:wiki^4.0 alttitle3:wiki^4.0) (keyword1:wiki^0.02 keyword2:wiki^0.01 keyword3:wiki^0.0066666664 keyword4:wiki^0.0050 keyword5:wiki^0.0039999997)] hit=[27] in 16ms using IndexSearcherMul:1210585609666 With Mediawiki Debuging enabled, my /var/log/mediawiki/debug_log.txt shows this Start request GET /wiki/index.php/Special:Search?search=wiki&fulltext=Search Host: nen-tftp.techiekb.com User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14 Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0. 5 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive Referer: http://nen-tftp.techiekb.com/wiki/index.php/Special:Version Cookie: wikidb_session=3jptdli2pf3nkuq924tq1ihlt0 Authorization: Basic ZGNvb3Blcjp0ZXN0cGFzcw==

Main cache: FakeMemCachedClient Message cache: MediaWikiBagOStuff Parser cache: MediaWikiBagOStuff Unstubbing $wgParser on call of $wgParser->setHook from require_once Fully initialised Unstubbing $wgContLang on call of $wgContLang->checkTitleEncoding from WebRequest::getGPCVal Language::loadLocalisation: got localisation for en from source Unstubbing $wgUser on call of $wgUser->isAllowed from Title::userCanRead Cache miss for user 2 Unstubbing $wgLoadBalancer on call of $wgLoadBalancer->getConnection from wfGetDB Logged in from session Unstubbing $wgMessageCache on call of $wgMessageCache->getTransform from wfMsgGetKey Unstubbing $wgLang on call of $wgLang->getCode from MessageCache::get MessageCache::load: got from global cache Unstubbing $wgOut on call of $wgOut->setPageTitle from SpecialSearch::setupPage Fetching search data from http://localhost:8123/search/wikidb/wiki?namespaces=0%2C1%2C2%2C3%2C4%2C5%2C6%2C 7%2C8%2C9%2C10%2C11%2C12%2C13%2C14%2C15&offset=0&limit=100&version=2&iwlimit=10 total [0] hits OutputPage::sendCacheControl: private caching; ** Request ended normally Now get this, if I goto the link from the debug from above = http://localhost:8123/search/wikidb/wiki?namespaces=0%2C1%2C2%2C3%2C4%2C5%2C6%2C7%2C8%2C9%2C10%2C11%2C12%2C13%2C14%2C15&offset=0&limit=100&version=2&iwlimit=10, I get this page;; 3 1.0 0 Main_Page 0.9577699303627014 0 EFFICIENT%2FCISCO%2FNETSCREEN%2FNETOPIA_Router_Command_Matrix 0.7121278643608093 0 DBU_-_DialBackUp Which leads me to my question: what am I doing wrong?? I have tried everything I can think of, I just cannot get my search within my mediawiki to work proplery. It seems like the search itself is working when going to the link directly above -- somehow the "total hits" in the log as well as the wiki are showing ZERO? Yet manually going to the link in the debug, shows me what appears to be a result indicating 3 PAGES were found with corresponding results data!?@# Why is MediaWiki not showing this? Anyhelp would be kindly appreciated, or even a link for reference! -peace- --Agentdcooper 12 May 2008


 * I would suspect the problem is the MW version. Search front-end has been heavily refactored in MediaWiki 1.13, and MWSearch is designed to run with latest mediawiki, so there might be some compatibility issues. Note that MW 1.13 is still not released, but is still in development. Try using Extension:LuceneSearch instead. --Rainman 13:20, 12 May 2008 (UTC)


 * Thanks a TON, I will try this out in just a few, I half suspected it was a MediaWiki versioning issue, I really need to upgrade! =) --Agentdcooper 20:16, 12 May 2008 (UTC)


 * I moved to LuceneSearch and getting a strange error -- I removed MWSearch extension entirely, then downloaded Extension:LuceneSearch SVN from today, and moved the LuceneSeach directory to /var/www/htdocs/wiki - chmod'd to 755 recursively to make sure it isn't a permissions issue - the I commented out the MWSearch code in LocalSettings.php;

 
 * 1) $wgSearchType = 'LuceneSearch';
 * 2) $wgLuceneHost = 'localhost';
 * 3) $wgLucenePort = 8123;
 * 4) require_once("extensions/MWSearch/MWSearch.php");
 * I've tried different settings for Extension:LuceneSearch, but ended up with this config for LuceneSearch ;

 $wgDisableInternalSearch = true; $wgDisableSearchUpdate = true; $wgSearchType = 'LuceneSearch'; $wgLuceneHost = 'localhost'; $wgLucenePort = 8123; require_once("extensions/LuceneSearch/LuceneSearch.php"); $wgLuceneSearchVersion = 2; $wgLuceneDisableSuggestions = true; $wgLuceneDisableTitleMatches = true; </PRE> I then ran the indexer, which seemed to go great ;  > php maintenance/dumpBackupInit.php --current --quiet > wikidb.xml && java -cp /usr/local/search/lucene-search-2svn05112008/LuceneSearch.jar org.wikimedia.lsearch.importer.Importer -s /var/www/htdocs/wiki/wikidb.xml wikidb MediaWiki Lucene search indexer - index builder from xml database dumps.

Trying config file at path /root/.lsearch.conf Trying config file at path /var/www/htdocs/wiki/lsearch.conf Trying config file at path /etc/lsearch.conf log4j: Trying to find [log4j.xml] using context classloader sun.misc.Launcher$AppClassLoader@133056f. log4j: Trying to find [log4j.xml] using sun.misc.Launcher$AppClassLoader@133056f class loader. log4j: Trying to find [log4j.xml] using ClassLoader.getSystemResource. log4j: Trying to find [log4j.properties] using context classloader sun.misc.Launcher$AppClassLoader@133056f. log4j: Trying to find [log4j.properties] using sun.misc.Launcher$AppClassLoader@133056f class loader. log4j: Trying to find [log4j.properties] using ClassLoader.getSystemResource. log4j: Could not find resource: [null]. log4j: Parsing for [root] with value=[INFO, A1]. log4j: Level token is [INFO]. log4j: Category root set to INFO log4j: Parsing appender named "A1". log4j: Parsing layout options for "A1". log4j: Setting property [conversionPattern] to [%-4r [%t] %-5p %c %x - %m%n]. log4j: End of parsing for "A1". log4j: Parsed "A1" options. log4j: Finished configuring. 0   [main] INFO  org.wikimedia.lsearch.util.UnicodeDecomposer  - Loaded unicode decomposer 17  [main] INFO  org.wikimedia.lsearch.util.Localization  - Reading localization for En 432  [main] INFO  org.wikimedia.lsearch.ranks.RankBuilder  - First pass, getting a list of valid articles... 94 pages (98.739/sec), 94 revs (98.739/sec) 1532 [main] INFO org.wikimedia.lsearch.ranks.RankBuilder  - Second pass, calculating article links... 94 pages (325.26/sec), 94 revs (325.26/sec) 1934 [main] INFO org.wikimedia.lsearch.importer.Importer  - Third pass, indexing articles... 94 pages (24.691/sec), 94 revs (24.691/sec) 5996 [main] INFO org.wikimedia.lsearch.importer.Importer  - Closing/optimizing index... Finished indexing in 5s, with final index optimization in 0s Total time: 6s 6515 [main] INFO org.wikimedia.lsearch.index.IndexThread  - Making snapshot for wikidb 6566 [main] INFO org.wikimedia.lsearch.index.IndexThread  - Made snapshot /usr/local/search/indexes/snapshot/wikidb/20080512134828 </PRE> And then, started lsearch daemon via console ;  > java -Djava.rmi.server.codebase=file:///usr/local/search/lucene-search-2svn05112008/LuceneSeach.jar -Djava.rmi.server.hostname=nen-tftp -jar /usr/local/search/lucene-search-2svn05112008/LuceneSearch.jar $* RMI registry started. Trying config file at path /root/.lsearch.conf Trying config file at path /root/lsearch.conf Trying config file at path /etc/lsearch.conf log4j: Parsing for [root] with value=[INFO, A1]. log4j: Level token is [INFO]. log4j: Category root set to INFO log4j: Parsing appender named "A1". log4j: Parsing layout options for "A1". log4j: Setting property [conversionPattern] to [%-4r [%t] %-5p %c %x - %m%n]. log4j: End of parsing for "A1". log4j: Parsed "A1" options. log4j: Finished configuring. 0   [main] INFO  org.wikimedia.lsearch.util.Localization  - Reading localization for En 2353 [main] INFO  org.wikimedia.lsearch.util.UnicodeDecomposer  - Loaded unicode decomposer 2603 [main] INFO org.wikimedia.lsearch.interoperability.RMIServer  - RMIMessenger bound 2885 [main] INFO org.wikimedia.lsearch.interoperability.RMIServer  - RemoteSearchable bound 2929 [Thread-2] INFO org.wikimedia.lsearch.frontend.HTTPIndexServer  - Started server at port 8321 2930 [Thread-3] INFO org.wikimedia.lsearch.frontend.SearchServer  - Binding server to port 8123 2935 [main] INFO org.wikimedia.lsearch.search.Warmup  - Warming up index wikidb ... 4265 [main] INFO org.wikimedia.lsearch.search.Warmup  - Warmed up wikidb in 1329 ms 4266 [main] INFO  org.wikimedia.lsearch.search.Warmup  - Warming up index wikidb ... 5110 [main] INFO org.wikimedia.lsearch.search.Warmup  - Warmed up wikidb in 844 ms 5110 [main] INFO  org.wikimedia.lsearch.search.Warmup  - Warming up index wikidb ... 5922 [main] INFO org.wikimedia.lsearch.search.Warmup  - Warmed up wikidb in 811 ms </PRE> My mediawiki's Special:Version page shows LuceneSearch (version 2.0) is installed properly. Yet, when I do any type of search in my MediaWiki, the page comes up displaying the following error; Fatal error: Call to undefined function wfLoadExtensionMessages in /var/www/htdocs/wiki/extensions/LuceneSearch/LuceneSearch_body.php on line 85 The lsearch daemon console output shows nothing, new since I started it! That to me indicates; the search isn't being passed to the lsearch daemon?? ... In reviewing the Debug log @ /var/log/mediawiki/debug_log.txt, I'm seeing this ;;  Start request GET /wiki/index.php/Special:Search?search=wiki&fulltext=Search Host: nen-tftp.techiekb.com User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.14) Gecko/2 0080404 Firefox/2.0.0.14 Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plai n;q=0.8,image/png,*/*;q=0.5 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive Referer: http://nen-tftp.techiekb.com/wiki/index.php/Main_Page Cookie: wikidbUserName=Rprior; wikidb_session=buvigq1obd1nd5ulbk1l8d83s7; wikidb UserID=2; wikidbToken=dd6c9b732dba0c94b04ad72044d46d79 Authorization: Basic ZGNvb3Blcjp0ZXN0cGFzcw==

Main cache: FakeMemCachedClient Message cache: MediaWikiBagOStuff Parser cache: MediaWikiBagOStuff Unstubbing $wgParser on call of $wgParser->setHook from require_once Fully initialised Unstubbing $wgContLang on call of $wgContLang->checkTitleEncoding from WebReques t::getGPCVal Language::loadLocalisation: got localisation for en from source Unstubbing $wgUser on call of $wgUser->isAllowed from Title::userCanRead Cache miss for user 2 Unstubbing $wgLoadBalancer on call of $wgLoadBalancer->getConnection from wfGetDB Logged in from session </PRE> Just as an added note here, the file /var/www/htdocs/wiki/extensions/LuceneSearch/LuceneSearch_body.php includes the following on line #85 thru #89 ;  wfLoadExtensionMessages( 'LuceneSearch' ); $fname = 'LuceneSearch::execute'; wfProfileIn( $fname ); $this->setHeaders; $wgOut->addHTML('<!-- titlens = '. $wgTitle->getNamespace . '- ->'); </PRE> Any chance you got an idea on how to fix this issue? =) --- I am thinking I may just have to update to mediawiki SVN and try MWSearch if I cannot get this going on my current mediawiki install, yet I'd LOVE to fix this if possible. Please help me! =) --Agentdcooper 21:12, 12 May 2008 (UTC)

2008-05-12 :: Installed Mediawiki SVN + Lucene-Search SVN & MWSearch SVN, still getting ZERO search results
I flat-out installed MW from new version of mediawiki SVN, Lucene-search SVN, and MWSearch SVN Version r34306 -- all subversion/SVN downloads from 05.12.2008, with lucene-search-2 SVN being 05.11.2008).
 * Base-system: is Slackware 12.0, on i686 Pentium III [Linux 2.6.21.5]
 * Mediawiki 1.13alpha (r34693)
 * PHP: 5.2.5
 * MySQL: 5.0.37
 * packages: jre-6u2-i586-1, jdk-1_5_0_09-i586-1, apache-ant-1.7.0-i586-1bj, rsync-2.6.9-i486-1
 * mwdumper.jar is intalled in /usr/local/search/lucene-search-2svn05112008/lib directory.
 * ExtensionFunctions.php installed @ /var/www/htdocs/wiki-test/extensions
 * Special:Version shows MWSearch (Version r34306) is installed properly...

My config files

/etc/lsearch.conf  MWConfig.global=file:///etc/lsearch-global.conf MWConfig.lib=/usr/local/search/lucene-search-2svn05112008/lib Indexes.path=/usr/local/search/indexes Search.updateinterval=1 Search.updatedelay=0 Search.checkinterval=30 Index.snapshotinterval=5 Index.maxqueuecount=5000 Index.maxqueuetimeout=12 Storage.master=localhost Storage.username=newwikiuser Storage.password=testpass Storage.useSeparateDBs=false Storage.defaultDB=wikidbnew Storage.lib=/usr/local/search/lucene-search-2svn05112008/sql SearcherPool.size=3 Localization.url=file:///var/www/htdocs/wiki-test/languages/messages Logging.logconfig=/etc/lsearch.log4j Logging.debug=true </PRE> /etc/lsearch-global.conf  [Database] wikidbnew : (single) (language,en) (warmup,10) [Index] nen-tftp : wikidbnew [Index-Path] : /usr/local/search/indexes [OAI] wiktionary : http://$lang.wiktionary.org/w/index.php wikilucene : http://localhost/wiki-lucene/phase3/index.php : http://$lang.wikipedia.org/w/index.php [Properties] Database.suffix=wiki wiktionary wikidbnew KeywordScoring.suffix=wikidbnew wiki wikilucene wikidev ExactCase.suffix=wikidbnew wiktionary wikilucene [Namespace-Prefix] all : [0] : 0 [1] : 1 [2] : 2 [3] : 3 [4] : 4 [5] : 5 [6] : 6 [7] : 7 [8] : 8 [9] : 9 [10] : 10 [11] : 11 [12] : 12 [13] : 13 [14] : 14 [15] : 15 </PRE> /etc/lsearch.log4j  log4j.rootLogger=INFO, A1 log4j.appender.A1=org.apache.log4j.ConsoleAppender log4j.appender.A1.layout=org.apache.log4j.PatternLayout log4j.appender.A1.layout.ConversionPattern=%-4r [%t] %-5p %c %x - %m%n </PRE> command-line for indexing my wiki (now in a script called /var/www/htdocs/wiki-test/dumpBackup.sh)  php maintenance/dumpBackupInit.php --current --quiet > wikidbnew.xml && java -cp /usr/local/search/lucene-search-2svn05112008/LuceneSearch.jar org.wikimedia.lsearch.importer.Importer -s /var/www/htdocs/wiki-test/wikidbnew.xml wikidbnew </PRE> command-line to start lsearch daemon (now in a script called /usr/local/search/lucene-search-2svn05112008/lsearchd)  java -Djava.rmi.server.codebase=file:///usr/local/search/lucene-search-2svn05112008/LuceneSeach.jar -Djava.rmi.server.hostname=nen-tftp -jar /usr/local/search/lucene-search-2svn05112008/LuceneSearch.jar $* </PRE> PHP Version 5.2.5 was configured with command line that enabled curl
 * switches used '--with-curl=shared' '--with-curlwrappers'
 * cURL support = enabled
 * cURL Information = libcurl/7.16.2 OpenSSL/0.9.8e zlib/1.2.3 libidn/0.6.10


 * the mySQL DB wikidbnew does show a table called searchindex sized 20.5 KiB, which appears to be populated correctly with search info from my wikidb.

config/install of new mediawiki SVN I ran thru the basic config/install of mediawiki, and put some data into the basic wiki - something I knew could be searchable easily. I build the index, it seems to build without error, everything just works --- but when I issue a search from the main wiki page, i get ZERO search results, even tho' the mediawiki original search DID find these searches when it was just a basic mediawiki install, prior to me installing Lucene-Search and/or MWSearch extensions.

mediawiki search results = ZERO What seems strange here is everything seems to work, up-to the point of searching thru my wiki! when I search in the wiki, i get the following, ZERO results message ;  No page text matches

Note: Only some namespaces are searched by default. Try prefixing your query with all: to search all content (including talk pages, templates, etc), or use the desired namespace as prefix. </PRE> mediawiki debug file When I look at the mediawiki debug file = /var/log/mediawiki/debug_mediawiki-wiki-test_log.txt, it shows the following :: when a search is being submitted for 'wiki' (which exists in multiple locations on the mainpage within the mediawiki) ;;  Start request GET /wiki-test/index.php?title=Special%3ASearch&search=wiki&ns0=1&ns1=1&ns2=1&ns3=1&ns4=1&ns5=1&ns6=1&ns7=1&ns8=1&ns9=1&ns10=1&ns11=1&ns12=1&ns13=1&ns14=1&ns15=1&fulltext=Search Host: nen-tftp.techiekb.com User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14 Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive Referer: http://nen-tftp.techiekb.com/wiki-test/index.php/Special:Search?search=wiki&fulltext=Search Cookie: wikidbUserName=Rprior; wikidb_session=buvigq1obd1nd5ulbk1l8d83s7; wikidbUserID=2; wikidbToken=dd6c9b732dba0c94b04ad72044d46d79; wikidbnew_session=gvchrcs1cf12uvdukl1odpapk7; wikidbnewUserID=1; wikidbnewUserName=Rprior; wikidbnewToken=ef9b27fc68ffacb8c7362b31ea27e292 Authorization: Basic ZGNvb3Blcjp0ZXN0cGFzcw==

Main cache: FakeMemCachedClient Message cache: MediaWikiBagOStuff Parser cache: MediaWikiBagOStuff session_set_cookie_params: "0", "/", "", "", "1" Fully initialised Unstubbing $wgContLang on call of $wgContLang->checkTitleEncoding from WebRequest::getGPCVal Language::loadLocalisation: got localisation for en from source Unstubbing $wgOut on call of $wgOut->setArticleRelated from SpecialPage::setHeaders Unstubbing $wgMessageCache on call of $wgMessageCache->get from wfMsgGetKey Unstubbing $wgLang on call of $wgLang->getCode from MessageCache::get Unstubbing $wgUser on call of $wgUser->getOption from StubUserLang::_newObject Cache miss for user 1 Connecting to localhost wikidbnew... Connected Logged in from session MessageCache::load: got from global cache Unstubbing $wgParser on call of $wgParser->firstCallInit from MessageCache::transform Preprocessor_Hash::preprocessToObj $1 - Preprocessor_Hash::preprocessToObj $1 - Preprocessor_Hash::preprocessToObj You searched for wiki Preprocessor_Hash::preprocessToObj For more information about searching, see |. Preprocessor_Hash::preprocessToObj Help:Contents Fetching search data from http://nen-tftp.techiekb.com:8123/search/wikidbnew/wiki?namespaces=0%2C1%2C2%2C3%2C4%2C5%2C6%2C7%2C8%2C9%2C10%2C11%2C12%2C13%2C14%2C15&offset=0&limit=100&version=2&iwlimit=10 Http::request: GET http://nen-tftp.techiekb.com:8123/search/wikidbnew/wiki?namespaces=0%2C1%2C2%2C3%2C4%2C5%2C6%2C7%2C8%2C9%2C10%2C11%2C12%2C13%2C14%2C15&offset=0&limit=100&version=2&iwlimit=10 total [0] hits Preprocessor_Hash::preprocessToObj

No page text matches
Preprocessor_Hash::preprocessToObj Note: Only some namespaces are searched by default. Try prefixing your query with all: to search all content (including talk pages, templates, etc), or use the desired namespace as prefix. Preprocessor_Hash::preprocessToObj Search in namespaces: $1

Preprocessor_Hash::preprocessToObj

Preprocessor_Hash::preprocessToObj

Preprocessor_Hash::preprocessToObj Search for $1 $2 Preprocessor_Hash::preprocessToObj Preprocessor_Hash::preprocessToObj About Preprocessor_Hash::preprocessToObj About Preprocessor_Hash::preprocessToObj From Preprocessor_Hash::preprocessToObj Search OutputPage::sendCacheControl: private caching; ** Request ended normally </PRE> pointing a browser at the link in debug file

Here's the deal though, if I goto the link in the debug thru lynx/a browser = "http://nen-tftp.techiekb.com:8123/search/wikidbnew/wiki?namespaces=0%2C1%2C2%2C3%2C4%2C5%2C6%2C7%2C8%2C9%2C10%2C11%2C12%2C13%2C14%2C15&offsett=0&limit=100&version=2&iwlimit=10" - I get this output ! ;  1 1.0 0 Main_Page </PRE> HELP :: where am I going wrong??

Mediawiki gives me no results, and the debug log file above, shows a total [0] hits, why am I getting zero hits? no matter what I do, I am getting zero hits!? can you see anything wrong I am doing here? please help =) --Agentdcooper 00:41, 13 May 2008 (UTC)
 * just to note: if I grep the file /var/www/htdocs/wiki-test/wikidbnew.xml for the same word I am searching for, I get MANY hits!? --Agentdcooper 00:51, 13 May 2008 (UTC)


 * OK then, try adding wfDebug($data); somewhere around line 564 in MWSearch.php. This should print to the MediaWiki debug log the same data you're seeing whey you directly access the search URL. If it doesn't print anything, then something is wrong with your curl. --Rainman 09:06, 13 May 2008 (UTC)


 * Well, I think you are on to something there! so here's the deal, I put wfDebug($data); on line #565, by itself. I then re-ran the index command, and restarted the lsearch daemon so I could watch the console output via SSH session .... I loaded up the main wiki page, and did a basic search for the word "wiki" here's what happens ;;


 * After pushing the search button within the wiki, it takes me to a blank page [my browser's address bar shows = "http://<mydomain.com>/wiki-test/index.php/Special:Search?search=wiki&fulltext=Search" yet is completely blank, watching the console output from the lsearch daemon, it shows the following;

 629776 [pool-1-thread-5] INFO org.wikimedia.lsearch.frontend.HttpHandler  - query:/search/wikidbnew/wiki?namespaces=0%2C1%2C2%2C3%2C4%2C5%2C6%2C7%2C8%2C9%2C10%2C11%2C12%2C13%2C14%2C15&offset=0&limit=100&version=2&iwlimit=10 what:search dbname:wikidbnew term:wiki 629780 [pool-1-thread-5] INFO org.wikimedia.lsearch.search.SearchEngine  - Using NamespaceFilterWrapper wrap: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15} 629784 [pool-1-thread-5] INFO org.wikimedia.lsearch.search.SearchEngine  - search wikidbnew: query=[wiki] parsed=[contents:wiki (title:wiki^6.0 stemtitle:wiki^2.0) (alttitle1:wiki^4.0 alttitle2:wiki^4.0 alttitle3:wiki^4.0) (keyword1:wiki^0.02 keyword2:wiki^0.01 keyword3:wiki^0.0066666664 keyword4:wiki^0.0050 keyword5:wiki^0.0039999997)] hit=[1] in 5ms using IndexSearcherMul:1210691193858 </PRE>
 * my debug log @ /var/log/mediawiki/debug_mediawiki-wiki-test_log.txt scrolls the following by, right when I do that "wiki" search ;;

 Start request GET /wiki-test/index.php/Special:Search?search=wiki&fulltext=Search Host: nen-tftp.techiekb.com User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.14) Gecko/2 0080404 Firefox/2.0.0.14 Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plai n;q=0.8,image/png,*/*;q=0.5 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive Referer: http://nen-tftp.techiekb.com/wiki-test/index.php/Main_Page Cookie: wikidbUserName=Rprior; wikidb_session=buvigq1obd1nd5ulbk1l8d83s7; wikidb UserID=2; wikidbToken=dd6c9b732dba0c94b04ad72044d46d79; wikidbnew_session=gvchrc s1cf12uvdukl1odpapk7; wikidbnewUserID=1; wikidbnewUserName=Rprior; wikidbnewToke n=ef9b27fc68ffacb8c7362b31ea27e292 Authorization: Basic ZGNvb3Blcjp0ZXN0cGFzcw==

Main cache: FakeMemCachedClient Message cache: MediaWikiBagOStuff Parser cache: MediaWikiBagOStuff session_set_cookie_params: "0", "/", "", "", "1" Fully initialised Unstubbing $wgContLang on call of $wgContLang->checkTitleEncoding from WebReques t::getGPCVal Language::loadLocalisation: got localisation for en from source Unstubbing $wgOut on call of $wgOut->setArticleRelated from SpecialPage::setHead ers Unstubbing $wgMessageCache on call of $wgMessageCache->get from wfMsgGetKey Unstubbing $wgLang on call of $wgLang->getCode from MessageCache::get Unstubbing $wgUser on call of $wgUser->getOption from StubUserLang::_newObject Cache miss for user 1 Connecting to localhost wikidbnew... Connected Logged in from session MessageCache::load: got from global cache Unstubbing $wgParser on call of $wgParser->firstCallInit from MessageCache::tran sform Preprocessor_Hash::preprocessToObj $1 - Preprocessor_Hash::preprocessToObj $1 - Preprocessor_Hash::preprocessToObj You searched for wiki Preprocessor_Hash::preprocessToObj For more information about searching, see |. Preprocessor_Hash::preprocessToObj Help:Contents Fetching search data from http://nen-tftp.techiekb.com:8123/search/wikidbnew/wik i?namespaces=0%2C1%2C2%2C3%2C4%2C5%2C6%2C7%2C8%2C9%2C10%2C11%2C12%2C13%2C14%2C15 &offset=0&limit=100&version=2&iwlimit=10 Http::request: GET http://nen-tftp.techiekb.com:8123/search/wikidbnew/wiki?names paces=0%2C1%2C2%2C3%2C4%2C5%2C6%2C7%2C8%2C9%2C10%2C11%2C12%2C13%2C14%2C15&offset =0&limit=100&version=2&iwlimit=10 </PRE>
 * If I goto that link at the bottom of the debug log, the following is displayed in my browser;;

 1 1.0 0 Main_Page </PRE>
 * so, what are you thinking boss, is it my CURL install? if that's the case, a new slackware v12.1 just came out, and it appears they updated apache to v2.2.8, PHP to v5.2.6, yet slack v12.1 still is using curl v7.16.2 package, which is the same version I'm running now, but it has been repackaged ... hmmmm ... what do you think rainman?? BTW, thanks a million for your assistance! I really cant wait to get this lucene search functionality working for my mediawiki project! --Agentdcooper 15:33, 13 May 2008 (UTC)


 * any idea's, anyone? I am stuck... please help. --Agentdcooper 03:38, 15 May 2008 (UTC)


 * I am going to install slackware v12.1 as a FRESH install on a new computer, and try this all over again, to see if it may be something I messed up along the way, I will report back with my results... In case someone ends up reading the above, and can make a suggestion, I'm all ears! I will be keeping the slackware 12.0 install seperate, and would love to hear from someone on how I might go about fixing it! -peace- --Agentdcooper 20:52, 15 May 2008 (UTC)

currently, i'm updating to newer OS, but is that necessary, REALLY?

I am downloading slackware 12.1 ISO's right now, but it just bewilders me why I would need to have the latest/greatest OS to run mediawiki - as I understood it, mediawiki can run on all sorts of linux based OS's/distributions and doesn't necessarily need to have the best hardware needed to run with... I've detailed my problems heavily above, I am hoping someone can help me, before I get my new, rather large 2.0Gig OS download completed (it'll take a couple days, due to my slow `net connection right now... I'd really like to fix whats broken before updating my entire OS, meh? thanks for all the help so far! --Agentdcooper 03:24, 19 May 2008 (UTC)

Lucene-search wrecks Special:ListUsers
When using Lucene-search version 2.0.2 (the current version as of this date) under mediawiki 1.10.x, I found that the special page Special:ListUsers stays blank. Turning on error reporting revealed a fatal error: Fatal error: Class 'ApiQueryGeneratorBase' not found in /srv/www/htdocs/mediawiki/extensions/LuceneSearch/ApiQueryLuceneSearch.php on line 33 I found that this can be solved by adding the line require_once($IP.'/includes/api/ApiQueryBase.php'); into the file LuceneSearch_body.php (right below the require statement which is already there).

Lexw 12:38, 17 July 2008 (UTC)

Exception resolution
If you have an error such as Exception in thread "main" java.lang.NullPointerException at java.io.File. (Unknown Source) at org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:117) at org.apache.lucene.index.IndexWriter. (IndexWriter.java:204) at org.wikimedia.lsearch.importer.SimpleIndexWriter.openIndex(SimpleIndexWriter.java:67) at org.wikimedia.lsearch.importer.SimpleIndexWriter. (SimpleIndexWriter.java:49) at org.wikimedia.lsearch.importer.DumpImporter. (DumpImporter.java:39) at org.wikimedia.lsearch.importer.Importer.main(Importer.java:128) when running the index creation, it can be because your host name changed (check $HOSTNAME on command line). In that case, update lsearch-global.conf

Darkoneko m'écrire 13:22, 23 July 2008 (UTC)

LuceneSearch is not available anymore?
LuceneSearch extension was developed for MediaWiki version 1.12 which IS the current version. But the box on the top of the page says it is not to be used with the current version, and the extension is not available in SVN any more. WHY is that? Am I missing something? Oduvan 13:37, 7 August 2008 (UTC)
 * Seems like someone moved around some extensions. I've updated the link on Extension:LuceneSearch to point to right location. --Rainman 19:49, 7 August 2008 (UTC)

Running multiple lsearch daemons
Hi, I am setting up a server which hosts several wikis. We want to use the lucene search for some of them so I have to config several lsearch daemons.

Although I change the Search.Port variable in the lsearch.conf file (Search.port=8124), and after starting the first lsearch, the second lsearch daemon complains about the port 8123 is being used.

Log from first lsearch: 452 [main] INFO  org.wikimedia.lsearch.interoperability.RMIServer  - RMIMessenger bound 493 [main] INFO  org.wikimedia.lsearch.interoperability.RMIServer  - RemoteSearchable bound 495 [Thread-1] INFO  org.wikimedia.lsearch.frontend.HTTPIndexServer  - Started server at port 8321 495 [Thread-2] INFO  org.wikimedia.lsearch.frontend.SearchServer  - Binding server to port 8123 497 [main] INFO  org.wikimedia.lsearch.search.Warmup  - Warming up index hiflydb ...

Log from second lsearch: 471 [main] INFO  org.wikimedia.lsearch.interoperability.RMIServer  - RMIMessenger bound 511 [main] INFO  org.wikimedia.lsearch.interoperability.RMIServer  - RemoteSearchable bound 514 [main] INFO  org.wikimedia.lsearch.search.Warmup  - Warming up index sgidb ... 565 [Thread-1] INFO  org.wikimedia.lsearch.frontend.HTTPIndexServer  - Started server at port 8322 565 [Thread-2] INFO  org.wikimedia.lsearch.frontend.SearchServer  - Binding server to port 8123 565 [Thread-2] FATAL org.wikimedia.lsearch.frontend.SearchServer  - Error: bind error: Address already in use What I'm doing wrong?

Thanks for your help. --2 October 2008

Hi,

I ran into the same problem, but I found out, that the SearchServer class does not parse the configuration for Search.Port.

The HTTPIndexServer on the other side parses the configuration for Index.port.

I suggest, that ther should be code like the following in the SearchServer class as well. [...] public class HTTPIndexServer extends Thread { [...] int port = config.getInt("Index","port",8321); [...]

I will try this out this. Hopefully I will post successful results afterwards.

Regards, -- Voglerp 14:21, 20 October 2008 (UTC)

So here are my test results:

I added the following two lines  into the SearchServer class [...] public class SearchServer extends Thread { [...] org.apache.log4j.Logger log = Logger.getLogger(SearchServer.class); 1 // Read port setting from configfile, if not found set default 2 port = config.getInt("Search","port",8123);

log.info("Searcher started on port " + port); [...]

Now the Searcher listens to the port specified in the configuration or to the default port 8123. But a new problem is, that it is no longer possible to specify the port on the commandline with -port.

Is it possible to change the code that both options will work?

Kind regards, Peter --Voglerp 08:06, 23 October 2008 (UTC)

Error when trying to run lsearch daemon
Everytime I run lsearchd I get the following error: RMI registry started. [java] Trying config file at path /root/.lsearch.conf [java] Trying config file at path /usr/local/search/ls2-bin/lsearch.conf [java] Ignoring a line up to first section heading...    [java] Ignoring a line up to first section heading...     [java] Ignoring a line up to first section heading...     [java] Ignoring a line up to first section heading...     [java] Ignoring a line up to first section heading...     [java] Ignoring a line up to first section heading...     [java] Ignoring a line up to first section heading...     [java] Ignoring a line up to first section heading...     [java] Ignoring a line up to first section heading...     [java] ERROR in GlobalConfiguration: Default path for index absent. Check section [Index-Path].

and this is what the [Index-Path] section of the global config looks like:

[Index-Path] : /mwsearch
 * 1) Rsync path where indexes are on hosts, after default value put
 * 2) hosts where the location differs
 * 3) Syntax: host :

any suggestions?

--Dgat16 20:28, 12 October 2008 (UTC)

try the Following:

[Index-Path] : /mwsearch 127.0.0.1 : mwsearch2
 * 1) Path where indexes are on hosts, after default value put hosts where
 * 2) the location differs

--Bachenberg 13:15, 27 August 2009 (UTC)

need help for small wiki farm
I have a small wiki farm with to wikis, mywiki-en and mywiki-de running on the same wiki software and sharing the same mysql database wikidb.

The mysql tables for both wikis are prefixed, with en_ or with de_ respectively.

mywiki-en is in English

mywiki-de is in German.

I know how to make two separate dump files, wikidb_en.dump and wikidb_de.dump by using the commands export REQUEST_URI=/wiki/en && php /wwd/wiki/maintenance/dumpBackup.php --current --quiet > wikidb_en.xml export REQUEST_URI=/wiki/de && php /wwd/wiki/maintenance/dumpBackup.php --current --quiet > wikidb_de.xml My question is: how do I configure Lucene and mwsearch, so that

- for searches in /wiki/de it uses the indexes created from wikidb_de.xml,

- for searches in /wiki/en it uses the indexes created from wikidb_en.xml

I would not desire that hits in the english wiki show up as serch results for queries in wiki/de, and the other way around.

I also need to know how to configure lsearch-global.conf So far I have written there [Database] wikidb : (single) (language,en) (warmup,10) but this is of course not correct: the dabase wikidb contains two wikis, one on German, one in English.

I hope that somebody can help me a bit.

Thank you, Alois 16:06, 29 October 2008 (UTC)

Searching what the user sees or searching what's behind the scenes
It seems to make no sense to search the unrendered wiki-text rather than the final product. I don't see why wiki comments should be included in the search but the contents of included templates are not. It really should be the other way round. For those wikis using the semantic media wiki extension, they also find that the results of inline queries are excluded from the search, that also seems like something that needs to change. Perhaps there is a place for a search that looks behind the scenes. It may be of interest to a wiki-site manager, but for a standard user the search really needs to be of the actual page contents. Pnelnik 17:41, 28 November 2008 (UTC)
 * Agreed that it doesn't. However, it is not a matter of if it makes sense or not, but whether it is difficult or easy to do. There is no easy way to reconstruct articles with templates from very large xml dumps, and no advanced way to integrate updates from OAI with templates, queues and such. This is one of those places where the flexibility of MW in one regard (e.g. syntax and caching) make a huge trade-off with other (ability to have a decent search). --Rainman 02:04, 30 November 2008 (UTC)

Lack of sane defaults
This extension suffers from a lack of sane defaults, which makes setting it up unnecessarily confusing. I will give some examples from the instructions.


 * mwdumper.jar: should be IN subversion. There is no reason to have to checkout the code for the extension and then get another file
 * speaking of subversion, the root should be moved up a level. The root should not be 'lucene-search-2' if you are going to ask them to put that in a parent directory called 'search'. The root should be 'search', and it should already contain the 'indexes' subdirectory. The instructions should then read 'svn co http://svn.wikimedia.org/svnroot/mediawiki/trunk/search /usr/local/search'.
 * MWConfig.global: specifically asks for a "URL", which have a very specific meaning, and gives an example of only a url. That's great for a multi-host configuration, which most mediawiki installations are not. The default path to this file should be /usr/local/search/ls2/lsearch-global.conf. If this is not an acceptable path, you should say so. The file:/// prefix that is used in these wiki instructions is not what people expect to see.
 * MWConfig.lib: Here you use a standard path, which people normally expect. But this is NOT what they expect since you have told them to use 'file:///' in the previous instructions on the wiki (but not in the configuration file). This is confusing!!!!
 * Localization.url: Back to the file:/// prefix. AGHHHH. There is no need to specify that it is a file. File paths are unambiguous without file:///.
 * Logging.logconfig: There is no reason to prompt the user for the location of this file if you put it in the ls2 directory by default, and make that the default location.

I believe that, up to this point, every single configuration step could have been avoided if there had been sane defaults in place. I don't have the energy to do the rest. --Alterego 18:47, 5 January 2009 (UTC)


 * I agree that the configuration is overly complicated, that is why the devel branch has a one-step script that will generate and connect all of the configuration in single-host installs. As for url/local file distinction, it follows a simple rule: everything that is global and shared across the search cluster (e.g. global config and MW files) is url, everything local (e.g. local config, indexes path, local log4j config and library files...) is a local path, although that is probably not obvious from the variable names... --Rainman 19:20, 5 January 2009 (UTC)

LSEARCH Daemon init script for SUSE

 * from Pierre Boisvert.
 * this is our init script for the daemon. It is simple but work for us, so it coult help others as well.
 * 1) chkconfig: 2345 80 20
 * 2) description: Apache Lucene is a high-performance, full-featured text \
 * 3)              search engine library written entirely in Java
 * 4) processname: lsearchd
 * 5) config: /etc/lsearch.conf
 * 6) pidfile: /var/run/lsearchd.pid

. /etc/rc.status
 * 1) Source function library.

JAVA=/usr/bin/java PROG=lsearchd BASEDIR=/usr/local/bin/ls2-bin LOG_FILE=/var/log/lsearchd.log PID_FILE=/var/run/lsearchd.pid PROG_BIN="$JAVA -Djava.rmi.server.codebase=file://$BASEDIR/LuceneSearch.jar -Djava.rmi.server.hostname=$HOSTNAME -jar $BASEDIR/LuceneSearch.jar" CHECK_PROC=`ps -ef | grep $JAVA | grep -v grep | wc -l`

rc_reset

start { echo -n $"Starting $PROG: " if [ ! -f $PID_FILE ] then $PROG_BIN >$LOG_FILE $* 2>&1 & echo $! > $PID_FILE else if [ $CHECK_PROC -gt 0 ] then echo "The LSEARCHD Daemon already started" rc_failed else echo "Removing old Pid file..." rm $PID_FILE $PROG_BIN $* >LOGFILE  2>&1 &  echo $! > $PID_FILE fi   fi    rc_status -v

} stop { echo -n $"Stopping $prog: " /sbin/killproc -p $PID_FILE -v $JAVA rc_status -v } status{ echo -n "Checking for Lsearchd daemon " checkproc -p $PID_FILE $JAVA rc_status -v } usage { echo $"Usage: ${prog} {start|stop|restart|reload|status|help" exit 1 }

case "$1" in   start)      start;;    stop)       stop;; status)    status;;    restart)    stop && start;; *)         usage;; esac rc_exit
 * 1) See how we were called.

=2009=

./configure for v. 2.1 does not seem to work
Running Ubuntu 8.04, Ant 1.7, Java 1.6.0_07, using the Binary install package: user@host: ./configure /path/to/mw/install

"0 [main] WARN org.wikimedia.lsearch.util.Command - Got exit value 1 while executing [/bin/bash, -c, cd /path/to/mw/install && (echo "return \$wgDBname" | php maintenance/eval.php)] Exception in thread "main" java.io.IOException: Error executing command: 	at org.wikimedia.lsearch.util.Command.exec(Command.java:45)	at org.wikimedia.lsearch.util.Configure.getVariable(Configure.java:77)	at org.wikimedia.lsearch.util.Configure.main(Configure.java:42)

user@host: sudo ./configure /path/to/mw/install 0 [main] WARN org.wikimedia.lsearch.util.Command - Got exit value 1 while executing [/bin/bash, -c, cd /path/to/mw/install && (echo "return \$wgDBname" | php maintenance/eval.php)] Exception in thread "main" java.io.IOException: Error executing command: at org.wikimedia.lsearch.util.Command.exec(Command.java:45) at org.wikimedia.lsearch.util.Configure.getVariable(Configure.java:77) at org.wikimedia.lsearch.util.Configure.main(Configure.java:42)

user@host: sudo su root@host: ./configure /path/to/mw/install 0 [main] WARN org.wikimedia.lsearch.util.Command - Got exit value 1 while executing [/bin/bash, -c, cd  /path/to/mw/instal && (echo "return \$wgDBname" | php maintenance/eval.php)] Exception in thread "main" java.io.IOException: Error executing command: at org.wikimedia.lsearch.util.Command.exec(Command.java:45) at org.wikimedia.lsearch.util.Configure.getVariable(Configure.java:77) at org.wikimedia.lsearch.util.Configure.main(Configure.java:42) Seems to me that this is highly unlikely to be a permissions issue. My MW installation is working just fine otherwise.

I can't even get past the first step of the instructions, which does not bode well. Will try building from source, but doubt that will make any difference.... Any ideas? --Fungiblename 20:38, 18 March 2009 (UTC)


 * You need to replace /path/to/mw/install with the actual path to your mediawiki installation (e.g. something like /var/www/mediawiki/). --Rainman 21:07, 18 March 2009 (UTC)
 * Thanks, I was using my actual path but did not want to reproduce it here in full. I was able to compile the SVN version, however, and even after changing the "hostname" variable to my actual hostname as recognized by Apache, I get the following:

./configure /var/www/mw 0 [main] WARN org.wikimedia.lsearch.util.Command - Got exit value 1 while executing [/bin/bash, -c, cd /var/www/mw && (echo "return \$wgDBname" | php maintenance/eval.php)] Exception in thread "main" java.io.IOException: Error executing command: at org.wikimedia.lsearch.util.Command.exec(Command.java:45) at org.wikimedia.lsearch.util.Configure.getVariable(Configure.java:77) at org.wikimedia.lsearch.util.Configure.main(Configure.java:42) --Fungiblename 21:14, 18 March 2009 (UTC)
 * If you go into your mw installation dir (i.e. one you supplied) and run echo "return \$wgDBname" | php maintenance/eval.php</tt> what do you get? Do you get the name of your database? --Rainman 21:28, 18 March 2009 (UTC)


 * Thanks for the troubleshooting advice! It seems like this was a major an oversight on my part. I get the same error as above because I'm running a small wiki farm with shared code (symlinks from the install directory to the shared MediaWiki code). Once I wrote "export MW_INSTALL_PATH=/var/www/mw && ./configure /var/www/mw" it wrote all the config files. You may want to add a note on the main page about configuring for installations with shared code (at least this very basic step). I'll play around on my own to try to find a way to have multiple separate indexes (my plan is to set up multiple directories with separate config files, index directories, and a symlink to the main jar). I'll try to get it working with just one first, though. Thanks again for your help and all your hard work on this! --Fungiblename 07:39, 19 March 2009 (UTC)


 * For me configure sets wrong value of dbname in config.ini and it cause . Here I see "dbname=> DatabaseName>". Note wrong ">" signs. Calling <tt>echo "return \$wgDBname" | php maintenance/eval.php</tt> returns

> DatabaseName >
 * eval.php at some servers prints prompt to stdout. I found that it happens when php function posix_isatty exists. Sometimes it does not.
 * Also configure wants php to be in PATH. It is not always true either. --Roma7
 * I had the same problem and solved it. It seems that the ./configure didn't "recognize" the PHP in the LAMP package and so I simply installed PHP CLI and it worked...--Gregra 21:55, 5 December 2009 (UTC)

Here's just a taste of my output from trying to build from source of the STABLE version
user@host:~/common/elements/lucene-SVN-stable-2009-03-18$ ant Buildfile: build.xml

build: [mkdir] Created dir: /home/username/common/elements/lucene-SVN-stable-2009-03-18/bin [javac] Compiling 101 source files to /home/username/common/elements/lucene-SVN-stable-2009-03-18/bin [javac] /home/username/common/elements/lucene-SVN-stable-2009-03-18/src/org/wikimedia/lsearch/analyzers/WikiQueryParser.java:24: package org.mediawiki.importer does not exist [javac] import org.mediawiki.importer.ExactListFilter; [javac]                             ^ [javac] /home/username/common/elements/lucene-SVN-stable-2009-03-18/src/org/wikimedia/lsearch/importer/DumpImporter.java:13: package org.mediawiki.importer does not exist... .... rTest.java uses or overrides a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] Note: Some input files use unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. [javac] 70 errors

BUILD FAILED /home/username/common/elements/lucene-SVN-stable-2009-03-18/build.xml:68: Compile failed; see the compiler error output for details.

Total time: 2 seconds

"ant -Xlint:deprecation -f build.xml Unknown argument: -Xlint:deprecation"

Does anyone have any instructions about how to even get this thing running? Are there some hidden instructions/prerequisites that I'm missing? Seems to me this should be pretty easy to run on Linux.... --Fungiblename 20:53, 18 March 2009 (UTC)
 * Must place "mwdumper.jar" in "lib" of directory downloaded from SVN. --Fungiblename 21:12, 18 March 2009 (UTC)

Unable to build
When building from the binary I get this error. I am in Ubuntu:

root@testwiki:/usr/share/mediawiki/extensions/lucene-search-2.1# ./build Dumping wikidb... 2009-03-19 20:14:42: wikidb 99 pages (143.215/sec), 100 revs (144.661/sec), ETA 2009-03-19 20:14:45 [max 513] 2009-03-19 20:14:42: wikidb 199 pages (192.676/sec), 200 revs (193.645/sec), ETA 2009-03-19 20:14:44 [max 513] 2009-03-19 20:14:43: wikidb 299 pages (222.928/sec), 300 revs (223.674/sec), ETA 2009-03-19 20:14:44 [max 513] 2009-03-19 20:14:43: wikidb 399 pages (230.430/sec), 400 revs (231.008/sec), ETA 2009-03-19 20:14:44 [max 513] 2009-03-19 20:14:43: wikidb 458 pages (243.707/sec), 458 revs (243.707/sec), ETA 2009-03-19 20:14:44 [max 513] mkdir: cannot create directory `/var/lib/mediawiki/extensions/lucene-search-2.1/indexes/status': No such file or directory ./build: line 19: /var/lib/mediawiki/extensions/lucene-search-2.1/indexes/status/wikidb: No such file or directory MediaWiki lucene-search indexer - rebuild all indexes associated with a database. Trying config file at path /root/.lsearch.conf Trying config file at path /var/lib/mediawiki/extensions/lucene-search-2.1/lsearch.conf MediaWiki lucene-search indexer - index builder from xml database dumps.

1   [main] INFO  org.wikimedia.lsearch.util.Localization  - Reading localization for En 2799 [main] INFO  org.wikimedia.lsearch.ranks.Links  - Making index at /var/lib/mediawiki/extensions/lucene-search-2.1/indexes/import/wikidb.links 3208 [main] INFO org.wikimedia.lsearch.ranks.LinksBuilder  - Calculating article links... 458 pages (26.889/sec), 458 revs (26.889/sec) 21058 [main] INFO org.wikimedia.lsearch.index.IndexThread  - Making snapshot for wikidb.links 21291 [main] INFO org.wikimedia.lsearch.index.IndexThread  - Made snapshot /var/lib/mediawiki/extensions/lucene-search-2.1/indexes/snapshot/wikidb.links/20090319161516 21405 [main] INFO org.wikimedia.lsearch.search.UpdateThread  - Syncing wikidb.links 21963 [main] INFO org.wikimedia.lsearch.ranks.Links  - Opening for read /var/lib/mediawiki/extensions/lucene-search-2.1/indexes/search/wikidb.links 21973 [main] INFO org.wikimedia.lsearch.related.RelatedBuilder  - Rebuilding related mapping from links 34467 [main] INFO org.wikimedia.lsearch.index.IndexThread  - Making snapshot for wikidb.related 34649 [main] INFO org.wikimedia.lsearch.index.IndexThread  - Made snapshot /var/lib/mediawiki/extensions/lucene-search-2.1/indexes/snapshot/wikidb.related/20090319161529 34661 [main] INFO org.wikimedia.lsearch.importer.Importer  - Indexing articles (index+highlight+titles)... 34663 [main] INFO org.wikimedia.lsearch.ranks.Links  - Opening for read /var/lib/mediawiki/extensions/lucene-search-2.1/indexes/search/wikidb.links 35075 [main] INFO org.wikimedia.lsearch.analyzers.StopWords  - Successfully loaded stop words for: [nl, en, it, fr, de, sv, es, no, pt, da] in 329 ms 35077 [main] INFO  org.wikimedia.lsearch.importer.SimpleIndexWriter  - Making new index at /var/lib/mediawiki/extensions/lucene-search-2.1/indexes/import/wikidb 35087 [main] INFO org.wikimedia.lsearch.importer.SimpleIndexWriter  - Making new index at /var/lib/mediawiki/extensions/lucene-search-2.1/indexes/import/wikidb.hl Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException at java.lang.System.arraycopy(libgcj.so.81) at java.io.ByteArrayOutputStream.write(libgcj.so.81) at org.apache.lucene.index.FieldsReader.uncompress(FieldsReader.java:514) at org.apache.lucene.index.FieldsReader.addField(FieldsReader.java:317) at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:166) at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:659) at org.apache.lucene.index.IndexReader.document(IndexReader.java:525) at org.wikimedia.lsearch.storage.RelatedStorage.getRelated(RelatedStorage.java:56) at org.wikimedia.lsearch.importer.DumpImporter.writeEndPage(DumpImporter.java:109) at org.mediawiki.importer.PageFilter.writeEndPage(Unknown Source) at org.mediawiki.importer.XmlDumpReader.closePage(Unknown Source) at org.mediawiki.importer.XmlDumpReader.endElement(Unknown Source) at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanEndElement(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source) at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source) at org.apache.xerces.jaxp.SAXParserImpl.parse(Unknown Source) at javax.xml.parsers.SAXParser.parse(libgcj.so.81) at org.mediawiki.importer.XmlDumpReader.readDump(Unknown Source) at org.wikimedia.lsearch.importer.Importer.main(Importer.java:186) at org.wikimedia.lsearch.importer.BuildAll.main(BuildAll.java:109) root@testwiki:/usr/share/mediawiki/extensions/lucene-search-2.1#

root@testwiki:/usr/share/mediawiki/extensions/lucene-search-2.1# java -version java version "1.5.0" gij (GNU libgcj) version 4.2.4 (Ubuntu 4.2.4-1ubuntu3)

Copyright (C) 2007 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. root@testwiki:/usr/share/mediawiki/extensions/lucene-search-2.1# javac Eclipse Java Compiler v_774_R33x, 3.3.1 Copyright IBM Corp 2000, 2007. All rights reserved. Usage: Now, add the following to LocalSettings.php: # lsearch require_once("extensions/MWSearch/MWSearch.php"); $wgSearchType = 'LuceneSearch'; $wgLuceneHost = 'YourHostName'; # <-- change this! $wgLucenePort = 8123; # uncomment this if you use lucene-search 2.1 # (MUST be AFTER the require_once!) $wgLuceneSearchVersion = 2.1;

Where YourHostName is the results of 'hostname'. The search doesn't work on my machine if I use the default, "192.168.0.1".

How to customize synonyms and stop words?
How can I edit the synonyms and stop words in order to bring the engine more in line with our needs?
 * You need to checkout the source from svn. Then edit resources/dist/wordnet-en.txt (for synonyms) and stopwords-en.txt. If this does not work, then you could also try making your own Filter class and plugging it in into the FilterFactory class. --Rainman 18:31, 1 May 2009 (UTC)

Thanks. I have done as you suggest. However, I do not see any indication that the system is ignoring stop words (e.g. if I search with the word "me", I get results). I also do not know how to confirm that the synonyms are working. Are there some good tests I could run to verify? Marc 14:31, 6 May 2009 (MDT)

Searching Attachments
I am running MediaWiki 1.13.1 PHP 5.2.4-2ubuntu5.6(apache2handler) MySQL 5.0.51a-3ubuntu5.4

I have the FileIndexer

http://www.mediawiki.org/wiki/Extension:FileIndexer

and

http://www.mediawiki.org/wiki/Extension:MWSearch

now installed and running.

The Lucene search capability seems to work far better than the default search capability except that it no longer generates search results from attachements that were turned into text and then inserted in the image field

Is this a limitation of the present software? I had hoped the Lucene Search would index the attachments, especially given the use of the FileIndexer.

Is is significant that the FQDN is http://wiki.tesla.local/ (on a local LAN) but that the hostname is wiki

Attached are the configuration files.

lsearch.conf

MWConfig.global=file:///home/chris/lucene-search-2.1/lsearch-global.conf Indexes.path=/home/chris/lucene-search-2.1/indexes Rsync.path=/usr/bin/rsync Search.updateinterval=0.1 Search.updatedelay=0 Search.checkinterval=10 Search.warmupaggregate=true Search.ramdirectory=false Search.disablewordnet=true SearcherPool.size=1 Index.snapshotinterval=2880 Index.maxqueuecount=5000 Index.maxqueuetimeout=12 Localization.url=file:///home/chris/public_html_3/wiki/languages/messages OAI.maxqueue=5000 OAI.bufferdocs=500 Logging.logconfig=/home/chris/lucene-search-2.1/lsearch.log4j Logging.debug=false lsearch-global.conf [Database] MediaWiki : (single) (spell,4,2) (language,en) [Search-Group] wiki : * [Index] wiki : * [Index-Path] : /search [OAI] : http://localhost/index.php [Namespace-Boost] : (0,2) (1,0.5) [Namespace-Prefix] all : [0] : 0 [1] : 1 [2] : 2 [3] : 3 [4] : 4 [5] : 5 [6] : 6 [7] : 7 [8] : 8 [9] : 9 [10] : 10 [11] : 11 [12] : 12 [13] : 13 [14] : 14 [15] : 15
 * 1) By default, will check /etc/lsearch.conf
 * 1) Global configuration
 * 1) Global configuration
 * 1) URL to global configuration, this is the shared main config file, it can
 * 2) be on a NFS partition or available somewhere on the network
 * 1) Local path to root directory of indexes
 * 1) Path to rsync
 * 1) Extra params for rsync
 * 2) Rsync.params=--bwlimit=8192
 * 1) Search node related configuration
 * 1) Search node related configuration
 * 1) Port of http daemon, if different from default 8123
 * 2) Search.port=8000
 * 1) In minutes, how frequently will the index host be checked for updates
 * 1) In seconds, delay after which the update will be fetched
 * 2) used to scatter the updates around the hour
 * 1) In seconds, how frequently the dead search nodes should be checked
 * 1) In milliseconds, for how long should the query be executed
 * 2) Search.timelimit=1000
 * 1) if to wait for aggregates to warm up before deploying the searcher
 * 1) cache *whole* index in RAM
 * 1) Disable wordnet aliases
 * 1) If this host runs on multiple CPUs maintain a pool of index searchers
 * 2) It's good idea to make it number of CPUs+1, or some larger odd number
 * 1) Indexer related configuration
 * 1) Indexer related configuration
 * 1) In minutes, how frequently is a clean snapshot of index created
 * 1) Daemon type (http is started by default)
 * 2) Index.daemon=xmlrpc
 * 1) Port of daemon (default is 8321)
 * 2) Index.port=8080
 * 1) Maximal queue size after which index is being updated
 * 1) Maximal time an update can remain in queue before being processed (in seconds)
 * 1) If to delete all old snapshots always (default to false - leaves the last good snapshot)
 * 2) Index.delsnapshots=true
 * 1) Log, ganglia, localization
 * 1) Log, ganglia, localization
 * 1) URL to MediaWiki message files
 * 1) Username/password for password authenticated OAI repo
 * 2) OAI.username=user
 * 3) OAI.password=pass
 * 1) Max queue size on remote indexer after which we wait a bit
 * 1) Number of docs to buffer before sending to inc updater
 * 1) Log configuration
 * 1) Set debug to true to diagnose problems with log4j configuration
 * 1) Turn this on to broadcast status to a Ganglia reporting system.
 * 2) Requires that 'gmetric' be in the PATH and runnable. You can
 * 3) override the default UDP broadcast port and interface if required.
 * 4) Ganglia.report=true
 * 5) Ganglia.port=8649
 * 6) Ganglia.interface=eth0
 * 1) Global search cluster layout configuration
 * 1) Global search cluster layout configuration

config.inc dbname=MediaWiki wgScriptPath= hostname=wiki indexes=/home/chris/lucene-search-2.1/indexes mediawiki=/home/chris/public_html_3/wiki base=/home/chris/lucene-search-2.1 wgServer=http://localhost


 * Unfortunately lucene-search won't search attachments no matter what kind of extra extension you use. You could however try Extension:EzMwLucene which is also lucene-based but has a different set of features, doesn't have some lucene-search stuff, but has attachment search. --Rainman 09:31, 26 May 2009 (UTC)


 * Thank you so much for the prompt response. I will try the Extension:EzMwLucene search as attachment searching is key feature I would like in our company wiki.


 * Thanks a bunch Rainman. Do you know offhand what the major differences are between both Lucene extensions?  We have Lucene-search installed but would like to enable EzMwLucene but it would be good to know what the feature differences are. --Gkullberg 13:59, 3 July 2009 (UTC)

Search within files?
Is it possible to use Lucene to search within files uploaded to MediaWiki?

On the Lucene page on Wikipedia it says:

"At the core of Lucene's logical architecture is the idea of a document containing fields of text. This flexibility allows Lucene's API to be independent of the file format. Text from PDFs, HTML, Microsoft Word, and OpenDocument documents, as well as many others can all be indexed so long as their textual information can be extracted."

It would be great if I could search within PDFs and Docs and whatever else I upload to my MediaWiki instance. --Gkullberg 19:55, 2 July 2009 (UTC)
 * See answer to previous question.... --Rainman 10:08, 3 July 2009 (UTC)

How to use CJKAnalyzer
Is it possible to use CJKAnalyzer for indexing pages written in Japanese?
 * Yes, just change (language,en) to (language,ja) in your config file (and re-run the build process). --Rainman 08:27, 10 July 2009 (UTC)

Periodic fatal errors while rebuilding index - "no segments* file"
I'm running Lucene-search on our local wiki. The build script runs correctly and produces a valid index, which is picked up by the daemon, and everything works fine...for a bit. I've created a cron job that runs the build script hourly, with the output of the script being emailed to me. The cron job runs happily for a spell and then I receive this in the output:

MediaWiki lucene-search indexer - rebuild all indexes associated with a database. Trying config file at path /home/system/mymintel-svc/.lsearch.conf Trying config file at path /data/mymintel/mediawiki/lucene_search/lsearch.conf MediaWiki lucene-search indexer - index builder from xml database dumps.

0   [main] INFO  org.wikimedia.lsearch.util.Localization  - Reading localization for En 582  [main] INFO  org.wikimedia.lsearch.ranks.Links  - Making index at /data/mymintel/mediawiki/lucene_search/indexes/import/it_wiki.links 924 [main] INFO  org.wikimedia.lsearch.ranks.LinksBuilder  - Calculating article links... 3,759 pages (338.679/sec), 3,759 revs (338.679/sec) 14271 [main] INFO org.wikimedia.lsearch.index.IndexThread  - Making snapshot for it_wiki.links 14645 [main] INFO org.wikimedia.lsearch.index.IndexThread  - Made snapshot /data/mymintel/mediawiki/lucene_search/indexes/snapshot/it_wiki.links/20090731050111 14696 [main] INFO org.wikimedia.lsearch.search.UpdateThread  - Syncing it_wiki.links 15632 [main] INFO org.wikimedia.lsearch.ranks.Links  - Opening for read /data/mymintel/mediawiki/lucene_search/indexes/search/it_wiki.links 15637 [main] INFO org.wikimedia.lsearch.related.RelatedBuilder  - Rebuilding related mapping from links 15640 [main] FATAL org.wikimedia.lsearch.importer.Importer - Cannot make related mapping: no segments* file found in  org.apache.lucene.store.FSDirectory@/data/mymintel/mediawiki/lucene_search/indexes/search/it_wiki.links: files: MediaWiki lucene-search indexer - build spelling suggestion index.

16802 [main] INFO org.wikimedia.lsearch.spell.SuggestBuilder  - Building spell-check for it_wiki 16802 [main] INFO org.wikimedia.lsearch.util.Localization  - Reading localization for En 16931 [main] INFO  org.wikimedia.lsearch.spell.SuggestBuilder  - Rebuilding precursor index... 17037 [main] INFO org.wikimedia.lsearch.analyzers.StopWords  - Successfully loaded stop words for: [nl, en, it, fr, de, sv, es, no, pt, da] in 68 ms 17039 [main] INFO  org.wikimedia.lsearch.spell.CleanIndexWriter  - Using phrase stopwords: [only, theirs, some, where, being, after, doing, did, they, herself, as, so, our, than, your, for, down, the, other, of, does, no, ours, with, from, them, by, also, you, hers, until, yourself, has, she, it, up, why, have, this, those, about, between, which, under, these, i, yours, but, his, myself, yourselves, having, more, be, her, into, its, an, he, on, over, was, here, to, such, above, because, nor, had, him, below, and, whoever, during, their, itself, been, most, that, out, each, or, a, own, all, what, in, ourselves, were, themselves, both, not, same, do, am, too, once, any, when, then, who, how, whom, my, through, there, before, very, we, against, few, while, again, me, at, if, himself, are, is, off, further] 17129 [main] INFO org.wikimedia.lsearch.ranks.Links  - Opening for read /data/mymintel/mediawiki/lucene_search/indexes/search/it_wiki.links java.io.IOException: no segments* file found in org.apache.lucene.store.FSDirectory@/data/mymintel/mediawiki/lucene_search/indexes/search/it_wiki.links: files: at org.mediawiki.importer.XmlDumpReader.readDump(Unknown Source)

From this point onwards, the job will not run correctly until I have deleted the indexes directory and started from scratch.

I've dumped the directory structure of the filesystem when the index is working correctly, and when it's broken; the output is below.

Working config
indexes/ |-- import |  |-- it_wiki |  |   |-- _7.cfs |  |   |-- segments.gen |  |   `-- segments_h |  |-- it_wiki.hl |   |   |-- _7.cfs |  |   |-- segments.gen |  |   `-- segments_h |  |-- it_wiki.links |  |   |-- _8.cfs |  |   |-- segments.gen |  |   `-- segments_j |  |-- it_wiki.related |  |   |-- _d.cfs |  |   |-- segments.gen |  |   `-- segments_t |  |-- it_wiki.spell |  |   |-- _1v.cfs |  |   |-- segments.gen |  |   `-- segments_3t |  `-- it_wiki.spell.pre |      |-- _8.cfs |      |-- segments.gen |      `-- segments_j |-- index |  |-- it_wiki |  |   |-- _7.cfs |  |   |-- segments.gen |  |   `-- segments_h |  |-- it_wiki.hl |   |   |-- _7.cfs |  |   |-- segments.gen |  |   `-- segments_h |  |-- it_wiki.links |  |   |-- _8.cfs |  |   |-- segments.gen |  |   `-- segments_j |  `-- it_wiki.spell.pre |      |-- _8.cfs |      |-- segments.gen |      `-- segments_j |-- search |  |-- it_wiki -> /data/mymintel/mediawiki/lucene_search/indexes/update/it_wiki/20090730163156 |  |-- it_wiki.hl -> /data/mymintel/mediawiki/lucene_search/indexes/update/it_wiki.hl/20090730163156 |  |-- it_wiki.links -> /data/mymintel/mediawiki/lucene_search/indexes/update/it_wiki.links/20090730163123 |  |-- it_wiki.related -> /data/mymintel/mediawiki/lucene_search/indexes/update/it_wiki.related/20090730163127 |  `-- it_wiki.spell -> /data/mymintel/mediawiki/lucene_search/indexes/update/it_wiki.spell/20090730163230 |-- snapshot |  |-- it_wiki |  |   `-- 20090730163156 |   |       |-- _7.cfs |  |       |-- segments.gen |  |       `-- segments_h |  |-- it_wiki.hl |   |   `-- 20090730163156 |  |       |-- _7.cfs |  |       |-- segments.gen |  |       `-- segments_h |  |-- it_wiki.links |  |   `-- 20090730163123 |   |       |-- _8.cfs |  |       |-- segments.gen |  |       `-- segments_j |  |-- it_wiki.related |  |   `-- 20090730163127 |   |       |-- _d.cfs |  |       |-- segments.gen |  |       `-- segments_t |  |-- it_wiki.spell |  |   `-- 20090730163230 |   |       |-- _1v.cfs |  |       |-- segments.gen |  |       `-- segments_3t |  `-- it_wiki.spell.pre |      `-- 20090730163210 |           |-- _8.cfs |          |-- segments.gen |          `-- segments_j |-- status |  `-- it_wiki `-- update |-- it_wiki |  `-- 20090730163156     |       |-- _7.cfs |      |-- segments.gen |      `-- segments_h |-- it_wiki.hl    |   `-- 20090730163156 |      |-- _7.cfs |      |-- segments.gen |      `-- segments_h |-- it_wiki.links |  `-- 20090730163123     |       |-- _8.cfs |      |-- segments.gen |      `-- segments_j |-- it_wiki.related |  `-- 20090730163127     |       |-- _d.cfs |      |-- segments.gen |      `-- segments_t `-- it_wiki.spell `-- 20090730163230            |-- _1v.cfs |-- segments.gen `-- segments_3t

Broken Config
indexes/ |-- import |  |-- it_wiki |  |   |-- _2f.cfs |  |   |-- segments.gen |  |   `-- segments_58 |  |-- it_wiki.hl |   |   |-- _2f.cfs |  |   |-- segments.gen |  |   `-- segments_58 |  |-- it_wiki.links |  |   |-- _5h.cfs |  |   |-- segments.gen |  |   `-- segments_bm |  |-- it_wiki.related |  |   |-- _4n.cfs |  |   |-- segments.gen |  |   `-- segments_9o |  |-- it_wiki.spell |  |   |-- _oj.cfs |  |   |-- segments.gen |  |   `-- segments_1dh |  `-- it_wiki.spell.pre |      |-- _39.fdt |      |-- _39.fdx |      |-- segments.gen |      |-- segments_74 |      `-- write.lock |-- index |  |-- it_wiki |  |   |-- _2f.cfs |  |   |-- segments.gen |  |   `-- segments_58 |  |-- it_wiki.hl |   |   |-- _2f.cfs |  |   |-- segments.gen |  |   `-- segments_58 |  |-- it_wiki.links |  |   |-- _5h.cfs |  |   |-- segments.gen |  |   `-- segments_bm |  `-- it_wiki.spell.pre |      |-- _39.fdt |      |-- _39.fdx |      |-- segments.gen |      |-- segments_74 |      `-- write.lock |-- search |  |-- it_wiki -> /data/mymintel/mediawiki/lucene_search/indexes/update/it_wiki/20090731040228 |  |-- it_wiki.hl -> /data/mymintel/mediawiki/lucene_search/indexes/update/it_wiki.hl/20090731040228 |  |-- it_wiki.links |  |   |-- 20090731050111 -> /data/mymintel/mediawiki/lucene_search/indexes/update/it_wiki.links/20090731050111 |  |   |-- 20090731060116 -> /data/mymintel/mediawiki/lucene_search/indexes/update/it_wiki.links/20090731060116 |  |   |-- 20090731070104 -> /data/mymintel/mediawiki/lucene_search/indexes/update/it_wiki.links/20090731070104 |  |   |-- 20090731080121 -> /data/mymintel/mediawiki/lucene_search/indexes/update/it_wiki.links/20090731080121 |  |   |-- 20090731090112 -> /data/mymintel/mediawiki/lucene_search/indexes/update/it_wiki.links/20090731090112 |  |   |-- 20090731100113 -> /data/mymintel/mediawiki/lucene_search/indexes/update/it_wiki.links/20090731100113 |  |   |-- 20090731110108 -> /data/mymintel/mediawiki/lucene_search/indexes/update/it_wiki.links/20090731110108 |  |   |-- 20090731120051 -> /data/mymintel/mediawiki/lucene_search/indexes/update/it_wiki.links/20090731120051 |  |   `-- 20090731130055 -> /data/mymintel/mediawiki/lucene_search/indexes/update/it_wiki.links/20090731130055 |  |-- it_wiki.related -> /data/mymintel/mediawiki/lucene_search/indexes/update/it_wiki.related/20090731040125 |  `-- it_wiki.spell -> /data/mymintel/mediawiki/lucene_search/indexes/update/it_wiki.spell/20090731040320 |-- snapshot |  |-- it_wiki |  |   |-- 20090731030246 |   |   |   |-- _27.cfs |  |   |   |-- segments.gen |  |   |   `-- segments_4r |  |   `-- 20090731040228 |   |       |-- _2f.cfs |  |       |-- segments.gen |  |       `-- segments_58 |  |-- it_wiki.hl |   |   |-- 20090731030247 |  |   |   |-- _27.cfs |  |   |   |-- segments.gen |  |   |   `-- segments_4r |  |   `-- 20090731040228 |   |       |-- _2f.cfs |  |       |-- segments.gen |  |       `-- segments_58 |  |-- it_wiki.links |  |   |-- 20090731120051 |   |   |   |-- _58.cfs |  |   |   |-- segments.gen |  |   |   `-- segments_b3 |  |   `-- 20090731130055 |   |       |-- _5h.cfs |  |       |-- segments.gen |  |       `-- segments_bm |  |-- it_wiki.related |  |   |-- 20090731030132 |   |   |   |-- _49.cfs |  |   |   |-- segments.gen |  |   |   `-- segments_8v |  |   `-- 20090731040125 |   |       |-- _4n.cfs |  |       |-- segments.gen |  |       `-- segments_9o |  |-- it_wiki.spell |  |   |-- 20090731030355 |   |   |   |-- _mn.cfs |  |   |   |-- segments.gen |  |   |   `-- segments_19o |  |   `-- 20090731040320 |   |       |-- _oj.cfs |  |       |-- segments.gen |  |       `-- segments_1dh |  `-- it_wiki.spell.pre |      |-- 20090731030320 |       |   |-- _2z.cfs |      |   |-- segments.gen |      |   `-- segments_6c |      `-- 20090731040253 |           |-- _38.cfs |          |-- segments.gen |          `-- segments_6v |-- status |  `-- it_wiki `-- update |-- it_wiki |  |-- 20090731030246     |   |   |-- _27.cfs |  |   |-- segments.gen |  |   `-- segments_4r |  `-- 20090731040228     |       |-- _2f.cfs |      |-- segments.gen |      `-- segments_58 |-- it_wiki.hl    |   |-- 20090731030247 |  |   |-- _27.cfs |  |   |-- segments.gen |  |   `-- segments_4r |  `-- 20090731040228     |       |-- _2f.cfs |      |-- segments.gen |      `-- segments_58 |-- it_wiki.links |  |-- 20090731120051     |   |   |-- _58.cfs |  |   |-- segments.gen |  |   `-- segments_b3 |  `-- 20090731130055     |       |-- _5h.cfs |      |-- segments.gen |      `-- segments_bm |-- it_wiki.related |  |-- 20090731030132     |   |   |-- _49.cfs |  |   |-- segments.gen |  |   `-- segments_8v |  `-- 20090731040125     |       |-- _4n.cfs |      |-- segments.gen |      `-- segments_9o `-- it_wiki.spell |-- 20090731030355        |   |-- _mn.cfs |  |-- segments.gen |  `-- segments_19o `-- 20090731040320            |-- _oj.cfs |-- segments.gen `-- segments_1dh

As you can see, the contents of index/search/it_wiki.links is completely different. I suspect that it's this that's causing the problem, but I don't know enough about what's going on to diagnose. Java version is: java version "1.5.0_14-p8" Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_14-p8-root_04_sep_2008_18_49) Java HotSpot(TM) 64-Bit Server VM (build 1.5.0_14-p8-root_04_sep_2008_18_49, mixed mode)

...and i'm running on FreeBSD 7.0, if that makes a difference. Any ideas what's going on? It'd be nice not to have to delete and rebuild the indexes by hand every day!

So this part

|-- search |  |-- it_wiki.links |  |   |-- 20090731050111 -> /data/mymintel/mediawiki/lucene_search/indexes/update/it_wiki.links/20090731050111 |  |   |-- 20090731060116 -> /data/mymintel/mediawiki/lucene_search/indexes/update/it_wiki.links/20090731060116 |  |   |-- 20090731070104 -> /data/mymintel/mediawiki/lucene_search/indexes/update/it_wiki.links/20090731070104 |  |   |-- 20090731080121 -> /data/mymintel/mediawiki/lucene_search/indexes/update/it_wiki.links/20090731080121 |  |   |-- 20090731090112 -> /data/mymintel/mediawiki/lucene_search/indexes/update/it_wiki.links/20090731090112 |  |   |-- 20090731100113 -> /data/mymintel/mediawiki/lucene_search/indexes/update/it_wiki.links/20090731100113 |  |   |-- 20090731110108 -> /data/mymintel/mediawiki/lucene_search/indexes/update/it_wiki.links/20090731110108 |  |   |-- 20090731120051 -> /data/mymintel/mediawiki/lucene_search/indexes/update/it_wiki.links/20090731120051 |  |   `-- 20090731130055 -> /data/mymintel/mediawiki/lucene_search/indexes/update/it_wiki.links/20090731130055

Looks quite wrong.. all the files in search/ should be symlinks and should not have any subdirectories.. I'm not sure how these are created. You are sure that the whole build process takes less than an hour? If you get overlapping jobs trying to do the same thing they might lock eachother indexes. --Rainman 16:43, 3 August 2009 (UTC)

When I run the process by hand, it never takes more than 5 mins to complete, so I'd be very surprised if jobs are overlapping. Would probably be a good idea to make certain though, so I'll change the cronjob to time the process and send an update next time it fails. -- Mrgroucho 16:48, 3 August 2009 (UTC)

OK, I think we can rule out overlapping jobs. The indexer ran successfully as scheduled for over 12 hours yesterday, and then failed early this morning. The time stats for the job, and the one preceding it, are as follows:

Preceding Job
real		 3m24.606s user		 2m15.323s sys		 0m16.361s

Failed Job
real		 0m59.046s user		 0m20.773s sys		 0m2.410s

The failed job takes less time, but you'd expect that: it failed. I've noted that the output mentions various Threads - is there any way that this could be some sort of race condition/locking problem between those threads? -- Mrgroucho 13:44, 4 August 2009 (UTC)

Any idea at all on how to fix this? I've put a workaround in place that deletes all of the indexes every midnight and then runs ./build, which means that if it breaks during the day the indexes will never be massively out of date, but it's hardly a pretty fix. -- Mrgroucho 13:33, 7 August 2009 (UTC)

I've had the same problem. The index building job (cronned for every hour) started to fail every couple of weeks, and then every week, and then every few days etc., until I was manually clearing out and rebuild a couple of times a day! I'd already written a wrapper script around the 'build' script, which was just for making the process a bit more cron-friendly, checking the search daemon was running, and for aborting it altogether when my server-backup routines are in operation etc. I've now decided to have it build the indexes in a new location every hour, and then 'cut-over' if no error is encountered. So far, this seems to be effective.

The indexes folder I was using seemed to be growing exponentially, and I think this may have been related to the problem. However, not being a Java or Lucene expert, I think this is a problem I'm gonna have to continue to work-around instead of solving. --140.131.255.2 05:38, 7 September 2009 (UTC)


 * Yeah, I have this problem too. If anyone can post a solution - I'd be grateful.  Even a workaround script.  Thanks.  --Robinson Weijman 09:08, 5 March 2010 (UTC)


 * We too are experiencing this problem and I am surprised no solution has been posted although many have reported this exact problem on the net. The last solution I tried was to do a "rm -r" of the whole "indexes" directory before each call to "build" the index. The problem still arises but at least, when it fails once, it usually succeeds afterwards. I think I gave a great chance to LuceneSearch 2.1.3 and I go to 2.0, hoping this problem will disappear. Phil Reid 30 Nov 2010

I also have this problem, and do indeed believe it to be some sort of locking issue. The way I (think) I've "solved" it is a bit of a kluge -- I am killing the search daemon (lsearchd) before the update and restarting it after the update. At the moment there is an interruption of 30 seconds every hour. This is not great, but I figure this is better than the index not being updated. Looking forward to a more elegant fix [Thu Dec 16 21:10:30 GMT 2010]


 * So much for that. This kluge worked until the symlink <tt>lucene-search-2.1.3/indexes/search/wikidb.links</tt> (pointing to <tt>lucene-search-2.1.3/indexes/update/wikidb.links/&lt;datestamp&gt;</tt>) was replaced by <tt>lucene-search-2.1.3/indexes/search/wikidb.links</tt> as a directory in its own right, containing symlinks to the datestamped dirs. I've now modifed the update script to check if <tt>indexes/search/wikidb.links</tt> is a symlink and recreate it if not, making the kluge even worse. [Thu Dec 16 23:55:20 GMT 2010]

We also have this stupid problem. I'm looking forward to change the cron script and remove the index if there is a failure. It would be nice if you post yours cron script dirty workarrounds. At this time, my cron bash script does check the logfile for errors after the update script runs, e.g.: CHECK=$(grep -i "Rebuild I/O error:" $LOGFILE | head -1 | cut -c10) If there is a failure, I send a mail... --BSG2000 15:46, 20 December 2010 (UTC)

how to fix this when I run ./lsearchd
0rz </usr/local/search/ls2-bin> # ./lsearchd RMI registry started. Trying config file at path /root/.lsearch.conf Trying config file at path /usr/local/search/ls2-bin/lsearch.conf Exception in thread "main" java.lang.NullPointerException at org.wikimedia.lsearch.config.GlobalConfiguration.makeIndexIdPool(GlobalConfiguration.java:531) at org.wikimedia.lsearch.config.GlobalConfiguration.read(GlobalConfiguration.java:413) at org.wikimedia.lsearch.config.GlobalConfiguration.readFromURL(GlobalConfiguration.java:247) at org.wikimedia.lsearch.config.Configuration. (Configuration.java:116) at org.wikimedia.lsearch.config.Configuration.open(Configuration.java:68) at org.wikimedia.lsearch.config.StartupManager.main(StartupManager.java:39) 0rz </usr/local/search/ls2-bin> #

my environment is: 0rz </usr/local/search/ls2-bin> # java -version java version "1.6.0_13" Java(TM) SE Runtime Environment (build 1.6.0_13-b03) Java HotSpot(TM) Client VM (build 11.3-b02, mixed mode, sharing) 0rz </usr/local/search/ls2-bin> # ant -version Apache Ant version 1.7.1 compiled on June 27 2008 0rz </usr/local/search/ls2-bin> #

LSearch Daemon Init Script for Ubuntu

 * This is just a sample. You will need to adjust this based on where you put the lucene-search directory.
 * 1) !/bin/sh -e
 * 2) BEGIN INIT INFO
 * 3) Provides:             lsearchd
 * 4) Required-Start:       $syslog
 * 5) Required-Stop:        $syslog
 * 6) Default-Start:        2 3 4 5
 * 7) Default-Stop:         1
 * 8) Short-Description:    Start the Lucene Search daemon
 * 9) Description:          Provide a Lucene Search backend for MediaWiki
 * 10) END INIT INFO

test -x /usr/local/lucene-search-2.1/lsearchd || exit 0

OPTIONS="" if [ -f "/etc/default/lsearchd" ] ; then . /etc/default/lsearchd fi

. /lib/lsb/init-functions

case "$1" in start)    cd /usr/local/lucene-search-2.1    log_begin_msg "Starting Lucene Search Daemon..."    start-stop-daemon --start --quiet --oknodo --chdir /usr/local/lucene-search-2.1 --background --exec /usr/local/lucene-search-2.1/lsearchd -- $OPTIONS    log_end_msg $?    ;;  stop) log_begin_msg "Stopping Lucene Search Daemon..." start-stop-daemon --stop --quiet --oknodo --retry 2 --chdir /usr/local/lucene-search-2.1 --exec /usr/local/lucene-search-2.1/lsearchd log_end_msg $? ;; restart)    $0 stop    sleep 1    $0 start    ;;  reload|force-reload) log_begin_msg "Reloading Lucene Search Daemon..." stat-stop-daemon --stop -signal 1 --chdir /usr/local/lucene-search-2.1 --exec /usr/local/lucene-search-2.1/lsearchd log_end_msg $? ;; status)    status_of_proc /usr/local/lucene-search-2.1/lsearchd lsearchd && exit 0 || exit $?    ;;  *) log_success_msg "Usage: /etc/init.d/lsearchd {start|stop|restart|reload|force-reload|status}" exit 1 esac

exit 0 55,6

Error here when use configure
Hi, After I run "ant" to build the jar and generate configuration files, here comes the error. What's wrong?

Mediawiki: 1.15.1; Lucence: 2.1; OS: centOS [root@xxx lucene-search-2.1]# ./configure /var/wk/ Exception in thread "main" java.net.UnknownHostException: 00:16:3h:2d:6c:b0-hk0.localdomain: 00:16:3h:2d:6c:b0-hk0.localdomain at java.net.InetAddress.getLocalHost(InetAddress.java:1425) at org.wikimedia.lsearch.util.Configure.main(Configure.java:52) --Alpha3 11:26, 27 August 2009 (UTC)

In which context config.inc will be used?
--Ans 08:22, 4 September 2009 (UTC)
 * It is used in build process "./build" --Ans 09:40, 4 September 2009 (UTC)

Install from SVN or Binary.
I would recommend SVN any day. I've been through several installations of Lucene Search this morning, and the most rapid and problem-free methods was to use the SVN approach. It was also the *easiest* method to get Lucene to work - it just works. Also reads your MW config and produces its own *correct* configuration files.

MWSearch works just fine and dandy on top of this Lucene instance.

The 2.02 installation went badly, several times for me - and it chews a LOT more resources. I did get it working, but when it came to rebuilding indexes, it spewed up on the whole Computer - chewed 100% Chip, chewed more than 100% RAM, which caused Kernel Panic and failures. Had to reboot forcibly with hardware. Re-attempted several times and re-configured settings to test against "should be working" configuration - got the same results with the machine crawling. So I gave up, went to SVN, and instead of choking on the indexes, it rebuilt them in under 20 seconds. I understand there are Java issues around this. Forget them, it's not worth breaking Java on the System, or putting up with some strange configuration, just to get Lucene working. Gooooo SVN!! :-)

BTW: It should be made apparent on the Lucene-search Extension page that the SVN installation DOES work, and works VERY well. I had previously avoided this method as I am SVN-wary - where with a bit of prompting, that would have been my first choice.

Cheers, Mike

Brief period with zero search results (using update script)
Our wiki runs the "update" script every 15 minutes, and the update takes about 2 minutes. Updates are done locally on the single wiki server.

Unfortunately, for a brief time while this script runs, searches return zero results. This problem lasts just a few seconds, but our users do encounter it and become confused.

Any advice on eliminating this "zero results" period? We thought about running two different reindexing processes, each writing to a different directory, and switching between them with a symbolic link. Something like:


 * 1) At 6:00, update index #1 in /usr/local/lucene1.
 * 2) Point symbolic link /usr/local/lucene at /usr/local/lucene1.
 * 3) At 6:15, update index #2 in /usr/local/lucene2.
 * 4) Point symbolic link /usr/local/lucene at /usr/local/lucene2.
 * 5) At 6:30, update index #1 in /usr/local/lucene1.
 * 6) Point symbolic link /usr/local/lucene at /usr/local/lucene1.

But I don't see a way to make Lucene reindex one directory while serving out of another. Any better suggestions? Maiden taiwan 15:03, 23 September 2009 (UTC)


 * This should not happen. Are there any errors in logs during this period? --Rainman 17:20, 23 September 2009 (UTC)


 * Yes: the update script outputs:

MediaWiki lucene-search indexer - build a map of related articles. ... 413 [main] INFO  org.wikimedia.lsearch.related.RelatedBuilder  - Rebuilding related mapping from links 416 [main] FATAL org.wikimedia.lsearch.related.RelatedBuilder  - Rebuild I/O error: no segments* file found in org.apache.lucene.store.FSDirectory@/usr/local/lucene-search-2.1/indexes/search/wikidb.links: files: java.io.FileNotFoundException: no segments* file found in org.apache.lucene.store.FSDirectory@/usr/local/lucene-search-2.1/indexes/search/wikidb.links: files: at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:587) at org.apache.lucene.index.DirectoryIndexReader.open(DirectoryIndexReader.java:63) at org.apache.lucene.index.IndexReader.open(IndexReader.java:209) at org.apache.lucene.index.IndexReader.open(IndexReader.java:173) at org.wikimedia.lsearch.ranks.Links.flushForRead(Links.java:213) at org.wikimedia.lsearch.ranks.Links.ensureRead(Links.java:239) at org.wikimedia.lsearch.ranks.Links.getKeys(Links.java:773) at org.wikimedia.lsearch.related.RelatedBuilder.rebuildFromLinks(RelatedBuilder.java:91) at org.wikimedia.lsearch.related.RelatedBuilder.main(RelatedBuilder.java:72)


 * The named folder (wikidb.links) contains only symbolic links named after timestamps: 20090924121512, etc., linking to folders. Inside the folders (that exist) are files:

-rw-r--r-- 2 root root 4952161 Sep 24 12:00 _hq.cfs -rw-r--r-- 2 root root     46 Sep 24 12:00 segments_172 -rw-r--r-- 2 root root     20 Sep 24 12:00 segments.gen


 * --Maiden taiwan 16:29, 24 September 2009 (UTC)

No, this appears to be a separate issues. In any case, the extension does use symbolic links to quickly switch between the new and the old index, and it also allows for the new and the old index to co-exist for a while until all the old searches finish or timeout. So, that shouldn't be a problem. What could be a problem is that if you have the indexer and searcher on the same machine with insufficient RAM, then the indexer bogs down the machine causing high I/O which then slows down the searchers to the point of searches timing out. --Rainman 08:56, 25 September 2009 (UTC)


 * Thanks. We have plenty of RAM (4 GB I believe) on a virtual machine, and while the load average does go up to about 3.0 - 4.0 during indexing, users don't perceive any slowness. That is, the search query returns quickly with zero results.  How long is the timeout? Maiden taiwan 11:49, 25 September 2009 (UTC)

svn revision # for 2.1.2
What is the revision number for the 2.1.2 binary hosted at SourceForge?

I am having trouble getting any search results when building from the HEAD of http://svn.wikimedia.org/svnroot/mediawiki/branches/lucene-search-2.1/ The index builds fine, but when I query I get no results. Returns exception:

<tt>java.lang.IllegalArgumentException: nDocs must be > 0</tt>

Querying directly to http://localhost:8123/search/wikidb/help gives:

267
 * 1) info search=[gziebold-15624s.local], highlight=[], suggest=[gziebold-15624s.local] in 46 ms
 * 2) no suggestion
 * 3) interwiki 0 0
 * 4) results 0

However, indexing and running based on 2.1.2 binary works fine.
 * It's the svn revision of the date of release, don't know offhand, you'll have to look it up. I've built some indexes but didn't have problems with latest svn, can you provide a full stack trace? --Rainman 11:09, 10 November 2009 (UTC)

RMI registry started. Trying config file at path /Users/gziebold/.lsearch.conf Trying config file at path /Users/gziebold/Projects/mediawiki/lucene-search-2.1.built/lsearch.conf 0   [main] INFO  org.wikimedia.lsearch.util.Localization  - Reading localization for En 733  [main] INFO  org.wikimedia.lsearch.interoperability.RMIServer  - RMIMessenger bound 737 [Thread-1] INFO  org.wikimedia.lsearch.frontend.HTTPIndexServer  - Indexer started on port 8321 739 [Thread-2] INFO  org.wikimedia.lsearch.frontend.SearchServer  - Searcher started on port 8123 746 [Thread-5] INFO  org.wikimedia.lsearch.search.SearcherCache  - Starting initial deployer for [wikidb, wikidb.hl, wikidb.links, wikidb.related, wikidb.spell] 818 [Thread-5] INFO  org.wikimedia.lsearch.search.SearcherCache  - Caching meta fields for wikidb ... 2522 [Thread-5] INFO  org.wikimedia.lsearch.search.SearcherCache  - Finished caching wikidb in 1705 ms 2554 [Thread-5] INFO  org.wikimedia.lsearch.interoperability.RMIServer  - RemoteSearchable $0 bound 2562 [Thread-5] INFO org.wikimedia.lsearch.interoperability.RMIServer  - RemoteSearchable<wikidb.hl>$0 bound 2567 [Thread-5] INFO org.wikimedia.lsearch.interoperability.RMIServer  - RemoteSearchable<wikidb.links>$0 bound 2575 [Thread-5] INFO org.wikimedia.lsearch.interoperability.RMIServer  - RemoteSearchable<wikidb.related>$0 bound 2582 [Thread-5] INFO org.wikimedia.lsearch.interoperability.RMIServer  - RemoteSearchable<wikidb.spell>$0 bound 6879 [Thread-8] INFO org.wikimedia.lsearch.frontend.HttpMonitor  - HttpMonitor thread started 6881 [pool-2-thread-1] INFO org.wikimedia.lsearch.frontend.HttpHandler  - query:/search/wikidb/wind?namespaces=0%2C500&offset=0&limit=20&version=2.1&iwlimit=10 what:search dbname:wikidb term:wind 6919 [pool-2-thread-1] INFO org.wikimedia.lsearch.analyzers.StopWords  - Successfully loaded stop words for: [nl, en, it, fr, de, sv, es, no, pt, da] in 21 ms 7052 [pool-2-thread-1] INFO  org.wikimedia.lsearch.search.SearchEngine  - Using FilterWrapper wrap: {0, 500} [] java.lang.IllegalArgumentException: nDocs must be > 0 at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:110) at org.wikimedia.lsearch.search.WikiSearcher.search(WikiSearcher.java:184) at org.apache.lucene.search.Searcher.search(Searcher.java:132) at org.wikimedia.lsearch.search.SearchEngine.search(SearchEngine.java:722) at org.wikimedia.lsearch.search.SearchEngine.search(SearchEngine.java:129) at org.wikimedia.lsearch.frontend.SearchDaemon.processRequest(SearchDaemon.java:101) at org.wikimedia.lsearch.frontend.HttpHandler.handle(HttpHandler.java:193) at org.wikimedia.lsearch.frontend.HttpHandler.run(HttpHandler.java:114) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:637) 7076 [pool-2-thread-1] WARN org.wikimedia.lsearch.search.SearchEngine  - Retry, temporal error for query: [wind] on wikidb : nDocs must be > 0 java.lang.IllegalArgumentException: nDocs must be > 0 at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:110) at org.wikimedia.lsearch.search.WikiSearcher.search(WikiSearcher.java:184) at org.apache.lucene.search.Searcher.search(Searcher.java:132) at org.wikimedia.lsearch.search.SearchEngine.search(SearchEngine.java:722) at org.wikimedia.lsearch.search.SearchEngine.search(SearchEngine.java:129) at org.wikimedia.lsearch.frontend.SearchDaemon.processRequest(SearchDaemon.java:101) at org.wikimedia.lsearch.frontend.HttpHandler.handle(HttpHandler.java:193) at org.wikimedia.lsearch.frontend.HttpHandler.run(HttpHandler.java:114) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:637)

I also confirmed that the index built from the HEAD is fine. If I swap the LuceneSearch.jar from 2.1.2 binary and run it against indexes built from HEAD, it works. Processing query with LuceneSearch.jar from HEAD fails.

Other details: MediaWiki 1.15.1 (r50) MWSearch (Version r45173)

-- Looks like r48153 == 2.1.2 Or at least I was able to get that to successfully work with MediaWiki 1.15.1 --GregZ 04:06, 11 November 2009 (UTC)


 * Ah I see.. fixed in svn. Thanks for the report. --Rainman 11:23, 11 November 2009 (UTC)

Category search
Can anyone elaborate on the following from OVERVIEW.txt?

searching categories. Syntax is: query incategory:"exact category name". It is important to note that category names are themselves not tokenized. Using logical operators, intersection, union and difference of categories can be searched. Since exact category is needed (only case is not important), it is maybe best to incorporate this somewhere on category page, and have category name put into query by MediaWiki instead manually by user.

The incategory: syntax does not appear to work as described (in 2.1.2)

Also, what is meant by ...it is maybe best to incorporate this somewhere on category page, and have category name put into query by MediaWiki instead manually by user. ? Is the suggestion to put a search form on the Category page and insert the incategory: syntax there?


 * It does work, but only for categories that are not added via templates, but in main article text. E.g. . --Rainman 11:02, 10 November 2009 (UTC)

Ah ha. That explains why my incategory: query was not working. The categories were added via templates. This is caused because lucene-search indexes the wikitext for articles and does not resolve templates? Has there been discussion of a different index of the rendered html content instead of only indexing wikitext?


 * Yes, however, the current mediawiki architecture makes it difficult to do... in fact, what we would want is article not in html, but in wikitext with expanded templates... In any case, you won't find a search extension that does it. --Rainman 15:51, 10 November 2009 (UTC)


 * How about; Keep indexing wikitext, but access the MediaWiki database to determine category relationships. Maiden taiwan 18:30, 16 December 2009 (UTC)


 * On default mysql install that won't scale very well, otherwise it would be implemented a long time ago. --Rainman 22:31, 16 December 2009 (UTC)


 * Thanks. Can you explain why it wouldn't scale? Couldn't it be done while Lucene builds the index -- just read the Mediawiki database tables once and you're done?  Or maybe it could be dynamic like DPL does: it checks category membership dynamically just fine. Finally, even if it's slow, could you make this behavior an option and let the sysadmin decide whether it works on his/her site?  Right now "incategory" produces different results than Mediawiki Category pages do. That seems like a bug.... Thank you. --Maiden taiwan 01:06, 17 December 2009 (UTC)

Also, can you comment on this ...it is maybe best to incorporate this somewhere on category page, and have category name put into query by MediaWiki instead manually by user. ? Is the suggestion to put a search form on the Category page and insert the incategory: syntax there?


 * Well yes... if someone would have done it that would be nice.. You need to take into consideration that the file has probably been written at 2am on some sunday, and not take everything in it very seriously ;) --Rainman 15:51, 10 November 2009 (UTC)

I simply needed the "late-night fog" translation. :) --GregZ 04:03, 11 November 2009 (UTC)

Is incategory going to be fixed?
Is the problem of incategory and transcluded category tags planned to be fixed? If incategory is returning wrong results (missing all articles with transcluded category tags), people are going to see incomplete search results and make wrong decisions ("Well, I guess there are no articles that match my query..."). This just happened on our wiki. Can't the extension just look at the wiki database to discover category relationships? Thanks. Maiden taiwan 18:02, 16 December 2009 (UTC)


 * No, there are no plans to fix it. Lucene-search is primarily developed towards needs of WMF, and because this would never get enabled on WMF projects due to inefficiency it is not planned to be implemented. However, this is an open-source project and if you need this functionality you can make it yourself or pay someone to do it. --Rainman 12:52, 19 December 2009 (UTC)

= 2010 =

Build function don't submit IP-Address
We are using an IP based authentication system for our Wiki. Unfortunately the Lucene Searchengine doesn't submit an IP-Adress when it creates the Index

wikidb is being deployed?
Hi, I wanna ask a question about Lucene-search. I have receieved an error message when I try to access http://localhost:8123/search/wikidb/test. It said "wikidb is being deployed or is not searched by this host". The full error message listed below: 7754 [pool-2-thread-2] ERROR org.wikimedia.lsearch.search.SearchEngine - Internal error in SearchEngine trying to make WikiSearcher: wikidb is being deployed or is not searched by this host java.lang.RuntimeException: wikidb is being deployed or is not searched by this host at org.wikimedia.lsearch.search.SearcherCache.getLocalSearcher(SearcherCache.java:369) at org.wikimedia.lsearch.search.WikiSearcher. (WikiSearcher.java:96) at org.wikimedia.lsearch.search.SearchEngine.search(SearchEngine.java:686) at org.wikimedia.lsearch.search.SearchEngine.search(SearchEngine.java:115) at org.wikimedia.lsearch.frontend.SearchDaemon.processRequest(SearchDaemon.java:92) at org.wikimedia.lsearch.frontend.HttpHandler.handle(HttpHandler.java:193) at org.wikimedia.lsearch.frontend.HttpHandler.run(HttpHandler.java:114) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636) Thanks. --PhiLiP 10:41, 6 January 2010 (UTC)


 * I think you should concentrate on "is not searched by this host" part.. check your hostnames in lsearch-global.conf --Rainman 11:32, 6 January 2010 (UTC)


 * I checked my hostname is "philip-ubuntu-pc", and the hostnames in lsearch-global.conf are also "philip-ubuntu-pc". The $wgDBname in my LocalSettings.php is "wikidb". And my MediaWiki is functional normally except Special:Search.

// lsearch-global.conf ...

[Search-Group] philip-ubuntu-pc : *

[Index] philip-ubuntu-pc : *

...
 * --PhiLiP 11:41, 6 January 2010 (UTC)

Seems I've found my way out. Forgot to excute. How stupid I am... --PhiLiP 17:44, 6 January 2010 (UTC)
 * Well that is not your fault, the error message should have suggested it or detected it... /me makes mental note --Rainman 18:07, 7 January 2010 (UTC)

Initial Error while saying "./configure" ...any suggestions? I definitely need help... trying to install this since years ;-).
$:/lucene-search-2.1 # ./configure /srv/www/vhosts/mySubDomain.com/subd                                                            omains/sysdoc/httpdocs/ Exception in thread "main" java.io.IOException: Cannot run program "/bin/bash":                                                             java.io.IOException: error=12, Cannot allocate memory at java.lang.ProcessBuilder.start(Unknown Source) at java.lang.Runtime.exec(Unknown Source) at java.lang.Runtime.exec(Unknown Source) at org.wikimedia.lsearch.util.Command.exec(Command.java:41) at org.wikimedia.lsearch.util.Configure.getVariable(Configure.java:84) at org.wikimedia.lsearch.util.Configure.main(Configure.java:49) Caused by: java.io.IOException: java.io.IOException: error=12, Cannot allocate m                                                            emory at java.lang.UNIXProcess. (Unknown Source) at java.lang.ProcessImpl.start(Unknown Source) ... 6 more

I tryed it from another directory too:

Exception in thread "main" java.lang.NoClassDefFoundError: org/wikimedia/lsearch/util/Configure Caused by: java.lang.ClassNotFoundException: org.wikimedia.lsearch.util.Configure at java.net.URLClassLoader$1.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at java.lang.ClassLoader.loadClassInternal(Unknown Source) Could not find the main class: org.wikimedia.lsearch.util.Configure. Program will exit.

What can I do???

I've got this from the repository (binary):

configure.inc: dbname=wikilucene wgScriptPath=/wiki/phase3 hostname=oblak indexes=/opt/lucene-search/indexes mediawiki=/var/www/wiki/phase3 base=/opt/lucene-search wgServer=http://localhost

Looks like some private data. what does it mean/do?

Why this extension is so ugly maintained?


 * My first guess would be to google that IOException error you are getting and make sure that you have enough memory etc on your hosting plan. The "private" data is some defaults. Please note that adopting an insulting tone towards developers will not help answer your questions. This is an open-source project developed purely in after-hours and free time and as such is not as polished for commercial use. --Rainman 11:20, 11 January 2010 (UTC)

Sorry for that tone, but it is very very frustrating and I m trying this install really since years on different servers. I m not a java-guru but web developer with 10 years experience and an open source developer too. I already googled for that exception and got no answer at all. Any ideas why it says cannot run bash programm? Do I have to set a path variable or something else? What are the ram requierements?


 * From what I gathered, this error appears either when /tmp is full, there is not enough RAM to make a process, or there is some kind of limit on number of processes one can have on shared hosting. In theory, for a smaller wiki 128MB of available RAM should be enough, but lucene-search hasn't been built or optimized to run on scarce resources, on the contrary, it is optimized to make use of large amounts of memory and multiple CPU cores to run most efficiently under heavy load. --Rainman 21:12, 11 January 2010 (UTC)

Is there a standard command to determine a process limit on a suse vserver? I found etc/security/limits.conf but there is nothing in there. My tmp-dir and RAM seems to be ok.
 * It is best to contact your hosting support on this one. --Rainman 10:13, 12 January 2010 (UTC)

I am getting the same error. I have BlueHost (shared hosting). I guess I'll have to use one of the other search extensions. Tisane 06:23, 28 March 2010 (UTC)

Can't search any string/keyword with dot (".")
My Lucence-Search 2.1 is working well, except for searching keyword with dot (.) in the middle.

For example, I am able to search "javadoc", and get results with any combinations with it including "javadoc.abc".

However, if I search "javadoc.abc" directly, I get nothing.

Any idea is greatly appreciated.


 * There seems to be a bug currently with parsing phrases with stuff like dots in them, because dots are handled specially in the index. Should work if you drop the quotes and bring up the most relevant result. --Rainman 12:51, 21 January 2010 (UTC)


 * Thank you Rainman for your info, but what do you mean "drop the quotes and bring up the most relevant result"? I am not using any quotes while doing search. Again, if I search for javadoc, I can get all results; if I search for javadoc.abc, I get No page text matches even though there are plenty of pages containing javadoc.abc.


 * Are you sure you're using the lucene-search backend? do the searches you make come up in lucene-search logs? --Rainman 14:43, 23 January 2010 (UTC)


 * I am sure I am using lucene-search 2.1 backend, and is using MWSearch to fetch results. Here is the log from searching javadoc.abc:

Fetching search data from http://192.168.1.20:8123/search/mediawiki/javadocu82eabcu800?namespaces=0&offset=0&limit=20&version=2.1&iwlimit=10&searchall=0 Http::request: GET http://192.168.1.20:8123/search/mediawiki/javadocu82eabcu800?namespaces=0&offset=0&limit=20&version=2.1&iwlimit=10&searchall=0 total [0] hits
 * --Ross Xu 18:52, 5 February 2010 (UTC)

I'm seeing the same issue, a search for 5.5.3 shows results with the standard search engine, and lucene returns nothing. The same search on wikipedia (different content of course) returns results, so it feels like I'm missing something. You can see the query hit lucene in the log:

4051499 [pool-2-thread-9] INFO org.wikimedia.lsearch.frontend.HttpHandler  - query:/search/wikidb/5u800u82e5u800u82e3u800?namespaces=0&offset=0&limit=20&version=2&iwlimit=10&searchall=0 what:search dbname:wikidb term:5u800u82e5u800u82e3u800 4051501 [pool-2-thread-9] INFO org.wikimedia.lsearch.search.SearchEngine  - Using FilterWrapper wrap: {0} [] 4051504 [pool-2-thread-9] INFO org.wikimedia.lsearch.search.SearchEngine  - search wikidb: query=[5u800u82e5u800u82e3u800] parsed=[custom(+(+contents:5^0.2 +contents:u800u82e5u800u82e3u800^0.2) relevance ([((P contents:"5 u800u82e5u800u82e3u800"~100) (((P sections:"5") (P sections:"u800u82e5u800u82e3u800") (P sections:"5 u800u82e5u800u82e3u800"))^0.25))^2.0], ((P alttitle:"5 u800u82e5u800u82e3u800"~20^2.5) (P alttitle:"5"^2.5) (P alttitle:"u800u82e5u800u82e3u800"^2.5)) ((P related:"5 u800u82e5u800u82e3u800"^12.0) (P related:"u800u82e5u800u82e3u800"^12.0) (P related:"5"^12.0))) (P alttitle:"5 u800u82e5u800u82e3u800"~20))] hit=[0] in 2ms using IndexSearcherMul:1264699369439

I'm using mediawiki 1.15.1, lucene-2.1 r61642, and MWSeach r62451 (with a small patch to make it work on 1.15.1)

--Nivfreak 19:18, 16 February 2010 (UTC)


 * You need to download the appropriate MWSearch version for your MediaWiki using the "Download snapshot" link on the MWSearch page. Your patch doesn't seem to resolve all the compatibility issues. --Rainman 03:55, 17 February 2010 (UTC)


 * You are absolutely right, and I knew better. I'm not even sure why I moved to the trunk version anymore. That solved my problems. Sorry for wasting your time. Nivfreak 18:38, 17 February 2010 (UTC)

Red highlight
How can the red highlight in search results be changed to match Wikipedia's way of working? --Robinson Weijman 08:51, 22 January 2010 (UTC)


 * Set the searchmatch CSS style in your Common.css, but before that check you have the latest MWSearch, as far as I remember we haven't been using red for a while now.. --Rainman 11:40, 22 January 2010 (UTC)


 * Thank you for the prompt response! Our MWSearch is r36482 (we have MW13.2).  I see that the current MWSearch is 37906 - I'll give that a try.  --Robinson Weijman 09:27, 25 January 2010 (UTC)


 * Well, it took a while but I tried an upgrade - no change. I could not find searchmatch CSS style in Common.css.  Any ideas?  --Robinson Weijman 10:07, 11 March 2010 (UTC)


 * Finally solved - it was in the skin's main.css file:

span.searchmatch { color: blue; } --Robinson weijman 13:50, 19 January 2011 (UTC)

Getting search results
Hi - how can I see what people are searching for? And how can I work out how good the searching is e.g. % hits (page matches) / searches? --Robinson Weijman 14:56, 25 January 2010 (UTC)

How to read lsearchd results

 * Alright, I've figured out how to do that (put lsearchd results in file) and what the problem was (too many search daemons running simultaneously!). But I now I'm confronted with a new problem - how to read those results.  Is it documented anywhere?  --Robinson Weijman 15:58, 27 January 2010 (UTC)

Case insensitive?
Hi - how can lucene search be made case insensitive? --Robinson Weijman 10:07, 4 February 2010 (UTC)
 * My mistake, it is case insensitive. What I meant to ask was:

Wildcards
Can wildcards be added by default to a search? --Robinson Weijman 10:41, 4 February 2010 (UTC)


 * Please test first, then ask.. wildcards do work, although they are limited to cases which won't kill the servers (e.g. you cannot do *a*). --Rainman 13:21, 4 February 2010 (UTC)


 * Your statement implies that I did not test first. Of course I did.  Perhaps I was unclear - what I meant to say was can the DEFAULT be when, e.g. if I search for "Exeter" using "Exe" that the default search is "*Exe*".  So I'm sorry I was unclear.  Please don't make assumptions about your customers - I don't appreciate it.  --Robinson Weijman 08:27, 5 February 2010 (UTC)


 * You are not my customer, I'm just a random person like you. In any case, yes this does not work, and will not work, using this kind of wildcards makes search unacceptably slow for any but sites with very few pages (and lucene-search is designed for big sites). If this doesn't suite your needs you can either hack it yourself of pay someone to do it. --Rainman 01:23, 8 February 2010 (UTC)


 * Thanks for the info.--Robinson Weijman 08:24, 8 February 2010 (UTC)

Fatal error with searching anything with colon
I am using Lucene-Search 2.1 and MWsearch.

Whenever I search for any keyword with colon (e.g. searching for "ha:"), I get the Fatal error: Fatal error: Call to undefined method Language::getNamespaceAliases in /var/www/html/wiki/extensions/MWSearch/MWSearch_body.php on line 96

It's the same thing with searching anything like "all:something" and "main:something".

Any idea is appreciated. --Ross Xu 20:10, 10 February 2010 (UTC)

Using 2.1 and "Did You Mean" not appearing
Hi all - as per the title. When I'm running a search, the "Did you mean" functionality does not appear to be working/showing. Do I need to do anything special configuration wise to get this to work or should this just work?

Any idea is appreciated. --Barramya 09:06, 17 February 2010 (UTC)
 * I am having the same issue right now and I can't fix it ... Did doublecheck the line in LocalSettings.php "$wgLuceneSearchVersion = 2.1;" to be NOT uncommented? I am looking forward for a solution. Thank you. Roemer2201 20:16, 19 April 2010 (UTC)


 * This line should be uncommented (as the instructions say), and no other special settings are needed. Verify that:
 * you have matching MediaWiki and MWSearch versions
 * the searches actually reach the lucene-search deamon - you should be able to see them in the console log you get when you start ./lsearchd
 * that lucene-search deamon has started without an error, especially the .spell index (also in the console log)
 * --Rainman 15:22, 20 April 2010 (UTC)

lsearchd doesn't start properly
after server restart my lsearchd (version 2.1) stopped to work:

Trying config file at path /root/.lsearch.conf Trying config file at path /vol/sites/jewage.org/search/ls21/lsearch.conf log4j: Parsing for [root] with value=[INFO, A1]. log4j: Level token is [INFO]. log4j: Category root set to INFO log4j: Parsing appender named "A1". log4j: Parsing layout options for "A1". log4j: Setting property [conversionPattern] to [%-4r [%t] %-5p %c %x - %m%n]. log4j: End of parsing for "A1". log4j: Parsed "A1" options. log4j: Finished configuring. 0   [main] INFO  org.wikimedia.lsearch.interoperability.RMIServer  - RMIMessenger bound
 * 1) ./lsearchd

after this it just hangs up

why would this happen?

java looks normal: java version "1.6.0_17" Java(TM) SE Runtime Environment (build 1.6.0_17-b04) Java HotSpot(TM) Client VM (build 14.3-b01, mixed mode, sharing)
 * 1) java -version

there is my config:

[Database] jewage : (single) (spell,4,2) [Search-Group] jewage.org : * [Index] jewage.org : * [Index-Path] : /search [OAI] : http://localhost/w/index.php [Namespace-Boost] : (0,2) (1,0.5) (110,5) [Namespace-Prefix] all :
 * 1) cat lsearch-global.conf
 * 2) Global search cluster layout configuration
 * 1) Global search cluster layout configuration
 * 1) (language,en)

tried jewage.org : (single) (spell,4,2) without effect --Eugenem 20:02, 17 February 2010 (UTC)

ok, needed to change all host adresses (config.inc, lsearch-global.conf) and rebuild the index --Eugenem 16:22, 18 February 2010 (UTC)

searching plurals returns result
for example searching vpn returns correct dataset with a small description but searching vpns returns same dataset only with the title. is this correct as I thought the search engine does a complete match. is there a way to correct this ?

Is there a way to search the raw wikitext?
Is there a option that will search the raw wikitext? For example, suppose I want to find all pages that use the cite extension tag. Is there a way to specify "return a list of pages using the tag ". I tried searching for on wikipedia, but all I got were references to "ref". Dnessett 18:35, 25 February 2010 (UTC)
 * nope. --Rainman 23:38, 25 February 2010 (UTC)
 * ...but Extension:Replace_Text will do this. Andthepharaohs 07:20, 27 April 2010 (UTC)

External Searches
Is there a way to include external (to the wiki) databases in the search results? Ideally I'd like to see: By "external database", I mean like a content management system containing office / PDF documents. --Robinson Weijman 11:28, 4 March 2010 (UTC)
 * 1) a list of results within the wiki and then
 * 2) underneath, one or more results for each external database.
 * No, unless you write a plugin for MediaWiki the query the external database and then show it on the search page. --Rainman 15:42, 4 March 2010 (UTC)


 * Oops, I missed this reply. Thanks.  --Robinson Weijman 10:30, 19 March 2010 (UTC)

Meta Tags
Can Lucene work with meta tags, e.g. Extension:MetaKeywordsTag. That is, pages with those tags appear higher in searches for those tags. --Robinson Weijman 12:09, 17 March 2010 (UTC)


 * No. --Rainman 13:42, 17 March 2010 (UTC)


 * Is it a good idea to add it? --Robinson Weijman 08:13, 18 March 2010 (UTC)

Ranking Suggestion
Are there any plans to bring out a new Lucene version? I'd like to see functionality dynamically to change the search results based on previous hits and clicks (like Google). Or that users can report "this was a useful / useless link", e.g. by clicking on an up or down arrow. --Robinson Weijman 12:11, 17 March 2010 (UTC)


 * This is very unlikely. --Rainman 13:42, 17 March 2010 (UTC)


 * Why is it unlikely? Is nobody continuing to develop this extension?  Wouldn't this be an improvement?  --Robinson Weijman 08:15, 18 March 2010 (UTC)


 * I am doing some maintenance in my free time, but there is no-one working full time, and no future major changes are currently planned. Of course, there are million things that would be good to have, but as I said, they are probably not happening unless someone else does them. --Rainman 17:32, 18 March 2010 (UTC)


 * OK thanks for the info. Let's hope someone steps forward then.  --Robinson Weijman 10:28, 19 March 2010 (UTC)

Searches with commas
We're trying to use Mediawiki and Lucene search but searches for phrases that have commas don't seem to work. Wikipedia demonstrates the problem too: Tom Crean (explorer) contains the sentence “In 1901, while serving on HMS Ringarooma in New Zealand, he volunteered to join Scott's 1901–04 British National Antarctic Expedition on Discovery, thus beginning his exploring career.” Searching for fragments of this sentence that don't involve commas—for example,  or  —turns up the page easily. But if you search for a fragment with a comma—for example,  or  —there are no matches. Taking the comma out of the search query doesn't help: there are still no matches.

Is this a bug? If it is intended, is there a way to disable it so that comma phrases can be found? Our intended use case, the OEIS, is all about comma-separated search strings. Thanks. --Russ Cox 05:50, 18 March 2010 (UTC)


 * yes, unfortunately it is a (known) bug. --Rainman 17:18, 18 March 2010 (UTC)


 * If I wanted to build a customized version without the bug, where should I be looking for it? I'd be happy to try to track it down, fix it, and send the change back, but I don't know where to start.  Thanks again.  --Russ Cox 18:30, 18 March 2010 (UTC)


 * The bug is a byproduct of a undertested "feature" and is actually very easy to fix, but needs a complete index rebuilt. You need to go to FastWikiTokenizerEngine.java and change MINOR_GAP = 2; to MINOR_GAP=1; --Rainman 21:12, 18 March 2010 (UTC)

New version? 2.1.2 -> 2.1.3
MarkAHershberger has updated the version number. Is there a new release then? --Robinson Weijman 08:10, 22 March 2010 (UTC)

Exception in thread "main" java.lang.NoClassDefFoundError: org/wikimedia/lsearch/util/Configure Caused by: java.lang.ClassNotFoundException: org.wikimedia.lsearch.util.Configure at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:319) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:264) at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:332) Could not find the main class: org.wikimedia.lsearch.util.Configure. Program will exit.

I did release a new version, but if you've found problems with it, please file a bug at bugzilla and assign it to me. --MarkAHershberger 23:31, 30 March 2010 (UTC)


 * Thanks. So can you provide a brief summary of the changes or a link to the release notes? --Robinson Weijman 06:47, 6 April 2010 (UTC)

Sort results by last modified date?
Is it possible to make Lucene sort its results by the last-modification date of the article? Even better, can this be exposed as an option for the user? Maiden taiwan 19:48, 7 April 2010 (UTC)

How to run as CronJob
Hello there, I am new here and i have got Problems to run (build) and (lsearchd) as cronjob. I have Ubuntu 10.4 and MediaWiki (1.15.3) with SMW (1.4.3) und SMW+/SMWhalo (1.4.6) on a virtuel System. There had been some problems running (configure) und (build), but after typing "sudo ..." there was everything ok. I also installed the mentiont Init Skript for Ubuntu (3.14 LSearch Daemon Init Script for Ubuntu) and it runs well.

My Problem is to get everything run as CronJob. I got some Help by googling for "crontab -e" and "sudo crontab -e" (for admin) or "gnome-schedule" as GUI. I tried to write the path to search-engine in PATH in the crontab. This or "export PATH=$PATH:(dein Pfad zum Programmordner)" in SHELL helped to start "lsearchd" @reboot. If I tried to start "build" to reindex my Wiki, there had been Problems by finding the file "config.inc". So I added "$(dirname $0)" infront of "config.inc" and "LuceneSearch.jar". This solves the Problem to run the Skript (build) from another Dir:

source $(dirname $0)/config.inc if [ -n "$1" ]; then dumpfile="$1" else dumps="$base/dumps" [ -e $dumps ] || mkdir $dumps dumpfile="$dumps/dump-$dbname.xml" timestamp=`date -u +%Y-%m-%d` slave=`php $mediawiki/maintenance/getSlaveServer.php \ $dbname \ --conf $mediawiki/LocalSettings.php \ --aconf $mediawiki/AdminSettings.php` echo "Dumping $dbname..." cd $mediawiki && php maintenance/dumpBackup.php \ $dbname \ --conf $mediawiki/LocalSettings.php \ --aconf $mediawiki/AdminSettings.php \ --current \ --server=$slave > $dumpfile [ -e $indexes/status ] || mkdir -p $indexes/status echo "timestamp=$timestamp" > $indexes/status/$dbname fi cd $base && java -cp $(dirname $0)/LuceneSearch.jar org.wikimedia.lsearch.importer.BuildAll $dumpfile $dbname
 * 1) !/bin/bash

Now I will try to start these Skripts from "system-wide crontab" (/etc/crontab), because "build" requires Admin-Rights. In "system-wide crontab" it is posible to run a Skript as root. I hope that's is!

Greetz Benor 16:55, 5 May 2010 (UTC)

Problem with build script (and solution)
I had issues running the build script. The environment setup in config.inc was completely wrong. It turns out the configure script runs maintenance/eval.php which will dump the contents of AdminSettings.php. This interfered with the configure script. The solution is to put the contents of AdminSettings.php into LocalSettings.php and move or delete AdmingSettings.php. Then re-configure and build should run fine.

My versions: Ubuntu 10.04 x64 MediaWiki 1.16wmf4 (r66614) PHP 5.3.2-1ubuntu4.1 (apache2handler) MySQL 5.1.41-3ubuntu12 --Spt5007 23:06, 19 May 2010 (UTC)

What does "spell" directive do in lsearch-global.conf?
What is the meaning of the  directive, e.g.,

wikidb : (single) (spell,4,2) (language,en)

Thank you. Maiden taiwan 18:09, 8 June 2010 (UTC)


 * It means that a spell-check index is going to be built using words occurring in at least 2 articles, and word combinations occurring in at least 4 articles. --Rainman 20:17, 8 June 2010 (UTC)

java.io.IOException: The markup in the document following the root element must be well-formed
When running the lucene  script I get the error:

java.io.IOException: The markup in the document following the root element must be well-formed

Things were working fine until I imported a bunch of new articles into the "en" wiki (below) using, then this error happened on the next reindex. How do you debug a problem like this?

Full output:

Trying config file at path /root/.lsearch.conf Trying config file at path /usr/local/lucene-search-2.1/lsearch.conf 0   [main] INFO  org.wikimedia.lsearch.util.Localization  - Reading localization for De 98   [main] INFO  org.wikimedia.lsearch.util.Localization  - Reading localization for En 150  [main] INFO  org.wikimedia.lsearch.util.Localization  - Reading localization for Es 188  [main] INFO  org.wikimedia.lsearch.util.Localization  - Reading localization for Fr 234  [main] INFO  org.wikimedia.lsearch.util.Localization  - Reading localization for Nl 275  [main] INFO  org.wikimedia.lsearch.oai.OAIHarvester  - de_wikidb using base url: http://de.mywiki.com/w/index.php?title=Special:OAIRepository 275 [main] INFO  org.wikimedia.lsearch.oai.OAIHarvester  - de_wikidb using base url: http://de.mywiki.com/w/index.php?title=Special:OAIRepository 275 [main] INFO  org.wikimedia.lsearch.oai.IncrementalUpdater  - Resuming update of de_wikidb from 2010-06-09T20:00:02Z 644 [main] INFO  org.wikimedia.lsearch.oai.OAIHarvester  - en_wikidb using base url: http://en.mywiki.com/w/index.php?title=Special:OAIRepository 644 [main] INFO  org.wikimedia.lsearch.oai.OAIHarvester  - en_wikidb using base url: http://en.mywiki.com/w/index.php?title=Special:OAIRepository 644 [main] INFO  org.wikimedia.lsearch.oai.IncrementalUpdater  - Resuming update of en_wikidb from 2010-06-14T14:15:02Z java.io.IOException: The markup in the document following the root element must be well-formed. at org.wikimedia.lsearch.oai.OAIParser.parse(OAIParser.java:68) at org.wikimedia.lsearch.oai.OAIHarvester.read(OAIHarvester.java:64) at org.wikimedia.lsearch.oai.OAIHarvester.getRecords(OAIHarvester.java:44) at org.wikimedia.lsearch.oai.IncrementalUpdater.main(IncrementalUpdater.java:191) 919 [main] WARN  org.wikimedia.lsearch.oai.IncrementalUpdater  - Retry later: error while processing update for en_wikidb : The markup in the document following the root element must be well-formed. java.io.IOException: The markup in the document following the root element must be well-formed. at org.wikimedia.lsearch.oai.OAIParser.parse(OAIParser.java:68) at org.wikimedia.lsearch.oai.OAIHarvester.read(OAIHarvester.java:64) at org.wikimedia.lsearch.oai.OAIHarvester.getRecords(OAIHarvester.java:44) at org.wikimedia.lsearch.oai.IncrementalUpdater.main(IncrementalUpdater.java:191) 920 [main] INFO  org.wikimedia.lsearch.oai.OAIHarvester  - es_wikidb using base url: http://es.mywiki.com/w/index.php?title=Special:OAIRepository 920 [main] INFO  org.wikimedia.lsearch.oai.OAIHarvester  - es_wikidb using base url: http://es.mywiki.com/w/index.php?title=Special:OAIRepository 920 [main] INFO  org.wikimedia.lsearch.oai.IncrementalUpdater  - Resuming update of es_wikidb from 2010-06-08T17:09:38Z 1692 [main] INFO org.wikimedia.lsearch.oai.OAIHarvester  - fr_wikidb using base url: http://fr.mywiki.com/w/index.php?title=Special:OAIRepository 1692 [main] INFO org.wikimedia.lsearch.oai.OAIHarvester  - fr_wikidb using base url: http://fr.mywiki.com/w/index.php?title=Special:OAIRepository 1692 [main] INFO org.wikimedia.lsearch.oai.IncrementalUpdater  - Resuming update of fr_wikidb from 2010-06-10T16:45:04Z 2556 [main] INFO org.wikimedia.lsearch.oai.OAIHarvester  - nl_wikidb using base url: http://nl.mywiki.com/w/index.php?title=Special:OAIRepository 2556 [main] INFO org.wikimedia.lsearch.oai.OAIHarvester  - nl_wikidb using base url: http://nl.mywiki.com/w/index.php?title=Special:OAIRepository 2556 [main] INFO org.wikimedia.lsearch.oai.IncrementalUpdater  - Resuming update of nl_wikidb from 2010-06-08T21:30:11Z

Here is lsearch-global.conf:

[Database] en_wikidb : (single) (spell,4,2) (language,en) de_wikidb : (single) (spell,4,2) (language,de) es_wikidb : (single) (spell,4,2) (language,es) fr_wikidb : (single) (spell,4,2) (language,fr) nl_wikidb : (single) (spell,4,2) (language,nl)

[Search-Group] myhost : *

[Index] myhost : *

[Index-Path] : /search

[OAI] : http://myhost/w/index.php en_wikidb : http://en.mywiki.com/w/index.php de_wikidb : http://de.mywiki.com/w/index.php es_wikidb : http://es.mywiki.com/w/index.php fr_wikidb : http://fr.mywiki.com/w/index.php nl_wikidb : http://nl.mywiki.com/w/index.php

[Namespace-Boost] : (0,2) (1,0.5)

[Namespace-Prefix] all : [0] : 0 [1] : 1 [2] : 2 [3] : 3 [4] : 4 [5] : 5 [6] : 6 [7] : 7 [8] : 8 [9] : 9 [10] : 10 [11] : 11 [12] : 12 [13] : 13 [14] : 14 [15] : 15

Thanks for any help! Maiden taiwan 18:49, 14 June 2010 (UTC)


 * Update: I noticed that Special:Statistics was showing too few pages, so I ran  and the crash went away. The search results page is still not reporting all matches though, only some. Maiden taiwan 16:41, 15 June 2010 (UTC)


 * Run ./build on your wiki to make a fresh copy of the index, and then start incremental updates from that point. --Rainman 00:54, 16 June 2010 (UTC)


 * Thanks - I did that, and managed to run "build" for each of my five wikis above. But now the update script is throwing the above-mentioned Java error again. (java.io.IOException: The markup in the document following the root element must be well-formed.) Any ideas how I can debug this? Maiden taiwan 04:11, 16 June 2010 (UTC)


 * My first guess would be that OAI extension is somehow mis-configured. Update to latest version (SVN) of lucene-search and the exact URL used should appear in the log. Put that into browser to see what kind of output is returned by OAIRepository. Also check your php error logs. --Rainman 17:47, 16 June 2010 (UTC)

Lucene-search on OpenVMS
I've been working to port this extension to OpenVMS and have ran accross a few snags simply to realize its a Java configuration issue.

The following logicals need to be declared:

$ set proc/parse=extended $ @SYS$COMMON:[JAVA$150.COM]JAVA$150_SETUP.COM $ define DECC$ARGV_PARSE_STYLE ENABLE $ define DECC$EFS_CASE_PRESERVE ENABLE $ define DECC$POSIX_SEEK_STREAM_FILE ENABLE $ define DECC$EFS_CHARSET ENABLE $ define DECC$ENABLE_GETENV_CACHE ENABLE $ define DECC$FILE_PERMISSION_UNIX ENABLE $ define DECC$FIXED_LENGTH_SEEK_TO_EOF ENABLE $ define DECC$RENAME_NO_INHERIT ENABLE $ define DECC$ENABLE_TO_VMS_LOGNAME_CACHE ENABLE $ FILE_MASK = %x00000008 + %x00040000 $ DEFINE JAVA$FILENAME_CONTROLS 'file_mask'

Also, the configure and build files need to be converted into COM files

After that, FSUtils.java needs to be edited to recognize OpenVMS and set up so that for any kind of linking(hard or soft) it calls a C program to convert the filepaths to VMS compliant ones and then copies the file.

I'm still having a few issues with .links files but am working on a fix. --Need to make C program remove all instances of "^." from filename after it is converted from POSIX to VMS          filename.

Indexing works! --Sillas33 18:56, 1 July 2010 (UTC)

How does lsearchd work?
I'm curious how the lsearchd script works. I am trying to port it to OpenVMS and am currently getting a ClassDefNotFound error, mostly due to me not quite understanding what the script is passing where.

jardir=`dirname $0` # put your jar dir here! java -Djava.rmi.server.codebase=file://$jardir/LuceneSearch.jar -Djava.rmi.server.hostname=$HOSTNAME -jar $jardir/LuceneSearch.jar $*
 * 1) !/bin/bash

Specifically: Why is $0 part of jardir ='dirname $0' Figured out that this sets jardir equal to the path to lsearchd.

Is it possible to set this up so that it would run from an exploded jar? --Sillas33 15:34, 2 July 2010 (UTC)

It is possible, but you would need to manually include in the classpath all of the libraries (look at build.xml for the list) which are automatically included in the jar. --Rainman 01:59, 3 July 2010 (UTC)

Ended up running it from an exploded jar like so: $ set def root:[000000] $ java -cp "''class_path'" - "-D java.rmi.server.codebase=file:root/LUCENESEARCH.JAR" - "-D java.rmi.server.hostname=hostname" - "org.wikimedia.lsearch.config.StartupManager" "/root/"

I left a copy of the original jar in the directory along with the exploded version (I couldn't seem to update the jar with the ONE file I changed to make the this work on VMS)

Skip a namespace
Is it possible to tell Lucene not to index a given namespace? Or can we tell MWSearch not to search a given namespace? Thanks. Maiden taiwan 15:43, 22 July 2010 (UTC)

Lucene and LiquidThreads
On my wiki I use LQT 2.0alpha. Build indexes of lucene was stopped on page, that is lqt comment. How can fix this problem? Are anyone also have problems with lucene and lqt?

My wiki info: here


 * Make sure you use the latest lucene-search version (from SVN). LQT search was designed for this extension and should work. --Rainman 23:52, 31 July 2010 (UTC)
 * Build .jar from sources and have this problem again - indexing will stop on LQT pages :( I don't know where is the problem, why on my wiki lucene don't work, how and who can help me with this?

Lucene Suggest and Fuzzy Search Problems
I am having some trouble getting the Lucene daemon to give spelling suggestions and to work with fuzzy searches. I have tried on different distros, versions, JDKs, and data sets, but nothing seems to work. The spelling indexes get built when I run the indexer, so that part seems ok. However, I notice that only wiki and wiki.links appear under indexes/search after indexing (wiki.spell appears under snapshots though). I also played around with the Java source to see if I can track it down. As far as I can see, this is the code where the problem happens:

SearchEngine.java

// find host String host = cache.getRandomHost(iid.getSpell); if(host == null) return; // no available

cache.getRandomHost returns a null value, so the suggestion generation is skipped. Digging in a little more, I found that the following lines in SearcherCache.java pass back the null value:

Hashtable<String,RemoteSearcherPool> pools = remoteCache.get(iid.toString); if(pools == null) return null; http://www.mediawiki.org/w/index.php?title=Extension_talk:Lucene-search&action=edit&section=91

Any idea what is going on here? I feel like it has something to do with how indexes are tied to hosts, but I just can't seem to get a working configuration.

Thanks!


 * It would be helpful if you provided your configuration files. --Rainman 23:51, 28 August 2010 (UTC)


 * I didn't want to clutter up the talk page, so I sent over the config info in an email. Hopefully that works for you.  I have tried a number of things with the config, so this just happens to be the latest that I have.  I have also sent the output from the lsearchd startup logging.  I noticed from an above post about a similar issue, you mention that the lsearchd startup should say something about the spelling index, and mine does not.  I can tell you that some kind of spelling index gets built.  I have actually used a Lucene utility to look at it, and it contains terms that are specific to the wiki I indexed. --Mehle


 * I got your message back, and that was exactly the problem. Suggestions, fuzzy search, and for an added bonus, related articles all work, and they are even better than I hoped.  For anyone who might be having the same problem, here is what I did wrong.  I thought the * in lsearch-global.conf was a placeholder for the database name, so I went with the instructions in the 2.0 docs for doing the Search-Group and Index sections.  The result was that lsearchd was only picking up the main index and skipping the rest.  So if you are having a similar problem, do not follow the 2.0 instructions and make sure to leave the * right where it is.  Thank you once again for the help and for creating such a great search engine. --Mehle

Error running build using lucene search 2.1.3
[root@server~]# PATH=/wiki/usr/local/java/bin:$PATH;export PATH [root@uswv1app04a ~]# echo $PATH /wiki/usr/local/java/bin:/usr/kerberos/sbin:/usr/kerberos/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin [root@uswv1app04a ~]# java -version java version "1.6.0_21" Java(TM) SE Runtime Environment (build 1.6.0_21-b06) Java HotSpot(TM) 64-Bit Server VM (build 17.0-b16, mixed mode)

[root@server lucene]# ./configure /wiki/www/htdocs/mediawiki-1.9.6 Generating configuration files for wikidb ... Making lsearch.conf Making lsearch-global.conf Making lsearch.log4j Making config.inc [root@server lucene]# ./build Dumping wikidb... MediaWiki lucene-search indexer - rebuild all indexes associated with a database. Trying config file at path /root/.lsearch.conf Trying config file at path /wiki/usr/local/lucene-search-2.1.3/lsearch.conf MediaWiki lucene-search indexer - index builder from xml database dumps.

0   [main] INFO  org.wikimedia.lsearch.util.Localization  - Reading localization for En 60   [main] INFO  org.wikimedia.lsearch.ranks.Links  - Making index at /wiki/usr/local/lucene-search-2.1.3/indexes/import/wikidb.links 122 [main] INFO  org.wikimedia.lsearch.ranks.LinksBuilder  - Calculating article links... 224 [main] FATAL org.wikimedia.lsearch.importer.Importer  - Cannot store link analytics: Content is not allowed in prolog.

java.io.IOException: Trying to hardlink nonexisting file /wiki/usr/local/lucene-search-2.1.3/indexes/import/wikidb at org.wikimedia.lsearch.util.FSUtils.createHardLinkRecursive(FSUtils.java:97) at org.wikimedia.lsearch.util.FSUtils.createHardLinkRecursive(FSUtils.java:81) at org.wikimedia.lsearch.importer.BuildAll.copy(BuildAll.java:157) at org.wikimedia.lsearch.importer.BuildAll.main(BuildAll.java:112)

227 [main] ERROR org.wikimedia.lsearch.importer.BuildAll  - Error during rebuild of wikidb : Trying to hardlink nonexisting file /wiki/usr/local/lucene-search-2.1.3/indexes/import/wikidb

java.io.IOException: Trying to hardlink nonexisting file /wiki/usr/local/lucene-search-2.1.3/indexes/import/wikidb at org.wikimedia.lsearch.util.FSUtils.createHardLinkRecursive(FSUtils.java:97) at org.wikimedia.lsearch.util.FSUtils.createHardLinkRecursive(FSUtils.java:81) at org.wikimedia.lsearch.importer.BuildAll.copy(BuildAll.java:157) at org.wikimedia.lsearch.importer.BuildAll.main(BuildAll.java:112) Finished build in 0s

Not sure what's going wrong here, the wiki/usr/local/lucene-search-2.1.3/indexes/import has R/W access for all users.

Any clue?

- Vicki


 * It looks like the dump process has failed. Look at the dumps/wikidb.xml and verify it doesn't contain errors. --Rainman 15:55, 8 September 2010 (UTC)
 * I get the same errors. My dumps/dump-wikidb.xml file is completely empty. Please help, I do not know what to do. -- Nicole
 * *blush* I made a typo in the username in Adminsettings.php.... -- Nicole

Search database of Word documentation
Has anyone tried to use this to search documentation linked from MediaWiki? So if a page contains links to docs A & B, Lucene will search that? Anyone tried that with Word documentation? It would be a great additional feature. --88.159.118.8 17:17, 30 October 2010 (UTC)

Get the version number?
Is there a way to obtain the Lucene version number from PHP? Maiden taiwan 18:01, 12 November 2010 (UTC)

2 DBs and 4 Wikis
Configuration:

This is the configuration of my wiki farm: wikidb1
 * -- wiki1   # <= 'wiki1' is table prefix and interwiki link
 * -- wiki2   # <= 'wiki2' is table prefix and interwiki link

wikidb2
 * -- wiki3   # <= 'wiki3' is table prefix and interwiki link
 * -- wiki4   # <= 'wiki4' is table prefix and interwiki link

All 4 wikis uses English as default language.

What is desired:
 * when searching in wiki1 there will also be interwiki results from wiki2, wiki3 and wiki4 (when the searched term is also in these wikis)
 * when searching in wiki2 there will also be interwiki results from wiki1, wiki3 and wiki4 (when the searched term is also in these wikis)

Questions:
 * 1) Is this possible to configure lucene-search to match my wishes?
 * 2) If ( 1. ) is possible: How do I have to set this up? (global configuration, ...)--JBE 09:59, 21 December 2010 (UTC)


 * There is no support for table prefixes. Wikis need to be in separate databases. --Rainman 01:00, 6 January 2011 (UTC)
 * Thank you!--JBE 07:30, 6 January 2011 (UTC)

= 2011 =

How to make Lucene's IndexReader.termDocs work on links field of wiki.links?
I have trouble using the links field in the wiki.links index to find documents that point to a particular article. --kalten, 5 January 2011

For example, to find articles pointing to "Anarchism" article: TermDocs tds = indexReader.termDocs(new Term("links", "0:United")) tds.next; // returns false This returns empty iterator.

However, the same method works as expected with other fields. E.g. TermDocs tds = indexReader.termDocs(new Term("article_key","0:Anarchism")); tds.next; // returns true

I have tried numerous things, including examining the source code creating the index and examining the index with Luke and with no success. According to Luke, there are 1460 docs with term "0:Anarchism" in its term vector.


 * Always use the Links class, it has all of those things properly implemented. --Rainman 00:56, 6 January 2011 (UTC)

Restricted search and wiki size
1. I will have several parts to my wiki. I would like my users to be able to restrict their search to certain sections while still retaining the availability of a global site search. I infer from what I have read here that this is possible but I have not seen where how to do it is explained.
 * My advice would be to put those articles inside of a namespace then allow your user to pick their space to search it -- 22:37, 26 January 2011 (UTC)

2. The main text warns that this extension is for "large" wikis and suggests Sphinx for "small" wikis, but what is "small" and what is "large"?

Thank you.

search 2 wikis
I have one wiki running Lucene-search. I want to add another wiki to Lucene. Is there a manual how to use many wikis with lucene?

I have read Docs and added 2nd database to lsearch-global.conf. But where I have to config path to second wiki installation for username and password for database access?

--Pdcemulator 08:58, 21 January 2011 (UTC)

Search suggest order
Are the search suggest entries built by Lucene? If so, how can the order be changed from alphabetical to based on number of links to the article (see link)? --Robinson weijman 12:50, 21 January 2011 (UTC)

Running the "build" script for a multilanguage farm?
I run a wiki family with multiple languages, say, en.mywiki.com, fr.mywiki.com, de.mywiki.com, etc. Each wiki has a separate database -- en_wikidb, de_wikidb, fr_wikidb, etc. -- with the same username (wikiuser) and password, but they all share a single MediaWiki code tree. In LocalSettings.php, a  statement looks at the incoming URL (e.g., fr.mywiki.com) and dynamically selects the proper database (fr_wikidb).

There is an existing lucene-search 2.1 installation that works correctly for this environment, reindexing correctly with the  script and MWSearch.

The problem:

In this environment, I am having trouble running the lucene-search 2.1.3  and   scripts. The  script produces a config.inc file with , which is not the database, because   is not running in a web context. Then the  script errors out with this:

Dumping wikidb... 2011-02-15 20:48:28: wikidb-vpw_ 5 pages (525.102/sec), 5 revs (525.102/sec), ETA 2011-02-15 20:48:28 [max 5] MediaWiki lucene-search indexer - rebuild all indexes associated with a database. Trying config file at path /root/.lsearch.conf Trying config file at path /usr/local/lucene-search-2.1.3/lsearch.conf Exception in thread "main" java.lang.RuntimeException: Index wikidb doesn't exist at org.wikimedia.lsearch.config.IndexId.get(IndexId.java:176) at org.wikimedia.lsearch.importer.BuildAll.main(BuildAll.java:91)

What is the right thing to do?


 * Run the  script 3 times, changing the database line in config.inc by hand each time to point to the right database? When I tried this, I got 3 indexes built, but according to , the indexes contained nothing (i.e., all searches return zero results). Probably because the   script is still hitting wikidb instead of en_wikidb.

Dumping en_wikidb... 2011-02-15 20:59:46: wikidb-vpw_ 5 pages (508.906/sec), 5 revs (508.906/sec), ETA 2011-02-15 20:59:46 [max 5]
 * 1) ./build

lsearch-global.conf:

[Database] en_wikidb : (single) (spell,4,2) (language,en) de_wikidb : (single) (spell,4,2) (language,de) fr_wikidb : (single) (spell,4,2) (language,fr)

[Search-Group] myserver : *

[Index] kmyserver : *

[Index-Path] : /search

[OAI] : http://myserver/w/index.php en_wikidb : http://en.mywiki.com/w/index.php de_wikidb : http://de.mywiki.com/w/index.php fr_wikidb : http://fr.mywiki.com/w/index.php

...

Thanks for any help! --Maiden taiwan 20:56, 15 February 2011 (UTC)

If you are into editing file, you should edit the dbname in config.inc (i.e. not in any of the individual files). This will make sure the right dbname is substituted where needed. However, the default scripts that come with the package are really made to make a single-wiki installation easy. If you have multiple wikis you will have to do some work yourself, possibly by reusing some of the code already available. Try googling "for loop bash". --Rainman 11:24, 17 February 2011 (UTC)
 * Thanks for your response, but when I changed config.inc (as above) to use en_wikidb, the build script still used wikidb, as you can see from this output:

# ./build Dumping en_wikidb...   2011-02-15 20:59:46:  wikidb -vpw_ 5 pages (508.906/sec), 5 revs (508.906/sec) ...


 * Where is the script getting "wikidb" from? LocalSettings.php? Thanks. Maiden taiwan 16:32, 17 February 2011 (UTC)


 * I suspect something is wrong with your dbname switching.. The build script uses maintenance/dumpBackup.php of your MediaWiki installation to make the dump (this is where the red-highlighted bit comes from) - you need to make sure this works for all of your alternative wikis. --Rainman 23:46, 18 February 2011 (UTC)

Using Lucene Apache French stemmer
Hello, We have a wiki (using French language) with lucene search. The search works properly but it's use English stemmer instead of French stemmer.

In the conf file we set : [Database] wikidb : (single) (spell,4,2) (language,fr)

Is there a mind to make the French stemmer work properly ?

Actually we have Edited the sources with some scrappy code to use our own analyzer and apparently it work but it's so bad ... (for example result's ratings loose precision)


 * Yes, that should work (remember you need to rerun ./build afterwards). As for article ranking, it does depend on many factors and it might not be ideal with default settings. Some careful testing and tunning is probably needed to tune it to your wiki size and language. --Rainman 01:27, 22 February 2011 (UTC)


 * Thanks for your anwser. I have an other question : What is the reason that the stemmer doesn't work like apache french stemmer ? For example, if you search the word : continuellement the stemmer must search for continuel with the default source code and a this configuration :

[Database] wikidb : (single) (spell,4,2) (language,fr) But the stemmer still search for continuellement.