Extension talk:Lucene-search
From MediaWiki.org
[edit] 2007
[edit] Error when editing pages
I followed your tutorial and installed LuceneSearch. All went fine, but when I edit a page, I get this error:
Fatal error: Call to undefined method LuceneSearch::setLimitOffset() in /path/to/wiki/includes/SearchEngine.php on line 222
I'm using Mediawiki 1.10.0. Is this a known problem or just a configuration issue? Looks like LuceneSearch.php or LuceneSearch_body.php don't define that function at all. Same with LuceneSearch::update() function... --12 July 2007
- You're missing
$wgDisableSearchUpdate = true;
- in your LocalSettings.php. It should be placed before the require_once statement. --Rainman 17:48, 12 July 2007 (UTC)
[edit] Installing Lucene on Windows 2003 Server
Is there a way to install the LuceneSearch under Windows? I Run my wiki on a Windows 2003 Server with XAMPP and I want to use the features of Lucene. I found at http://meta.wikimedia.org/wiki/Installing_lucene_search that wikipedia uses the C# engine of Lucene.
Is there a compiled version of the C# engine to install it on my Apache running on Windows 2003 Server?----stp-- 13:40, 1 August 2007 (UTC)
- As far as I know, no. --Rainman 09:54, 3 August 2007 (UTC)
I am also interested in a Windows 2003 tutorial for improving MediaWiki search results. Cedarrapidsboy 14:29, 2 August 2007 (UTC)
- You can use the old C# daemon following tutorial on m:Installing lucene search. Wikimedia sites used to use this one, but now use to the latest (java) version. The new version could in principle run on windows with some modifications (main problem is usage of symbolic and hard links), but there is no-one around the patch it. --Rainman 09:54, 3 August 2007 (UTC)
Could you explain, how to compile old C# daemon under windows with Mono? There is no "make" and "make install" commands under Windows :((( --Konstbel 09:04, 31 March 2008 (UTC)
- any luck on the patch for windows? --zhamrock 16:48, 28 July 2008 (SGT)
There is a .dll version available here: http://incubator.apache.org/lucene.net/download/ , but I don't know if this helps --jdpond 21:53, 27 August 2007 (UTC)
- The problem is not in the lucene itself, but the LSearch daemon, that makes use of linux fs to efficiently fetch new indexes, keep old copies, and swap copies after a background warmup phrase. --Rainman 09:18, 28 August 2007 (UTC)
-
- The 2.1 branch seems to have some support for Windows (see FSUtil.java). Is someone actively working on this? Any idea what the status is? --Cneubauer 19:35, 13 January 2009 (UTC)
- No, no-one is actively working on windows support.. the lucene-search-2.1 branch won't work on windows, although it could with some poking around, e.g. restructuring the indexregistry class.. --Rainman 22:08, 13 January 2009 (UTC)
- I managed to get it run on Windows by patching FSUtil.java. I'm using NTFS hardlinks and a free Microsoft tool to create directory links (linkd.exe). It may not be that flexible as the Linux version using symbolic links but it works for me, especially because I'm able to do the development of a wiki search client completely on my Windows machine. If someone is interested, leave a note on my user page. I would commit it to the repository myself but I guess I'm not allowed to so. --Kai Kühn. 19:24, 2 February 2009 (UTC)
- The 2.1 branch seems to have some support for Windows (see FSUtil.java). Is someone actively working on this? Any idea what the status is? --Cneubauer 19:35, 13 January 2009 (UTC)
[edit] Missing Method?
I installed everything following the instructions (on MediaWiki 1.10.1), but I'm getting this when I hit the search-button:
Fatal error: Call to undefined method LuceneSearch::getRedirect() in /var/www/mediawiki-1.10.1/includes/SpecialPage.php on line 396
Is this a known issue with 1.10.1, or am I missing something? --217.6.3.114 06:34, 6 August 2007 (UTC)
- No idea, getRedirect() is defined in SpecialPage, and LuceneSearch inherits SpecialPage. You might be using some odd php version, or something else might be wrong... --Rainman 10:55, 6 August 2007 (UTC)
-
- My PHP- Version is (PHP 5.2.0-8+etch7 (cli) (built: Jul 2 2007 21:46:15)). Do you really think this might be a problem? I believe it is more likely that I forgot something obvious, not mentioned in the instructions. For example: I had to download ExtensionFunctions.php from svn, because it is not shipped with Mediawiki or the Extension. Do I need to register the Extension anywhere other than in LocalSettings.php? --217.6.3.114 12:55, 6 August 2007 (UTC)
- I've seen people complain about various mediawiki stuff not working with php 5.2, switching back to php 5.1 usually fixes it. But I'm by no means php expert (I mainly do the java part), so I cannot really tell if it would help. If you can, give it a try, and let us know if it helps. --Rainman 16:48, 6 August 2007 (UTC)
- My PHP- Version is (PHP 5.2.0-8+etch7 (cli) (built: Jul 2 2007 21:46:15)). Do you really think this might be a problem? I believe it is more likely that I forgot something obvious, not mentioned in the instructions. For example: I had to download ExtensionFunctions.php from svn, because it is not shipped with Mediawiki or the Extension. Do I need to register the Extension anywhere other than in LocalSettings.php? --217.6.3.114 12:55, 6 August 2007 (UTC)
-
-
-
- There seems to be no php 5.1 package available for debian etch, so I guess there's no chance to make search work.--217.6.3.114 12:10, 7 August 2007 (UTC)
- I submitted a bugreport: http://bugzilla.wikimedia.org/show_bug.cgi?id=10835 --7 August 2007
- Yep, seen it .. I still think it might be a php problem, or maybe a broken eAccelerator or something like that... --Rainman 10:33, 21 August 2007 (UTC)
- Is eAccelerator required for this extension? We do not use it.--217.6.3.114 08:58, 7 September 2007 (UTC)
- Found the Solution! The problem was incompatibility between the MWSearch-Extension and LuceneSearch. I forgot that MWSearch was still active when I installed LuceneSearch. After deactivating MWSearch the problem was gone. --217.6.3.114 08:05, 11 September 2007 (UTC)
- Is eAccelerator required for this extension? We do not use it.--217.6.3.114 08:58, 7 September 2007 (UTC)
- Yep, seen it .. I still think it might be a php problem, or maybe a broken eAccelerator or something like that... --Rainman 10:33, 21 August 2007 (UTC)
- I submitted a bugreport: http://bugzilla.wikimedia.org/show_bug.cgi?id=10835 --7 August 2007
- There seems to be no php 5.1 package available for debian etch, so I guess there's no chance to make search work.--217.6.3.114 12:10, 7 August 2007 (UTC)
-
-
[edit] Wildcard Search
Is there a way to use wildcards as described on http://lucene.apache.org/java/docs/queryparsersyntax.html#Wildcard%20Searches? --217.6.3.114 12:50, 12 September 2007 (UTC)
- Yes. Currently only simple prefixes work (e.g. test*) since I didn't get to test the performance impact of other wildcard schemes. If you want to patch it yourself, look at WikiQueryParser.java around line 669 (function makeQueryFromTokens()), you probably want to replace buffer[length-1]=='*' with something that checks if * or ? are anywhere in the buffer. --Rainman 16:23, 12 September 2007 (UTC)
[edit] dumpBackup.php causes DB connection error: Unknown error
Following the simple Index creation tutorial "Building the index" I tryed to run
php maintenance/dumpBackup.php --current --quiet > wikidb.xml && java -cp LuceneSearch.jar org.wikimedia.lsearch.importer.Importer -s wikidb.xml wikidb
But the Script throws the mentioned error. After big trouble and consideration of this script I've found a solution for this/my and our Problem. The Problem exists, because of the for dumpBackup.php required File "includes/backup.inc". This File does the main-backup-work and uses some MediaWiki-Variables($wg...). This is really no Problem, if dumpBackup.php runs with mediaWiki but as standalone console-script, it will miss this $wg..-Parameters. So dumpBackup.php uses empty strings for $wgDBtype,$wgDBadminuser,$wgDBadminpassword,$wgDBname,$wgDebugDumpSql and this causes the DB connection error: Unknown error while running. I've solved this Problem with a self-written php-wrapper-script, which only initializes this Variables and then simply include dumpBackup.php and now it works fine. This is my php-wrapper-script:
<?php
## dumpBackupInit - Wrapper Script to run the mediaWiki xml-dump "dumpBackup.php" correctly
## @author: Stefan Furcht
## @version: 1.0
## @require: /srv/www/htdocs/wiki/maintenance/dumpBackup.php
# The following Variables musst be set, to get dumpBackup.php at work
$wgDBtype = 'mysql';
$wgDBadminuser="[MySQL-Username]";
$wgDBadminpassword ="[MySQL-Usernames-Password]";
$wgDBname = '[mediaWiki-Database-scheme]';
$wgDebugDumpSql='true';
# you'll find this Values in the DB-section into your mediaWiki-Config: LocalSettings.php
# XML-Dumper 'dumpBackup.php' requires the setted Vars to run
# simply include the original dumpBackup-Script
require_once("/srv/www/htdocs/wiki/maintenance/dumpBackup.php");
?>
Now you can use this script as like as the dumpBackup.php with exception it will (hopefully) now run correctly. Example: php dumpBackupInit.php --current > WikiDatabaseDump.xml
I hope this will help you. Please excuse my properly bad english
Regards -Stefan- 12 September 2007
- dumpBackup.php uses AdminSettings.php (and not LocalSettings.php), so you need to set it up (basically you would rename AdminSettings.sample and fill-in the data). What would be in AdminSettings.php is exactly what you provide in your wrapper, see Manual:System_administration#Maintenance_scripts. --Rainman 16:12, 12 September 2007 (UTC)
Thank you very much. I've never read what 'AdminSettings.php' exactly does. By setting this vars, it works finde. So you can delete my "wrapper script" from this discussion page. But perhaps it's usefull to mention explicitly on the extension page that 'AdminSettings.php' musst be set to run 'dumpBackup.php', because somebody may never had to issue on this file before. Thanks for this very great extension. -Stefan- 79.211.199.66 08:14, 20 September 2007 (UTC)
[edit] lsearchd killed in virtual hosting environment
When running lsearchd in a virtual hosting environment, it would work for 10-20 seconds or so, then it would fail with the message "killed." Thanks to Rainman's help, I verified that the resource requirements of the application exceeded the capacity available in the virtual hosting environment (whether it was the size of the JVM or number of threads, I was never sure.) It runs fine and with modest resource requirements on a dedicated server. Dbkayanda 20:44, 14 October 2007 (UTC)
Also, I notice in lsearch.conf there are a number of variables for the Storage backend:
- Storage.username
- Storage.password
etc. Do these need to be modified to my environment, or do they get ignored? --15 September 2007
- These are for the incremental updater (it stores articles rank info). If you don't use it, it gets ignored. --Rainman 17:23, 15 September 2007 (UTC)
[edit] Error while initially creating index
I am trying to get the LuceneSearch-Extension running on a mediawiki-1.11.0rc1 installation under opensuse10.2. LuceneSearch.jar and mwdumper.jar were generated from svn sources with ant and javac-version 1.5.0_12. I followed the instructions, but when I try to build the index, I get a Null-pointer exception:
me@mypc:~/var/lucene> java -cp ~/bin/lucene-search-2/LuceneSearch.jar org.wikimedia.lsearch.importer.Importer -s wikidb_TEST.xml wikidb_TEST
MediaWiki Lucene search indexer - index builder from xml database dumps.
Trying config file at path /home/muenzebrock/.lsearch.conf
0 [main] INFO org.wikimedia.lsearch.util.UnicodeDecomposer - Loaded unicode decomposer
8 [main] INFO org.wikimedia.lsearch.ranks.RankBuilder - First pass, getting a list of valid articles...
324 pages (1.213,483/sec), 324 revs (1.213,483/sec)
316 [main] INFO org.wikimedia.lsearch.ranks.RankBuilder - Second pass, calculating article links...
375 [main] INFO org.wikimedia.lsearch.util.Localization - Reading localization for En
377 [main] WARN org.wikimedia.lsearch.util.Localization - Error processing message file at file:///srv/www/htdocs/php/mediawiki1.11.0rc1/languages/messages/MessagesEn.php
378 [main] WARN org.wikimedia.lsearch.util.Localization - Could not load localization for En
324 pages (2.677,686/sec), 324 revs (2.677,686/sec)
465 [main] INFO org.wikimedia.lsearch.importer.Importer - Third pass, indexing articles...
Exception in thread "main" java.lang.NullPointerException
at java.io.File.<init>(File.java:194)
at org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:117)
at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:204)
at org.wikimedia.lsearch.importer.SimpleIndexWriter.openIndex(SimpleIndexWriter.java:67)
at org.wikimedia.lsearch.importer.SimpleIndexWriter.<init>(SimpleIndexWriter.java:49)
at org.wikimedia.lsearch.importer.DumpImporter.<init>(DumpImporter.java:39)
at org.wikimedia.lsearch.importer.Importer.main(Importer.java:128)
I played with the Indexes.path-variable in lsearch.conf, but with no luck. --19 September 2007
- Do you have permissions to write to directory you set as Indexes.path in /home/muenzebrock/.lsearch.conf ? --Rainman 14:13, 19 September 2007 (UTC)
- Yes. For debugging, I set it to be world-writable. --205.175.225.24 14:20, 19 September 2007 (UTC)
- You can do imports only at the indexer, so, did you set your lsearch-global.conf right? i.e. assign the index wikidb_TEST to your host mypc (not localhost or 127.0.0.1) in the Index section? --Rainman 14:47, 19 September 2007 (UTC)
- This is the part of lsearch-global.conf that I touched (i.e. the rest is similar to the file in svn):
- You can do imports only at the indexer, so, did you set your lsearch-global.conf right? i.e. assign the index wikidb_TEST to your host mypc (not localhost or 127.0.0.1) in the Index section? --Rainman 14:47, 19 September 2007 (UTC)
- Yes. For debugging, I set it to be world-writable. --205.175.225.24 14:20, 19 September 2007 (UTC)
# databases can be writen as {url}, where url contains list of dbs
[Database]
#wikilucene : (single) (language,en) (warmup,0)
#wikidev : (single) (language,sr)
#wikilucene : (nssplit,3) (nspart1,[0]) (nspart2,[4,5,12,13]), (nspart3,[])
#wikilucene : (language,en) (warmup,10)
wikidb_TEST : (single) (language,de) (warmup,100)
# Search groups
# Index parts of a split index are always taken from the node's group
# host : db1.part db2.part
# Mulitple hosts can search multiple dbs (N-N mapping)
[Search-Group]
#oblak : wikilucene wikidev
oblak : wikidb_TEST
# Index nodes
# host: db1.part db2.part
# Each db.part can be indexed by only one host
[Index]
#oblak: wikilucene wikidev
oblak : wikidb_TEST
-
-
-
- Now I seem to recognize my failure: I should have replaced oblak with my hostname, right? I was wondering what this should mean anyway ;-) Thanks for your quick help on this. --205.175.225.24 15:00, 19 September 2007 (UTC)
-
-
This error can also occur if you follow the installation instructions exactly and use a FQDN in the [Search-Group] and [Index] sections. Use only the hostname part of the $HOSTNAME, omitting the domain name part, if it is included. -- 216.143.51.66 15:54, 7 February 2008 (UTC)
- Hmm, I got this same error and fixed it by adding my complete hostname and domain to the various config files and the hostname file. In my case
<hostname> : wikidbdidn't work but<hostname>.domain.com : wikidbdid. --Cneubauer
-
- ANOTHER EXPERIENCE: the installation-manual mentioned that you use the envirementvariable $HOSTNAME in the global.conf - for SuSE i can say that you need to use the complete hostname standeing in /etc/HOSTNAME ! --195.216.198.100 10:58, 12 June 2008 (UTC)
Hi
I've got an similar error :
root@rainbow:/usr/local/search/ls2 # java -cp LuceneSearch.jar org.wikimedia.lsearch.importer.Importer -s /srv/www/htdocs/mwiki/wikidb.xml wikidb
MediaWiki Lucene search indexer - index builder from xml database dumps.
Trying config file at path /root/.lsearch.conf
Trying config file at path /usr/local/search/ls2/lsearch.conf
1 [main] INFO org.wikimedia.lsearch.util.UnicodeDecomposer - Loaded unicode decomposer
15 [main] INFO org.wikimedia.lsearch.util.Localization - Reading localization for De
507 [main] INFO org.wikimedia.lsearch.ranks.RankBuilder - First pass, getting a list of valid articles...
114 pages (118.626/sec), 114 revs (118.626/sec)
1666 [main] INFO org.wikimedia.lsearch.ranks.RankBuilder - Second pass, calculating article links...
114 pages (428.571/sec), 114 revs (428.571/sec)
2044 [main] INFO org.wikimedia.lsearch.importer.Importer - Third pass, indexing articles...
Exception in thread "main" java.lang.NullPointerException
at java.io.File.<init>(File.java:194)
at org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:117)
at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:204)
at org.wikimedia.lsearch.importer.SimpleIndexWriter.openIndex(SimpleIndexWriter.java:67)
at org.wikimedia.lsearch.importer.SimpleIndexWriter.<init>(SimpleIndexWriter.java:49)
at org.wikimedia.lsearch.importer.DumpImporter.<init>(DumpImporter.java:39)
at org.wikimedia.lsearch.importer.Importer.main(Importer.java:128)
My configs :
root@rainbow:/usr/local/search/ls2 # cat lsearch-global.conf | grep ^[^#] [Database] wikidb : (single) (language,de) (warmup,10) [Search-Group] rainbow : wikidb [Index] rainbow : wikidb [Index-Path] <default> : /usr/local/search/indexes [OAI] wikidd : http://rainbow.local.com/mwiki/index.php [Properties] Database.suffix=itowiki_ ExactCase.suffix=itowiki_ [Namespace-Prefix] all : <all> [0] : 0 [1] : 1 [2] : 2 [3] : 3 [4] : 4 [5] : 5 [6] : 6 [7] : 7 [8] : 8 [9] : 9 [10] : 10 [11] : 11 [12] : 12 [13] : 13 [14] : 14 [15] : 15
and the other config :
root@rainbow:/usr/local/search/ls2 # cat lsearch.conf | grep ^[^#] MWConfig.global=file:///usr/local/search/ls2/lsearch-global.conf MWConfig.lib=/usr/local/search/ls2/lib Indexes.path=/usr/local/search/indexes Search.updateinterval=1 Search.updatedelay=0 Search.checkinterval=30 Index.snapshotinterval=5 Index.maxqueuecount=5000 Index.maxqueuetimeout=12 Storage.master=rainbow Storage.username=root Storage.password=mysecret Storage.adminuser=root Storage.adminpass=mysecret Storage.useSeparateDBs=false Storage.defaultDB=lsearch Storage.lib=/usr/local/search/ls2/sql SearcherPool.size=3 Localization.url=file:///srv/www/htdocs/mwiki/languages/messages Logging.logconfig=/usr/local/search/ls2/lsearch.log4j Logging.debug=false
and finally :
root@rainbow:/usr/local/search/ls2 # cat lsearch.log4j | grep ^[^#] log4j.rootLogger=INFO, A1 log4j.appender.A1=org.apache.log4j.ConsoleAppender log4j.appender.A1.layout=org.apache.log4j.PatternLayout log4j.appender.A1.layout.ConversionPattern=%-4r [%t] %-5p %c %x - %m%n
Kind regards Stefan 17 January 2008
[edit] Multiple wikis in one database
Is there a way to index and search multiple wikis that are contained within one database? I've tried a few things in the configuration and command lines, and I've not figured out a way to do this.
Thanks! --Laduncan 16:31, 8 October 2007 (UTC)
- If you want to get search results combined from multiple wikis, that is still not supported (as of v2.0). Next minor release might show some improvements in that direction.. --Rainman 16:55, 8 October 2007 (UTC)
-
- Thanks for the quick info! --Laduncan 20:31, 8 October 2007 (UTC)
-
-
- I have Lucene search running on an installation which contains 3 wikis sharing the same database, using pefixes. The search results give the wrong count; when I search within a wiki, it seems to actually search in all 3 wikis, but shows only the hits fitting in the current wiki. That way, I get for example only 6 hits listed on the result page, because the other invisible hits were in the other two wikis. How do I get the first 20 hits for the current wiki listed (I do not want to see the hits into the other wikis, and I do not want them counted). --83.202.49.58 10:06, 16 March 2009 (UTC)
-
[edit] Requiring less exact matches
It appears that the search in the fulltext is doing an implicit AND -- that is, all the words need to be in the document for it to appear in the results list.
For what I'm doing, I'd like to have the default be "OR," and let the ranking algorithm hopefully bring the most relevant content to the top. (The queries my users will be using will be long and complex, and will generally match nothing with "AND.")
I can manually search with OR between the words, but I wanted to know if I could change the configuration of the extension to have it do that by default.
Thanks in advance, Dbkayanda 00:57, 15 October 2007 (UTC)
- Personally, I think ranking is not smart enough to give best results if the default operator is OR, but you can change it with hacking the code a bit. In WikiQueryParser.java, on line 112 there is:
BooleanClause.Occur boolDefault = BooleanClause.Occur.MUST;, replace the last part withBooleanClause.Occur.SHOULD. --Rainman 14:41, 15 October 2007 (UTC)
-
- Worked like a charm. Thanks, as always, for your help. --16 October 2007
[edit] Index of attachments (doc, pdf, xls)
Hi Robert,
I found the cool mediawiki extension for the lucene search engine. Is there a possibility to index all attachments like PDF, HTML, DOC and XLS with this addon?
I found some informations in the lucene faq - http://wiki.apache.org/lucene-java/LuceneFAQ#head-37523379241b88fd90bcd1de81b74e7ec8843f72 - how to index attachments. Is it able to use such indexed files with the mediawiki extension you wrote?
Thanks a lot! Alex--14:51, 22 October 2007 (UTC)
- Yes, there are libraries that can parse pdf, doc,.. that work with lucene, but I haven't got around to include them in the extension yet, and I probably won't have time in next few months ... If you really need it, you can try to hack it yourself, you would probably want Importer to fetch the media file (maybe with ?action=raw), and then construct an Article object whose contents would be the parsed text and pass it to the indexer. --Rainman 21:08, 22 October 2007 (UTC)
-
- Were all namespaces indexed in the current LuceneSearch extension? Also the namespace image that contains all file-data? Does the extension then only index the recent file description? Where I have to start in the LuceneSearch_body.php ?
- Thanks! Alex --12:06, 23 October 2007 (UTC)
-
-
- All articles from the database get indexed. LuceneSearch_body.php is just an interface for the java daemon that does all the work. So, you'll need to modify the java code. What currently gets indexed is just the image descriptions, the media files themself are stored outside the database, in the file system... --Rainman 10:20, 23 October 2007 (UTC)
-
[edit] Binary version of LuceneSearch.jar?
Hello,
Where can I get a binary version of LuceneSearch.jar? I don't have ant on the server this is being installed on, and I tried building LuceneSearch.jar on my desktop computer using ant, but it failed with errors about missing MediaWiki Java classes. I'd prefer a binary, if possible, so I can get this up and running ASAP.
Ben --8 December 2007
[edit] Soundex searches?
Will this extension support Soundex like searches for spelling mistakes etc..? --12 December 2007
- Probably in the next major release (hopefully end of january). --Rainman 14:11, 13 December 2007 (UTC)
[edit] Special page search complains about "problem with wiki search"
After following, as close as possible instructions. Plugin renders special page as such:
[ search_string on text area ] [ dropdown_list ] [search_button] <noexactmatch-nocreate> There was a problem with the wiki search. This is probably temporary; try again in a few moments, or you can search the wiki through an external search service:
Content in square brackets are just my attempt to recreate the gui.
Is there something missing in the way it is using the host to do the search? --Cartoro 00:00, 20 December 2007 (UTC)
- Check your log files for more info about what went wrong ... --Rainman 18:31, 20 December 2007 (UTC)
- Yes, I wanted to see that... but I couldn't find any log files.... sorry, silly question, but where are they? Could this be a problem with accessing the actual DB? --Cartoro 22:11, 20 December 2007 (UTC)
- Extension:LuceneSearch#Troubleshooting --Rainman 22:17, 20 December 2007 (UTC)
- Yes, I wanted to see that... but I couldn't find any log files.... sorry, silly question, but where are they? Could this be a problem with accessing the actual DB? --Cartoro 22:11, 20 December 2007 (UTC)
[edit] Port 8123 already in use.
Hi again,
I'm still trying to make it run. I've found that most of the problems are due to an ill configuration of my part. Java error messages at first are not very helpful, but that is just the case with any new functionality one comes across.
When I tried to run ./lsearchd. It came up with this.
java.rmi.ConnectIOException: error during JRMP connection establishment; nested exception is:
java.net.SocketTimeoutException: Read timed out
at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:286)
at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:184)
at sun.rmi.server.UnicastRef.newCall(UnicastRef.java:322)
at sun.rmi.registry.RegistryImpl_Stub.rebind(Unknown Source)
at org.wikimedia.lsearch.interoperability.RMIServer.register(RMIServer.java:24)
at org.wikimedia.lsearch.interoperability.RMIServer.bindRMIObjects(RMIServer.java:60)
at org.wikimedia.lsearch.config.StartupManager.main(StartupManager.java:52)
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
at java.io.DataInputStream.readByte(DataInputStream.java:248)
at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:228)
further down, it came up with this message:
120488 [Thread-1] FATAL org.wikimedia.lsearch.frontend.HTTPIndexServer - Dying: bind error: Address already in use
Has anybody seen this? I still think is a trivial error from my part, but I still cannot find the cause of the error. --Cartoro 00:00, 20 December 2007 (UTC)
- The above is RMI complaining it cannot register the networked objects. That should be harmless unless you're using distributed searching. About the below, seems to be what it says: some other app is using the ports (the searcher is by default on 8123, and indexer on 8321) - make sure you don't have any old version of lsearchd still running. Use command: nmap localhost to find out which ports are taken. If those default ports are taken by other apps, change them in lsearch.conf, and in LocalSettings.php ... --Rainman 13:36, 20 December 2007 (UTC)
[edit] Error when running ./lsearchd
I am getting the following error when running ./lsearchd:
53664-jpbaello:/srv/www/htdocs/search/ls2 # ./lsearchd
RMI registry started.
Trying config file at path /root/.lsearch.conf
Trying config file at path /srv/www/htdocs/search/ls2/lsearch.conf
Error resolving local hostname. Make sure that hostname is setup correctly.
java.net.UnknownHostException: 53664-jpbaello: 53664-jpbaello
at java.net.InetAddress.getLocalHost(InetAddress.java:1346)
at org.wikimedia.lsearch.config.GlobalConfiguration.determineInetAddress(GlobalConfiguration.java:124)
at org.wikimedia.lsearch.config.GlobalConfiguration.<init>(GlobalConfiguration.java:102)
at org.wikimedia.lsearch.config.GlobalConfiguration.getInstance(GlobalConfiguration.java:112)
at org.wikimedia.lsearch.config.Configuration.<init>(Configuration.java:105)
at org.wikimedia.lsearch.config.Configuration.open(Configuration.java:68)
at org.wikimedia.lsearch.config.StartupManager.main(StartupManager.java:39)
Exception in thread "main" java.lang.NullPointerException
at java.util.Hashtable.get(Hashtable.java:336)
at org.wikimedia.lsearch.config.GlobalConfiguration.makeIndexIdPool(GlobalConfiguration.java:468)
at org.wikimedia.lsearch.config.GlobalConfiguration.read(GlobalConfiguration.java:413)
at org.wikimedia.lsearch.config.GlobalConfiguration.readFromURL(GlobalConfiguration.java:247)
at org.wikimedia.lsearch.config.Configuration.<init>(Configuration.java:116)
at org.wikimedia.lsearch.config.Configuration.open(Configuration.java:68)
at org.wikimedia.lsearch.config.StartupManager.main(StartupManager.java:39)
And then it goes back to the command prompt I believe this is an error because I can not get it to create the index. A little new to this though and not sure if I am doing things right. Also, Sorry if I am not putting this in right either! Any ideas? --Think411 22 December 2007
- As the error message suggests, your hostname seems to be wrong. Is "53664-jpbaello" really your hostname? Use "echo $HOSTNAME" to verify this. Check if this hostname correctly maps to your IP in /etc/hosts. Or, try using your IP instead of your hostname. --Rainman 12:21, 22 December 2007 (UTC)
[edit] 2008
[edit] Compiling to create lucenesearch.jar failed
I am trying to install the lucene engine for our wiki but the compile of lucene fails.
Ant gives back a lot of error messages during the compilation, errors like:
[javac] /root/lucene/lucene-search/org/wikimedia/lsearch/SearchState.java:331: cannot find symbol
[javac] symbol : class Hits
[javac] location: class org.wikimedia.lsearch.SearchState
[javac] Hits hits = searcher.search(new TermQuery(
[javac] ^
[javac] /root/lucene/lucene-search/org/wikimedia/lsearch/SearchState.java:331: cannot find symbol
[javac] symbol : class TermQuery
[javac] location: class org.wikimedia.lsearch.SearchState
[javac] Hits hits = searcher.search(new TermQuery(
[javac] ^
[javac] /root/lucene/lucene-search/org/wikimedia/lsearch/SearchState.java:332: cannot find symbol
[javac] symbol : class Term
[javac] location: class org.wikimedia.lsearch.SearchState
[javac] new Term("key", key)));
[javac] ^
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.
[javac] 85 errors
Can you help me to solve these error messages or provide a binary?
Many thanks in advance. --Phaidros 12 January 2008
- Are you compiling with a Sun Java 1.5+ compiler? If so, can you provide the beginning of the error log? --Rainman 00:55, 13 January 2008 (UTC)
Yes, I am using the opensuse 10.3 distribution and javac 1.5.0_13. I hope that the error messages I provide below are enough, sorry for my low experience in java build processes.
I´ll provide the first part and the last messages here:
Apache Ant version 1.7.0 compiled on September 22 2007
Buildfile: build.xml
Detected Java version: 1.5 in: /usr/lib/jvm/java-1.5.0-sun-1.5.0_update13-sr2/jre
Detected OS: Linux
parsing buildfile /root/lucene/lucene-search/build.xml with URI = file:/root/lucene/lucene-search/build.xml
Project base dir set to: /root/lucene/lucene-search
[antlib:org.apache.tools.ant] Could not load definitions from resource org/apache/tools/ant/antlib.xml. It could not be found.
[property] Loading /root/lucene-search.build.properties
[property] Unable to find property file: /root/lucene-search.build.properties
[property] Loading /root/build.properties
[property] Unable to find property file: /root/build.properties
[property] Loading /root/lucene/lucene-search/build.properties
[property] Unable to find property file: /root/lucene/lucene-search/build.properties
Property "current.year" has not been set
Build sequence for target(s) `default' is [init, compile-core, compile, default]
Complete build sequence is [init, compile-core, compile, default, package-tgz-src, jar-core, javadocs, package, package-zip, package-tgz, package-all-binary, dist, package-zip-src, package-all-src, dist-src, dist-all, jar, jar-src, clean, ]
init:
[mkdir] Skipping /root/lucene/lucene-search/bin because it already exists.
[mkdir] Skipping /root/lucene/lucene-search/dist because it already exists.
compile-core:
[mkdir] Skipping /root/lucene/lucene-search/bin because it already exists.
[javac] wikimedia/lsearch/Article.java added as wikimedia/lsearch/Article.class doesn't exist.
[javac] wikimedia/lsearch/ArticleList.java added as wikimedia/lsearch/ArticleList.class doesn't exist.
[javac] wikimedia/lsearch/Configuration.java added as wikimedia/lsearch/Configuration.class doesn't exist.
[javac] wikimedia/lsearch/DatabaseConnection.java added as wikimedia/lsearch/DatabaseConnection.class doesn't exist.
[javac] wikimedia/lsearch/EnglishAnalyzer.java added as wikimedia/lsearch/EnglishAnalyzer.class doesn't exist.
[javac] wikimedia/lsearch/EsperantoAnalyzer.java added as wikimedia/lsearch/EsperantoAnalyzer.class doesn't exist.
[javac] wikimedia/lsearch/EsperantoStemFilter.java added as wikimedia/lsearch/EsperantoStemFilter.class doesn't exist.
[javac] wikimedia/lsearch/MWDaemon.java added as wikimedia/lsearch/MWDaemon.class doesn't exist.
[javac] wikimedia/lsearch/MWSearch.java added as wikimedia/lsearch/MWSearch.class doesn't exist.
[javac] wikimedia/lsearch/NamespaceFilter.java added as wikimedia/lsearch/NamespaceFilter.class doesn't exist.
[javac] wikimedia/lsearch/QueryStringMap.java added as wikimedia/lsearch/QueryStringMap.class doesn't exist.
[javac] wikimedia/lsearch/SearchClientReader.java added as wikimedia/lsearch/SearchClientReader.class doesn't exist.
[javac] wikimedia/lsearch/SearchDbException.java added as wikimedia/lsearch/SearchDbException.class doesn't exist.
[javac] wikimedia/lsearch/SearchState.java added as wikimedia/lsearch/SearchState.class doesn't exist.
[javac] wikimedia/lsearch/Title.java added as wikimedia/lsearch/Title.class doesn't exist.
[javac] wikimedia/lsearch/TitlePrefixMatcher.java added as wikimedia/lsearch/TitlePrefixMatcher.class doesn't exist.
[javac] Compiling 16 source files to /root/lucene/lucene-search/bin
[javac] Using modern compiler
dropping /root/lucene/lucene-search/bin/bin from path as it doesn't exist
[javac] Compilation arguments:
[javac] '-deprecation'
[javac] '-d'
[javac] '/root/lucene/lucene-search/bin'
[javac] '-classpath'
[javac] '/root/lucene/lucene-search/bin:/usr/share/java/ant.jar:/usr/share/java/ant-launcher.jar:/usr/share/java/jaxp_parser_impl.jar:/usr/share/java/xml-commons-apis.jar:/usr/share/java/ant/ant-antlr.jar:/usr/share/java/bcel.jar:/usr/share/java/ant/ant-apache-bcel.jar:/usr/share/java/bsf.jar:/usr/share/java/ant/ant-apache-bsf.jar:/usr/share/java/log4j.jar:/usr/share/java/ant/ant-apache-log4j.jar:/usr/share/java/oro.jar:/usr/share/java/ant/ant-apache-oro.jar:/usr/share/java/regexp.jar:/usr/share/java/ant/ant-apache-regexp.jar:/usr/share/java/xml-commons-resolver.jar:/usr/share/java/ant/ant-apache-resolver.jar:/usr/share/java/jakarta-commons-logging.jar:/usr/share/java/ant/ant-commons-logging.jar:/usr/share/java/javamail.jar:/usr/share/java/jaf.jar:/usr/share/java/ant/ant-javamail.jar:/usr/share/java/jdepend.jar:/usr/share/java/ant/ant-jdepend.jar:/usr/share/java/ant/ant-jmf.jar:/usr/share/java/junit.jar:/usr/share/java/ant/ant-junit.jar:/usr/share/java/ant/ant-nodeps.jar:/usr/lib/jvm/java/lib/tools.jar:/usr/share/ant/lib/ant-apache-resolver-1.7.0.jar:/usr/share/ant/lib/ant-apache-bsf.jar:/usr/share/ant/lib/ant-nodeps.jar:/usr/share/ant/lib/ant-commons-logging.jar:/usr/share/ant/lib/ant-junit.jar:/usr/share/ant/lib/ant-javamail-1.7.0.jar:/usr/share/ant/lib/ant-junit-1.7.0.jar:/usr/share/ant/lib/ant-launcher.jar:/usr/share/ant/lib/ant-apache-log4j.jar:/usr/share/ant/lib/ant-apache-oro-1.7.0.jar:/usr/share/ant/lib/ant-javamail.jar:/usr/share/ant/lib/ant-apache-log4j-1.7.0.jar:/usr/share/ant/lib/ant-apache-bcel-1.7.0.jar:/usr/share/ant/lib/ant-nodeps-1.7.0.jar:/usr/share/ant/lib/ant-jmf.jar:/usr/share/ant/lib/ant-jmf-1.7.0.jar:/usr/share/ant/lib/ant-commons-logging-1.7.0.jar:/usr/share/ant/lib/ant-jdepend-1.7.0.jar:/usr/share/ant/lib/ant-1.7.0.jar:/usr/share/ant/lib/ant-apache-regexp.jar:/usr/share/ant/lib/ant-apache-oro.jar:/usr/share/ant/lib/ant-apache-resolver.jar:/usr/share/ant/lib/ant-jdepend.jar:/usr/share/ant/lib/ant-antlr.jar:/usr/share/ant/lib/ant-antlr-1.7.0.jar:/usr/share/ant/lib/ant-apache-regexp-1.7.0.jar:/usr/share/ant/lib/ant-apache-bcel.jar:/usr/share/ant/lib/ant-apache-bsf-1.7.0.jar:/usr/share/ant/lib/ant-launcher-1.7.0.jar:/usr/share/ant/lib/ant.jar'
[javac] '-sourcepath'
[javac] '/root/lucene/lucene-search/org'
[javac] '-encoding'
[javac] 'utf-8'
[javac] '-g'
[javac]
[javac] The ' characters around the executable and arguments are
[javac] not part of the command.
[javac] Files to be compiled:
[javac] /root/lucene/lucene-search/org/wikimedia/lsearch/Article.java
[javac] /root/lucene/lucene-search/org/wikimedia/lsearch/ArticleList.java
[javac] /root/lucene/lucene-search/org/wikimedia/lsearch/Configuration.java
[javac] /root/lucene/lucene-search/org/wikimedia/lsearch/DatabaseConnection.java
[javac] /root/lucene/lucene-search/org/wikimedia/lsearch/EnglishAnalyzer.java
[javac] /root/lucene/lucene-search/org/wikimedia/lsearch/EsperantoAnalyzer.java
[javac] /root/lucene/lucene-search/org/wikimedia/lsearch/EsperantoStemFilter.java
[javac] /root/lucene/lucene-search/org/wikimedia/lsearch/MWDaemon.java
[javac] /root/lucene/lucene-search/org/wikimedia/lsearch/MWSearch.java
[javac] /root/lucene/lucene-search/org/wikimedia/lsearch/NamespaceFilter.java
[javac] /root/lucene/lucene-search/org/wikimedia/lsearch/QueryStringMap.java
[javac] /root/lucene/lucene-search/org/wikimedia/lsearch/SearchClientReader.java
[javac] /root/lucene/lucene-search/org/wikimedia/lsearch/SearchDbException.java
[javac] /root/lucene/lucene-search/org/wikimedia/lsearch/SearchState.java
[javac] /root/lucene/lucene-search/org/wikimedia/lsearch/Title.java
[javac] /root/lucene/lucene-search/org/wikimedia/lsearch/TitlePrefixMatcher.java
[javac] /root/lucene/lucene-search/org/wikimedia/lsearch/EnglishAnalyzer.java:28: package org.apache.lucene.analysis does not exist
[javac] import org.apache.lucene.analysis.Analyzer;
[javac] ^
[javac] /root/lucene/lucene-search/org/wikimedia/lsearch/EnglishAnalyzer.java:29: package org.apache.lucene.analysis does not exist
[javac] import org.apache.lucene.analysis.LowerCaseTokenizer;
[javac] ^
[javac] /root/lucene/lucene-search/org/wikimedia/lsearch/EnglishAnalyzer.java:30: package org.apache.lucene.analysis does not exist
[javac] import org.apache.lucene.analysis.PorterStemFilter;
[javac] ^
[javac] /root/lucene/lucene-search/org/wikimedia/lsearch/EnglishAnalyzer.java:31: package org.apache.lucene.analysis does not exist
[javac] import org.apache.lucene.analysis.TokenStream;
[javac] ^
[javac] /root/lucene/lucene-search/org/wikimedia/lsearch/EnglishAnalyzer.java:37: cannot find symbol
[javac] symbol: class Analyzer
[javac] public class EnglishAnalyzer extends Analyzer {
[javac] ^
[javac] /root/lucene/lucene-search/org/wikimedia/lsearch/EnglishAnalyzer.java:38: cannot find symbol
[javac] symbol : class TokenStream
[javac] location: class org.wikimedia.lsearch.EnglishAnalyzer
[javac] public final TokenStream tokenStream(String fieldName, Reader reader) {
[javac] ^
[javac] /root/lucene/lucene-search/org/wikimedia/lsearch/EsperantoAnalyzer.java:31: package org.apache.lucene.analysis does not exist
[javac] import org.apache.lucene.analysis.Analyzer;
[javac] ^
[javac] /root/lucene/lucene-search/org/wikimedia/lsearch/EsperantoAnalyzer.java:32: package org.apache.lucene.analysis does not exist
[javac] import org.apache.lucene.analysis.LowerCaseTokenizer;
[javac] ^
[javac] /root/lucene/lucene-search/org/wikimedia/lsearch/EsperantoAnalyzer.java:33: package org.apache.lucene.analysis does not exist
[javac] import org.apache.lucene.analysis.Token;
[javac] ^
[javac] /root/lucene/lucene-search/org/wikimedia/lsearch/EsperantoAnalyzer.java:34: package org.apache.lucene.analysis does not exist
[javac] import org.apache.lucene.analysis.TokenStream;
[javac] ^
[javac] /root/lucene/lucene-search/org/wikimedia/lsearch/EsperantoAnalyzer.java:36: cannot find symbol
[javac] symbol: class Analyzer
[javac] public class EsperantoAnalyzer extends Analyzer{
[javac] ^
[javac] /root/lucene/lucene-search/org/wikimedia/lsearch/EsperantoAnalyzer.java:37: cannot find symbol
[javac] symbol : class TokenStream
[javac] location: class org.wikimedia.lsearch.EsperantoAnalyzer
[javac] public final TokenStream tokenStream(String fieldName, Reader reader) {
[javac] ^
[javac] /root/lucene/lucene-search/org/wikimedia/lsearch/EsperantoStemFilter.java:31: package org.apache.lucene.analysis does not exist
[javac] import org.apache.lucene.analysis.Token;
[javac] ^
[javac] /root/lucene/lucene-search/org/wikimedia/lsearch/EsperantoStemFilter.java:32: package org.apache.lucene.analysis does not exist
[javac] import org.apache.lucene.analysis.TokenStream;
[javac] ^
[javac] /root/lucene/lucene-search/org/wikimedia/lsearch/EsperantoStemFilter.java:33: package org.apache.lucene.analysis does not exist
[javac] import org.apache.lucene.analysis.TokenFilter;
[javac] ^
[javac] /root/lucene/lucene-search/org/wikimedia/lsearch/EsperantoStemFilter.java:36: cannot find symbol
[javac] symbol: class TokenFilter
[javac] public class EsperantoStemFilter extends TokenFilter {
[javac] ^
[javac] /root/lucene/lucene-search/org/wikimedia/lsearch/EsperantoStemFilter.java:37: cannot find symbol
[javac] symbol : class TokenStream
[javac] location: class org.wikimedia.lsearch.EsperantoStemFilter
[javac] public EsperantoStemFilter(TokenStream tokenizer) {
--- snipp ---
cutted some lines here
--- snipp ---
[javac] /root/lucene/lucene-search/org/wikimedia/lsearch/SearchState.java:332: cannot find symbol
[javac] symbol : class Term
[javac] location: class org.wikimedia.lsearch.SearchState
[javac] new Term("key", key)));
[javac] ^
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.
[javac] 85 errors
BUILD FAILED
/root/lucene/lucene-search/build.xml:55: Compile failed; see the compiler error output for details.
at org.apache.tools.ant.taskdefs.Javac.compile(Javac.java:999)
at org.apache.tools.ant.taskdefs.Javac.execute(Javac.java:820)
at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:288)
at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:105)
at org.apache.tools.ant.Task.perform(Task.java:348)
at org.apache.tools.ant.Target.execute(Target.java:357)
at org.apache.tools.ant.Target.performTasks(Target.java:385)
at org.apache.tools.ant.Project.executeSortedTargets(Project.java:1329)
at org.apache.tools.ant.Project.executeTarget(Project.java:1298)
at org.apache.tools.ant.helper.DefaultExecutor.executeTargets(DefaultExecutor.java:41)
at org.apache.tools.ant.Project.executeTargets(Project.java:1181)
at org.apache.tools.ant.Main.runBuild(Main.java:698)
at org.apache.tools.ant.Main.startAnt(Main.java:199)
at org.apache.tools.ant.launch.Launcher.run(Launcher.java:257)
at org.apache.tools.ant.launch.Launcher.main(Launcher.java:104)
--Phaidros 24 January 2008
- Looks like your ant is broken and cannot find the relevant libraries. I've compiled the package and put it here.--Rainman 11:01, 26 January 2008 (UTC)
[edit] Cannot bind RMIMessenger exception: non-JRMP server at remote endpoint
Hello everyone,
I'm quite new in Lucene stuff and I have a problem. I can't get Lucene Java working on one of my server. I've setup it on another server for Mediawiki and it works fine.
It's a GNU/Linux Ubuntu Edgy i686 with kernel 2.6.17-11-server running Apache 2.0 with PHP5 for Mediawiki, some others stuffs like Tomcat & Jboss. Got Java installed : j2re1.4, j2sdk1.4, java-common, libgcj-common, sun-java5-bin , sun-java5-demo , sun-java5-jdk and sun-java5-jre
In the case of the first server (fresh Ubuntu Gutsy 64bits with almost anything running) it worked fine, I can use Lucene to search into my Wiki. In the case of my second server, here is the error when I would like to start the engine :
www-data@myserver:/usr/local/search/ls2$ ./lsearchd . Trying config file at path /var/www/.lsearch.conf Trying config file at path /usr/local/search/ls2/lsearch.conf 0 [main] INFO org.wikimedia.lsearch.util.UnicodeDecomposer - Loaded unicode decomposer java.rmi.ConnectIOException: non-JRMP server at remote endpoint
- at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:217)
- at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:171)
- at sun.rmi.server.UnicastRef.newCall(UnicastRef.java:306)
- at sun.rmi.registry.RegistryImpl_Stub.rebind(Unknown Source)
- at org.wikimedia.lsearch.interoperability.RMIServer.register(RMIServer.java:24)
- at org.wikimedia.lsearch.interoperability.RMIServer.bindRMIObjects(RMIServer.java:60)
- at org.wikimedia.lsearch.config.StartupManager.main(StartupManager.java:52)
76 [main] WARN org.wikimedia.lsearch.interoperability.RMIServer - Cannot bind RMIMessenger exception:non-JRMP server at remote endpoint
But NOTHING use the port 8321. I've tried to use another port, it's the same problem. Any ideas how to solve this problem please? Here is my contact :
Thanks, LMJ 15 January 2008
- First verify that jboss, tomcat and lsearchd all run under sun-java5-bin (and not j2re1.4). If this is the case then maybe the RMI registry is colliding with jboss (so try stopping it if you can). If this appears to be the case, then you can either configure jboss not to use the port 1099, or edit RMIRegistry.java to use a different port (replace 1099 there with your port, and provide the port as param to getRegistry() calls in RMIRegistry.java and RMIMessengerClient.java). --Rainman 15:05, 15 January 2008 (UTC)
-
- Indeed Rainman, thanks for your help! look at this :
- # lsof +i :1099
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
java 20832 syncron 7u IPv4 149877937 TCP *:rmiregistry (LISTEN)
The port is used by Jboss rmiregistry :-/ I need some extra help to change that port. Can we exchange emails about it Rainman? I tried to contact you via your personal page but I just read English & French ;) --16 January 2008
-
- I've edited /usr/local/jboss-3.2.7/server/default/conf/jboss-service.xml and change to port to 10999. It seems to work better ;) Got another problem but it seems to be lsearch.conf related issue. --22 January 2008
[edit] Daemon status
On the German Wikipedia, I am often irritated because changes in content are not reflected immediately by the full text search and – at the moment – I cannot see whether and when the changes have already or will be processed by the daemon. Therefore, I would like to know:
- whether the daemon processes the changes chronologically so one could be certain that if one's changes were made at time T and the daemon has processed all changes up to T + 1, they will be reflected in the full text search, and
- whether there is any way to obtain the daemon status (all changes up to T, n articles in queue, etc.) from a current or future Wikipedia installation.
Thanks, Tim Landscheidt 19:52, 7 February 2008 (UTC)
- The index is updated around 5 am GMT every day on wikimedia projects (when nothing goes wrong which is most of the time). About 1) - yes, it processes the changes chronologically. 2) - this interface is available but only for system admins, for everybody else - just wait till tomorrow for changes to be applied. --Rainman 10:07, 8 February 2008 (UTC)
-
- Hmmm. If I search for "Lassithi" (note the double "s") now, I see that changes in de:Panagia i Kera (8 days ago), de:Kritsa (7 days ago), de:Ierapetra (10 days ago), de:Kera Kardiotissa (11 days ago), de:Griechische Toponyme (11 days ago), de:Venezianische Kolonien (9 days ago) and de:Sitia (11 days ago) have not been processed. Is that what you mean by "when nothing goes wrong"? :-) Would it be technically feasible to include the last time a change was successfully worked into the index in the result page, i. e. "All changes until T considered."? Tim Landscheidt 17:24, 8 February 2008 (UTC)
- Yes, this seems to be a case of "if nothing is broken" :) one of the dewiki search servers (srv21) is broken and stopped updating its index and seems to have a broken logrotate and possibly some other things. We'll fix it when a sysadmin become available. Whenever you see changes not going in for more than a couple of days you should report it. --Rainman 18:07, 8 February 2008 (UTC)
- Ok, we tracked this down to a hard drive failure on srv21, now one just needs to wait for cache to expire (~12h) and you should get fresh results - thanks for the report! --Rainman 18:57, 8 February 2008 (UTC)
- Thanks for the information :-). What would be the proper place to report such things in the future? Tim Landscheidt 21:32, 8 February 2008 (UTC)
- Technical issues are usually reported via IRC channel #wikimedia-tech where all of the sysadmins are. If there's no-one online to fix the problem then you could submit a bug. You could also send me an e-mail via this wiki or leave a message on my talk page, since I'm more-or-less in change of maintaining the search subsystem. --Rainman 21:44, 8 February 2008 (UTC)
- Okay, I'll keep that in mind. Thanks again, Tim Landscheidt 22:53, 8 February 2008 (UTC)
- Technical issues are usually reported via IRC channel #wikimedia-tech where all of the sysadmins are. If there's no-one online to fix the problem then you could submit a bug. You could also send me an e-mail via this wiki or leave a message on my talk page, since I'm more-or-less in change of maintaining the search subsystem. --Rainman 21:44, 8 February 2008 (UTC)
- Thanks for the information :-). What would be the proper place to report such things in the future? Tim Landscheidt 21:32, 8 February 2008 (UTC)
- Hmmm. If I search for "Lassithi" (note the double "s") now, I see that changes in de:Panagia i Kera (8 days ago), de:Kritsa (7 days ago), de:Ierapetra (10 days ago), de:Kera Kardiotissa (11 days ago), de:Griechische Toponyme (11 days ago), de:Venezianische Kolonien (9 days ago) and de:Sitia (11 days ago) have not been processed. Is that what you mean by "when nothing goes wrong"? :-) Would it be technically feasible to include the last time a change was successfully worked into the index in the result page, i. e. "All changes until T considered."? Tim Landscheidt 17:24, 8 February 2008 (UTC)
[edit] Query String Syntax
Please document the subset of Lucene query string syntax that has been implemented.
-- 216.143.51.66 22:52, 8 February 2008 (UTC)
[edit] Error running the Daemon
# . lsearchd RMI registry started. Trying config file at path /root/.lsearch.conf Trying config file at path /var/www/vhosts/kidneycancerknol.com/httpdocs/Lucene/ls2-bin/lsearch.conf 0 [main] INFO org.wikimedia.lsearch.util.Localization - Reading localization for En 530 [main] INFO org.wikimedia.lsearch.util.UnicodeDecomposer - Loaded unicode decomposer 602 [main] INFO org.wikimedia.lsearch.interoperability.RMIServer - RMIMessenger bound 619 [main] ERROR org.wikimedia.lsearch.search.SearcherCache - I/O Error opening index at path /var/www/vhosts/kidneycancerknol.com/httpdocs/Lucene/ls2-bin/indexes/search/kck_wiki : /var/www/vhosts/kidneycancerknol.com/httpdocs/Lucene/ls2-bin/indexes/search/kck_wiki/segments (No such file or directory) 621 [main] ERROR org.wikimedia.lsearch.search.SearcherCache - I/O Error opening index at path /var/www/vhosts/kidneycancerknol.com/httpdocs/Lucene/ls2-bin/indexes/search/kck_wiki : /var/www/vhosts/kidneycancerknol.com/httpdocs/Lucene/ls2-bin/indexes/search/kck_wiki/segments (No such file or directory) 621 [main] WARN org.wikimedia.lsearch.search.SearcherCache - I/O error warming index for kck_wiki 621 [Thread-3] INFO org.wikimedia.lsearch.frontend.SearchServer - Binding server to port 8123 623 [Thread-2] INFO org.wikimedia.lsearch.frontend.HTTPIndexServer - Started server at port 8321
I'm getting this error saying no file or directory. The directory exists, owever I don't know where the "segments" file comes from
I ran this to create the indexes
php maintenance/dumpBackup.php --current --quiet > wikidb.xml && java -cp LuceneSearch.jar org.wikimedia.lsearch.importer.Importer -s wikidb.xml kck_wiki
The wikidb.xml file exists in the httpdocs directory
...and then I started the deamon
Am I missing a trick?
Thanks
Andy Andy.thomas 19 February 2008
- And what is the output from the importer? It should give you a success messages that it created the indexes and successfully made a snapshot. --Rainman 01:30, 20 February 2008 (UTC)
I'm most likely doing something dumb (being a bit of a newbie) but This is what I get when I just run the
java -cp LuceneSearch.jar org.wikimedia.lsearch.importer.Importer -s wikidb.xml kck_wiki
Exception in thread "main" java.lang.NoClassDefFoundError: org/wikimedia/lsearch/importer/Importer
--Andy 17:00, 20 February 2008 (GMT)
- The java command you're running assumes that LuceneSearch.jar is in your current directory, the full command would be
java -cp /full/path/to/LuceneSearch.jar org.wikimedia.lsearch.importer.Importer -s wikidb.xml kck_wiki
- --Rainman 18:04, 20 February 2008 (UTC)
I'm getting further thanks that helped. Sorry - I'm being dumb I know and I apologise for asking you to hand hold me in this way but I now get this
rying config file at path /root/.lsearch.conf Trying config file at path /var/www/vhosts/kidneycancerknol.com/httpdocs/lsearch.conf 0 [main] INFO org.wikimedia.lsearch.util.UnicodeDecomposer - Loaded unicode decomposer 3 [main] INFO org.wikimedia.lsearch.util.Localization - Reading localization for En 60 [main] INFO org.wikimedia.lsearch.ranks.RankBuilder - First pass, getting a list of valid articles... 175 [main] FATAL org.wikimedia.lsearch.ranks.RankBuilder - I/O error reading dump while getting titles from wikidb.xml 175 [main] INFO org.wikimedia.lsearch.ranks.RankBuilder - Second pass, calculating article links... 179 [main] FATAL org.wikimedia.lsearch.ranks.RankBuilder - I/O error reading dump while calculating ranks for from wikidb.xml Exception in thread "main" java.lang.NullPointerException at org.wikimedia.lsearch.importer.Importer.main(Importer.java:114)
Do I need to set the OIA settings in the global config? I've just kept them s the default. --Andy 18:30, 20 February 2008 (GMT)
- No, you don't need oai.. Seems to me something is wrong with the xml file .. sure would be helpful if exception weren't suppressed :\ unfortunately cannot help you much more than that.. is wikidb.xml a valid xml file? did you give full path to it? --Rainman 01:00, 21 February 2008 (UTC)
[edit] Exception in thread "main" java.lang.UnsupportedClassVersionError
Hi
I use following configuration:
- MediaWiki: 1.11.0
- PHP: 5.2.5 (apache2handler)
- MySQL: 5.0.51
If I call this:
java -cp /usr/local/search/ls2/ls2-bin/LuceneSearch.jar org.wikimedia.lsearch.importer.Importer -s basiswikidb.xml basiswiki
I get the error:
Exception in thread "main" java.lang.UnsupportedClassVersionError: org/wikimedia/lsearch/importer/Importer (Unsupported major.minor version 49.0)
at java.lang.ClassLoader.defineClass0(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:539)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:123)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:251)
at java.net.URLClassLoader.access$100(URLClassLoader.java:55)
at java.net.URLClassLoader$1.run(URLClassLoader.java:194)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:187)
at java.lang.ClassLoader.loadClass(ClassLoader.java:289)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:274)
at java.lang.ClassLoader.loadClass(ClassLoader.java:235)
at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:302)
My Configuration
- all Files are in /usr/local/search/ls2/
- MWConfig.global=file:///usr/local/search/ls2/lsearch-global.conf
- MWConfig.lib=/usr/local/search/ls2/lib
- Indexes.path=/usr/local/search/indexes
- Localization.url=file:///opt/lampp/htdocs/basiswiki/languages/messages
- Logging.logconfig=/usr/local/search/ls2/lsearch.log4j
- mwdumper.jar => /usr/local/search/ls2/lib
- lsearch.conf: Storage.lib=/usr/local/search/ls2/sql
lsearch-global.conf
[Database] #wikilucene : (single) (language,en) (warmup,0) wikidev : (single) (language,sr) wikilucene : (nssplit,3) (nspart1,[0]) (nspart2,[4,5,12,13]), (nspart3,[]) wikilucene : (language,en) (warmup,10) basiswiki : (single) (language,en) (warmup,10) # Search groups # Index parts of a split index are always taken from the node's group # host : db1.part db2.part # Mulitple hosts can search multiple dbs (N-N mapping) [Search-Group] <my host> : wikilucene wikidev <my host> : basiswiki
Please can you help me?!
85.158.226.1 11:03, 31 March 2008 (UTC)
- Run java -version. I probably have old java, you need to update to 1.5 or later. --Rainman 11:57, 31 March 2008 (UTC)
[edit] MediaWiki+Lucene-Search+MWSearch = ZERO search results ??!@#?!
Can someone please assist me? =)
- Slackware 12.0, on i686 Pentium III [Linux 2.6.21.5]
- MediaWiki: 1.9.1
- PHP: 5.2.5 (apache2handler)
- MySQL: 5.0.37
- MediaWiki Extension(s): MWSearch SVN (05122008), and Lucene-search SVN (05122008), + I downloaded & installed mwdumper.jar into the Lucene2 lib dir.
- other tools: jre-6u2-i586-1, jdk-1_5_0_09-i586-1, apache-ant-1.7.0-i586-1bj, rsync-2.6.9-i486-1
I've followed the steps per Extension:Lucene-search and Extension:MWSearch pages, to the T - I've gone over and over them several times, I've been to MediaWiki Forums, and the MediaWiki-L mailing list ... please help me! =)
My Local LuceneSearch configuration
- LuceneSearch SVN Install dir: /usr/local/search/lucene-search-2svn05112008
- Indexes stored: /usr/local/search/indexes
/etc/lsearch.conf
MWConfig.global=file:///etc/lsearch-global.conf MWConfig.lib=/usr/local/search/lucene-search-2svn05112008/lib Indexes.path=/usr/local/search/indexes Search.updateinterval=1 Search.updatedelay=0 Search.checkinterval=30 Index.snapshotinterval=5 Index.maxqueuecount=5000 Index.maxqueuetimeout=12 Storage.master=localhost Storage.username=wikiuser Storage.password=mypass Storage.useSeparateDBs=false Storage.defaultDB=wikidb Storage.lib=/usr/local/search/lucene-search-2svn05112008/sql Localization.url=file:///var/www/htdocs/wiki/languages/messages Logging.logconfig=/etc/lsearch.log4j Logging.debug=true
/etc/lsearch-global.conf
[Database] wikidb : (single) (language,en) (warmup,10) [Search-Group] nen-tftp : wikidb [Index] nen-tftp : wikidb [Index-Path] <default> : /usr/local/search/indexes [OAI] wiktionary : http://$lang.wiktionary.org/w/index.php wikilucene : http://localhost/wiki-lucene/phase3/index.php <default> : http://$lang.wikipedia.org/w/index.php [Properties] Database.suffix=wiki wiktionary wikidb KeywordScoring.suffix=wikidb wiki wikilucene wikidev ExactCase.suffix=wikidb wiktionary wikilucene [Namespace-Prefix] all : <all> [0] : 0 [1] : 1 [2] : 2 [3] : 3 [4] : 4 [5] : 5 [6] : 6 [7] : 7 [8] : 8 [9] : 9 [10] : 10 [11] : 11 [12] : 12 [13] : 13 [14] : 14 [15] : 15
/etc/lsearch.log4j
log4j.rootLogger=INFO, A1 log4j.appender.A1=org.apache.log4j.ConsoleAppender log4j.appender.A1.layout=org.apache.log4j.PatternLayout log4j.appender.A1.layout.ConversionPattern=%-4r [%t] %-5p %c %x - %m%n
relevant /var/www/htdocs/wiki/LocalSettings.php settings
$wgSearchType = 'LuceneSearch';
$wgLuceneHost = 'localhost';
$wgLucenePort = 8123;
require_once("extensions/MWSearch/MWSearch.php");
building the index works running dumpBackup(Init).php
> php maintenance/dumpBackupInit.php --current --quiet > wikidb.xml && java -cp /usr/local/search/lucene-search-2svn05112008/LuceneSearch.jar org.wikimedia.lsearch.importer.Importer -s /var/www/htdocs/wiki/wikidb.xml wikidb MediaWiki Lucene search indexer - index builder from xml database dumps. Trying config file at path /root/.lsearch.conf Trying config file at path /var/www/htdocs/wiki/lsearch.conf Trying config file at path /etc/lsearch.conf log4j: Trying to find [log4j.xml] using context classloader sun.misc.Launcher$AppClassLoader@133056f. log4j: Trying to find [log4j.xml] using sun.misc.Launcher$AppClassLoader@133056f class loader. log4j: Trying to find [log4j.xml] using ClassLoader.getSystemResource(). log4j: Trying to find [log4j.properties] using context classloader sun.misc.Launcher$AppClassLoader@133056f. log4j: Trying to find [log4j.properties] using sun.misc.Launcher$AppClassLoader@133056f class loader. log4j: Trying to find [log4j.properties] using ClassLoader.getSystemResource(). log4j: Could not find resource: [null]. log4j: Parsing for [root] with value=[INFO, A1]. log4j: Level token is [INFO]. log4j: Category root set to INFO log4j: Parsing appender named "A1". log4j: Parsing layout options for "A1". log4j: Setting property [conversionPattern] to [%-4r [%t] %-5p %c %x - %m%n]. log4j: End of parsing for "A1". log4j: Parsed "A1" options. log4j: Finished configuring. 0 [main] INFO org.wikimedia.lsearch.util.UnicodeDecomposer - Loaded unicode decomposer 18 [main] INFO org.wikimedia.lsearch.util.Localization - Reading localization for En 434 [main] INFO org.wikimedia.lsearch.ranks.RankBuilder - First pass, getting a list of valid articles... 94 pages (99.576/sec), 94 revs (99.576/sec) 1527 [main] INFO org.wikimedia.lsearch.ranks.RankBuilder - Second pass, calculating article links... 94 pages (326.389/sec), 94 revs (326.389/sec) 1928 [main] INFO org.wikimedia.lsearch.importer.Importer - Third pass, indexing articles... 94 pages (24.588/sec), 94 revs (24.588/sec) 6005 [main] INFO org.wikimedia.lsearch.importer.Importer - Closing/optimizing index... Finished indexing in 5s, with final index optimization in 0s Total time: 6s 6530 [main] INFO org.wikimedia.lsearch.index.IndexThread - Making snapshot for wikidb 6582 [main] INFO org.wikimedia.lsearch.index.IndexThread - Made snapshot /usr/local/search/indexes/snapshot/wikidb/20080512024654
That creates a 277KB file @ /var/www/htdocs/wiki/wikidb.xml , which looks just fine to me...
Starting the lsearch daemon is working When I run my script /usr/local/search/lucene-search-2svn05112008/lsearchd - which starts the lsearch deamon, I get the following, which ALSO looks fine ;
java -Djava.rmi.server.codebase=file:///usr/local/search/lucene-search-2svn05112008/LuceneSeah.jar -Djava.rmi.server.hostname=nen-tftp -jar /usr/local/search/lucene-search-2svn05112008/LuceneSearch.jar $* RMI registry started. Trying config file at path /root/.lsearch.conf Trying config file at path /usr/local/search/lucene-search-2svn05112008/lsearch.conf log4j: Parsing for [root] with value=[INFO, A1]. log4j: Level token is [INFO]. log4j: Category root set to INFO log4j: Parsing appender named "A1". log4j: Parsing layout options for "A1". log4j: Setting property [conversionPattern] to [%-4r [%t] %-5p %c %x - %m%n]. log4j: End of parsing for "A1". log4j: Parsed "A1" options. log4j: Finished configuring. 0 [main] INFO org.wikimedia.lsearch.util.Localization - Reading localization for En 2351 [main] INFO org.wikimedia.lsearch.util.UnicodeDecomposer - Loaded unicode decomposer 2600 [main] INFO org.wikimedia.lsearch.interoperability.RMIServer - RMIMessenger bound 2882 [main] INFO org.wikimedia.lsearch.interoperability.RMIServer - RemoteSearchable<wikidb> bound 2914 [main] INFO org.wikimedia.lsearch.search.Warmup - Warming up index wikidb ... 2928 [Thread-2] INFO org.wikimedia.lsearch.frontend.HTTPIndexServer - Started server at port 8321 2929 [Thread-3] INFO org.wikimedia.lsearch.frontend.SearchServer - Binding server to port 8123 4246 [main] INFO org.wikimedia.lsearch.search.Warmup - Warmed up wikidb in 1331 ms 4246 [main] INFO org.wikimedia.lsearch.search.Warmup - Warming up index wikidb ... 5079 [main] INFO org.wikimedia.lsearch.search.Warmup - Warmed up wikidb in 833 ms 5079 [main] INFO org.wikimedia.lsearch.search.Warmup - Warming up index wikidb ... 5861 [main] INFO org.wikimedia.lsearch.search.Warmup - Warmed up wikidb in 782 ms
From here, I pull up my normal wiki, which has been working fine ALL along - but now, I get ZERO search results, no matter what I do! I know I am searching correctly, I just type in 1 single word for searching (that I know is on several pages in the wiki) I've even tried to edit the file before and after building the index, and starting/stoping the lsearch daemon, yet I get this error in my MediaWiki search results page;
Search results From AgentDcooper's Wiki You searched for wiki For more information about searching AgentDcooper's Wiki, see Searching AgentDcooper's Wiki. Showing below 0 results starting with #1. No page text matches Note: Unsuccessful searches are often caused by searching for common words like "have" and "from", which are not indexed, or by specifying more than one search term (only pages containing all of the search terms will appear in the result).
I notice that the lsearch daemon console output scrolls the following; right after doing a search within the wiki
293744 [pool-2-thread-1] INFO org.wikimedia.lsearch.frontend.HttpHandler - query:/search/wikidb/wiki?namespaces=0%2C1%2C2%2C3%2C4%2C5%2C6%2C7%2C8%2C9%2C10%2C11%2C12%2C13%2C14%2C15&offset=0&limit=100&version=2&iwlimit=10 what:search dbname:wikidb term:wiki
293759 [pool-2-thread-1] INFO org.wikimedia.lsearch.search.SearchEngine - Using NamespaceFilterWrapper wrap: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}
293786 [pool-2-thread-1] INFO org.wikimedia.lsearch.search.SearchEngine - search wikidb: query=[wiki] parsed=[contents:wiki (title:wiki^6.0 stemtitle:wiki^2.0) (alttitle1:wiki^4.0 alttitle2:wiki^4.0 alttitle3:wiki^4.0) (keyword1:wiki^0.02 keyword2:wiki^0.01 keyword3:wiki^0.0066666664 keyword4:wiki^0.0050 keyword5:wiki^0.0039999997)] hit=[27] in 16ms using IndexSearcherMul:1210585609666
With Mediawiki Debuging enabled, my /var/log/mediawiki/debug_log.txt shows this
Start request GET /wiki/index.php/Special:Search?search=wiki&fulltext=Search Host: nen-tftp.techiekb.com User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14 Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0. 5 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive Referer: http://nen-tftp.techiekb.com/wiki/index.php/Special:Version Cookie: wikidb_session=3jptdli2pf3nkuq924tq1ihlt0 Authorization: Basic ZGNvb3Blcjp0ZXN0cGFzcw== Main cache: FakeMemCachedClient Message cache: MediaWikiBagOStuff Parser cache: MediaWikiBagOStuff Unstubbing $wgParser on call of $wgParser->setHook from require_once Fully initialised Unstubbing $wgContLang on call of $wgContLang->checkTitleEncoding from WebRequest::getGPCVal Language::loadLocalisation(): got localisation for en from source Unstubbing $wgUser on call of $wgUser->isAllowed from Title::userCanRead Cache miss for user 2 Unstubbing $wgLoadBalancer on call of $wgLoadBalancer->getConnection from wfGetDB Logged in from session Unstubbing $wgMessageCache on call of $wgMessageCache->getTransform from wfMsgGetKey Unstubbing $wgLang on call of $wgLang->getCode from MessageCache::get MessageCache::load(): got from global cache Unstubbing $wgOut on call of $wgOut->setPageTitle from SpecialSearch::setupPage Fetching search data from http://localhost:8123/search/wikidb/wiki?namespaces=0%2C1%2C2%2C3%2C4%2C5%2C6%2C 7%2C8%2C9%2C10%2C11%2C12%2C13%2C14%2C15&offset=0&limit=100&version=2&iwlimit=10 total [0] hits OutputPage::sendCacheControl: private caching; ** Request ended normally
Now get this, if I goto the link from the debug from above = http://localhost:8123/search/wikidb/wiki?namespaces=0%2C1%2C2%2C3%2C4%2C5%2C6%2C7%2C8%2C9%2C10%2C11%2C12%2C13%2C14%2C15&offset=0&limit=100&version=2&iwlimit=10 , I get this page;;
3 1.0 0 Main_Page 0.9577699303627014 0 EFFICIENT%2FCISCO%2FNETSCREEN%2FNETOPIA_Router_Command_Matrix 0.7121278643608093 0 DBU_-_DialBackUp
Which leads me to my question: what am I doing wrong?? I have tried everything I can think of, I just cannot get my search within my mediawiki to work proplery. It seems like the search itself is working when going to the link directly above -- somehow the "total hits" in the log as well as the wiki are showing ZERO? Yet manually going to the link in the debug, shows me what appears to be a result indicating 3 PAGES were found with corresponding results data!?@# Why is MediaWiki not showing this? Anyhelp would be kindly appreciated, or even a link for reference! -peace- --Agentdcooper 12 May 2008
- I would suspect the problem is the MW version. Search front-end has been heavily refactored in MediaWiki 1.13, and MWSearch is designed to run with latest mediawiki, so there might be some compatibility issues. Note that MW 1.13 is still not released, but is still in development. Try using Extension:LuceneSearch instead. --Rainman 13:20, 12 May 2008 (UTC)
-
- Thanks a TON, I will try this out in just a few, I half suspected it was a MediaWiki versioning issue, I really need to upgrade! =) --Agentdcooper 20:16, 12 May 2008 (UTC)
- I moved to LuceneSearch and getting a strange error -- I removed MWSearch extension entirely, then downloaded Extension:LuceneSearch SVN from today, and moved the LuceneSeach directory to /var/www/htdocs/wiki - chmod'd to 755 recursively to make sure it isn't a permissions issue - the I commented out the MWSearch code in LocalSettings.php;
#$wgSearchType = 'LuceneSearch';
#$wgLuceneHost = 'localhost';
#$wgLucenePort = 8123;
#require_once("extensions/MWSearch/MWSearch.php");
-
- I've tried different settings for Extension:LuceneSearch, but ended up with this config for LuceneSearch ;
$wgDisableInternalSearch = true;
$wgDisableSearchUpdate = true;
$wgSearchType = 'LuceneSearch';
$wgLuceneHost = 'localhost';
$wgLucenePort = 8123;
require_once("extensions/LuceneSearch/LuceneSearch.php");
$wgLuceneSearchVersion = 2;
$wgLuceneDisableSuggestions = true;
$wgLuceneDisableTitleMatches = true;
I then ran the indexer, which seemed to go great ;
> php maintenance/dumpBackupInit.php --current --quiet > wikidb.xml && java -cp /usr/local/search/lucene-search-2svn05112008/LuceneSearch.jar org.wikimedia.lsearch.importer.Importer -s /var/www/htdocs/wiki/wikidb.xml wikidb MediaWiki Lucene search indexer - index builder from xml database dumps. Trying config file at path /root/.lsearch.conf Trying config file at path /var/www/htdocs/wiki/lsearch.conf Trying config file at path /etc/lsearch.conf log4j: Trying to find [log4j.xml] using context classloader sun.misc.Launcher$AppClassLoader@133056f. log4j: Trying to find [log4j.xml] using sun.misc.Launcher$AppClassLoader@133056f class loader. log4j: Trying to find [log4j.xml] using ClassLoader.getSystemResource(). log4j: Trying to find [log4j.properties] using context classloader sun.misc.Launcher$AppClassLoader@133056f. log4j: Trying to find [log4j.properties] using sun.misc.Launcher$AppClassLoader@133056f class loader. log4j: Trying to find [log4j.properties] using ClassLoader.getSystemResource(). log4j: Could not find resource: [null]. log4j: Parsing for [root] with value=[INFO, A1]. log4j: Level token is [INFO]. log4j: Category root set to INFO log4j: Parsing appender named "A1". log4j: Parsing layout options for "A1". log4j: Setting property [conversionPattern] to [%-4r [%t] %-5p %c %x - %m%n]. log4j: End of parsing for "A1". log4j: Parsed "A1" options. log4j: Finished configuring. 0 [main] INFO org.wikimedia.lsearch.util.UnicodeDecomposer - Loaded unicode decomposer 17 [main] INFO org.wikimedia.lsearch.util.Localization - Reading localization for En 432 [main] INFO org.wikimedia.lsearch.ranks.RankBuilder - First pass, getting a list of valid articles... 94 pages (98.739/sec), 94 revs (98.739/sec) 1532 [main] INFO org.wikimedia.lsearch.ranks.RankBuilder - Second pass, calculating article links... 94 pages (325.26/sec), 94 revs (325.26/sec) 1934 [main] INFO org.wikimedia.lsearch.importer.Importer - Third pass, indexing articles... 94 pages (24.691/sec), 94 revs (24.691/sec) 5996 [main] INFO org.wikimedia.lsearch.importer.Importer - Closing/optimizing index... Finished indexing in 5s, with final index optimization in 0s Total time: 6s 6515 [main] INFO org.wikimedia.lsearch.index.IndexThread - Making snapshot for wikidb 6566 [main] INFO org.wikimedia.lsearch.index.IndexThread - Made snapshot /usr/local/search/indexes/snapshot/wikidb/20080512134828
And then, started lsearch daemon via console ;
> java -Djava.rmi.server.codebase=file:///usr/local/search/lucene-search-2svn05112008/LuceneSeach.jar -Djava.rmi.server.hostname=nen-tftp -jar /usr/local/search/lucene-search-2svn05112008/LuceneSearch.jar $* RMI registry started. Trying config file at path /root/.lsearch.conf Trying config file at path /root/lsearch.conf Trying config file at path /etc/lsearch.conf log4j: Parsing for [root] with value=[INFO, A1]. log4j: Level token is [INFO]. log4j: Category root set to INFO log4j: Parsing appender named "A1". log4j: Parsing layout options for "A1". log4j: Setting property [conversionPattern] to [%-4r [%t] %-5p %c %x - %m%n]. log4j: End of parsing for "A1". log4j: Parsed "A1" options. log4j: Finished configuring. 0 [main] INFO org.wikimedia.lsearch.util.Localization - Reading localization for En 2353 [main] INFO org.wikimedia.lsearch.util.UnicodeDecomposer - Loaded unicode decomposer 2603 [main] INFO org.wikimedia.lsearch.interoperability.RMIServer - RMIMessenger bound 2885 [main] INFO org.wikimedia.lsearch.interoperability.RMIServer - RemoteSearchable<wikidb> bound 2929 [Thread-2] INFO org.wikimedia.lsearch.frontend.HTTPIndexServer - Started server at port 8321 2930 [Thread-3] INFO org.wikimedia.lsearch.frontend.SearchServer - Binding server to port 8123 2935 [main] INFO org.wikimedia.lsearch.search.Warmup - Warming up index wikidb ... 4265 [main] INFO org.wikimedia.lsearch.search.Warmup - Warmed up wikidb in 1329 ms 4266 [main] INFO org.wikimedia.lsearch.search.Warmup - Warming up index wikidb ... 5110 [main] INFO org.wikimedia.lsearch.search.Warmup - Warmed up wikidb in 844 ms 5110 [main] INFO org.wikimedia.lsearch.search.Warmup - Warming up index wikidb ... 5922 [main] INFO org.wikimedia.lsearch.search.Warmup - Warmed up wikidb in 811 ms
My mediawiki's Special:Version page shows LuceneSearch (version 2.0) is installed properly. Yet, when I do any type of search in my MediaWiki, the page comes up displaying the following error;
Fatal error: Call to undefined function wfLoadExtensionMessages() in /var/www/htdocs/wiki/extensions/LuceneSearch/LuceneSearch_body.php on line 85
The lsearch daemon console output shows nothing, new since I started it! That to me indicates; the search isn't being passed to the lsearch daemon?? ... In reviewing the Debug log @ /var/log/mediawiki/debug_log.txt, I'm seeing this ;;
Start request GET /wiki/index.php/Special:Search?search=wiki&fulltext=Search Host: nen-tftp.techiekb.com User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.14) Gecko/2 0080404 Firefox/2.0.0.14 Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plai n;q=0.8,image/png,*/*;q=0.5 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive Referer: http://nen-tftp.techiekb.com/wiki/index.php/Main_Page Cookie: wikidbUserName=Rprior; wikidb_session=buvigq1obd1nd5ulbk1l8d83s7; wikidb UserID=2; wikidbToken=dd6c9b732dba0c94b04ad72044d46d79 Authorization: Basic ZGNvb3Blcjp0ZXN0cGFzcw== Main cache: FakeMemCachedClient Message cache: MediaWikiBagOStuff Parser cache: MediaWikiBagOStuff Unstubbing $wgParser on call of $wgParser->setHook from require_once Fully initialised Unstubbing $wgContLang on call of $wgContLang->checkTitleEncoding from WebReques t::getGPCVal Language::loadLocalisation(): got localisation for en from source Unstubbing $wgUser on call of $wgUser->isAllowed from Title::userCanRead Cache miss for user 2 Unstubbing $wgLoadBalancer on call of $wgLoadBalancer->getConnection from wfGetDB Logged in from session
Just as an added note here, the file /var/www/htdocs/wiki/extensions/LuceneSearch/LuceneSearch_body.php includes the following on line #85 thru #89 ;
wfLoadExtensionMessages( 'LuceneSearch' );
$fname = 'LuceneSearch::execute';
wfProfileIn( $fname );
$this->setHeaders();
$wgOut->addHTML('<!-- titlens = '. $wgTitle->getNamespace() . '-
->');
Any chance you got an idea on how to fix this issue? =) --- I am thinking I may just have to update to mediawiki SVN and try MWSearch if I cannot get this going on my current mediawiki install, yet I'd LOVE to fix this if possible. Please help me! =) --Agentdcooper 21:12, 12 May 2008 (UTC)
[edit] 2008-05-12 :: Installed Mediawiki SVN + Lucene-Search SVN & MWSearch SVN, still getting ZERO search results
I flat-out installed MW from new version of mediawiki SVN, Lucene-search SVN, and MWSearch SVN Version r34306 -- all subversion/SVN downloads from 05.12.2008, with lucene-search-2 SVN being 05.11.2008).
- Base-system: is Slackware 12.0, on i686 Pentium III [Linux 2.6.21.5]
- Mediawiki 1.13alpha (r34693)
- PHP: 5.2.5
- MySQL: 5.0.37
- packages: jre-6u2-i586-1, jdk-1_5_0_09-i586-1, apache-ant-1.7.0-i586-1bj, rsync-2.6.9-i486-1
- mwdumper.jar is intalled in /usr/local/search/lucene-search-2svn05112008/lib directory.
- ExtensionFunctions.php installed @ /var/www/htdocs/wiki-test/extensions
- Special:Version shows MWSearch (Version r34306) is installed properly...
My config files
/etc/lsearch.conf
MWConfig.global=file:///etc/lsearch-global.conf MWConfig.lib=/usr/local/search/lucene-search-2svn05112008/lib Indexes.path=/usr/local/search/indexes Search.updateinterval=1 Search.updatedelay=0 Search.checkinterval=30 Index.snapshotinterval=5 Index.maxqueuecount=5000 Index.maxqueuetimeout=12 Storage.master=localhost Storage.username=newwikiuser Storage.password=testpass Storage.useSeparateDBs=false Storage.defaultDB=wikidbnew Storage.lib=/usr/local/search/lucene-search-2svn05112008/sql SearcherPool.size=3 Localization.url=file:///var/www/htdocs/wiki-test/languages/messages Logging.logconfig=/etc/lsearch.log4j Logging.debug=true
/etc/lsearch-global.conf
[Database] wikidbnew : (single) (language,en) (warmup,10) [Index] nen-tftp : wikidbnew [Index-Path] <default> : /usr/local/search/indexes [OAI] wiktionary : http://$lang.wiktionary.org/w/index.php wikilucene : http://localhost/wiki-lucene/phase3/index.php <default> : http://$lang.wikipedia.org/w/index.php [Properties] Database.suffix=wiki wiktionary wikidbnew KeywordScoring.suffix=wikidbnew wiki wikilucene wikidev ExactCase.suffix=wikidbnew wiktionary wikilucene [Namespace-Prefix] all : <all> [0] : 0 [1] : 1 [2] : 2 [3] : 3 [4] : 4 [5] : 5 [6] : 6 [7] : 7 [8] : 8 [9] : 9 [10] : 10 [11] : 11 [12] : 12 [13] : 13 [14] : 14 [15] : 15
/etc/lsearch.log4j
log4j.rootLogger=INFO, A1 log4j.appender.A1=org.apache.log4j.ConsoleAppender log4j.appender.A1.layout=org.apache.log4j.PatternLayout log4j.appender.A1.layout.ConversionPattern=%-4r [%t] %-5p %c %x - %m%n
command-line for indexing my wiki (now in a script called /var/www/htdocs/wiki-test/dumpBackup.sh)
php maintenance/dumpBackupInit.php --current --quiet > wikidbnew.xml && java -cp /usr/local/search/lucene-search-2svn05112008/LuceneSearch.jar org.wikimedia.lsearch.importer.Importer -s /var/www/htdocs/wiki-test/wikidbnew.xml wikidbnew
command-line to start lsearch daemon (now in a script called /usr/local/search/lucene-search-2svn05112008/lsearchd)
java -Djava.rmi.server.codebase=file:///usr/local/search/lucene-search-2svn05112008/LuceneSeach.jar -Djava.rmi.server.hostname=nen-tftp -jar /usr/local/search/lucene-search-2svn05112008/LuceneSearch.jar $*
PHP Version 5.2.5 was configured with command line that enabled curl
-
- switches used '--with-curl=shared' '--with-curlwrappers'
- cURL support = enabled
- cURL Information = libcurl/7.16.2 OpenSSL/0.9.8e zlib/1.2.3 libidn/0.6.10
- the mySQL DB wikidbnew does show a table called searchindex sized 20.5 KiB, which appears to be populated correctly with search info from my wikidb.
config/install of new mediawiki SVN I ran thru the basic config/install of mediawiki, and put some data into the basic wiki - something I knew could be searchable easily. I build the index, it seems to build without error, everything just works --- but when I issue a search from the main wiki page, i get ZERO search results, even tho' the mediawiki original search DID find these searches when it was just a basic mediawiki install, prior to me installing Lucene-Search and/or MWSearch extensions.
mediawiki search results = ZERO What seems strange here is everything seems to work, up-to the point of searching thru my wiki! when I search in the wiki, i get the following, ZERO results message ;
No page text matches Note: Only some namespaces are searched by default. Try prefixing your query with all: to search all content (including talk pages, templates, etc), or use the desired namespace as prefix.
mediawiki debug file When I look at the mediawiki debug file = /var/log/mediawiki/debug_mediawiki-wiki-test_log.txt, it shows the following :: when a search is being submitted for 'wiki' (which exists in multiple locations on the mainpage within the mediawiki) ;;
Start request
GET /wiki-test/index.php?title=Special%3ASearch&search=wiki&ns0=1&ns1=1&ns2=1&ns3=1&ns4=1&ns5=1&ns6=1&ns7=1&ns8=1&ns9=1&ns10=1&ns11=1&ns12=1&ns13=1&ns14=1&ns15=1&fulltext=Search
Host: nen-tftp.techiekb.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer: http://nen-tftp.techiekb.com/wiki-test/index.php/Special:Search?search=wiki&fulltext=Search
Cookie: wikidbUserName=Rprior; wikidb_session=buvigq1obd1nd5ulbk1l8d83s7; wikidbUserID=2; wikidbToken=dd6c9b732dba0c94b04ad72044d46d79; wikidbnew_session=gvchrcs1cf12uvdukl1odpapk7; wikidbnewUserID=1; wikidbnewUserName=Rprior; wikidbnewToken=ef9b27fc68ffacb8c7362b31ea27e292
Authorization: Basic ZGNvb3Blcjp0ZXN0cGFzcw==
Main cache: FakeMemCachedClient
Message cache: MediaWikiBagOStuff
Parser cache: MediaWikiBagOStuff
session_set_cookie_params: "0", "/", "", "", "1"
Fully initialised
Unstubbing $wgContLang on call of $wgContLang->checkTitleEncoding from WebRequest::getGPCVal
Language::loadLocalisation(): got localisation for en from source
Unstubbing $wgOut on call of $wgOut->setArticleRelated from SpecialPage::setHeaders
Unstubbing $wgMessageCache on call of $wgMessageCache->get from wfMsgGetKey
Unstubbing $wgLang on call of $wgLang->getCode from MessageCache::get
Unstubbing $wgUser on call of $wgUser->getOption from StubUserLang::_newObject
Cache miss for user 1
Connecting to localhost wikidbnew...
Connected
Logged in from session
MessageCache::load(): got from global cache
Unstubbing $wgParser on call of $wgParser->firstCallInit from MessageCache::transform
Preprocessor_Hash::preprocessToObj
$1 - {{SITENAME}}
Preprocessor_Hash::preprocessToObj
$1 - {{SITENAME}}
Preprocessor_Hash::preprocessToObj
You searched for '''[[:wiki]]'''
Preprocessor_Hash::preprocessToObj
For more information about searching {{SITENAME}}, see [[{{MediaWiki:Helppage}}|{{int:help}}]].
Preprocessor_Hash::preprocessToObj
Help:Contents
Fetching search data from http://nen-tftp.techiekb.com:8123/search/wikidbnew/wiki?namespaces=0%2C1%2C2%2C3%2C4%2C5%2C6%2C7%2C8%2C9%2C10%2C11%2C12%2C13%2C14%2C15&offset=0&limit=100&version=2&iwlimit=10
Http::request: GET http://nen-tftp.techiekb.com:8123/search/wikidbnew/wiki?namespaces=0%2C1%2C2%2C3%2C4%2C5%2C6%2C7%2C8%2C9%2C10%2C11%2C12%2C13%2C14%2C15&offset=0&limit=100&version=2&iwlimit=10
total [0] hits
Preprocessor_Hash::preprocessToObj
==No page text matches==
Preprocessor_Hash::preprocessToObj
'''Note''': Only some namespaces are searched by default. Try prefixing your query with ''all:'' to search all content (including talk pages, templates, etc), or use the desired namespace as prefix.
Preprocessor_Hash::preprocessToObj
Search in namespaces:<br />$1<br />
Preprocessor_Hash::preprocessToObj
Preprocessor_Hash::preprocessToObj
Preprocessor_Hash::preprocessToObj
Search for $1 $2
Preprocessor_Hash::preprocessToObj
{{SITENAME}} ({{CONTENTLANGUAGE}})
Preprocessor_Hash::preprocessToObj
About {{SITENAME}}
Preprocessor_Hash::preprocessToObj
About {{SITENAME}}
Preprocessor_Hash::preprocessToObj
From {{SITENAME}}
Preprocessor_Hash::preprocessToObj
Search {{SITENAME}}
OutputPage::sendCacheControl: private caching; **
Request ended normally
pointing a browser at the link in debug file
Here's the deal though, if I goto the link in the debug thru lynx/a browser = "http://nen-tftp.techiekb.com:8123/search/wikidbnew/wiki?namespaces=0%2C1%2C2%2C3%2C4%2C5%2C6%2C7%2C8%2C9%2C10%2C11%2C12%2C13%2C14%2C15&offsett=0&limit=100&version=2&iwlimit=10" - I get this output ! ;
1 1.0 0 Main_Page
HELP :: where am I going wrong??
Mediawiki gives me no results, and the debug log file above, shows a total [0] hits, why am I getting zero hits? no matter what I do, I am getting zero hits!? can you see anything wrong I am doing here? please help =) --Agentdcooper 00:41, 13 May 2008 (UTC)
- just to note: if I grep the file /var/www/htdocs/wiki-test/wikidbnew.xml for the same word I am searching for, I get MANY hits!? --Agentdcooper 00:51, 13 May 2008 (UTC)
-
- OK then, try adding wfDebug($data); somewhere around line 564 in MWSearch.php. This should print to the MediaWiki debug log the same data you're seeing whey you directly access the search URL. If it doesn't print anything, then something is wrong with your curl. --Rainman 09:06, 13 May 2008 (UTC)
-
-
- Well, I think you are on to something there! so here's the deal, I put wfDebug($data); on line #565, by itself. I then re-ran the index command, and restarted the lsearch daemon so I could watch the console output via SSH session .... I loaded up the main wiki page, and did a basic search for the word "wiki" here's what happens ;;
-
-
-
- After pushing the search button within the wiki, it takes me to a blank page [my browser's address bar shows = "http://<mydomain.com>/wiki-test/index.php/Special:Search?search=wiki&fulltext=Search" yet is completely blank, watching the console output from the lsearch daemon, it shows the following;
-
629776 [pool-1-thread-5] INFO org.wikimedia.lsearch.frontend.HttpHandler - query:/search/wikidbnew/wiki?namespaces=0%2C1%2C2%2C3%2C4%2C5%2C6%2C7%2C8%2C9%2C10%2C11%2C12%2C13%2C14%2C15&offset=0&limit=100&version=2&iwlimit=10 what:search dbname:wikidbnew term:wiki
629780 [pool-1-thread-5] INFO org.wikimedia.lsearch.search.SearchEngine - Using NamespaceFilterWrapper wrap: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}
629784 [pool-1-thread-5] INFO org.wikimedia.lsearch.search.SearchEngine - search wikidbnew: query=[wiki] parsed=[contents:wiki (title:wiki^6.0 stemtitle:wiki^2.0) (alttitle1:wiki^4.0 alttitle2:wiki^4.0 alttitle3:wiki^4.0) (keyword1:wiki^0.02 keyword2:wiki^0.01 keyword3:wiki^0.0066666664 keyword4:wiki^0.0050 keyword5:wiki^0.0039999997)] hit=[1] in 5ms using IndexSearcherMul:1210691193858
-
-
- my debug log @ /var/log/mediawiki/debug_mediawiki-wiki-test_log.txt scrolls the following by, right when I do that "wiki" search ;;
-
Start request
GET /wiki-test/index.php/Special:Search?search=wiki&fulltext=Search
Host: nen-tftp.techiekb.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.14) Gecko/2
0080404 Firefox/2.0.0.14
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plai
n;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer: http://nen-tftp.techiekb.com/wiki-test/index.php/Main_Page
Cookie: wikidbUserName=Rprior; wikidb_session=buvigq1obd1nd5ulbk1l8d83s7; wikidb
UserID=2; wikidbToken=dd6c9b732dba0c94b04ad72044d46d79; wikidbnew_session=gvchrc
s1cf12uvdukl1odpapk7; wikidbnewUserID=1; wikidbnewUserName=Rprior; wikidbnewToke
n=ef9b27fc68ffacb8c7362b31ea27e292
Authorization: Basic ZGNvb3Blcjp0ZXN0cGFzcw==
Main cache: FakeMemCachedClient
Message cache: MediaWikiBagOStuff
Parser cache: MediaWikiBagOStuff
session_set_cookie_params: "0", "/", "", "", "1"
Fully initialised
Unstubbing $wgContLang on call of $wgContLang->checkTitleEncoding from WebReques
t::getGPCVal
Language::loadLocalisation(): got localisation for en from source
Unstubbing $wgOut on call of $wgOut->setArticleRelated from SpecialPage::setHead
ers
Unstubbing $wgMessageCache on call of $wgMessageCache->get from wfMsgGetKey
Unstubbing $wgLang on call of $wgLang->getCode from MessageCache::get
Unstubbing $wgUser on call of $wgUser->getOption from StubUserLang::_newObject
Cache miss for user 1
Connecting to localhost wikidbnew...
Connected
Logged in from session
MessageCache::load(): got from global cache
Unstubbing $wgParser on call of $wgParser->firstCallInit from MessageCache::tran
sform
Preprocessor_Hash::preprocessToObj
$1 - {{SITENAME}}
Preprocessor_Hash::preprocessToObj
$1 - {{SITENAME}}
Preprocessor_Hash::preprocessToObj
You searched for '''[[:wiki]]'''
Preprocessor_Hash::preprocessToObj
For more information about searching {{SITENAME}}, see [[{{MediaWiki:Helppage}}|
{{int:help}}]].
Preprocessor_Hash::preprocessToObj
Help:Contents
Fetching search data from http://nen-tftp.techiekb.com:8123/search/wikidbnew/wik
i?namespaces=0%2C1%2C2%2C3%2C4%2C5%2C6%2C7%2C8%2C9%2C10%2C11%2C12%2C13%2C14%2C15
&offset=0&limit=100&version=2&iwlimit=10
Http::request: GET http://nen-tftp.techiekb.com:8123/search/wikidbnew/wiki?names
paces=0%2C1%2C2%2C3%2C4%2C5%2C6%2C7%2C8%2C9%2C10%2C11%2C12%2C13%2C14%2C15&offset
=0&limit=100&version=2&iwlimit=10
-
-
- If I goto that link at the bottom of the debug log, the following is displayed in my browser;;
-
1 1.0 0 Main_Page
-
-
- so, what are you thinking boss, is it my CURL install? if that's the case, a new slackware v12.1 just came out, and it appears they updated apache to v2.2.8, PHP to v5.2.6, yet slack v12.1 still is using curl v7.16.2 package, which is the same version I'm running now, but it has been repackaged ... hmmmm ... what do you think rainman?? BTW, thanks a million for your assistance! I really cant wait to get this lucene search functionality working for my mediawiki project! --Agentdcooper 15:33, 13 May 2008 (UTC)
-
-
-
- any idea's, anyone? I am stuck... please help. --Agentdcooper 03:38, 15 May 2008 (UTC)
-
-
-
- I am going to install slackware v12.1 as a FRESH install on a new computer, and try this all over again, to see if it may be something I messed up along the way, I will report back with my results... In case someone ends up reading the above, and can make a suggestion, I'm all ears! I will be keeping the slackware 12.0 install seperate, and would love to hear from someone on how I might go about fixing it! -peace- --Agentdcooper 20:52, 15 May 2008 (UTC)
-
currently, i'm updating to newer OS, but is that necessary, REALLY?
I am downloading slackware 12.1 ISO's right now, but it just bewilders me why I would need to have the latest/greatest OS to run mediawiki - as I understood it, mediawiki can run on all sorts of linux based OS's/distributions and doesn't necessarily need to have the best hardware needed to run with... I've detailed my problems heavily above, I am hoping someone can help me, before I get my new, rather large 2.0Gig OS download completed (it'll take a couple days, due to my slow `net connection right now... I'd really like to fix whats broken before updating my entire OS, meh? thanks for all the help so far! --Agentdcooper 03:24, 19 May 2008 (UTC)
[edit] Lucene-search wrecks Special:ListUsers
When using Lucene-search version 2.0.2 (the current version as of this date) under mediawiki 1.10.x, I found that the special page Special:ListUsers stays blank. Turning on error reporting revealed a fatal error:
Fatal error: Class 'ApiQueryGeneratorBase' not found in /srv/www/htdocs/mediawiki/extensions/LuceneSearch/ApiQueryLuceneSearch.php on line 33
I found that this can be solved by adding the line
require_once($IP.'/includes/api/ApiQueryBase.php');
into the file LuceneSearch_body.php (right below the require statement which is already there).
Lexw 12:38, 17 July 2008 (UTC)
[edit] Exception resolution
If you have an error such as
Exception in thread "main" java.lang.NullPointerException
at java.io.File.<init>(Unknown Source)
at org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:117)
at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:204)
at org.wikimedia.lsearch.importer.SimpleIndexWriter.openIndex(SimpleIndexWriter.java:67)
at org.wikimedia.lsearch.importer.SimpleIndexWriter.<init>(SimpleIndexWriter.java:49)
at org.wikimedia.lsearch.importer.DumpImporter.<init>(DumpImporter.java:39)
at org.wikimedia.lsearch.importer.Importer.main(Importer.java:128)
when running the index creation, it can be because your host name changed (check $HOSTNAME on command line). In that case, update lsearch-global.conf
Darkoneko m'écrire 13:22, 23 July 2008 (UTC)
[edit] LuceneSearch is not available anymore?
LuceneSearch extension was developed for MediaWiki version 1.12 which IS the current version. But the box on the top of the page says it is not to be used with the current version, and the extension is not available in SVN any more. WHY is that? Am I missing something? Oduvan 13:37, 7 August 2008 (UTC)
- Seems like someone moved around some extensions. I've updated the link on Extension:LuceneSearch to point to right location. --Rainman 19:49, 7 August 2008 (UTC)
[edit] Running multiple lsearch daemons
Hi, I am setting up a server which hosts several wikis. We want to use the lucene search for some of them so I have to config several lsearch daemons.
Although I change the Search.Port variable in the lsearch.conf file (Search.port=8124), and after starting the first lsearch, the second lsearch daemon complains about the port 8123 is being used.
Log from first lsearch:
452 [main] INFO org.wikimedia.lsearch.interoperability.RMIServer - RMIMessenger bound 493 [main] INFO org.wikimedia.lsearch.interoperability.RMIServer - RemoteSearchable<hiflydb> bound 495 [Thread-1] INFO org.wikimedia.lsearch.frontend.HTTPIndexServer - Started server at port 8321 495 [Thread-2] INFO org.wikimedia.lsearch.frontend.SearchServer - Binding server to port 8123 497 [main] INFO org.wikimedia.lsearch.search.Warmup - Warming up index hiflydb ...
Log from second lsearch:
471 [main] INFO org.wikimedia.lsearch.interoperability.RMIServer - RMIMessenger bound 511 [main] INFO org.wikimedia.lsearch.interoperability.RMIServer - RemoteSearchable<sgidb> bound 514 [main] INFO org.wikimedia.lsearch.search.Warmup - Warming up index sgidb ... 565 [Thread-1] INFO org.wikimedia.lsearch.frontend.HTTPIndexServer - Started server at port 8322 565 [Thread-2] INFO org.wikimedia.lsearch.frontend.SearchServer - Binding server to port 8123 565 [Thread-2] FATAL org.wikimedia.lsearch.frontend.SearchServer - Error: bind error: Address already in use
What I'm doing wrong?
Thanks for your help. --2 October 2008
Hi,
I ran into the same problem, but I found out, that the SearchServer class does not parse the configuration for Search.Port.
The HTTPIndexServer on the other side parses the configuration for Index.port.
I suggest, that ther should be code like the following in the SearchServer class as well.
[...]
public class HTTPIndexServer extends Thread {
[...]
int port = config.getInt("Index","port",8321);
[...]
I will try this out this. Hopefully I will post successful results afterwards.
Regards, -- Voglerp 14:21, 20 October 2008 (UTC)
So here are my test results:
I added the following two lines into the SearchServer class
[...]
public class SearchServer extends Thread {
[...]
org.apache.log4j.Logger log = Logger.getLogger(SearchServer.class);
1 // Read port setting from configfile, if not found set default
2 port = config.getInt("Search","port",8123);
log.info("Searcher started on port " + port);
[...]
Now the Searcher listens to the port specified in the configuration or to the default port 8123. But a new problem is, that it is no longer possible to specify the port on the commandline with -port.
Is it possible to change the code that both options will work?
Kind regards, Peter --Voglerp 08:06, 23 October 2008 (UTC)
[edit] Error when trying to run lsearch daemon
Everytime I run lsearchd I get the following error:
RMI registry started.
[java] Trying config file at path /root/.lsearch.conf
[java] Trying config file at path /usr/local/search/ls2-bin/lsearch.conf
[java] Ignoring a line up to first section heading...
[java] Ignoring a line up to first section heading...
[java] Ignoring a line up to first section heading...
[java] Ignoring a line up to first section heading...
[java] Ignoring a line up to first section heading...
[java] Ignoring a line up to first section heading...
[java] Ignoring a line up to first section heading...
[java] Ignoring a line up to first section heading...
[java] Ignoring a line up to first section heading...
[java] ERROR in GlobalConfiguration: Default path for index absent. Check section [Index-Path].
and this is what the [Index-Path] section of the global config looks like:
# Rsync path where indexes are on hosts, after default value put # hosts where the location differs # Syntax: host : <path> [Index-Path] <default> : /mwsearch
any suggestions?
--Dgat16 20:28, 12 October 2008 (UTC)
try the Following:
# Path where indexes are on hosts, after default value put hosts where # the location differs [Index-Path] <default> : /mwsearch 127.0.0.1 : mwsearch2
--Bachenberg 13:15, 27 August 2009 (UTC)
[edit] need help for small wiki farm
I have a small wiki farm with to wikis, mywiki-en and mywiki-de running on the same wiki software and sharing the same mysql database wikidb.
The mysql tables for both wikis are prefixed, with en_ or with de_ respectively.
mywiki-en is in English
mywiki-de is in German.
I know how to make two separate dump files, wikidb_en.dump and wikidb_de.dump by using the commands
export REQUEST_URI=/wiki/en && php /wwd/wiki/maintenance/dumpBackup.php --current --quiet > wikidb_en.xml export REQUEST_URI=/wiki/de && php /wwd/wiki/maintenance/dumpBackup.php --current --quiet > wikidb_de.xml
My question is: how do I configure Lucene and mwsearch, so that
- for searches in /wiki/de it uses the indexes created from wikidb_de.xml,
- for searches in /wiki/en it uses the indexes created from wikidb_en.xml
I would not desire that hits in the english wiki show up as serch results for queries in wiki/de, and the other way around.
I also need to know how to configure lsearch-global.conf So far I have written there
[Database] wikidb : (single) (language,en) (warmup,10)
but this is of course not correct: the dabase wikidb contains two wikis, one on German, one in English.
I hope that somebody can help me a bit.
Thank you, Alois 16:06, 29 October 2008 (UTC)
[edit] Searching what the user sees or searching what's behind the scenes
It seems to make no sense to search the unrendered wiki-text rather than the final product. I don't see why wiki comments ( <!-- such as this --> )should be included in the search but the contents of included templates are not. It really should be the other way round.
For those wikis using the semantic media wiki extension, they also find that the results of inline queries are excluded from the search, that also seems like something that needs to change.
Perhaps there is a place for a search that looks behind the scenes. It may be of interest to a wiki-site manager, but for a standard user the search really needs to be of the actual page contents.
Pnelnik 17:41, 28 November 2008 (UTC)
- Agreed that it doesn't. However, it is not a matter of if it makes sense or not, but whether it is difficult or easy to do. There is no easy way to reconstruct articles with templates from very large xml dumps, and no advanced way to integrate updates from OAI with templates, queues and such. This is one of those places where the flexibility of MW in one regard (e.g. syntax and caching) make a huge trade-off with other (ability to have a decent search). --Rainman 02:04, 30 November 2008 (UTC)
[edit] Lack of sane defaults
This extension suffers from a lack of sane defaults, which makes setting it up unnecessarily confusing. I will give some examples from the instructions.
- mwdumper.jar: should be IN subversion. There is no reason to have to checkout the code for the extension and then get another file
- speaking of subversion, the root should be moved up a level. The root should not be 'lucene-search-2' if you are going to ask them to put that in a parent directory called 'search'. The root should be 'search', and it should already contain the 'indexes' subdirectory. The instructions should then read 'svn co http://svn.wikimedia.org/svnroot/mediawiki/trunk/search /usr/local/search'.
- MWConfig.global: specifically asks for a "URL", which have a very specific meaning, and gives an example of only a url. That's great for a multi-host configuration, which most mediawiki installations are not. The default path to this file should be /usr/local/search/ls2/lsearch-global.conf. If this is not an acceptable path, you should say so. The file:/// prefix that is used in these wiki instructions is not what people expect to see.
- MWConfig.lib: Here you use a standard path, which people normally expect. But this is NOT what they expect since you have told them to use 'file:///' in the previous instructions on the wiki (but not in the configuration file). This is confusing!!!!
- Localization.url: Back to the file:/// prefix. AGHHHH. There is no need to specify that it is a file. File paths are unambiguous without file:///.
- Logging.logconfig: There is no reason to prompt the user for the location of this file if you put it in the ls2 directory by default, and make that the default location.
I believe that, up to this point, every single configuration step could have been avoided if there had been sane defaults in place. I don't have the energy to do the rest. --Alterego 18:47, 5 January 2009 (UTC)
- I agree that the configuration is overly complicated, that is why the devel branch has a one-step script that will generate and connect all of the configuration in single-host installs. As for url/local file distinction, it follows a simple rule: everything that is global and shared across the search cluster (e.g. global config and MW files) is url, everything local (e.g. local config, indexes path, local log4j config and library files...) is a local path, although that is probably not obvious from the variable names... --Rainman 19:20, 5 January 2009 (UTC)
[edit] LSEARCH Daemon init script for SUSE
- from Pierre Boisvert.
- this is our init script for the daemon. It is simple but work for us, so it coult help others as well.
# chkconfig: 2345 80 20
# description: Apache Lucene is a high-performance, full-featured text \
# search engine library written entirely in Java
# processname: lsearchd
# config: /etc/lsearch.conf
# pidfile: /var/run/lsearchd.pid
# Source function library.
. /etc/rc.status
JAVA=/usr/bin/java
PROG=lsearchd
BASEDIR=/usr/local/bin/ls2-bin
LOG_FILE=/var/log/lsearchd.log
PID_FILE=/var/run/lsearchd.pid
PROG_BIN="$JAVA -Djava.rmi.server.codebase=file://$BASEDIR/LuceneSearch.jar -Djava.rmi.server.hostname=$HOSTNAME -jar $BASEDIR/LuceneSearch.jar"
CHECK_PROC=`ps -ef | grep $JAVA | grep -v grep | wc -l`
rc_reset
start() {
echo -n $"Starting $PROG: "
if [ ! -f $PID_FILE ]
then
$PROG_BIN >$LOG_FILE $* 2>&1 & echo $! > $PID_FILE
else
if [ $CHECK_PROC -gt 0 ]
then
echo "The LSEARCHD Daemon already started"
rc_failed
else
echo "Removing old Pid file..."
rm $PID_FILE
$PROG_BIN $* >LOGFILE 2>&1 & echo $! > $PID_FILE
fi
fi
rc_status -v
}
stop() {
echo -n $"Stopping $prog: "
/sbin/killproc -p $PID_FILE -v $JAVA
rc_status -v
}
status(){
echo -n "Checking for Lsearchd daemon "
checkproc -p $PID_FILE $JAVA
rc_status -v
}
usage() {
echo $"Usage: ${prog} {start|stop|restart|reload|status|help"
exit 1
}
# See how we were called.
case "$1" in
start) start;;
stop) stop;;
status) status;;
restart) stop && start;;
*) usage;;
esac
rc_exit
[edit] 2009
[edit] ./configure for v. 2.1 does not seem to work
Running Ubuntu 8.04, Ant 1.7, Java 1.6.0_07, using the Binary install package:
user@host: ./configure /path/to/mw/install "0 [main] WARN org.wikimedia.lsearch.util.Command - Got exit value 1 while executing [/bin/bash, -c, cd /path/to/mw/install && (echo "return \$wgDBname" | php maintenance/eval.php)] Exception in thread "main" java.io.IOException: Error executing command: at org.wikimedia.lsearch.util.Command.exec(Command.java:45) at org.wikimedia.lsearch.util.Configure.getVariable(Configure.java:77) at org.wikimedia.lsearch.util.Configure.main(Configure.java:42) user@host: sudo ./configure /path/to/mw/install 0 [main] WARN org.wikimedia.lsearch.util.Command - Got exit value 1 while executing [/bin/bash, -c, cd /path/to/mw/install && (echo "return \$wgDBname" | php maintenance/eval.php)] Exception in thread "main" java.io.IOException: Error executing command: at org.wikimedia.lsearch.util.Command.exec(Command.java:45) at org.wikimedia.lsearch.util.Configure.getVariable(Configure.java:77) at org.wikimedia.lsearch.util.Configure.main(Configure.java:42) user@host: sudo su root@host: ./configure /path/to/mw/install 0 [main] WARN org.wikimedia.lsearch.util.Command - Got exit value 1 while executing [/bin/bash, -c, cd /path/to/mw/instal && (echo "return \$wgDBname" | php maintenance/eval.php)] Exception in thread "main" java.io.IOException: Error executing command: at org.wikimedia.lsearch.util.Command.exec(Command.java:45) at org.wikimedia.lsearch.util.Configure.getVariable(Configure.java:77) at org.wikimedia.lsearch.util.Configure.main(Configure.java:42)
Seems to me that this is highly unlikely to be a permissions issue. My MW installation is working just fine otherwise.
I can't even get past the first step of the instructions, which does not bode well. Will try building from source, but doubt that will make any difference.... Any ideas? --Fungiblename 20:38, 18 March 2009 (UTC)
- You need to replace /path/to/mw/install with the actual path to your mediawiki installation (e.g. something like /var/www/mediawiki/). --Rainman 21:07, 18 March 2009 (UTC)
- Thanks, I was using my actual path but did not want to reproduce it here in full. I was able to compile the SVN version, however, and even after changing the "hostname" variable to my actual hostname as recognized by Apache, I get the following:
./configure /var/www/mw 0 [main] WARN org.wikimedia.lsearch.util.Command - Got exit value 1 while executing [/bin/bash, -c, cd /var/www/mw && (echo "return \$wgDBname" | php maintenance/eval.php)] Exception in thread "main" java.io.IOException: Error executing command:
at org.wikimedia.lsearch.util.Command.exec(Command.java:45) at org.wikimedia.lsearch.util.Configure.getVariable(Configure.java:77) at org.wikimedia.lsearch.util.Configure.main(Configure.java:42) --Fungiblename 21:14, 18 March 2009 (UTC)
- If you go into your mw installation dir (i.e. one you supplied) and run echo "return \$wgDBname" | php maintenance/eval.php what do you get? Do you get the name of your database? --Rainman 21:28, 18 March 2009 (UTC)
-
- Thanks for the troubleshooting advice! It seems like this was a major an oversight on my part. I get the same error as above because I'm running a small wiki farm with shared code (symlinks from the install directory to the shared MediaWiki code). Once I wrote "export MW_INSTALL_PATH=/var/www/mw && ./configure /var/www/mw" it wrote all the config files. You may want to add a note on the main page about configuring for installations with shared code (at least this very basic step). I'll play around on my own to try to find a way to have multiple separate indexes (my plan is to set up multiple directories with separate config files, index directories, and a symlink to the main jar). I'll try to get it working with just one first, though. Thanks again for your help and all your hard work on this! --Fungiblename 07:39, 19 March 2009 (UTC)
-
- For me configure sets wrong value of dbname in config.ini and it cause . Here I see "dbname=> DatabaseName>". Note wrong ">" signs. Calling echo "return \$wgDBname" | php maintenance/eval.php returns
> DatabaseName >
-
- eval.php at some servers prints prompt to stdout. I found that it happens when php function posix_isatty exists. Sometimes it does not.
- Also configure wants php to be in PATH. It is not always true either. --Roma7
[edit] Here's just a taste of my output from trying to build from source of the STABLE version
user@host:~/common/elements/lucene-SVN-stable-2009-03-18$ ant Buildfile: build.xml
build:
[mkdir] Created dir: /home/username/common/elements/lucene-SVN-stable-2009-03-18/bin [javac] Compiling 101 source files to /home/username/common/elements/lucene-SVN-stable-2009-03-18/bin [javac] /home/username/common/elements/lucene-SVN-stable-2009-03-18/src/org/wikimedia/lsearch/analyzers/WikiQueryParser.java:24: package org.mediawiki.importer does not exist [javac] import org.mediawiki.importer.ExactListFilter; [javac] ^ [javac] /home/username/common/elements/lucene-SVN-stable-2009-03-18/src/org/wikimedia/lsearch/importer/DumpImporter.java:13: package org.mediawiki.importer does not exist...
.... rTest.java uses or overrides a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details. [javac] Note: Some input files use unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. [javac] 70 errors
BUILD FAILED /home/username/common/elements/lucene-SVN-stable-2009-03-18/build.xml:68: Compile failed; see the compiler error output for details.
Total time: 2 seconds
"ant -Xlint:deprecation -f build.xml Unknown argument: -Xlint:deprecation"
Does anyone have any instructions about how to even get this thing running? Are there some hidden instructions/prerequisites that I'm missing? Seems to me this should be pretty easy to run on Linux.... --Fungiblename 20:53, 18 March 2009 (UTC)
- Must place "mwdumper.jar" in "lib" of directory downloaded from SVN. --Fungiblename 21:12, 18 March 2009 (UTC)
[edit] Unable to build
When building from the binary I get this error. I am in Ubuntu:
root@testwiki:/usr/share/mediawiki/extensions/lucene-search-2.1# ./build
Dumping wikidb...
2009-03-19 20:14:42: wikidb 99 pages (143.215/sec), 100 revs (144.661/sec), ETA 2009-03-19 20:14:45 [max 513]
2009-03-19 20:14:42: wikidb 199 pages (192.676/sec), 200 revs (193.645/sec), ETA 2009-03-19 20:14:44 [max 513]
2009-03-19 20:14:43: wikidb 299 pages (222.928/sec), 300 revs (223.674/sec), ETA 2009-03-19 20:14:44 [max 513]
2009-03-19 20:14:43: wikidb 399 pages (230.430/sec), 400 revs (231.008/sec), ETA 2009-03-19 20:14:44 [max 513]
2009-03-19 20:14:43: wikidb 458 pages (243.707/sec), 458 revs (243.707/sec), ETA 2009-03-19 20:14:44 [max 513]
mkdir: cannot create directory `/var/lib/mediawiki/extensions/lucene-search-2.1/indexes/status': No such file or directory
./build: line 19: /var/lib/mediawiki/extensions/lucene-search-2.1/indexes/status/wikidb: No such file or directory
MediaWiki lucene-search indexer - rebuild all indexes associated with a database.
Trying config file at path /root/.lsearch.conf
Trying config file at path /var/lib/mediawiki/extensions/lucene-search-2.1/lsearch.conf
MediaWiki lucene-search indexer - index builder from xml database dumps.
1 [main] INFO org.wikimedia.lsearch.util.Localization - Reading localization for En
2799 [main] INFO org.wikimedia.lsearch.ranks.Links - Making index at /var/lib/mediawiki/extensions/lucene-search-2.1/indexes/import/wikidb.links
3208 [main] INFO org.wikimedia.lsearch.ranks.LinksBuilder - Calculating article links...
458 pages (26.889/sec), 458 revs (26.889/sec)
21058 [main] INFO org.wikimedia.lsearch.index.IndexThread - Making snapshot for wikidb.links
21291 [main] INFO org.wikimedia.lsearch.index.IndexThread - Made snapshot /var/lib/mediawiki/extensions/lucene-search-2.1/indexes/snapshot/wikidb.links/20090319161516
21405 [main] INFO org.wikimedia.lsearch.search.UpdateThread - Syncing wikidb.links
21963 [main] INFO org.wikimedia.lsearch.ranks.Links - Opening for read /var/lib/mediawiki/extensions/lucene-search-2.1/indexes/search/wikidb.links
21973 [main] INFO org.wikimedia.lsearch.related.RelatedBuilder - Rebuilding related mapping from links
34467 [main] INFO org.wikimedia.lsearch.index.IndexThread - Making snapshot for wikidb.related
34649 [main] INFO org.wikimedia.lsearch.index.IndexThread - Made snapshot /var/lib/mediawiki/extensions/lucene-search-2.1/indexes/snapshot/wikidb.related/20090319161529
34661 [main] INFO org.wikimedia.lsearch.importer.Importer - Indexing articles (index+highlight+titles)...
34663 [main] INFO org.wikimedia.lsearch.ranks.Links - Opening for read /var/lib/mediawiki/extensions/lucene-search-2.1/indexes/search/wikidb.links
35075 [main] INFO org.wikimedia.lsearch.analyzers.StopWords - Successfully loaded stop words for: [nl, en, it, fr, de, sv, es, no, pt, da] in 329 ms
35077 [main] INFO org.wikimedia.lsearch.importer.SimpleIndexWriter - Making new index at /var/lib/mediawiki/extensions/lucene-search-2.1/indexes/import/wikidb
35087 [main] INFO org.wikimedia.lsearch.importer.SimpleIndexWriter - Making new index at /var/lib/mediawiki/extensions/lucene-search-2.1/indexes/import/wikidb.hl
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException
at java.lang.System.arraycopy(libgcj.so.81)
at java.io.ByteArrayOutputStream.write(libgcj.so.81)
at org.apache.lucene.index.FieldsReader.uncompress(FieldsReader.java:514)
at org.apache.lucene.index.FieldsReader.addField(FieldsReader.java:317)
at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:166)
at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:659)
at org.apache.lucene.index.IndexReader.document(IndexReader.java:525)
at org.wikimedia.lsearch.storage.RelatedStorage.getRelated(RelatedStorage.java:56)
at org.wikimedia.lsearch.importer.DumpImporter.writeEndPage(DumpImporter.java:109)
at org.mediawiki.importer.PageFilter.writeEndPage(Unknown Source)
at org.mediawiki.importer.XmlDumpReader.closePage(Unknown Source)
at org.mediawiki.importer.XmlDumpReader.endElement(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanEndElement(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
at org.apache.xerces.jaxp.SAXParserImpl.parse(Unknown Source)
at javax.xml.parsers.SAXParser.parse(libgcj.so.81)
at org.mediawiki.importer.XmlDumpReader.readDump(Unknown Source)
at org.wikimedia.lsearch.importer.Importer.main(Importer.java:186)
at org.wikimedia.lsearch.importer.BuildAll.main(BuildAll.java:109)
root@testwiki:/usr/share/mediawiki/extensions/lucene-search-2.1#
root@testwiki:/usr/share/mediawiki/extensions/lucene-search-2.1# java -version
java version "1.5.0"
gij (GNU libgcj) version 4.2.4 (Ubuntu 4.2.4-1ubuntu3)
Copyright (C) 2007 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
root@testwiki:/usr/share/mediawiki/extensions/lucene-search-2.1# javac
Eclipse Java Compiler v_774_R33x, 3.3.1
Copyright IBM Corp 2000, 2007. All rights reserved.
Usage: <options> <source files | directories>
If directories are specified, then their source contents are compiled.
Possible options are listed below. Options enabled by default are prefixed
with '+'.
Classpath options:
-cp -classpath <directories and zip/jar files
What is wrong?
- It won't work on GNU java. You can use openjdk6 which is also opensource java and is available as a package for ubuntu. --Rainman 21:14, 19 March 2009 (UTC)
Thanks, I will give that a shot. 166.50.205.143 11:02, 20 March 2009 (UTC)
[edit] Newest binary (2.1.1) does not appear to run on Mac OS 10.5.6 - 2.1 did not run either.
$ java -version java version "1.6.0_07" Java(TM) SE Runtime Environment (build 1.6.0_07-b06-153) Java HotSpot(TM) 64-Bit Server VM (build 1.6.0_07-b06-57, mixed mode) $ export MW_INSTALL_PATH=/Sites/mw/ && ./configure /Sites/mw/ Exception in thread "main" java.lang.UnsupportedClassVersionError: Bad version number in .class file at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:675) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:124) at java.net.URLClassLoader.defineClass(URLClassLoader.java:260) at java.net.URLClassLoader.access$100(URLClassLoader.java:56) at java.net.URLClassLoader$1.run(URLClassLoader.java:195) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:188) at java.lang.ClassLoader.loadClass(ClassLoader.java:316) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:280) at java.lang.ClassLoader.loadClass(ClassLoader.java:251) at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:374)
Any thoughts? I have been using Sphinx in the meantime (which uses about 90-95% less memory), but it does not provide a lot of the features that Lucene does; I would really like to get Lucene running. --Fungiblename 11:16, 26 March 2009 (UTC)
[edit] Solution
Change the Java preference using the Java Preferences app to make sure that Java SE 6 is the top preference, then it runs. Also, this appears to be hard-coded to look for mysql.sock in /var/mysql/mysql.sock (I grepped for it in the ls2.1 directory). I have no desire to recompile to attempt to tweak it for my system though. I run from a non-standard location, so I just made a symbolic link to that location from my actual install. YMMV. --Fungiblename 16:16, 31 March 2009 (UTC)
- For details see (meanwhile) http://www.mediawiki.org/wiki/Manual:Running_MediaWiki_on_Mac_OS_X --Achimbode 20:33, 9 August 2009 (UTC)
[edit] Hardcoded search port? 8123
Thank you for keeping this up to date.
I recently upgraded to the latest. This time the configuration was way better. I loved that configuration generator. There's but one thing though. I cannot use the search port 8123. So I went off and changed it on lsearch.conf and "LocalSettings.php". However, it didn't like it at all. It is still listening on 8123. Now the "noddy" question, Am I missing something? Thanks --Cartoro 16:00, 31 March 2009 (UTC)
I have same problem. In lsearch.conf, I have edited Search.port=8000. But when I start lsearchd Result 646 [Thread-2] INFO org.wikimedia.lsearch.frontend.SearchServer - Searcher started on port 8123 Sébas
- This has been fixed in latest binary (available for download from sourceforge) and svn version. --Rainman 13:52, 15 April 2009 (UTC)
I'm afraid the source is still showing the hardcoded "8123" (May 28, 2009).
- Which file, where? Does changing the default port to some other value not work for you? --Rainman 21:31, 27 May 2009 (UTC)
[edit] XML-RPC server incompete
I've installed latest Lucene-search and MWSearch on MW 1.13 and found that updatePage and deletePage actions doesn't pass through.
Looking at source code I've found that these handlers were removed in rev 32681 of lucene-search RPCIndexDeamon.java. As far as I understand now there is new HTTP daemon available but MWSearch isn't aware of it.
Am I missing something?
--Eugenem 07:39, 15 April 2009 (UTC)
- Using HTTP to post articles (either via xml-rpc or as raw http attachment) is an old and deprecated way of index update, and thus the methods have been removed. To keep the index up-to-date please use either complete rebuilds (via "./build") or Extension:OAIRepository (via ("./update") --Rainman 09:47, 15 April 2009 (UTC)
-
- I see. Actually I was interested in these featured to make custom updates such as output of special pages. On our site we use a lot of special pages to show profiles so we'd like to index special page output instead of template. Is there any way to do that? I mean some interface to add bunch of pages to index using PHP and now writing custom Java parser.
-
-
- You could include those pages into the xml dump of your database (produced by maintenance/dumpBackup.php) and then index everything. The other way would be to include it into the OAI table, although that could be tricky since you would need to have consistent page_ids for those special pages in order for incremental update to work properly. There might be other ways, but they are bound to break something, so my advice is to stick with these two standard ways. --Rainman 11:03, 15 April 2009 (UTC)
-
[edit] Finally it works (for me)
Nothing but the following worked for my install. Here's what I did:
svn co http://svn.wikimedia.org/svnroot/mediawiki/trunk/extensions/MWSearch mv MWSearch extensions svn co http://svn.wikimedia.org/svnroot/mediawiki/branches/lucene-search-2.1/ lucene-search-2 cd lucene-search-2 ant ./configure ./build
Now, add the following to LocalSettings.php:
# lsearch
require_once("extensions/MWSearch/MWSearch.php");
$wgSearchType = 'LuceneSearch';
$wgLuceneHost = 'YourHostName'; # <-- change this!
$wgLucenePort = 8123;
# uncomment this if you use lucene-search 2.1
# (MUST be AFTER the require_once!)
$wgLuceneSearchVersion = 2.1;
Where YourHostName is the results of 'hostname'. The search doesn't work on my machine if I use the default, "192.168.0.1".
# test lucene, now ./lsearchd
[edit] How to customize synonyms and stop words?
How can I edit the synonyms and stop words in order to bring the engine more in line with our needs?
- You need to checkout the source from svn. Then edit resources/dist/wordnet-en.txt (for synonyms) and stopwords-en.txt. If this does not work, then you could also try making your own Filter class and plugging it in into the FilterFactory class. --Rainman 18:31, 1 May 2009 (UTC)
Thanks. I have done as you suggest. However, I do not see any indication that the system is ignoring stop words (e.g. if I search with the word "me", I get results). I also do not know how to confirm that the synonyms are working. Are there some good tests I could run to verify? ----Marc 14:31, 6 May 2009 (MDT)
[edit] Searching Attachments
I am running
MediaWiki 1.13.1 PHP 5.2.4-2ubuntu5.6(apache2handler) MySQL 5.0.51a-3ubuntu5.4
I have the FileIndexer
http://www.mediawiki.org/wiki/Extension:FileIndexer
and
http://www.mediawiki.org/wiki/Extension:MWSearch
now installed and running.
The Lucene search capability seems to work far better than the default search capability except that it no longer generates search results from attachements that were turned into text and then inserted in the image field
Is this a limitation of the present software? I had hoped the Lucene Search would index the attachments, especially given the use of the FileIndexer.
Is is significant that the FQDN is http://wiki.tesla.local/ (on a local LAN) but that the hostname is wiki
Attached are the configuration files.
lsearch.conf
# By default, will check /etc/lsearch.conf ################################################ # Global configuration ################################################ # URL to global configuration, this is the shared main config file, it can # be on a NFS partition or available somewhere on the network MWConfig.global=file:///home/chris/lucene-search-2.1/lsearch-global.conf # Local path to root directory of indexes Indexes.path=/home/chris/lucene-search-2.1/indexes # Path to rsync Rsync.path=/usr/bin/rsync # Extra params for rsync # Rsync.params=--bwlimit=8192 ################################################ # Search node related configuration ################################################ # Port of http daemon, if different from default 8123 # Search.port=8000 # In minutes, how frequently will the index host be checked for updates Search.updateinterval=0.1 # In seconds, delay after which the update will be fetched # used to scatter the updates around the hour Search.updatedelay=0 # In seconds, how frequently the dead search nodes should be checked Search.checkinterval=10 # In milliseconds, for how long should the query be executed # Search.timelimit=1000 # if to wait for aggregates to warm up before deploying the searcher Search.warmupaggregate=true # cache *whole* index in RAM Search.ramdirectory=false # Disable wordnet aliases Search.disablewordnet=true # If this host runs on multiple CPUs maintain a pool of index searchers # It's good idea to make it number of CPUs+1, or some larger odd number SearcherPool.size=1 ################################################ # Indexer related configuration ################################################ # In minutes, how frequently is a clean snapshot of index created Index.snapshotinterval=2880 # Daemon type (http is started by default) #Index.daemon=xmlrpc # Port of daemon (default is 8321) #Index.port=8080 # Maximal queue size after which index is being updated Index.maxqueuecount=5000 # Maximal time an update can remain in queue before being processed (in seconds) Index.maxqueuetimeout=12 # If to delete all old snapshots always (default to false - leaves the last good snapshot) # Index.delsnapshots=true ################################################ # Log, ganglia, localization ################################################ # URL to MediaWiki message files Localization.url=file:///home/chris/public_html_3/wiki/languages/messages # Username/password for password authenticated OAI repo # OAI.username=user # OAI.password=pass # Max queue size on remote indexer after which we wait a bit OAI.maxqueue=5000 # Number of docs to buffer before sending to inc updater OAI.bufferdocs=500 # Log configuration Logging.logconfig=/home/chris/lucene-search-2.1/lsearch.log4j # Set debug to true to diagnose problems with log4j configuration Logging.debug=false # Turn this on to broadcast status to a Ganglia reporting system. # Requires that 'gmetric' be in the PATH and runnable. You can # override the default UDP broadcast port and interface if required. #Ganglia.report=true #Ganglia.port=8649 #Ganglia.interface=eth0
lsearch-global.conf
################################################ # Global search cluster layout configuration ################################################ [Database] MediaWiki : (single) (spell,4,2) (language,en) [Search-Group] wiki : * [Index] wiki : * [Index-Path] <default> : /search [OAI] <default> : http://localhost/index.php [Namespace-Boost] <default> : (0,2) (1,0.5) [Namespace-Prefix] all : <all> [0] : 0 [1] : 1 [2] : 2 [3] : 3 [4] : 4 [5] : 5 [6] : 6 [7] : 7 [8] : 8 [9] : 9 [10] : 10 [11] : 11 [12] : 12 [13] : 13 [14] : 14 [15] : 15
config.inc
dbname=MediaWiki wgScriptPath= hostname=wiki indexes=/home/chris/lucene-search-2.1/indexes mediawiki=/home/chris/public_html_3/wiki base=/home/chris/lucene-search-2.1 wgServer=http://localhost
- Unfortunately lucene-search won't search attachments no matter what kind of extra extension you use. You could however try Extension:EzMwLucene which is also lucene-based but has a different set of features, doesn't have some lucene-search stuff, but has attachment search. --Rainman 09:31, 26 May 2009 (UTC)
-
- Thank you so much for the prompt response. I will try the Extension:EzMwLucene search as attachment searching is key feature I would like in our company wiki.
-
-
- Thanks a bunch Rainman. Do you know offhand what the major differences are between both Lucene extensions? We have Lucene-search installed but would like to enable EzMwLucene but it would be good to know what the feature differences are. --Gkullberg 13:59, 3 July 2009 (UTC)
-
[edit] Search within files?
Is it possible to use Lucene to search within files uploaded to MediaWiki?
On the Lucene page on Wikipedia it says:
"At the core of Lucene's logical architecture is the idea of a document containing fields of text. This flexibility allows Lucene's API to be independent of the file format. Text from PDFs, HTML, Microsoft Word, and OpenDocument documents, as well as many others can all be indexed so long as their textual information can be extracted."
It would be great if I could search within PDFs and Docs and whatever else I upload to my MediaWiki instance. --Gkullberg 19:55, 2 July 2009 (UTC)
- See answer to previous question.... --Rainman 10:08, 3 July 2009 (UTC)
[edit] How to use CJKAnalyzer
Is it possible to use CJKAnalyzer for indexing pages written in Japanese?
- Yes, just change (language,en) to (language,ja) in your config file (and re-run the build process). --Rainman 08:27, 10 July 2009 (UTC)
[edit] Periodic fatal errors while rebuilding index - "no segments* file"
I'm running Lucene-search on our local wiki. The build script runs correctly and produces a valid index, which is picked up by the daemon, and everything works fine...for a bit. I've created a cron job that runs the build script hourly, with the output of the script being emailed to me. The cron job runs happily for a spell and then I receive this in the output:
MediaWiki lucene-search indexer - rebuild all indexes associated with a database. Trying config file at path /home/system/mymintel-svc/.lsearch.conf Trying config file at path /data/mymintel/mediawiki/lucene_search/lsearch.conf MediaWiki lucene-search indexer - index builder from xml database dumps.
0 [main] INFO org.wikimedia.lsearch.util.Localization - Reading localization for En 582 [main] INFO org.wikimedia.lsearch.ranks.Links - Making index at /data/mymintel/mediawiki/lucene_search/indexes/import/it_wiki.links 924 [main] INFO org.wikimedia.lsearch.ranks.LinksBuilder - Calculating article links... 3,759 pages (338.679/sec), 3,759 revs (338.679/sec) 14271 [main] INFO org.wikimedia.lsearch.index.IndexThread - Making snapshot for it_wiki.links 14645 [main] INFO org.wikimedia.lsearch.index.IndexThread - Made snapshot /data/mymintel/mediawiki/lucene_search/indexes/snapshot/it_wiki.links/20090731050111 14696 [main] INFO org.wikimedia.lsearch.search.UpdateThread - Syncing it_wiki.links 15632 [main] INFO org.wikimedia.lsearch.ranks.Links - Opening for read /data/mymintel/mediawiki/lucene_search/indexes/search/it_wiki.links 15637 [main] INFO org.wikimedia.lsearch.related.RelatedBuilder - Rebuilding related mapping from links 15640 [main] FATAL org.wikimedia.lsearch.importer.Importer - Cannot make related mapping: no segments* file found in org.apache.lucene.store.FSDirectory@/data/mymintel/mediawiki/lucene_search/indexes/search/it_wiki.links: files: MediaWiki lucene-search indexer - build spelling suggestion index.
16802 [main] INFO org.wikimedia.lsearch.spell.SuggestBuilder - Building spell-check for it_wiki 16802 [main] INFO org.wikimedia.lsearch.util.Localization - Reading localization for En 16931 [main] INFO org.wikimedia.lsearch.spell.SuggestBuilder - Rebuilding precursor index... 17037 [main] INFO org.wikimedia.lsearch.analyzers.StopWords - Successfully loaded stop words for: [nl, en, it, fr, de, sv, es, no, pt, da] in 68 ms 17039 [main] INFO org.wikimedia.lsearch.spell.CleanIndexWriter - Using phrase stopwords: [only, theirs, some, where, being, after, doing, did, they, herself, as, so, our, than, your, for, down, the, other, of, does, no, ours, with, from, them, by, also, you, hers, until, yourself, has, she, it, up, why, have, this, those, about, between, which, under, these, i, yours, but, his, myself, yourselves, having, more, be, her, into, its, an, he, on, over, was, here, to, such, above, because, nor, had, him, below, and, whoever, during, their, itself, been, most, that, out, each, or, a, own, all, what, in, ourselves, were, themselves, both, not, same, do, am, too, once, any, when, then, who, how, whom, my, through, there, before, very, we, against, few, while, again, me, at, if, himself, are, is, off, further] 17129 [main] INFO org.wikimedia.lsearch.ranks.Links - Opening for read /data/mymintel/mediawiki/lucene_search/indexes/search/it_wiki.links java.io.IOException: no segments* file found in org.apache.lucene.store.FSDirectory@/data/mymintel/mediawiki/lucene_search/indexes/search/it_wiki.links: files: at org.mediawiki.importer.XmlDumpReader.readDump(Unknown Source)
From this point onwards, the job will not run correctly until I have deleted the indexes directory and started from scratch.
I've dumped the directory structure of the filesystem when the index is working correctly, and when it's broken; the output is below.
[edit] Working config
indexes/
|-- import
| |-- it_wiki
| | |-- _7.cfs
| | |-- segments.gen
| | `-- segments_h
| |-- it_wiki.hl
| | |-- _7.cfs
| | |-- segments.gen
| | `-- segments_h
| |-- it_wiki.links
| | |-- _8.cfs
| | |-- segments.gen
| | `-- segments_j
| |-- it_wiki.related
| | |-- _d.cfs
| | |-- segments.gen
| | `-- segments_t
| |-- it_wiki.spell
| | |-- _1v.cfs
| | |-- segments.gen
| | `-- segments_3t
| `-- it_wiki.spell.pre
| |-- _8.cfs
| |-- segments.gen
| `-- segments_j
|-- index
| |-- it_wiki
| | |-- _7.cfs
| | |-- segments.gen
| | `-- segments_h
| |-- it_wiki.hl
| | |-- _7.cfs
| | |-- segments.gen
| | `-- segments_h
| |-- it_wiki.links
| | |-- _8.cfs
| | |-- segments.gen
| | `-- segments_j
| `-- it_wiki.spell.pre
| |-- _8.cfs
| |-- segments.gen
| `-- segments_j
|-- search
| |-- it_wiki -> /data/mymintel/mediawiki/lucene_search/indexes/update/it_wiki/20090730163156
| |-- it_wiki.hl -> /data/mymintel/mediawiki/lucene_search/indexes/update/it_wiki.hl/20090730163156
| |-- it_wiki.links -> /data/mymintel/mediawiki/lucene_search/indexes/update/it_wiki.links/20090730163123
| |-- it_wiki.related -> /data/mymintel/mediawiki/lucene_search/indexes/update/it_wiki.related/20090730163127
| `-- it_wiki.spell -> /data/mymintel/mediawiki/lucene_search/indexes/update/it_wiki.spell/20090730163230
|-- snapshot
| |-- it_wiki
| | `-- 20090730163156
| | |-- _7.cfs
| | |-- segments.gen
| | `-- segments_h
| |-- it_wiki.hl
| | `-- 20090730163156
| | |-- _7.cfs
| | |-- segments.gen
| | `-- segments_h
| |-- it_wiki.links
| | `-- 20090730163123
| | |-- _8.cfs
| | |-- segments.gen
| | `-- segments_j
| |-- it_wiki.related
| | `-- 20090730163127
| | |-- _d.cfs
| | |-- segments.gen
| | `-- segments_t
| |-- it_wiki.spell
| | `-- 20090730163230
| | |-- _1v.cfs
| | |-- segments.gen
| | `-- segments_3t
| `-- it_wiki.spell.pre
| `-- 20090730163210
| |-- _8.cfs
| |-- segments.gen
| `-- segments_j
|-- status
| `-- it_wiki
`-- update
|-- it_wiki
| `-- 20090730163156
| |-- _7.cfs
| |-- segments.gen
| `-- segments_h
|-- it_wiki.hl
| `-- 20090730163156
| |-- _7.cfs
| |-- segments.gen
| `-- segments_h
|-- it_wiki.links
| `-- 20090730163123
| |-- _8.cfs
| |-- segments.gen
| `-- segments_j
|-- it_wiki.related
| `-- 20090730163127
| |-- _d.cfs
| |-- segments.gen
| `-- segments_t
`-- it_wiki.spell
`-- 20090730163230
|-- _1v.cfs
|-- segments.gen
`-- segments_3t
[edit] Broken Config
indexes/
|-- import
| |-- it_wiki
| | |-- _2f.cfs
| | |-- segments.gen
| | `-- segments_58
| |-- it_wiki.hl
| | |-- _2f.cfs
| | |-- segments.gen
| | `-- segments_58
| |-- it_wiki.links
| | |-- _5h.cfs
| | |-- segments.gen
| | `-- segments_bm
| |-- it_wiki.related
| | |-- _4n.cfs
| | |-- segments.gen
| | `-- segments_9o
| |-- it_wiki.spell
| | |-- _oj.cfs
| | |-- segments.gen
| | `-- segments_1dh
| `-- it_wiki.spell.pre
| |-- _39.fdt
| |-- _39.fdx
| |-- segments.gen
| |-- segments_74
| `-- write.lock
|-- index
| |-- it_wiki
| | |-- _2f.cfs
| | |-- segments.gen
| | `-- segments_58
| |-- it_wiki.hl
| | |-- _2f.cfs
| | |-- segments.gen
| | `-- segments_58
| |-- it_wiki.links
| | |-- _5h.cfs
| | |-- segments.gen
| | `-- segments_bm
| `-- it_wiki.spell.pre
| |-- _39.fdt
| |-- _39.fdx
| |-- segments.gen
| |-- segments_74
| `-- write.lock
|-- search
| |-- it_wiki -> /data/mymintel/mediawiki/lucene_search/indexes/update/it_wiki/20090731040228
| |-- it_wiki.hl -> /data/mymintel/mediawiki/lucene_search/indexes/update/it_wiki.hl/20090731040228
| |-- it_wiki.links
| | |-- 20090731050111 -> /data/mymintel/mediawiki/lucene_search/indexes/update/it_wiki.links/20090731050111
| | |-- 20090731060116 -> /data/mymintel/mediawiki/lucene_search/indexes/update/it_wiki.links/20090731060116
| | |-- 20090731070104 -> /data/mymintel/mediawiki/lucene_search/indexes/update/it_wiki.links/20090731070104
| | |-- 20090731080121 -> /data/mymintel/mediawiki/lucene_search/indexes/update/it_wiki.links/20090731080121
| | |-- 20090731090112 -> /data/mymintel/mediawiki/lucene_search/indexes/update/it_wiki.links/20090731090112
| | |-- 20090731100113 -> /data/mymintel/mediawiki/lucene_search/indexes/update/it_wiki.links/20090731100113
| | |-- 20090731110108 -> /data/mymintel/mediawiki/lucene_search/indexes/update/it_wiki.links/20090731110108
| | |-- 20090731120051 -> /data/mymintel/mediawiki/lucene_search/indexes/update/it_wiki.links/20090731120051
| | `-- 20090731130055 -> /data/mymintel/mediawiki/lucene_search/indexes/update/it_wiki.links/20090731130055
| |-- it_wiki.related -> /data/mymintel/mediawiki/lucene_search/indexes/update/it_wiki.related/20090731040125
| `-- it_wiki.spell -> /data/mymintel/mediawiki/lucene_search/indexes/update/it_wiki.spell/20090731040320
|-- snapshot
| |-- it_wiki
| | |-- 20090731030246
| | | |-- _27.cfs
| | | |-- segments.gen
| | | `-- segments_4r
| | `-- 20090731040228
| | |-- _2f.cfs
| | |-- segments.gen
| | `-- segments_58
| |-- it_wiki.hl
| | |-- 20090731030247
| | | |-- _27.cfs
| | | |-- segments.gen
| | | `-- segments_4r
| | `-- 20090731040228
| | |-- _2f.cfs
| | |-- segments.gen
| | `-- segments_58
| |-- it_wiki.links
| | |-- 20090731120051
| | | |-- _58.cfs
| | | |-- segments.gen
| | | `-- segments_b3
| | `-- 20090731130055
| | |-- _5h.cfs
| | |-- segments.gen
| | `-- segments_bm
| |-- it_wiki.related
| | |-- 20090731030132
| | | |-- _49.cfs
| | | |-- segments.gen
| | | `-- segments_8v
| | `-- 20090731040125
| | |-- _4n.cfs
| | |-- segments.gen
| | `-- segments_9o
| |-- it_wiki.spell
| | |-- 20090731030355
| | | |-- _mn.cfs
| | | |-- segments.gen
| | | `-- segments_19o
| | `-- 20090731040320
| | |-- _oj.cfs
| | |-- segments.gen
| | `-- segments_1dh
| `-- it_wiki.spell.pre
| |-- 20090731030320
| | |-- _2z.cfs
| | |-- segments.gen
| | `-- segments_6c
| `-- 20090731040253
| |-- _38.cfs
| |-- segments.gen
| `-- segments_6v
|-- status
| `-- it_wiki
`-- update
|-- it_wiki
| |-- 20090731030246
| | |-- _27.cfs
| | |-- segments.gen
| | `-- segments_4r
| `-- 20090731040228
| |-- _2f.cfs
| |-- segments.gen
| `-- segments_58
|-- it_wiki.hl
| |-- 20090731030247
| | |-- _27.cfs
| | |-- segments.gen
| | `-- segments_4r
| `-- 20090731040228
| |-- _2f.cfs
| |-- segments.gen
| `-- segments_58
|-- it_wiki.links
| |-- 20090731120051
| | |-- _58.cfs
| | |-- segments.gen
| | `-- segments_b3
| `-- 20090731130055
| |-- _5h.cfs
| |-- segments.gen
| `-- segments_bm
|-- it_wiki.related
| |-- 20090731030132
| | |-- _49.cfs
| | |-- segments.gen
| | `-- segments_8v
| `-- 20090731040125
| |-- _4n.cfs
| |-- segments.gen
| `-- segments_9o
`-- it_wiki.spell
|-- 20090731030355
| |-- _mn.cfs
| |-- segments.gen
| `-- segments_19o
`-- 20090731040320
|-- _oj.cfs
|-- segments.gen
`-- segments_1dh
As you can see, the contents of index/search/it_wiki.links is completely different. I suspect that it's this that's causing the problem, but I don't know enough about what's going on to diagnose. Java version is:
java version "1.5.0_14-p8" Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_14-p8-root_04_sep_2008_18_49) Java HotSpot(TM) 64-Bit Server VM (build 1.5.0_14-p8-root_04_sep_2008_18_49, mixed mode)
...and i'm running on FreeBSD 7.0, if that makes a difference. Any ideas what's going on? It'd be nice not to have to delete and rebuild the indexes by hand every day!
So this part
|-- search | |-- it_wiki.links | | |-- 20090731050111 -> /data/mymintel/mediawiki/lucene_search/indexes/update/it_wiki.links/20090731050111 | | |-- 20090731060116 -> /data/mymintel/mediawiki/lucene_search/indexes/update/it_wiki.links/20090731060116 | | |-- 20090731070104 -> /data/mymintel/mediawiki/lucene_search/indexes/update/it_wiki.links/20090731070104 | | |-- 20090731080121 -> /data/mymintel/mediawiki/lucene_search/indexes/update/it_wiki.links/20090731080121 | | |-- 20090731090112 -> /data/mymintel/mediawiki/lucene_search/indexes/update/it_wiki.links/20090731090112 | | |-- 20090731100113 -> /data/mymintel/mediawiki/lucene_search/indexes/update/it_wiki.links/20090731100113 | | |-- 20090731110108 -> /data/mymintel/mediawiki/lucene_search/indexes/update/it_wiki.links/20090731110108 | | |-- 20090731120051 -> /data/mymintel/mediawiki/lucene_search/indexes/update/it_wiki.links/20090731120051 | | `-- 20090731130055 -> /data/mymintel/mediawiki/lucene_search/indexes/update/it_wiki.links/20090731130055
Looks quite wrong.. all the files in search/ should be symlinks and should not have any subdirectories.. I'm not sure how these are created. You are sure that the whole build process takes less than an hour? If you get overlapping jobs trying to do the same thing they might lock eachother indexes. --Rainman 16:43, 3 August 2009 (UTC)
When I run the process by hand, it never takes more than 5 mins to complete, so I'd be very surprised if jobs are overlapping. Would probably be a good idea to make certain though, so I'll change the cronjob to time the process and send an update next time it fails. -- Mrgroucho 16:48, 3 August 2009 (UTC)
OK, I think we can rule out overlapping jobs. The indexer ran successfully as scheduled for over 12 hours yesterday, and then failed early this morning. The time stats for the job, and the one preceding it, are as follows:
[edit] Preceding Job
real 3m24.606s user 2m15.323s sys 0m16.361s
[edit] Failed Job
real 0m59.046s user 0m20.773s sys 0m2.410s
The failed job takes less time, but you'd expect that: it failed. I've noted that the output mentions various Threads - is there any way that this could be some sort of race condition/locking problem between those threads? -- Mrgroucho 13:44, 4 August 2009 (UTC)
Any idea at all on how to fix this? I've put a workaround in place that deletes all of the indexes every midnight and then runs ./build, which means that if it breaks during the day the indexes will never be massively out of date, but it's hardly a pretty fix. -- Mrgroucho 13:33, 7 August 2009 (UTC)
I've had the same problem. The index building job (cronned for every hour) started to fail every couple of weeks, and then every week, and then every few days etc., until I was manually clearing out and rebuild a couple of times a day! I'd already written a wrapper script around the 'build' script, which was just for making the process a bit more cron-friendly, checking the search daemon was running, and for aborting it altogether when my server-backup routines are in operation etc. I've now decided to have it build the indexes in a new location every hour, and then 'cut-over' if no error is encountered. So far, this seems to be effective.
The indexes folder I was using seemed to be growing exponentially, and I think this may have been related to the problem. However, not being a Java or Lucene expert, I think this is a problem I'm gonna have to continue to work-around instead of solving. --140.131.255.2 05:38, 7 September 2009 (UTC)
[edit] how to fix this when I run ./lsearchd
0rz </usr/local/search/ls2-bin> # ./lsearchd
RMI registry started.
Trying config file at path /root/.lsearch.conf
Trying config file at path /usr/local/search/ls2-bin/lsearch.conf
Exception in thread "main" java.lang.NullPointerException
at org.wikimedia.lsearch.config.GlobalConfiguration.makeIndexIdPool(GlobalConfiguration.java:531)
at org.wikimedia.lsearch.config.GlobalConfiguration.read(GlobalConfiguration.java:413)
at org.wikimedia.lsearch.config.GlobalConfiguration.readFromURL(GlobalConfiguration.java:247)
at org.wikimedia.lsearch.config.Configuration.<init>(Configuration.java:116)
at org.wikimedia.lsearch.config.Configuration.open(Configuration.java:68)
at org.wikimedia.lsearch.config.StartupManager.main(StartupManager.java:39)
0rz </usr/local/search/ls2-bin> #
my environment is:
0rz </usr/local/search/ls2-bin> # java -version java version "1.6.0_13" Java(TM) SE Runtime Environment (build 1.6.0_13-b03) Java HotSpot(TM) Client VM (build 11.3-b02, mixed mode, sharing) 0rz </usr/local/search/ls2-bin> # ant -version Apache Ant version 1.7.1 compiled on June 27 2008 0rz </usr/local/search/ls2-bin> #
[edit] LSearch Daemon Init Script for Ubuntu
- This is just a sample. You will need to adjust this based on where you put the lucene-search directory.
#!/bin/sh -e
### BEGIN INIT INFO
# Provides: lsearchd
# Required-Start: $syslog
# Required-Stop: $syslog
# Default-Start: 2 3 4 5
# Default-Stop: 1
# Short-Description: Start the Lucene Search daemon
# Description: Provide a Lucene Search backend for MediaWiki
### END INIT INFO
test -x /usr/local/lucene-search-2.1/lsearchd || exit 0
OPTIONS=""
if [ -f "/etc/default/lsearchd" ] ; then
. /etc/default/lsearchd
fi
. /lib/lsb/init-functions
case "$1" in
start)
cd /usr/local/lucene-search-2.1
log_begin_msg "Starting Lucene Search Daemon..."
start-stop-daemon --start --quiet --oknodo --chdir /usr/local/lucene-search-2.1 --background --exec /usr/local/lucene-search-2.1/lsearchd -- $OPTIONS
log_end_msg $?
;;
stop)
log_begin_msg "Stopping Lucene Search Daemon..."
start-stop-daemon --stop --quiet --oknodo --retry 2 --chdir /usr/local/lucene-search-2.1 --exec /usr/local/lucene-search-2.1/lsearchd
log_end_msg $?
;;
restart)
$0 stop
sleep 1
$0 start
;;
reload|force-reload)
log_begin_msg "Reloading Lucene Search Daemon..."
stat-stop-daemon --stop -signal 1 --chdir /usr/local/lucene-search-2.1 --exec /usr/local/lucene-search-2.1/lsearchd
log_end_msg $?
;;
status)
status_of_proc /usr/local/lucene-search-2.1/lsearchd lsearchd && exit 0 || exit $?
;;
*)
log_success_msg "Usage: /etc/init.d/lsearchd {start|stop|restart|reload|force-reload|status}"
exit 1
esac
exit 0
55,6
[edit] Error here when use configure
Hi,
After I run "ant" to build the jar and generate configuration files, here comes the error. What's wrong?
Mediawiki: 1.15.1; Lucence: 2.1; OS: centOS
[root@xxx lucene-search-2.1]# ./configure /var/wk/
Exception in thread "main" java.net.UnknownHostException: 00:16:3h:2d:6c:b0-hk0.localdomain: 00:16:3h:2d:6c:b0-hk0.localdomain
at java.net.InetAddress.getLocalHost(InetAddress.java:1425)
at org.wikimedia.lsearch.util.Configure.main(Configure.java:52)
--Alpha3 11:26, 27 August 2009 (UTC)
[edit] In which context config.inc will be used?
--Ans 08:22, 4 September 2009 (UTC)
- It is used in build process "./build" --Ans 09:40, 4 September 2009 (UTC)
[edit] Install from SVN or Binary.
I would recommend SVN any day. I've been through several installations of Lucene Search this morning, and the most rapid and problem-free methods was to use the SVN approach. It was also the *easiest* method to get Lucene to work - it just works. Also reads your MW config and produces its own *correct* configuration files.
MWSearch works just fine and dandy on top of this Lucene instance.
The 2.02 installation went badly, several times for me - and it chews a LOT more resources. I did get it working, but when it came to rebuilding indexes, it spewed up on the whole Computer - chewed 100% Chip, chewed more than 100% RAM, which caused Kernel Panic and failures. Had to reboot forcibly with hardware. Re-attempted several times and re-configured settings to test against "should be working" configuration - got the same results with the machine crawling. So I gave up, went to SVN, and instead of choking on the indexes, it rebuilt them in under 20 seconds. I understand there are Java issues around this. Forget them, it's not worth breaking Java on the System, or putting up with some strange configuration, just to get Lucene working. Gooooo SVN!! :-)
BTW: It should be made apparent on the Lucene-search Extension page that the SVN installation DOES work, and works VERY well. I had previously avoided this method as I am SVN-wary - where with a bit of prompting, that would have been my first choice.
Cheers, Mike
[edit] Brief period with zero search results (using update script)
Our wiki runs the "update" script every 15 minutes, and the update takes about 2 minutes. Updates are done locally on the single wiki server.
Unfortunately, for a brief time while this script runs, searches return zero results. This problem lasts just a few seconds, but our users do encounter it and become confused.
Any advice on eliminating this "zero results" period? We thought about running two different reindexing processes, each writing to a different directory, and switching between them with a symbolic link. Something like:
- At 6:00, update index #1 in /usr/local/lucene1.
- Point symbolic link /usr/local/lucene at /usr/local/lucene1.
- At 6:15, update index #2 in /usr/local/lucene2.
- Point symbolic link /usr/local/lucene at /usr/local/lucene2.
- At 6:30, update index #1 in /usr/local/lucene1.
- Point symbolic link /usr/local/lucene at /usr/local/lucene1.
- ...
But I don't see a way to make Lucene reindex one directory while serving out of another. Any better suggestions? Maiden taiwan 15:03, 23 September 2009 (UTC)
- This should not happen. Are there any errors in logs during this period? --Rainman 17:20, 23 September 2009 (UTC)
-
- Yes: the update script outputs:
MediaWiki lucene-search indexer - build a map of related articles.
...
413 [main] INFO org.wikimedia.lsearch.related.RelatedBuilder - Rebuilding related mapping from links
416 [main] FATAL org.wikimedia.lsearch.related.RelatedBuilder - Rebuild I/O error:
no segments* file found in org.apache.lucene.store.FSDirectory@/usr/local/lucene-search-2.1/indexes/search/wikidb.links: files:
java.io.FileNotFoundException: no segments* file found in org.apache.lucene.store.FSDirectory@/usr/local/lucene-search-2.1/indexes/search/wikidb.links: files:
at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:587)
at org.apache.lucene.index.DirectoryIndexReader.open(DirectoryIndexReader.java:63)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:209)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:173)
at org.wikimedia.lsearch.ranks.Links.flushForRead(Links.java:213)
at org.wikimedia.lsearch.ranks.Links.ensureRead(Links.java:239)
at org.wikimedia.lsearch.ranks.Links.getKeys(Links.java:773)
at org.wikimedia.lsearch.related.RelatedBuilder.rebuildFromLinks(RelatedBuilder.java:91)
at org.wikimedia.lsearch.related.RelatedBuilder.main(RelatedBuilder.java:72)
-
- The named folder (wikidb.links) contains only symbolic links named after timestamps: 20090924121512, etc., linking to folders. Inside the folders (that exist) are files:
-rw-r--r-- 2 root root 4952161 Sep 24 12:00 _hq.cfs -rw-r--r-- 2 root root 46 Sep 24 12:00 segments_172 -rw-r--r-- 2 root root 20 Sep 24 12:00 segments.gen
-
- --Maiden taiwan 16:29, 24 September 2009 (UTC)
No, this appears to be a separate issues. In any case, the extension does use symbolic links to quickly switch between the new and the old index, and it also allows for the new and the old index to co-exist for a while until all the old searches finish or timeout. So, that shouldn't be a problem. What could be a problem is that if you have the indexer and searcher on the same machine with insufficient RAM, then the indexer bogs down the machine causing high I/O which then slows down the searchers to the point of searches timing out. --Rainman 08:56, 25 September 2009 (UTC)
- Thanks. We have plenty of RAM (4 GB I believe) on a virtual machine, and while the load average does go up to about 3.0 - 4.0 during indexing, users don't perceive any slowness. That is, the search query returns quickly with zero results. How long is the timeout? Maiden taiwan 11:49, 25 September 2009 (UTC)
[edit] svn revision # for 2.1.2
What is the revision number for the 2.1.2 binary hosted at SourceForge?
I am having trouble getting any search results when building from the HEAD of http://svn.wikimedia.org/svnroot/mediawiki/branches/lucene-search-2.1/ The index builds fine, but when I query I get no results. Returns exception:
java.lang.IllegalArgumentException: nDocs must be > 0
Querying directly to http://localhost:8123/search/wikidb/help gives:
267 #info search=[gziebold-15624s.local], highlight=[], suggest=[gziebold-15624s.local] in 46 ms #no suggestion #interwiki 0 0 #results 0
However, indexing and running based on 2.1.2 binary works fine.
- It's the svn revision of the date of release, don't know offhand, you'll have to look it up. I've built some indexes but didn't have problems with latest svn, can you provide a full stack trace? --Rainman 11:09, 10 November 2009 (UTC)
RMI registry started.
Trying config file at path /Users/gziebold/.lsearch.conf
Trying config file at path /Users/gziebold/Projects/mediawiki/lucene-search-2.1.built/lsearch.conf
0 [main] INFO org.wikimedia.lsearch.util.Localization - Reading localization for En
733 [main] INFO org.wikimedia.lsearch.interoperability.RMIServer - RMIMessenger bound
737 [Thread-1] INFO org.wikimedia.lsearch.frontend.HTTPIndexServer - Indexer started on port 8321
739 [Thread-2] INFO org.wikimedia.lsearch.frontend.SearchServer - Searcher started on port 8123
746 [Thread-5] INFO org.wikimedia.lsearch.search.SearcherCache - Starting initial deployer for [wikidb, wikidb.hl, wikidb.links, wikidb.related, wikidb.spell]
818 [Thread-5] INFO org.wikimedia.lsearch.search.SearcherCache - Caching meta fields for wikidb ...
2522 [Thread-5] INFO org.wikimedia.lsearch.search.SearcherCache - Finished caching wikidb in 1705 ms
2554 [Thread-5] INFO org.wikimedia.lsearch.interoperability.RMIServer - RemoteSearchable<wikidb>$0 bound
2562 [Thread-5] INFO org.wikimedia.lsearch.interoperability.RMIServer - RemoteSearchable<wikidb.hl>$0 bound
2567 [Thread-5] INFO org.wikimedia.lsearch.interoperability.RMIServer - RemoteSearchable<wikidb.links>$0 bound
2575 [Thread-5] INFO org.wikimedia.lsearch.interoperability.RMIServer - RemoteSearchable<wikidb.related>$0 bound
2582 [Thread-5] INFO org.wikimedia.lsearch.interoperability.RMIServer - RemoteSearchable<wikidb.spell>$0 bound
6879 [Thread-8] INFO org.wikimedia.lsearch.frontend.HttpMonitor - HttpMonitor thread started
6881 [pool-2-thread-1] INFO org.wikimedia.lsearch.frontend.HttpHandler - query:/search/wikidb/wind?namespaces=0%2C500&offset=0&limit=20&version=2.1&iwlimit=10 what:search dbname:wikidb term:wind
6919 [pool-2-thread-1] INFO org.wikimedia.lsearch.analyzers.StopWords - Successfully loaded stop words for: [nl, en, it, fr, de, sv, es, no, pt, da] in 21 ms
7052 [pool-2-thread-1] INFO org.wikimedia.lsearch.search.SearchEngine - Using FilterWrapper wrap: {0, 500} []
java.lang.IllegalArgumentException: nDocs must be > 0
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:110)
at org.wikimedia.lsearch.search.WikiSearcher.search(WikiSearcher.java:184)
at org.apache.lucene.search.Searcher.search(Searcher.java:132)
at org.wikimedia.lsearch.search.SearchEngine.search(SearchEngine.java:722)
at org.wikimedia.lsearch.search.SearchEngine.search(SearchEngine.java:129)
at org.wikimedia.lsearch.frontend.SearchDaemon.processRequest(SearchDaemon.java:101)
at org.wikimedia.lsearch.frontend.HttpHandler.handle(HttpHandler.java:193)
at org.wikimedia.lsearch.frontend.HttpHandler.run(HttpHandler.java:114)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:637)
7076 [pool-2-thread-1] WARN org.wikimedia.lsearch.search.SearchEngine - Retry, temporal error for query: [wind] on wikidb : nDocs must be > 0
java.lang.IllegalArgumentException: nDocs must be > 0
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:110)
at org.wikimedia.lsearch.search.WikiSearcher.search(WikiSearcher.java:184)
at org.apache.lucene.search.Searcher.search(Searcher.java:132)
at org.wikimedia.lsearch.search.SearchEngine.search(SearchEngine.java:722)
at org.wikimedia.lsearch.search.SearchEngine.search(SearchEngine.java:129)
at org.wikimedia.lsearch.frontend.SearchDaemon.processRequest(SearchDaemon.java:101)
at org.wikimedia.lsearch.frontend.HttpHandler.handle(HttpHandler.java:193)
at org.wikimedia.lsearch.frontend.HttpHandler.run(HttpHandler.java:114)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:637)
I also confirmed that the index built from the HEAD is fine. If I swap the LuceneSearch.jar from 2.1.2 binary and run it against indexes built from HEAD, it works. Processing query with LuceneSearch.jar from HEAD fails.
Other details: MediaWiki 1.15.1 (r50) MWSearch (Version r45173)
-- Looks like r48153 == 2.1.2 Or at least I was able to get that to successfully work with MediaWiki 1.15.1 --GregZ 04:06, 11 November 2009 (UTC)
- Ah I see.. fixed in svn. Thanks for the report. --Rainman 11:23, 11 November 2009 (UTC)
[edit] Category search
Can anyone elaborate on the following from OVERVIEW.txt?
searching categories. Syntax is: query incategory:"exact category name". It is important to note that category names are themselves not tokenized. Using logical operators, intersection, union and difference of categories can be searched. Since exact category is needed (only case is not important), it is maybe best to incorporate this somewhere on category page, and have category name put into query by MediaWiki instead manually by user.
The incategory: syntax does not appear to work as described (in 2.1.2)
Also, what is meant by ...it is maybe best to incorporate this somewhere on category page, and have category name put into query by MediaWiki instead manually by user. ? Is the suggestion to put a search form on the Category page and insert the incategory: syntax there?
- It does work, but only for categories that are not added via templates, but in main article text. E.g. [1]. --Rainman 11:02, 10 November 2009 (UTC)
Ah ha. That explains why my incategory: query was not working. The categories were added via templates. This is caused because lucene-search indexes the wikitext for articles and does not resolve templates? Has there been discussion of a different index of the rendered html content instead of only indexing wikitext?
- Yes, however, the current mediawiki architecture makes it difficult to do... in fact, what we would want is article not in html, but in wikitext with expanded templates... In any case, you won't find a search extension that does it. --Rainman 15:51, 10 November 2009 (UTC)
Also, can you comment on this ...it is maybe best to incorporate this somewhere on category page, and have category name put into query by MediaWiki instead manually by user. ? Is the suggestion to put a search form on the Category page and insert the incategory: syntax there?
- Well yes... if someone would have done it that would be nice.. You need to take into consideration that the file has probably been written at 2am on some sunday, and not take everything in it very seriously ;) --Rainman 15:51, 10 November 2009 (UTC)
I simply needed the "late-night fog" translation. :) --GregZ 04:03, 11 November 2009 (UTC)