Extension talk:MWSearch

What are the advantages/disadvantages of MWSearch extension over Lucene extension ? Thank you.

MWSearch is being maintained and developed, while LuceneSearch is not. The difference are minor. MWSearch has a somewhat better and more consistent interface, e.g. it has galleries for Image namespace hits. --Rainman 15:01, 24 April 2008 (UTC)

Does MWSearch support Windows + Apache / Windows + IIS ?
 * No, the search backend does not support windows. --Rainman 13:52, 12 May 2008 (UTC)
 * thats not true; i just managed to compile it under ubuntu and now running it on a windows 2003 server --193.27.220.82 13:55, 28 May 2008 (UTC)

MWSearch over SSL?
Is it possible to use MWSearch over SSL? If so, is there a special configuration needed? (I ask b/c I've been unable to get it working and I'm not permitted to open additional ports)
 * Sorry to answer my own question, but I was able to make this work by editing my host file to point the domain to 127.0.0.1 instead of the external IP. I don't really like this solution so I'm hoping this isn't the preferred method. Does anyone have any suggestions?

MediaWiki_SVN+Lucene-Search2_SVN+MWSearch_SVN = ZERO search results
I've been trying for a very long time now to implement Lucene-search functionality on my MediaWiki site -- I've spent Hours+Days+Weeks+Months troubleshooting this very issue - I'm giving this one last effort, I am hoping SOMEONE will be able to help me, otherwise I am thinking of moving away from MediaWiki entirely (which would make me a very sad panda), and will painstakingly import my MediaWiki into Drupal - then try out their neat sounding-module called Drupal::Search_Attachments module. I have put in over 1 year's worth of work into my MediaWiki-based wiki. I truly need the solution that Extension:Lucene-search+Extension:MWSearch offers, but it has been next to impossible for me to implement on my Slackware server. Who knows, my trouble could very well be something I am doing wrong! I'm human enough to admit that if it turns out to be the case... This is why I come for assistance, I am thinking of posting to the MediaWiki forums too, and linking here (hope that's OK), as I really, really want to get Extension:Lucene-search+Extension:MWSearch working for me! I have been trying to get Extension:Lucene-search+Extension:MWSearch working for over 6months now, and posted alot of my previous issues HERE at the MediaWiki Lucene-Search Talk page, but was unable to find any solutions to my problem. I even tried the newer+more-up-to-date Extension:Lucene-search+Extension:MWSearch setup to no avail. I am BACK, now with a brand new computer (well, it's actually older-hardware, but newly formatted hard-drive), a freshly installed OS, a very basic website with some basic data input to test search functionality This is my current overall system setup; I've gone over and over and OVER the directions per Extension:Lucene-search and Extension:MWSearch pages, I just cannot get this working properly on my box, now that I tried on a new install with new everything, I am convinced this is not a problem on my end, but I could be wrong --- I have documented EVERYTHING I did from install, to now since I've been over this so many times, maybe by me posting my logs here and what I did from begining to end, someone might "see something" I'm missing?? Please HELP! =)
 * Slackware 12.1, on i686 Pentium III (Linux 2.6.24.5-smp = Slackware 12.1's generic-smp-2.6.24.5-smp kernel)
 * MediaWiki: 1.13alpha (SVN 06-25-2008)
 * PHP: 5.2.6 (I used Slackware 12.1's PHP v5.2.6 update package)
 * MySQL: 5.0.51b
 * MediaWiki Extension(s): [Extension:lucene-search|MWSearch]] SVN 06-25-2008, and Lucene-search2 SVN 06-25-2008, + I downloaded & installed mwdumper.jar into the Lucene-search2 "lib" dir = /usr/local/search/ls2
 * other tools: jre-6u6-i586-3, jdk-1_5_0_09-i586-1, apache-ant-1.7.0-i486, rsync-3.0.2-i486-1

SVN install of MediaWiki
> svn checkout http://svn.wikimedia.org/svnroot/mediawiki/trunk/phase3
 * Checked out revision 36630
 * I moved the everything to my htdocs folder using NEW directory = /var/www/htdocs/wiki-svn060108

> chmod a+x /var/www/htdocs/wiki-svn060108/config
 * I ran first time config via http:// /wiki-svn060108/config (configured it as follows) ;;


 * wikiname: NOC Archive
 * contact email: rprior@newedgenetworks.com
 * language: en - english
 * license:  GNU Free Documentation License 1.2 (Wikipedia-compatible)
 * admin username: rprior
 * admin password: xxxxxxxxx
 * wikiDB name: svnwikidb
 * DB username: svnwikiuser
 * DB password: xxxxxxxxx
 * database character set:  Experimental MySQL 4.1/5.0 UTF-8


 * I created the MySQL DB and gave myself permissions

> mysql -u root -p mysql> create database svnwikidb character set utf8; Query OK, 1 row affected (0.00 sec)

mysql> GRANT SELECT,INSERT,UPDATE,DELETE,CREATE,DROP -> ON svnwikidb.* -> TO 'svnwikiuser'@'localhost' -> IDENTIFIED BY 'testpass'; Query OK, 0 rows affected (0.03 sec)

mysql> exit


 * moved /var/www/htdocs/wiki-svn060108/config/LocalSettings.php TO /var/www/htdocs/wiki-svn060108/

> chown root:apache LocalSettings.php > chown 700 LocalSettings.php > rm -r /var/www/htdocs/wiki-svn060108/config


 * pulled up my new wiki via the page http://nen-tftp.techiekb.com/wiki-svn06252008/index.php/Main_Page = and IT WORKS!


 * I put some basic data that I knew would be searchable on the front/1st page

Installtion of LuceneSearch2+MWSeach extensions
> cd /var/www/htdocs/wiki-svn06252008/extensions > svn checkout http://svn.wikimedia.org/svnroot/mediawiki/trunk/extensions/MWSearch

> ln -s /usr/lib/jdk1.5.0_09/lib/tools.jar /usr/lib/java/lib

> ls -al /usr/lib/java/lib/ lrwxrwxrwx 1 root root       34 Jun 25 03:19 tools.jar -> /usr/lib/jdk1.5.0_09/lib/tools.jar

> cd /tmp > mkdir lucene-search-2 > cd lucene-search-2/ > svn checkout http://svn.wikimedia.org/svnroot/mediawiki/trunk/lucene-search-2/ > mkdir /usr/local/search > mkdir /usr/local/search/ls2 > cd /usr/local/search/ls2 > mv /tmp/lucene-search-2/lucene-search-2/* ./ > cd /usr/local/search/ls2/lib > wget http://download.wikimedia.org/tools/mwdumper.jar > mkdir /usr/local/search/indexes > cd /usr/local/search/ls2

lsearch.conf - my configuration file build
MWConfig.global=file:///etc/lsearch-global.conf MWConfig.lib=/usr/local/search/ls2/lib Indexes.path=/usr/local/search/indexes Search.updateinterval=1 Search.updatedelay=0 Search.checkinterval=30 Index.snapshotinterval=5 Index.maxqueuecount=5000 Index.maxqueuetimeout=12 Storage.master=localhost Storage.useSeparateDBs=false Storage.defaultDB=lsearch Storage.lib=/usr/local/search/ls2/sql SearcherPool.size=3 Localization.url=file:///var/www/htdocs/wiki-svn06252008/languages/messages OAI.username=user OAI.password=pass OAI.maxqueue=5000 Logging.logconfig=/etc/lsearch.log4j Logging.debug=true
 * I created a symlink for /etc/lsearch.conf that points to the actual file = /usr/local/search/ls2/lsearch.conf ln -s /usr/local/search/ls2/lsearch.conf /etc

/etc/lsearch.log4j - my configuration file build
log4j.rootLogger=INFO, A1 log4j.appender.A1=org.apache.log4j.ConsoleAppender log4j.appender.A1.layout=org.apache.log4j.PatternLayout log4j.appender.A1.layout.ConversionPattern=%-4r [%t] %-5p %c %x - %m%n

/etc/lsearch-global.conf - my configuration file build
[Database] svnwikidb : (single) (language,en) (warmup,10) [Search-Group] nen-tftp : svnwikidb [Index] Database.suffix=wiki wiktionary svnwikidb KeywordScoring.suffix=svnwikidb wiki wikilucene wikidev ExactCase.suffix=svnwikidb wiktionary wikilucene [Namespace-Prefix] all : [0] : 0 [1] : 1 [2] : 2 [3] : 3 [4] : 4 [5] : 5 [6] : 6 [7] : 7 [8] : 8 [9] : 9 [10] : 10 [11] : 11 [12] : 12 [13] : 13 [14] : 14 [15] : 15

built LuceneSearch.jar via ANT
> ln -s /opt/apache-ant/bin/ant /bin > ant Buildfile: build.xml build: [mkdir] Created dir: /usr/local/search/ls2/bin [javac] Compiling 101 source files to /usr/local/search/ls2/bin [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] Note: Some input files use unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. alljar: [jar] Building jar: /usr/local/search/ls2/LuceneSearch.jar BUILD SUCCESSFUL Total time: 24 seconds

LuceneSearch extension added to LocalSettings.php

 * I added the following to my /var/www/htdocs/wiki-svn060108/LocalSettings.php file ;

$wgSearchType = 'LuceneSearch'; $wgLuceneHost = 'localhost'; $wgLucenePort = 8123; require_once("extensions/MWSearch/MWSearch.php");

created a dumpBackup.sh script to automate building of my index
php /var/www/htdocs/wiki-svn06252008/maintenance/dumpBackupInit.php --current --quiet > wikidb.xml && java -cp /usr/local/search/ls2/LuceneSearch.jar org.wikimedia.lsearch.importer.Importer -s wikidb.xml svnwikidb

> chmod 750 dumpBackup.sh

created file = dumpBackupInit.php

 * created file /var/www/htdocs/wiki-svn06252008/maintenance/dumpBackupInit.php with 755 permssions ;


 * 1) dumpBackupInit - Wrapper Script to run the mediaWiki xml-dump "dumpBackup.php" correctly
 * 2) $wgDBtype           = "mysql";
 * 3) $wgDBserver         = "localhost";
 * 4) $wgDBname           = "svnwikidb";
 * 5) $wgDBuser           = "svnwikiuser";
 * 6) $wgDBpassword       = "xxxxxxxx";
 * 7) $wgDBprefix         = "";
 * 8)  * $wgDBport           = "5432";
 * 9) @author: Stefan Furcht
 * 10) @version: 1.0
 * 11) @require: /srv/www/htdocs/wiki-svn06252008/maintenance/dumpBackup.php
 * 12) The following Variables musst be set, to get dumpBackup.php at work
 * 1) you'll find this Values in the DB-section into your mediaWiki-Config: LocalSettings.php
 * 2) XML-Dumper 'dumpBackup.php' requires the setted Vars to run
 * 3) simply include the original dumpBackup-Script
 * I then, ran my "dumpBackup.sh" file via command-line

/srv/www/htdocs/wiki-svn06252008/dumpBackup.sh
 * This creates an XML dump of my Wiki DB in a file called wikidb.xml, which seems to work JUST FINE, the file is 3.6Kb, which is pretty small, since I don't have much in my BRAND NEW WIKI, just some text I know will be easily found when the search function is working properly.

starting the lucene-search2 daemon
I start the lucene-search2 daemon using this command-line ; /usr/local/search/lucene-search-2svn05112008/lsearchd
 * The program loads, and spits out some information to the console I am logged into <'some' text follows>

RMI registry started. Trying config file at path /root/.lsearch.conf Trying config file at path /var/www/htdocs/wiki-svn06252008/lsearch.conf Trying config file at path /etc/lsearch.conf log4j: Parsing for [root] with value=[INFO, A1]. log4j: Level token is [INFO]. log4j: Category root set to INFO log4j: Parsing appender named "A1". log4j: Parsing layout options for "A1". log4j: Setting property [conversionPattern] to [%-4r [%t] %-5p %c %x - %m%n]. log4j: End of parsing for "A1". log4j: Parsed "A1" options. log4j: Finished configuring. 0   [main] INFO  org.wikimedia.lsearch.util.Localization  - Reading localization for En 2804 [main] INFO  org.wikimedia.lsearch.util.UnicodeDecomposer  - Loaded unicode decomposer 3068 [main] INFO org.wikimedia.lsearch.interoperability.RMIServer  - RMIMessenger bound 3351 [main] INFO org.wikimedia.lsearch.interoperability.RMIServer  - RemoteSearchable bound 3374 [Thread-2] INFO org.wikimedia.lsearch.frontend.HTTPIndexServer  - Started server at port 8321 3386 [Thread-3] INFO org.wikimedia.lsearch.frontend.SearchServer  - Binding server to port 8123 3407 [main] INFO org.wikimedia.lsearch.search.Warmup  - Warming up index svnwikidb ... 4737 [main] INFO org.wikimedia.lsearch.search.Warmup  - Warmed up svnwikidb in 1330 ms 4738 [main] INFO  org.wikimedia.lsearch.search.Warmup  - Warming up index svnwikidb ... 5629 [main] INFO org.wikimedia.lsearch.search.Warmup  - Warmed up svnwikidb in 891 ms 5630 [main] INFO  org.wikimedia.lsearch.search.Warmup  - Warming up index svnwikidb ... 6203 [main] INFO org.wikimedia.lsearch.search.Warmup  - Warmed up svnwikidb in 573 ms

My MediaWiki Special:Version page

 * My MediaWiki site's Special:Version page appears to indicate all modules are being recognized ;;

INSTALLED SOFTWARE INSTALLED EXTENSIONS
 * MediaWiki 1.13alpha
 * PHP 5.2.6 (apache2handler)
 * MySQL 5.0.51b
 * MWSearch (Version r36482) - MWSearch plugin - Brion Vibber and Kate Turner

The actual PROBLEM is NO SEARCH RESULTS
Now that I have everything setup, and Lucene-search2-deamon running I tried to search on my website... Fingers crossed.... I type in a known word that IS on the front page, and is also in my XML dump of the MySQL DB (wikidb.xml) --- sure enough, I get ZERO SEARCH RESULTS!! I get this error in my MediaWiki search results page; Search results From AgentDcooper's Wiki You searched for wiki For more information about searching AgentDcooper's Wiki, see Searching AgentDcooper's Wiki. Showing below 0 results starting with #1. No page text matches Note: Unsuccessful searches are often caused by searching for common words like "have" and "from", which are not indexed, or by specifying more than one search term (only pages containing all of the search terms will appear in the result).

Troubleshooting the ZERO results issue
Since I have a console session open with lucene-search2 daemon running, I notice that AS SOON as I hit the SEARCH button after typing in my search phrase (loopback) in my MediaWiki search box, the lucene-search2 daemon console output scrolls the following; 893553 [pool-2-thread-1] INFO org.wikimedia.lsearch.frontend.HttpHandler  - query:/search/svnwikidb/loopback?namespaces=0&offset=0&limit=20&version=2&iwlimit=10 what:search dbname:svnwikidb term:loopback 893567 [pool-2-thread-1] INFO org.wikimedia.lsearch.search.SearchEngine  - Using NamespaceFilterWrapper wrap: {0} 893592 [pool-2-thread-1] INFO org.wikimedia.lsearch.search.SearchEngine  - search svnwikidb: query=[loopback] parsed=[contents:loopback (title:loopback^6.0 stemtitle:loopback^2.0) (alttitle1:loopback^4.0 alttitle2:loopback^4.0 alttitle3:loopback^4.0) (keyword1:loopback^0.02 keyword2:loopback^0.01 keyword3:loopback^0.0066666664 keyword4:loopback^0.0050 keyword5:loopback^0.0039999997)] hit=[1] in 12ms using IndexSearcherMul:1214736931039


 * I've been troubleshooting this issue for a long time, so I do know how to enable Mediawiki Debuging --- here is what my /var/log/mediawiki/debug_svn_log.txt shows ;;

Start request GET /wiki-svn06252008/index.php/Special:Search?search=loopback&fulltext=Search Host: nen-tftp.techiekb.com User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9) Gecko/2008052906 Firefox/3.0 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive Referer: http://nen-tftp.techiekb.com/wiki-svn06252008/index.php/Special:Search?search=loopback&fulltext=Search Cookie: wikidbToken=dd6c9b732dba0c94b04ad72044d46d79; wikidbUserName=Rprior; wikidbUserID=2; wikidb_session=5rph1dsoik5dpdlcitc1canlr0; svnwikidb_session=n8btqun31sn6vnubiek79l5br6; svnwikidbUserID=1; svnwikidbUserName=Rprior; svnwikidbToken=baea562c5be4148475a179c94a6868d4 Authorization: Basic ZGNvb3Blcjp0ZXN0cGFzcw==

Main cache: FakeMemCachedClient Message cache: MediaWikiBagOStuff Parser cache: MediaWikiBagOStuff session_set_cookie_params: "0", "/", "", "", "1" Fully initialised Unstubbing $wgContLang on call of $wgContLang->checkTitleEncoding from WebRequest::getGPCVal Language::loadLocalisation: got localisation for en from source Unstubbing $wgOut on call of $wgOut->setArticleRelated from SpecialPage::setHeaders Unstubbing $wgMessageCache on call of $wgMessageCache->get from wfMsgGetKey Unstubbing $wgLang on call of $wgLang->getCode from MessageCache::get Unstubbing $wgUser on call of $wgUser->getOption from StubUserLang::_newObject Cache miss for user 1 Connecting to localhost svnwikidb... Connected Logged in from session MessageCache::load: Loading en... got from global cache Unstubbing $wgParser on call of $wgParser->firstCallInit from MessageCache::transform Fetching search data from http://localhost:8123/search/svnwikidb/loopback?namespaces=0&offset=0&limit=20&version=2&iwlimit=10 Http::request: GET http://localhost:8123/search/svnwikidb/loopback?namespaces=0&offset=0&limit=20&version=2&iwlimit=10 total [0] hits OutputPage::sendCacheControl: private caching; ** Request ended normally


 * That is the part I am having the most trouble with, IMHO!
 * Follow me here.... Everything actually seems to work, up until the 3rd to last line in the debug! The part that doesn't appear to be working properly is the 3rd from the bottom line =  total [0] hits.
 * This is my reason, WHY I think that is the case, If I actually pull up the web address on my webserver via lynx or any other webbrowser =  http://localhost:8123/search/svnwikidb/loopback?namespaces=0&offset=0&limit=20&version=2&iwlimit=10  I get the following output!

1 1.0 0 Main_Page


 * That output appears to be saying that THERE IS 1 page that matches!!! The page being Main_Page??!! Does that sound right?? I suspect something MUST be wrong here, it has been pointed out that it may be my CURL library, but that was on an earlier version of Slackware (12.0, I am now running 12.1) - according to my Slackware 12.1 (plain vanilla install, except I upgraded to newer version of PHP) the CURL version/package I am using is curl-7.16.2-i486-1. I don't suspect that CURL is my problem, but I am completely open to anyone's interpretation of my issue at hand, and would love to work with someone on this, and/or come up with a solution... I think it's working, just MWSearch is not passing the data properly to my MediaWiki search???

Does anyone have any ideas here? Please help me, I really don't want to move away from MediaWiki, but I very much need this functionality from MediaWiki! Thanks in advance + sorry to be longwinded, just wanted to ensure I give as much details as possible - if you have any questions, feel free to ask!

PS :: I tried ExtensionFunctions.php SVN-06-25-2008
BTW, in my reading and troubleshooting, I saw something that said I should download the file ExtensionFunctions.php so I pulled this file down from SVN-06-25-2008 ;; > wget http://svn.wikimedia.org /svnroot/mediawiki/trunk/extensions/ExtensionFunctions.php

> mv ExtensionFunctions.php /var/www/htdocs/wiki-svn06252008

This did not resolve my issue at all, still seeing the same problem. ANYONE HAVE ANY IDEAS?
 * If you add wfDebug(print_r($data, true)); in MWSearch_body.php file right after the $data = Http::get( $searchUrl ); line, does that give something useful in your debug log? Or does it give a null? 83.81.5.126 15:35, 29 June 2008 (UTC)