Extension talk:Zend Search Lucene for MediaWiki

From MediaWiki.org
Jump to: navigation, search
Start a new discussion

Link to page containing document

I am maintaining a wiki on a local intranet. Could you tell me if Zend search will show the link to the wiki page which contains an indexed document or if it only links to the indexed document.

Our current search will find a phrase in 'aaa.pdf' but when you click the link, you can only open 'aaa.pdf', you cannot see where 'aaa.pdf' is linked on the wiki.

Thanks.

203.5.217.304:18, 26 April 2012

It should list the file link and pages containing "aaa.pdf". Further more it should allow to find phrases inside a pdf-document as long as you extended the code a little bit.

c u stevie

Steviex2 (talk)15:50, 30 April 2012

Can you elaborate or provide links to further information on how to extend the code to search inside pdf documents? This is the main reason I would like to use Zend Search Lucene for MediaWiki.

Thanks

Rpsteiner (talk)22:09, 24 May 2012

Hi Rpsteiner,

Open file PslZendSearchLuceneIndexer.php, go to line 530 "...We will do this...". This should be the location to index pdf-content. Unfortunately there was no further sponsor to let this happen, but you could do it by yourself If you are coder. The keyword here is XPDF (name of an external linux library). This task should be easy- but would need some hours. Whish you good luck...may you can contribute the necessary code portion here. I would integrate it in a next release (providing an exception for windows users etc.).

c u stevie

Steviex2 (talk)22:35, 24 May 2012
 
 
 

Question about pslpopularsearches

Hi all, I received the following erro message "1146: Table 'wikidb.mw_mw_pslpopularsearches' doesn't exist"

Anyone have an idea?

Steviex218:40, 10 May 2011

Seems, you have a problem with the db-table prefix. Could you please post the complete error message and the version of ZSL you are using?

Steviex218:42, 10 May 2011

Hello, think there is a prefix-problem in my wiki too (ZSL 2.0):

Es ist ein Datenbankfehler aufgetreten. Der Grund kann ein Programmierfehler sein.
Die letzte Datenbankabfrage lautete: 
INSERT IGNORE INTO `wikiwikipslpopularsearches` (searchcon,results,success,triggertime,
user,ip,rawquery,score,namespace,pids,page,category,rating,pageurl,pagpage,sk) 
VALUES ('body_and_title','55','1','2011-05-16 14:13:44',,,'Buch','4.41565438044',,',
840,864,1014,1541,319,1474,842,905,1156,1211,897,715,1548,366,811,421,349,305,1072,426,
480,307,896,846,514,634,306,1551,311,1221,1349,1443,415,371,927,407,327,1248,1402,340,
1530,889,890,360,554,309,145,1470,775,432,1391,1465,304,985,1469',,,,,,
'465e8da6fb92d8a7b7bb24687b89517b')
aus der Funktion „PslZendSearchLuceneDbActions“. Die Datenbank meldete den Fehler „1146:
Table 'wiki.wikiwikipslpopularsearches' doesn't exist (localhost)“.

Thank you.

Agoerlt08:19, 18 May 2011

Same here - what's the fix?

Pjtait (talk)01:19, 20 April 2012

Hi,

as mentioned... check the db-prefix setup in your LocalSettings.php.

Steviex2 (talk)02:23, 20 April 2012
 
 
 
 

Suggestions not working

Hello,

Suggestions are not working. Do you have any idea? In LocalSettings.php it is enabled.

212.185.65.9107:56, 31 August 2011

It should be an issue of your environment. Usually it works fine.

Steviex208:06, 31 August 2011

Do you have any idea what could be missing? Will i need a special php libary or something like that?

212.185.65.9109:12, 31 August 2011
Edited by author.
Last edit: 06:40, 1 September 2011

Do you know that suggestions could have been configured by users? At first you should play with this settings and test every mode. As I konw there are no additional PHP-modules necessary. Could you please tell me something about your MW-Version?

Steviex217:36, 31 August 2011

Ah i didn't configure it for my user...sorry. Then my Problem is 'fixed'. Thx for your help.

212.185.65.9106:16, 1 September 2011

Hi, could you please clarify this for me? By "suggestions" you mean that drop-down list that appears bellow the search box as we type, or is it a list of suggestions presented after we click on the search button (something like a "Did you mean?"). Thanks

Capmo (talk)03:08, 24 March 2012
 
 
 
 
 

Search multiple wikis

In the documentation you list unlimited wiki instances as a feature, can I run a search across all index and return results for all the wikis?

Ashex05:33, 10 February 2012

This could have been done by modifying the source code. "Unlimited instances" is first related to the indexer, meaning you have a server with several MediaWiki-installations and whish to index all with a single nightly cron job.

80.187.107.301:00, 11 February 2012

Could you possibly elaborate on that? I have a Wiki family (single server running multiple wikis, each has a separate database) and would like to be able to search across them.

Ashex (talk)21:50, 24 February 2012

This is no standard feature of ZSL and would require approximately several days of investigations and development. Unfortunately I can't provide this on a non profit base. This could also be a preformance related issue and may fail. We currently have no experience on doing so. If all this doesn't matter you could require a quote on www.wiki-service.biz. You could also check an alternative enterprise search engine like Solr.

c u stevie

Steviex2 (talk)22:16, 24 February 2012

If it's not a standard feature and would require additional development of the extension, then why is it listed as a feature?

Ashex (talk)01:36, 26 February 2012

Unlimited instances is related to the indexer!

Steviex2 (talk)17:15, 26 February 2012
 
 
 
 
 

XmlDump won't work on Windows

Hello,

I installed the extension yesterday and tried to have it working but I always get the same error message when running the indexer:

2011-11-09T18:29:52+01:00 INFO (6): LuceneIndexer Error-Message! ERROR dumpXML() 
"C:\Program Files\EasyPHP 3.0\php\php.exe" -c "C:\Program Files\EasyPHP 3.0\conf_files\php.ini" 
D:/Wikis/mediawiki-1.17.0/maintenance/dumpBackup.php --current --quiet --uploads 
> D:/Wikis/search-engine/psl_sources/af_current.xml---Status -> 1

However, when I copy/paste and run myself the command above in the same cmd.exe where I ran the indexer (so in the same conditions normally), this works fine and my af_current.xml is correctly filled with the dump.

Any idea?

(I'm on mediaWiki 1.17 but this has no link with the above mentioned stuff, I think)

193.56.136.20007:14, 10 November 2011

Hi,

strange, never heard about it before.

dumpXML should be a funktion in the code. What you could do is, to go there...see what the funktion is trying to do and print out anny information available at this point (pathes etc.). Also be sure that your WAMP System has the needed rights for this task.

Steviex212:58, 10 November 2011
 

Blank page when searching

SOLVED: Problem was that the directorys were owned by the user apache instead of nobody, which was actually the webserver-user.

Hello everyone,

I´ve installed ZF and the Zendsearchlucene for Mediawiki extension.

The index-process runs smoothly and without an error.

Problem is, that I get a blank page as soon as I hit the search button. No error-message and no hint what´s wrong.

Here´s my config:

Webroot is /opt/lampp/htdocs (+/mediawiki as symlink to mw installation path /opt/lampp/htdocs/mediawiki_1.17.0)

ZendF resides in /opt/lampp/zend/

The relevant parts of the config are:

Index Config

$GuiFlag = 0;


$wikisArray[0]['xmlSource'] = "/opt/lampp/zend/sources/internal_current.xml";
$wikisArray[0]['indexName'] = "wikidb_internal";
$wikisArray[0]['maintenanceScript'] = "/opt/lampp/htdocs/mediawiki/maintenance/dumpBackup.php";
$wikisArray[0]['mediaDir'] = "/opt/lampp/htdocs/mediawiki/images";// maybe httpdocs/images/ if img_auth.php not in use



#$wikisArray[1]['xmlSource'] = "/opt/lampp/zend/sources/sysdoc_current.xml";
#$wikisArray[1]['indexName'] = "wikidb_sysdoc";
#$wikisArray[1]['maintenanceScript'] = "/opt/lampp/htdocs/mediawiki/maintenance/dumpBackup.php";
#$wikisArray[1]['mediaDir'] = "/opt/lampp/htdocs/mediawiki/images/";// maybe httpdocs/images/ if img_auth.php not in use

#[...]

endif;



@preg_match_all("/(Windows)(.*?)/", $_SERVER['OS'], $matched, PREG_SET_ORDER);




/* an index dir above web root */




        $indexDirName = "psl_search_indexes";
        $PhpExecutionStringUnix = "/opt/lampp/bin/php -c /opt/lampp/etc/php.ini";
        $PhpExecutionStringWindows = "c:\\xampp\\php\\php.exe ";


        $email = [mailto:info@my.reporting-mailadress.ork info@my.reporting-mailadress.ork];




/* file formats which will be indexed */
$additionalFileFormatsArray = array('pdf','docx','xlsx','pptx','sql','vnd','txt','xml','xmlx','csv');

if(count($matched) > 0 ):
/* modify this to fit your needs, if you are on windows */




        $webServerUser = "";
        $webServerUserGroup = "";
        $zendFrameworkLibraryPath = "C:\\xampp\\htdocs/ZF/library";
        $zendLogPath = "C:\\xampp\\htdocs\\".$indexDirName."\\";
        $applicationPath = "C:\\xampp\\htdocs";




else:




/* modify this to fit your needs, if you are on unix */




        $webServerUser = "apache";
        $webServerUserGroup = "apache";
        $zendFrameworkLibraryPath = "/opt/lampp/zend/library";
        $zendLogPath = "/opt/lampp/zend/".$indexDirName."/";
        $applicationPath = "/opt/lampp/zend/";
endif;

Local-settings from Mediawiki

/* Configuration Zend Search Lucene for MediaWiki - Start */
$PslDomainDir = "internal";
$PslPhpExecutionStringUnix = "/opt/lampp/bin/php -c /opt/lampp/etc/php.ini ";
$PslMaintenancePath = "/opt/lampp/htdocs/mediawiki/maintenance/";
$PslXmlPath = "/opt/lampp/zend/sources/".$PslDomainDir."_current.xml";


$wgPslZslAdminUseAutoReIndex = false;
$wgPslZslAdminDefaultEmail = "<your email address>";
$wgPslZslAdminDumpString = $PslPhpExecutionStringUnix.$PslMaintenancePath."dumpBackup.php --current --quiet --uploads > ".$PslXmlPath;
$wgPslZslAdminMediaDir = "/opt/lampp/htdocs/mediawiki/images/";
$wgPslZslAdminReIndexString = $PslPhpExecutionStringUnix."/opt/lampp/zend/PslZendSearchLuceneIndexer.php ".$PslXmlPath." wikidb_".$PslDomainDir." ". $PslMaintenancePath."dumpBackup.php";



require_once( "$IP/extensions/PslZslAdmin/PslZslAdmin.php");



$wgSearchType = 'PslZendSearchLucene';
$wgPslEnableSuggestions = true;//enables suggestions
$wgPslEnableStopWords = false;//enables stop words
$wgPslStopWords = array('aber','als','am','an');
$wgPslImagePath = [http://172.23.101.63/mediawiki/extensions/PslZendSearchLucene/ http://172.23.101.63/mediawiki/extensions/PslZendSearchLucene/];



$wgPslWikiUrl = "http://172.23.101.63/mediawiki-1.17.0/index.php/";
$wgPslEntriesPerPage = 20;
$wgPslUtf8DecodeResults = false;//utf8-hint for related display issues, (play around with this if needed)
$wgPslIndexDir = "/opt/lampp/zend/psl_search_indexes/wikidb_".$PslDomainDir;
$wgPslZendLibraryDir = "/opt/lampp/zend/library/";
$wgPslEnablePopularSearches = true;//requires table-create rights for MediaWikis db-account
$wgPslPopularSearchesHistory = 365;//data remains 365 days
$wgPslProtectPopularSearches = false;//
$wgPslHighlightColor = "#ff6900";
$wgPslEnabaleDebugMode = false;//debug mode
$wgPslEnableSuggestions = true;//enables suggestions
$wgPslEnableFileSearch = true;//enables file search
$wgPslEnablePsIpTracking = false;//enables ip tracking for geo lacation services etc. (currently not implemented)
$wgPslEnableAnonKey = true;//anonymous key for science
$wgPslHistoryEntries = 30;//history entries per page
$wgPslHistoryMiniStat = true;



$wgHiddenPrefs[] = 'searchlimit';//this entries are disabling no more needed user preferences of the old/default search
$wgHiddenPrefs[] = 'contextlines';
$wgHiddenPrefs[] = 'contextchars';
$wgHiddenPrefs[] = 'disablesuggest';
$wgHiddenPrefs[] = 'searcheverything';
$wgHiddenPrefs[] = 'searchnamespaces';
$wgPslEnableUserInHistory = false;//enhanced knowledge management feature could tackle your country specific law!
require_once( "$IP/extensions/PslZendSearchLucene/PslZendSearchLucene.php");
/* Configuration Zend Search Lucene for MediaWiki - End */

Anyone an idea? I already tried several combinations of paths but none would work.

Greetings, and thanks in advance...

F

141.84.149.1008:03, 20 October 2011

Enhanced, more admin friendly Instructions (mediawiki 1.17.0)

On one hand, i finally got Zend Lucene to work, and on mediawiki 1.17.0, but not without having virtually any documentation that could help.

MediaWiki 1.17.0

PHP 5.3.6 (apache2handler)

MySQL 5.2.8-MariaDB-log


Step 1 - Install / Download Zend Framework Download Zend Framework. Unpack and copy the contents of the download file to a webserver folder (commonly not below web root). Zend Framework install is done! You're NOT exactly done. Here, I had to create a directory called 'zend', above web root (i.e. /var/www/zend) where i extracted the contents of the tar ball.

Step 2 - Configure Zend Search Lucene for MediaWiki Download and extract the extensions PslZslAdmin and PslZendSearchLucene to your Wiki(s) extension directory. Move the files PslZendSearchLuceneIndexer.php and PslZendSearchLuceneIndexerConfig.php to a server directory above web root. Edit the marked parts of the file PslZendSearchLuceneIndexerConfig.php as described in it.

This is where it gets tricky. The config file (PslZendSearchLuceneIndexerConfig.php) unfortunately lacks the proper comments to help any admin intuitively configure Zend in any reasonable ammount of time. Here are some suggestions i would recommend to any admin out there wanting to install Lucene:

1.) In my case, i didn't have an xml repository for my db dumps. So I created a 'source' folder where the xml dumps will be housed. I just created a 'sources' directory in <full path of Zend installation>/sources

2.) Here's how i would've labeled the parameters in the config file:


Instead of:

$wikisArray[0]['xmlSource']         = "/var/www/vhosts/indi.sexyserver4you.de/subdomains/internal/internal_current.xml";

I would've put:

$wikisArray[0]['xmlSource']         = "<full path of xml dumps>/internal_current.xml";

In my case, again, i created a directory specifically designated for these dumps, which was /var/www/zend/sources


Instead of:

$wikisArray[0]['maintenanceScript'] = "/var/www/vhosts/indi.sexyserver4you.de/subdomains/internal/httpdocs/wiki/maintenance/dumpBackup.php";
$wikisArray[0]['mediaDir']          = "/var/www/vhosts/indi.sexyserver4you.de/subdomains/internal/public/";// maybe httpdocs/images/ if img_auth.php not in use


I would've put:

$wikisArray[0]['maintenanceScript'] = "<full path of mediawiki installation>/maintenance/dumpBackup.php";
$wikisArray[0]['mediaDir']          = "<full path of mediawiki installation>/images/";

In my case, <full path of mediawiki installation> = /var/www/html/wiki/mediawiki-1.17.0/


Instead of:

$wikisArray[1]['xmlSource']         = "/var/www/vhosts/indi.sexyserver4you.de/subdomains/sysdoc/sysdoc_current.xml";

I would've put:

$wikisArray[1]['xmlSource']         = "<full path of xml dumps>/sysdoc_current.xml";

In my case, again, i created a directory specifically designated for these dumps, which was /var/www/zend/sources


Instead of

$wikisArray[1]['maintenanceScript'] = "/var/www/vhosts/indi.sexyserver4you.de/subdomains/sysdoc/httpdocs/maintenance/dumpBackup.php";
$wikisArray[1]['mediaDir']          = "/var/www/vhosts/indi.sexyserver4you.de/subdomains/sysdoc/public/";// maybe httpdocs/images/ if img_auth.php not in use


I would've put:

$wikisArray[1]['maintenanceScript'] = "<full path of mediawiki installation>/maintenance/dumpBackup.php";
$wikisArray[1]['mediaDir']          = "<full path of mediawiki installation>/images/";


Instead of:

$PhpExecutionStringUnix         = "/usr/bin/php -c /etc/php5/cli/php.ini";

I would've put:

$PhpExecutionStringUnix         = "/usr/bin/php -c /<location of php.ini file>/php.ini";

In my case <location of php.ini file> = /etc/


Instead of:

$webServerUser              = "www-data";
$webServerUserGroup         = "psaserv";
$zendFrameworkLibraryPath   = "/PSL_ADD_ONS/ZF/library";
$zendLogPath                = "/PSL_ADD_ONS/".$indexDirName."/";
$applicationPath            = "/PSL_ADD_ONS";


I would've put:

$webServerUser              = "<your web server user>";
$webServerUserGroup         = "<your web server group>";
$zendFrameworkLibraryPath   = "/<installation path of Zend>/ZendFramework-1.11.10/library";
$zendLogPath                = "/<installation path of Zend>/".$indexDirName."/";
$applicationPath            = "/<installation path of Zend>/";


3.) Here's how i would've labeled the config parameters for LocalSettings.php

/* Configuration Zend Search Lucene for MediaWiki - Start */
$PslDomainDir                   = "sysdoc";
$PslPhpExecutionStringUnix      = "/usr/bin/php -c /<full path to php.ini file>/php.ini ";
$PslMaintenancePath             = "/<full path to mediawiki installation>/maintenance/";
$PslXmlPath                     = "/<full path to xml dumps or sources>/".$PslDomainDir."_current.xml";

$wgPslZslAdminUseAutoReIndex    = false;
$wgPslZslAdminDefaultEmail      = "<your email address>";
$wgPslZslAdminDumpString        = $PslPhpExecutionStringUnix.$PslMaintenancePath."dumpBackup.php --current --quiet --uploads > ".$PslXmlPath;
$wgPslZslAdminMediaDir          = "<full path of mediawiki installation ir directory where you store uploaded docs>/images/";
$wgPslZslAdminReIndexString     = $PslPhpExecutionStringUnix."/<full path to Zend Installation>/PslZendSearchLuceneIndexer.php ".$PslXmlPath." wikidb_".$PslDomainDir." ".  $PslMaintenancePath."dumpBackup.php"; 

require_once( "$IP/extensions/PslZslAdmin/PslZslAdmin.php");

$wgSearchType                      = 'PslZendSearchLucene';
$wgPslEnableSuggestions            = true;//enables suggestions
$wgPslEnableStopWords              = false;//enables stop words
$wgPslStopWords                    = array('aber','als','am','an');
$wgPslImagePath                    = "http://<wiki domain or ip address of wiki server>/extensions/PslZendSearchLucene/";

$wgPslWikiUrl                      = "http://<wiki url>/mediawiki-1.17.0/index.php/";
$wgPslEntriesPerPage               = 20;
$wgPslUtf8DecodeResults            = false;//utf8-hint for related display issues, (play around with this if needed)
$wgPslIndexDir                     = "/<full path to Zend Installation>/psl_search_indexes/wikidb_".$PslDomainDir;
$wgPslZendLibraryDir               = "/<full path to Zend Installation>/ZendFramework-1.11.10/library/";
$wgPslEnablePopularSearches        = true;//requires table-create rights for MediaWikis db-account
$wgPslPopularSearchesHistory       = 365;//data remains 365 days
$wgPslProtectPopularSearches       = false;//
$wgPslHighlightColor               = "#ff6900";
$wgPslEnabaleDebugMode             = false;//debug mode
$wgPslEnableSuggestions            = true;//enables suggestions
$wgPslEnableFileSearch             = true;//enables file search
$wgPslEnablePsIpTracking           = false;//enables ip tracking for geo lacation services etc. (currently not implemented)
$wgPslEnableAnonKey                = true;//anonymous key for science
$wgPslHistoryEntries               = 30;//history entries per page
$wgPslHistoryMiniStat              = true;

$wgHiddenPrefs[] = 'searchlimit';//this entries are disabling no more needed user preferences of the old/default search
$wgHiddenPrefs[] = 'contextlines';
$wgHiddenPrefs[] = 'contextchars';
$wgHiddenPrefs[] = 'disablesuggest';
$wgHiddenPrefs[] = 'searcheverything';
$wgHiddenPrefs[] = 'searchnamespaces';
$wgPslEnableUserInHistory          = false;//enhanced knowledge management feature could tackle your country specific law!
require_once( "$IP/extensions/PslZendSearchLucene/PslZendSearchLucene.php");
/* Configuration Zend Search Lucene for MediaWiki - End */


4.) Finally, make sure you check your permissions! Your xml dump/source directory must be owned by the web server user and in the web server group. In my case, apache:apache. A simply chown -R apache:apache . in the Zend installation directory should do the trick.

Hope this helps!

Ucananduwill19:12, 22 September 2011

Thank you for your contributions. Nice to heare it's running on 1.17. We have several hundred downloads and less then 20 questions about setup till now. So this wasn't on my toDo-list :-).

Steviex220:31, 22 September 2011

Thank you for taking the time for these instructions, they really helped me! It's an awesome extension but so confusing to set up at first glance. Now it breaks my template, but that's a minor thing to fix! :)

21:23, 4 October 2011
 
 

Undefined Method User::getOptions()

When I try to open "Special Pages" I get an error:

Fatal error: Call to undefined method User::getOptions() in /users/lenjo/www/mediawiki-1.15.1/extensions/PslZendSearchLucene/PslZendSearchLucene_body.php on line 116

Where is the method defined? How can I fix that?

Edit: I found the class where getOptions is defined. But that does not solve the problem...

213.70.5.5709:31, 29 August 2011

You may have the wrong MW-version.

Steviex202:48, 30 August 2011

Hmm, does that not work with 1.15.1? I do not have the posibility to upgrade, since the MW is not mine...

213.70.5.5706:41, 30 August 2011

It was never tested against 1.15 and is declared as 1.16-Extension. I will test it against 1.17 soon.

Steviex207:34, 30 August 2011

oh shoot, I didn't see that. Well, then I have a problem... :-S

213.70.5.5708:33, 30 August 2011
 
 
 
 

PslZendSearchLuceneIndexerConfig.php problem

The PslZendSearchLuceneIndexerConfig.php file comes with "<?" at the start of the file, should probably be "<?php" as not all of us have short_open_tag enabled.

204.137.29.24320:34, 14 July 2011

Invalid argument supplied for foreach() in C:\xampp\htdocs\PslZend SearchLuceneIndexer.php on line 421

What to repair when indexing has this result? Thanks...

PHP Warning: Invalid argument supplied for foreach() in C:\xampp\htdocs\PslZend SearchLuceneIndexer.php on line 421

Warning: Invalid argument supplied for foreach() in C:\xampp\htdocs\PslZendSearc hLuceneIndexer.php on line 421 PHP Warning: Invalid argument supplied for foreach() in C:\xampp\htdocs\PslZend SearchLuceneIndexer.php on line 430

Warning: Invalid argument supplied for foreach() in C:\xampp\htdocs\PslZendSearc hLuceneIndexer.php on line 430

Rien Satori09:27, 21 June 2011

Hi, seems $domArr['mediawiki']['page'] is emty...this could mean you have no XML-Data to parse...your MediaWiki data extraction fails, or your Wiki has no pages.

Steviex212:22, 21 June 2011

Thank you so much for your reply. But the database dump was successful and internal_current.xml file was successfully created too. All other settings are almost exactly according to guide how to install it.

As I understand indexer is independent on LocalSettings.php, so shouldn't matter if I have some misconfiguration there.

Rien Satori23:32, 21 June 2011

Yes indexer is independent on LocalSettings.php.

Steviex217:13, 22 June 2011

Thanks for the reply. I have found that my wiki was dumping XML file with not allowed character at the beginning, after trimming it, indexing was OK.

I have multilingual smw wiki in English and Japanese in single database and on search results in Japanese I get error in Search results:

Warning: preg_replace() [function.preg-replace]: Compilation failed: unrecognized character after (? or (?- at offset 2 in /my/path/extensions/PslZendSearchLucene/PslZendSearchLucene_body.php on line 1525

anyway the Zend Extansion finds correct page but :

  • "Text" is not displayed under the search results for Japanese
  • foreign UTF8 characters (like other languages in English text) are displayed as ?
  • words inside Japanese sentence are not indexed (as there are no spaces between words in Japanese)

$wgPslUtf8DecodeResults just turns Japanese page names to ????

ad Mediawiki MW Search:

  • same result as mentioned upper
  • foreign UTF8 characters are displayed correct in normal MW Search...
  • words inside Japanese sentence are not indexed (default MW Search probably cannot deal with this)

I don't know what of the mentioned is my misconfiguration and what real troubles, just wanted to share overall result from testing by normal Mediawiki user (not a PHP expert).

Rien Satori06:24, 24 June 2011

Hi Satori,


I will try to reply accordingly from developers point of view- for you and following visitors. As I know Semantic MediaWiki (smw) is a complete "other peace of Software", or drastically modified MediaWiki. We never tested ZSL for MediaWiki against this branch. There is another comment describing problems with Japanese language...so we might can say ZSL is currently not ready for Japan ;-). But we recognized many downloads from other countries all over the world (without any bug postings) and use it with UTF8 in german language. So we could say it's a stable ZSL release accordingly to the requirements and test scenarios mentioned at the main page.

Steviex216:28, 24 June 2011
 
 
 
 
 

$wikisArray xmlSource xml file

In the following configuration of PslZendSearchLuceneIndexerConfig.php, how do you establish or figure out the path to the xmlsource file? do you need to create the xml file yourself? should it be found somewhere? I'm trying to get this installed on windows and I'm not having the best of luck.

$wikisArray[0]['xmlSource'] = "D:\xampp\htdocs\mediawiki\internal_current.xml"; $wikisArray[0]['indexName'] = "TESTWiki";

  $wikisArray[0]['maintenanceScript'] = "D:\xampp\htdocs\mediawiki\maintenance\dumpBackup.php";
  $wikisArray[0]['mediaDir']          = "D:\xampp\htdocs\mediawiki\images";// maybe httpdocs/images/
Daddyd20520:46, 9 June 2011

Hi there,

on win you should use double slashes...see post from Philipp

80.187.106.19514:32, 10 June 2011
 

How to configure PslZendSearchLuceneIndexerConfig.php

Hi,

could you explain to me how the paths within PslZendSearchLuceneIndexerConfig.php must be configured to get the whole thing working ? I made following configurations:

$GuiFlag = 0;

   $wikisArray[0]['xmlSource']         = "D:\xampp\htdocs\mediawiki\internal_current.xml";
   $wikisArray[0]['indexName']         = "TESTWiki";
   $wikisArray[0]['maintenanceScript'] = "D:\xampp\htdocs\mediawiki\maintenance\dumpBackup.php";
   $wikisArray[0]['mediaDir']          = "D:\xampp\htdocs\mediawiki\images";// maybe httpdocs/images/ if img_auth.php not in use

I get the following error, when I try to run the Indexer

C:\>D:\xampp\php\php.exe -f D:\xampp\htdocs\ZendFramework\PslZendSearchLuceneIndexer.php PHP Warning: require_once(Zend/Search/Lucene.php): failed to open stream: No such file or directory in D:\xampp\htdocs\ZendFramework\PslZendSearchLuceneIndexer.php on line 140

Warning: require_once(Zend/Search/Lucene.php): failed to open stream: No such file or directory in D:\xampp\htdocs\ZendFramework\PslZendSearchLuceneIndexer.php on line 140 PHP Fatal error: require_once(): Failed opening required 'Zend/Search/Lucene.php' (include_path='d: mpp\htdocs\ZendFramework\library') in D:\xampp\htdocs\ZendFramework\PslZendSearchLuceneIndexer.php on line 140

Fatal error: require_once(): Failed opening required 'Zend/Search/Lucene.php' (include_path='d: mpp\htdocs\ZendFramework\library') in D:\xampp\htdocs\ZendFramework\PslZendSearchLuceneIndexer.php on line 140

C:\>

Where is my mistake ?

Thank you

Philipp

194.172.26.13509:58, 23 May 2011

Found my mistake: $webServerUser = "";

   $webServerUserGroup         = "";
   $zendFrameworkLibraryPath   = "d:\\xampp\\htdocs\\ZendFramework\\library";
   $zendLogPath                = "d:\xampp\htdocs\\".$indexDirName."\\";
   $applicationPath            = "d:\xampp\htdocs";

Wrong path under $zendFrameworkLibraryPath = "d:\\xampp\\htdocs\\ZendFramework\\library"; Had to use double backslash.

Fixed it but I still get error-messages in my commandlinewindow-

194.172.26.13510:08, 23 May 2011

Found another missing "doublebackslash" in my config -> fixed it -> everything OK

Case closed

So long

Philipp

194.172.26.13510:20, 23 May 2011

never mind / Keine Ursache :-).

Steviex213:48, 23 May 2011
 
 
 

Special:PslZendSearchLucene, problems with display of Japanese/Chinese !

We are using English, Chinese and Japanese page content, and while the standard MW search and display works in the correct manner, Zend Search brings a error message and does not display the result in the correct character codding.

Warning: Cannot modify header information - headers already sent by (output started at
 ...extensions\PslZendSearchLucene\PslZendSearchLucene_body.php:452) 
MWJames05:46, 24 February 2011

Hi MWJames, it currently supports UTF-8, english and german (positive tested, see main page).

Steviex206:18, 24 February 2011
Edited by author.
Last edit: 15:40, 17 May 2011

@Steviex2: I am testing PslZendSearchLucene-Extension (Version 2.0) with MediaWiki 1.16.4, PHP 5.2.6 and MySQL 5.0.51. The Wiki's content is written in german language, but unfortunately I also receive the following error message (presumably based on coding issues):

Notice: iconv_strlen() [function.iconv-strlen]: Detected an illegal character in input string in {Server-Path}/ZendFramework-1.11.6/library/Zend/Search/Lucene/Search/QueryLexer.php on line 342

Is there any known solution for this problem yet? Grüße!

Agoerlt15:18, 17 May 2011
Edited by 0 users.
Last edit: 15:33, 17 May 2011

Hi,

could you please mention the extension version you are using and the full error message.

Steviex215:33, 17 May 2011

Its Version 2.0 of the extension and except for "{Server-Path}" the full error message (compare my post).

Agoerlt16:01, 17 May 2011
Edited by 2 users.
Last edit: 02:40, 18 May 2011

I' m not really sure- but believe that I read something about it while developing, maybe in conjunction with the used zend framework version or the iconv-configuration in php.ini. There are several german wikis in use with this extension (in production mode)- never heard about it again (sorry ;-)). You also could google this issue like me while developing. I'm sure there is an answer for this. Would be nice to leave a note after fixing.

c u

Steviex216:24, 17 May 2011
 
 
 
 
 

Search in Microsoft Office documents ( doc, docx, xls, xlsx etc. ) and pdf.

When can we have the possibility to search inside document ( all office format and pdf ) ? Any idea of the release date ? Additionnal question : when we upload a new document or make a new article, does the index is automatically up to date or do we need to launch a complete indexation with a job ?

84.37.20.4209:54, 8 March 2011

There is currently no timeline to implement new features mentioned as todo's on the extension main page, as long I don't receive urgent, further comercial development assignments. I will do this for sure, but please consider every single kind of file-extension needs a serious programming job in search engine land. Reindexing is needed by every search engine. You can do this incremental or full. The implementation of this could vary. As mentioned a common way to do this is triggering the indexer script by a cronjob. But theoretically this could be happen after every editing action (may be a little bit crazy).

UPDATE: All this features are realized with the next upcoming release (see main page announcements).

Steviex214:38, 8 March 2011
 

PslZendSearchLuceneIndexer.php, Full update vs. Incremental update?

As for now PslZendSearchLuceneIndexer.php will always initiate a full index update which cost immense system resources and takes an amount of time to be finished. Is their are a way, PslZendSearchLuceneIndexer.php has an incremental update modus, so that updates can be scheduled on regularly (incremental) basis and full updates only on special occasions?

MWJames03:00, 24 February 2011

I noticed in my environment, that incremental update takes more time then a full update, however you can set "private $incrementUpdate" to true.

Steviex206:35, 24 February 2011

There is coming an update with an easy to edit config file for the Lucene Indexer. It also provides an admin-UI for manually reindexing in full and incremental mode and a config var called $wgPslZslAdminUseAutoReIndex, which will cause reindexing on article save events. count on it...coming within the next few days :-).

Steviex204:44, 1 April 2011
 
 

How to install it on a localhost of windows

I am a newer for Mediawiki. I use the Mediawiki on my comuputer (windows operation system, PHP is in the directory C:\xampp\, Wiki is in the directory D:\www\htdocs\, SQL is in the directory D\www\mysql\). It's very good, and I want to find a search engine. I think this extension is very good. During the process of installation, I have some questions as following:

1 In the frist step, I have downloaded the Zend framwork. I put the contents in \htdocs\psl-suche\. Is it right? is there some paths to be added or changed?

2 In the second step, I move the file PslZendSearchLuceneIndexer.php to \htdocs\psl-suche\. How can I "Edit the marked parts of this file as descriped in it." ?

Thanks a lot!

Simonlsw13:43, 16 March 2011
Hi Simonlsw,

1. You can put the ZendFramework every where you want as long you point to it in LocalSettings.php and the directory is accessable by PHP. For first success "\htdocs\psl-suche\" is a good idea, but keep in mind you can have it also above web root or another dirname then "psl-suche".

2. Open the file "PslZendSearchLuceneIndexer.php" in your prefered IDE or simply text editor and follow the instruction. Remember to trigger this file to produce a searchable Lucene Index.

Steviex218:03, 16 March 2011
 

Special:PslZendSearchLucene, the url api and non-display of redirects?

We recognized that while using ($wgSearchType = 'PslZendSearchLucene') as standard search with the cost of a large performance drop in comparison with the standard MW search (same search term with MW search (under 2 sec.) Zend Search over 1 min.) but we would consider Zend Search as additional search option. We found that the Special Page can be used with an url api option. P

{{fullurl:Special:PslZendSearchLucene|query= " search term" &PslSearchMode=2}}

Is their a possibility to have an option that redirects are not shown in the result display, similar to the search options (PslSearchMode=1 or 2 or 3)?

MWJames05:33, 24 February 2011

I wrote this plugin mainly for a customer. There we have server with many different MediaWiki-instances. Every Wiki has ~ 4000 Lucene documents. Till now we recognized no performance issues. And yes there is always a possibility to add more options, it's OOP and Open Source :-). I have some other ToDos first (see main page).

Steviex206:28, 24 February 2011
 

PslZendSearchLuceneIndexer.php and Windows

Under Windows we had to maintain directories with a double slash "\\" otherwise an error message would appear.

$wikisArray[0]['maintenanceScript'] = "...\\maintenance\\dumpBackup.php";
...
private $PhpExecutionStringWindows  = "...\\php\\php.exe";
MWJames02:01, 24 February 2011

Sure, I wrote this under Windows and deployed it on Unix. The examples are obviously mixed :-).

Steviex206:39, 24 February 2011
 
Personal tools
Namespaces

Variants
Actions
Navigation
Support
Download
Development
Communication
Print/export
Toolbox