Extension talk:SolrStore
Contents
| Thread title | Replies | Last modified |
|---|---|---|
| summary line wrong | 5 | 21:00, 17 May 2013 |
| Multicore support | 3 | 16:55, 26 February 2013 |
| Why Tomcat dependency? | 2 | 23:03, 21 February 2013 |
| Error with undefined method SolrConnectorStore::getConceptCacheStatus | 2 | 09:50, 20 April 2012 |
| a few problems | 2 | 09:49, 20 April 2012 |
| Unexpected XML tag doc/p | 14 | 09:48, 20 April 2012 |
| Probe connectivity to the Solr host | 3 | 07:06, 13 April 2012 |
| Distorted Vector skin while using Special:SolrSearch | 4 | 06:43, 13 April 2012 |
| Error on attribute value & #13; and & lt; | 2 | 14:32, 10 April 2012 |
| Error adding field | 4 | 14:30, 10 April 2012 |
| ERROR: multiple values encountered for non multiValued field | 2 | 14:29, 10 April 2012 |
| Missing Has subobject | 1 | 10:55, 8 April 2012 |
| Article full text vs. property attribution text | 1 | 10:07, 8 April 2012 |
hi,
We're now trying SolrStore with MW 1.19/Solr 4.1/SMW 1.8. It all seems to work, but the summary line is wrong:
Relevance: 27.0% - 2 KB (19 words) - 22:28, 16 May 2013
The search word is in the title, so relevance should be higher? The article is more than 19 words (not sure about KB size), and the date is incorrect since the article was last modified on the 15th. Is there a known fix?
thanks!
Hi David, nice to here that it's working with Solr 4.1 we haven't test it yet. The Relevance is a Bit tricky, because Solr generates a Score for each result based on TF-IDF. Normally you can not convert a TF-IDF score cleanly into a percentage. But the default MediaWiki search form wants a Relevance in percent. We have often relevance values far over the 100% so Please do not take it as accurate.
For the last modified date you have to do 2 things:
- Look at your solr search result xml and find the actual field name of your Modification date. The Problem here is, its based on the language you are using. In an English wiki it should be "Modification date_dt", in German it's "Zuletzt geändert_dt".
- Go into SolrStore/templates/SolrSearchTemplate_Standart.php line 81 and change it from:
if ( $docd[ 'name' ] == 'Zuletzt geändert_dt'){to your language. if it's English:if ( $docd[ 'name' ] == 'Modification date_dt'){
EDIT: I just uploaded a fix for the English language to SourceForge: http://sourceforge.net/projects/smwsolrstore/files/SolrStore_0.8.1.zip/download
If you have any other Problems etc. just ask.
Heiya Simon,
it would be nice to have the commit for the new version also in Gerrit. Thus all the translation update would move into this version, too.
Cheers
Hi,
Thanks for the translation fix, the date is correct now. However, the file size is still incorrect. For example, it shows "212 B (16 words)" for a page that is 752 words, 4534 bytes. As it's different for each entry I presume it's not a translation problem.
How should the "relevance" score be interpreted? Is the sorting correct? I don't want to put something in front of the users that's confusing.
Hi, the Score is a correct tf-idf score, the higher the score the better and the sorting is also correct.
I'll look into this Bytes/Words Problem. I have currently no idea where the problem is, but I'll answer you as soon as possible.
One thing you should know about the extension is, that we currently don't support the search in selected namespaces. You can only search in all namespaces, but you can disable some namespace in your LocalSettings.php with the parameter $wgSolrOmitNS.
The default is:$wgSolrOmitNS = array('102' );
You should hide you advance search options so nobody gets confused. The CSS for that is: .mw-search-formheader div.search-types, #mw-searchoptions{ display: none; }
Thanks very much for your diligence! Let me know if I can help.
What do you think of specifying the name of a solr core that is to be updated or queried?
What do you actually mean?
Defining one url for updating and another for querying?
Or do you just want to add the solr core which should be ask for both ?
cheers,
I realize it would be an extension of SMW, but the thought is to accommodate multiple solr cores. For instance {{#core-ask: |core=name}} and {{#core-set:name|prop=val}}. just a thought! - john
When we started with the extension, we tried to do ask query's with solr. But we had to much trouble re-implementing the result printer. The SolrStore is currently a better version of the Extension:MWSearch. If you have good knowledge in the smw code, you implement this feature. I will help everybody who is interested in developing new features to the extension, just pm me.
Why does SolrStore state a dependency on Tomcat? That's only one of various Servlet Containers supported by Solr itself, including Glassfish, JBoss, Jetty (default, included into Solr package), Resin, Weblogic and WebSphere. thanks
Hi Hypergrove, you are absolutely right, you can use what ever you want.
Cheers,
Sorry hadn't much time to look at it but the following keeps turning up while using concepts but SMW_SQLStore2 defines a method called getConceptCacheStatus somehow this method is not present in the extended SolrConnectorStore SMWStore class.
Fatal error: Call to undefined method SolrConnectorStore::getConceptCacheStatus() in
Thx for reporting the Bug, to fix it you have to add the following code to your SolrConnectorStore.php line 39
/**
* Return status of the concept cache for the given concept as an array
* with key 'status' ('empty': not cached, 'full': cached, 'no': not
* cachable). If status is not 'no', the array also contains keys 'size'
* (query size), 'depth' (query depth), 'features' (query features). If
* status is 'full', the array also contains keys 'date' (timestamp of
* cache), 'count' (number of results in cache).
*
* @param $concept Title or SMWWikiPageValue
*/
public function getConceptCacheStatus( $concept ) {
return self::getBaseStore()->getConceptCacheStatus( $concept );
}
Fixed in SVN now.
great extension! hope to see it 100% soon. Running the 'trunk' version I encountered these:
- fieldset name won't accept spaces
- search form and results are messed up in Vector skin (fixed with /table from below)
- Prompted to 'Create the page [fieldset name] on this wiki!'
3. trying to run refreshdata from semantic mediawiki/maintenance:
Warning: DOMDocument::loadHTML(): Unexpected end tag : p in Entity, line: 8 in /home/vid/webs/atip/docs/mediawiki-1.18.1/extensions/SolrStore/SolrTalker.php on line
258
Catchable fatal error: Argument 1 passed to DOMDocument::saveXML() must be an instance of DOMNode, null given, called in /home/vid/webs/atip/docs/mediawiki-1.18.1/extensions/SolrStore/SolrTalker.php on line 209 and defined in /home/vid/webs/atip/docs/mediawiki-1.18.1/extensions/SolrStore/SolrTalker.php on line 261
Hi David, Thanks for reporting your Problems.
But I'm a bit Confused about some of your Problems.
Contents |
fieldset name won't accept spaces [edit]
This should take spaces have a look http://sofis.gesis.org/sofiswiki/Spezial:SolrSearch/Projekte
This is our Fieldset definition from our wiki, maybe there is an error in your definition ?
$wgSolrFields = array(
new SolrSearchFieldSet('Projekte', 'Titel; Personen; id', 'Titel; Personen und Authoren; SOFIS-Nr. (Erfassungs-Nr.)', ' AND category:Projekte', 'AND'),
new SolrSearchFieldSet('Institutionen', 'name; Inst-ID;ort', 'Name; Institutions-Nr.;Ort', ' AND category:Institution', 'AND')
);
search form and results are messed up in Vector skin (fixed with /table from below) [edit]
this should be fixed now in the SVN, sometimes my friend User:Schuellersa gets a bit confused by his SolrStore Versions and commits the wrong one (This is the 3. time we fix it now :)
Prompted to 'Create the page [fieldset name] on this wiki!' [edit]
This is really new to me, could you please post your $wgSolrFields definition?
trying to run refreshdata from semantic mediawiki/maintenance [edit]
We are working on that, could you try the newest svn version This seems to be the same error Extension_talk:SolrStore#Unexpected_XML_tag_doc
Hi David, the refreshdata error should now be fixed in the newest SVN version, please test it.
During a runJob exercise another error occurred, the backtrace does not say which document caused the error nor which XML tag, anyway please find the backtrace below.
The request sent by the client was syntactically incorrect (unexpected XML tag doc/p).
Backtrace:
#0 D:\xampp\htdocs\...\extensions\SolrStore\SolrTalker.php(209): SolrTalker->solrSend('http://192.168....', '<add><doc><fiel...')
#1 D:\xampp\htdocs\...\extensions\SolrStore\SolrTalker.php(279): SolrTalker->solrAdd('<add><doc><fiel...')
#2 D:\xampp\htdocs\...\extensions\SolrStore\SolrTalker.php(439): SolrTalker->addDoc(Object(SolrDoc))
#3 D:\xampp\htdocs\...\extensions\SolrStore\SolrConnectorStore.php(139): SolrTalker->parseSemanticData(Object(SMWSemanticData))
This looks like the same Error as Error on attribute value & #13; and & lt;
There seems to be a Tag or something in your Value you. I have to Fix that at Tuesday, when I'm back at Work.
If you want to fix it your self, have a look in the SolrDoc.php
In Line 26 is the function addField( $name, $value ), you have to add some String Replaces for the Field Value and remove '<' and '>', this should Probably Fix the errors.
But i have to build a better solution, for cleaning the Values.
As for testing purpose, I just did a quick hack where at least the runJob doesn't break any more.
$value = preg_replace('/<|>/msu', '',$value);
fixed. rev. 114821
Hi James, we should have fixed this error now in the newest SVN version, could you please test it.
When suddenly the Solr host is not available, all article saving goes south. The interface should somehow check if it is able to connect to the Solr host otherwise bail-out.
couldn't connect to host
Backtrace:
#0 D:\xampp\htdocs\...\extensions\SolrStore\SolrTalker.php(211): SolrTalker->solrSend('http://192.168....', '<add><doc><fiel...')
I love you for testing our Extension, we are going to fix this somehow. I could think of retrying to sent it to Solr for 5 times, but after that an error will be thrown.
The Bigger Problem is, that the SMW indexer have to stop until Solr is ready again. I have no idea how to tell him to Stop.
You can allays re-index your wiki by using the "Repair-Button" under Spezial:SMW-Administration, but thats no solution for the Problem.
Actually for the case above, Solr was not available because the server was restarted. Not sure about the inner working of Solr but certainly their must be method to check if Solr is ready to receive index values and in case it is not return true for the hook and marked the document as non-indexed.
Normally for any indexing services, you would have to have a status table on which one can track the current status of those documents, while I'm sure you don't want to introduce any special handling nor create a additional status table you could instead trace the status by creating a meta-subobject (with a special property) which is created and annotated to an entity (page) in case the status returns with anything other than successful. So either one can run a #ask query to find those subobjects or a special status page can pick those, display and allow for a mass re-index because running Special:SWMAdmin is not alwasy the best option (in our case we have around 1.1M triples which makes every Special:SWMAdmin run very costly).
I'll have to think about it, I'll find a nice solution
we have the same problem with re-indexing, it takes us 1-2 days for a Full rebuild. This is why we restart Solr only if we have changed our schema, because after the most schema changes you have to re-index to have all property's indexed the right way.
A Tip beside: Create your own solr schema for your wiki, for better query results. You can add stemmers, tokenizer and many more for your Data types or copyfields, where you can merge two fields into one. The most things are only interesting if you use the field based search.
A distorted vector skin was found to appear while testing SolrStore 0.6 Beta (r114795) and Special:SolrSearch. After the selection of SolrSearch: SearchSet select the whole vector skin / sidebar became repositioned and distorted while the reason might lay in some Special:SolrSearch fieldsets or div's not enclosed and responsible for the repositioning.
Change In the File SolrSpecialSearch.php line 562 to:
$out .= '</table>';
fixed. rev. 114822
Not sure why but with r114866 <table> was introduced again, I had to change it back to </table>.
HI James, my College Schuellersa sometimes gets confused with his different versions of our Extension. Its now the 3. time we have to re-fix this error :-(
Thanks for the quick response and sorry this time we run into a problem involving & #13; and & lt;.
The request sent by the client was syntactically incorrect (Unexpected '<' in attribute value
at [row,col {unknown-source}]: [4,212]).
Backtrace:
#0 D:\xampp\htdocs\...\extensions\SolrStore\SolrTalker.php(209): SolrTalker->so
lrSend('http://192.168....', '<add><doc><fiel...')
#1 D:\xampp\htdocs\...\extensions\SolrStore\SolrTalker.php(279): SolrTalker->so
lrAdd('<add><doc><fiel...')
#2 D:\xampp\htdocs\...\extensions\SolrStore\SolrTalker.php(439): SolrTalker->ad
dDoc(Object(SolrDoc))
#3 D:\xampp\htdocs\...\extensions\SolrStore\SolrConnectorStore.php(139): SolrTa
lker->parseSemanticData(Object(SMWSemanticData))
#4 D:\xampp\htdocs\...\extensions\SemanticMediaWiki\includes\storage\SMW_Store.
php(303): SolrConnectorStore->doDataUpdate(Object(SMWSemanticData))
#5 D:\xampp\htdocs\...\extensions\SemanticMediaWiki\includes\SMW_ParseData.php(
316): SMWStore->updateData(Object(SMWSemanticData))
#6 D:\xampp\htdocs\...\extensions\SemanticMediaWiki\includes\SMW_ParseData.php(
445): SMWParseData::storeData(Object(ParserOutput), Object(Title), true)
#7 [internal function]: SMWParseData::onLinksUpdateConstructed(Object(LinksUpdat
e))
#8 D:\xampp\htdocs\...\includes\Hooks.php(216): call_user_func_array('SMWParseD
ata::o...', Array)
#9 D:\xampp\htdocs\...\includes\GlobalFunctions.php(3631): Hooks::run('LinksUpd
ateCons...', Array)
#10 D:\xampp\htdocs\...\includes\LinksUpdate.php(98): wfRunHooks('LinksUpdateCo
ns...', Array)
#11 D:\xampp\htdocs\...\includes\job\RefreshLinksJob.php(119): LinksUpdate->__c
onstruct(Object(Title), Object(ParserOutput), false)
#12 D:\xampp\htdocs\...\maintenance\runJobs.php(78): RefreshLinksJob2->run()
#13 D:\xampp\htdocs\...\maintenance\doMaintenance.php(105): RunJobs->execute()
#14 D:\xampp\htdocs\...\maintenance\runJobs.php(108): require_once('D:\xampp\ht
docs...')
#15 {main}
Ok, this looks like a bigger Problem.
We sent the Property values as XML to Solr, '<' and '>' would break the xml syntax.
Next Week we will try to fix that error.
fixed rev. 114821
Sorry to drag this but we just had another issue when for an article that have been stored before a particular property (in the case below File size is a special property) wasn't present but when saving the article again this property is annoted and SolrStore causes the following error.
The request sent by the client was syntactically incorrect
(ERROR: [doc=827826dc-2fc4-4615-ba7f-b89ba1f14480.pdf]
Error adding field 'File size_i'='4.699').
Backtrace:
#0 D:\xampp\htdocs\...\extensions\SolrStore\SolrTalker.php(209): SolrTalker->solrSend('http://192.168....', '<add><doc><fiel...')
#1 D:\xampp\htdocs\aris\extensions\SolrStore\SolrTalker.php(279): SolrTalker->solrAdd('<add><doc><fiel...')
#2 D:\xampp\htdocs\...\extensions\SolrStore\SolrTalker.php(439): SolrTalker->addDoc(Object(SolrDoc))
#3 D:\xampp\htdocs\...\extensions\SolrStore\SolrConnectorStore.php(139): SolrTalker->parseSemanticData(Object(SMWSemanticData))
#4 D:\xampp\htdocs\...\extensions\SemanticMediaWiki\includes\storage\SMW_Store.php(303): SolrConnectorStore->doDataUpdate(Object(SMWSemanticData))
#5 D:\xampp\htdocs\...\extensions\SemanticMediaWiki\includes\SMW_ParseData.php(316): SMWStore->updateData(Object(SMWSemanticData))
#6 D:\xampp\htdocs\...\extensions\SemanticMediaWiki\includes\SMW_ParseData.php(445): SMWParseData::storeData(Object(ParserOutput), Object(Title), true)
HI, this is another Schema.xml error.
We have defined all Semanic Fields of the Type Nummber as integer and you have a float value.
You have to change line 513 in your schema.xml from int to float.
<dynamicField name="*_i" type="float" indexed="true" stored="true" multiValued="true"/>
Thanks that did the trick and to complete the settings, one should change below as well.
<dynamicField name="*_imin" type="float" indexed="true" /> <dynamicField name="*_imax" type="float" indexed="true" />
Fixed rev. 114820
While testing SolrStore 0.6 Beta (r114795), the system stopped with an fatal dump and since we couldn't find a SolrStore bugzilla component, we post our findings here.
Our hypothesis is that whenever the property Categories is assigned more than one category value [Pages with broken file links, Book] a dump such as below is created where in cases with no assigned value on the property Categories (category) no error dump was created.
The request sent by the client was syntactically incorrect
(ERROR: [doc=Porter/1986/Competition in Global Industries]
multiple values encountered for non multiValued field category:
[Pages with broken file links, Book]).
Backtrace:
#0 D:\xampp\htdocs\...\extensions\SolrStore\SolrTalker.php(209): SolrTalker->solrSend('http://192.168....', '<add><doc><fiel...')
#1 D:\xampp\htdocs\...\extensions\SolrStore\SolrTalker.php(279): SolrTalker->solrAdd('<add><doc><fiel...')
#2 D:\xampp\htdocs\...\extensions\SolrStore\SolrTalker.php(439): SolrTalker->addDoc(Object(SolrDoc))
#3 D:\xampp\htdocs\...\extensions\SolrStore\SolrConnectorStore.php(139): SolrTalker->parseSemanticData(Object(SMWSemanticData))
#4 D:\xampp\htdocs\...\extensions\SemanticMediaWiki\includes\storage\SMW_Store.php(303): SolrConnectorStore->doDataUpdate(Object(SMWSemanticData))
#5 D:\xampp\htdocs\...\extensions\SemanticMediaWiki\includes\SMW_ParseData.php(316): SMWStore->updateData(Object(SMWSemanticData))
#6 D:\xampp\htdocs\...\extensions\SemanticMediaWiki\includes\SMW_ParseData.php(445): SMWParseData::storeData(Object(ParserOutput), Object(Title), true)
#7 [internal function]: SMWParseData::onLinksUpdateConstructed(Object(LinksUpdate))
#8 D:\xampp\htdocs\...\includes\Hooks.php(216): call_user_func_array('SMWParseData::o...', Array)
#9 D:\xampp\htdocs\...\includes\GlobalFunctions.php(3631): Hooks::run('LinksUpdateCons...', Array)
#10 D:\xampp\htdocs\...\includes\LinksUpdate.php(98): wfRunHooks('LinksUpdateCons...', Array)
#11 D:\xampp\htdocs\...\includes\WikiPage.php(2021): LinksUpdate->__construct(Object(Title), Object(ParserOutput))
#12 D:\xampp\htdocs\...\includes\WikiPage.php(1200): WikiPage->doEditUpdates(Object(Revision), Object(User), Array)
#13 [internal function]: WikiPage->doEdit('{{Book?|title=C...', '', 98)
#14 D:\xampp\htdocs\...\includes\Article.php(1934): call_user_func_array(Array, Array)
#15 D:\xampp\htdocs\...\includes\EditPage.php(1214): Article->__call('doEdit', Array)
#16 D:\xampp\htdocs\...\includes\EditPage.php(1214): Article->doEdit('{{Book?|title=C...', '', 98)
#17 D:\xampp\htdocs\...\includes\EditPage.php(2855): EditPage->internalAttemptSave(Array, false)
#18 D:\xampp\htdocs\...\includes\EditPage.php(478): EditPage->attemptSave()
#19 D:\xampp\htdocs\...\includes\EditPage.php(353): EditPage->edit()
#20 D:\xampp\htdocs\...\includes\Wiki.php(501): EditPage->submit()
#21 D:\xampp\htdocs\...\includes\Wiki.php(241): MediaWiki->performAction(Object(Article))
#22 D:\xampp\htdocs\...\includes\Wiki.php(626): MediaWiki->performRequest()
#23 D:\xampp\htdocs\...\includes\Wiki.php(533): MediaWiki->main()
#24 D:\xampp\htdocs\...\index.php(57): MediaWiki->run()
#25 {main}
Thanks, for Reporting this Bug. I'll have a closer look at this error at Tuesday when I'm back at Work
This Error normally Occurs when Solr trys to insert values into a field that is not defined as multivalued=true.
You have to change in line 448 in your Solr schema.xml to
<field name="category" type="text" indexed="true" stored="true" multiValued="true"/>
Fixed rev. 114820
While using the solr admin interface to inspect the result XML output, I recognized that subobjects haven't been indexed. The result XML shows an empty entry (see below) for all results even though Has subobject have been automatically assigned with values.
<str name="subobjectname"/>
We have mapped all property Types we known, but not all.
You can map them easily in the SolrTalker.php line 340++ there is a switch case for each Property type, but not all are filled with code.
the code would be something like:
$solritem->addField( $propertyName . '_i', $di->getNumber() );
the "_i" is the dynamicbase which defines the data type in the schema.xml. have a look at the upper part of the schema.xml, there are all kinds of data types defined. The mapping of the dynamicbase to the data type is in the lower part (where you changed int to float).
you dont need to add sortfield, we dont actually use them at the moment.
It might seem like a dull question but I'll try to elaborate on it anyway. While SolrStore is quick to index all property related fields (numeric, blob etc), I was wondering how to customize SolrStore so it would index the article text as well? Because when I have looked at the Solr index files, I could see that the article full text have not been indexed at all and glancing at http://www.gesis.org/sofiswiki/ I could see that larger textual information were stored within a property called Inhalt de but how would one setup SolrStore so it would index also the article full text since not all text information can't/shouldn't be stored in a text property.
Cheers
We extent a SMWStore with the class SolrConnectorStore.
It does nothing else than use the default SMWstore and sent each attribute to Solr. To get all the Text to Solr you need it in the SMWstore and I dont know how :-( the easiest way would be adding a semantic Property for it. In our wiki, we use so many templates that we didnt want to get this stuff indexed too, so we build this extension to avoid the Problem. All attributes of a page get merged by solr to a single field called text, this field is used for the "everywhere search".
If you want to add some data like the not index text to an existing solr page, you have to create a new solrDoc with the PageName as ID and just add the fields you want to add to the id. Solr should merge the data for the ID, if I remember it right.