Topic on Extension talk:SolrStore

Unexpected XML tag doc/p

15
MWJames (talkcontribs)

During a runJob exercise another error occurred, the backtrace does not say which document caused the error nor which XML tag, anyway please find the backtrace below.

The request sent by the client was syntactically incorrect (unexpected XML tag doc/p).
Backtrace:
#0 D:\xampp\htdocs\...\extensions\SolrStore\SolrTalker.php(209): SolrTalker->solrSend('http://192.168....', '<add><doc><fiel...')
#1 D:\xampp\htdocs\...\extensions\SolrStore\SolrTalker.php(279): SolrTalker->solrAdd('<add><doc><fiel...')
#2 D:\xampp\htdocs\...\extensions\SolrStore\SolrTalker.php(439): SolrTalker->addDoc(Object(SolrDoc))
#3 D:\xampp\htdocs\...\extensions\SolrStore\SolrConnectorStore.php(139): SolrTalker->parseSemanticData(Object(SMWSemanticData))
SBachenberg (talkcontribs)

This looks like the same Error as Error on attribute value & #13; and & lt;

There seems to be a

Tag or something in your Value you. I have to Fix that at Tuesday, when I'm back at Work.

SBachenberg (talkcontribs)

If you want to fix it your self, have a look in the SolrDoc.php

In Line 26 is the function addField( $name, $value ), you have to add some String Replaces for the Field Value and remove '<' and '>', this should Probably Fix the errors.

But i have to build a better solution, for cleaning the Values.

MWJames (talkcontribs)

As for testing purpose, I just did a quick hack where at least the runJob doesn't break any more.

$value = preg_replace('/<|>/msu', '',$value);
Schuellersa (talkcontribs)
MWJames (talkcontribs)

Instead of using the preg_replace, I now use MW's own XML sanitizer (Sanitizer::normalizeCharReferences) which should make any name/value XML conform.

$this->output .= '<field name="' .  Sanitizer::normalizeCharReferences ( $name ) . '">' . Sanitizer::normalizeCharReferences ( $value ) . '</field>';
SBachenberg (talkcontribs)

This looks really cool. I didnt know that MW has its own Sanitizer, Thank you!

We will add this.

MWJames (talkcontribs)

Having said this, everything should be covered but somehow Solr still comes back with an error which means their must be another area where some misleading XML tags create a crash.

But I have a hypothesis that when a property for example Abstract (has type::text) not only contains text but also a notion of a template ({{value| ...}}) a crash dump is created while trying to save the article. Because when changing {{value| ...}} to [[value:: ...]] in the property value text the same article saves without any trouble.

Schuellersa (talkcontribs)
SBachenberg (talkcontribs)

The Best thing would be to remove all HTML Tags completely before sending them to Solr. I think nobody wants to query html tags, so you dont need them in your Solr index.

Could you try this piece of code for me ?

26 	public function addField( $name, $value ) {
27 	$this->output .= '<field name="' . strip_tags ( $name ) . '">' . strip_tags ( $value ) . '</field>';
28 	}
MWJames (talkcontribs)

For the above cited case of {{ }} within property values the above change didn't bring any success, it still runs into a back trace. Could their be another area where XML fragments are created?

SBachenberg (talkcontribs)

HI James, this is the only place where we create XML before we sent it to Solr.

Maybe you could sent me another backtrace ?

Your Normal way to parse the SMW-Data is:

  1. Read the Attributes and values
  2. add them to a SolrDoc
  3. sent the SolrDoc with the SolrTalker to Solr
  4. Done
MWJames (talkcontribs)

As I said before the backtrace is really ambiguous therefore one can't really tell where, when, and how things are happening.

I also tried to log ($wgDebugLogFile) any other possible messages but the log file does not show any related information to the above problem.

Maybe some wfDebugLog( 'SolrStore', __METHOD__,... log messages could help to shed light on where it comes to problems. Using the this message type would allow to filter all related Solr message using $wgDebugLogGroups = ...

Anyway the last backtrace based on SVN r114866.

The request sent by the client was syntactically incorrect (unexpected XML tag d
oc/span).
Backtrace:
#0 D:\xampp\htdocs\...\extensions\SolrStore\SolrTalker.php(211): SolrTalker->so
lrSend('http://192.168....', '<add><doc><fiel...')
#1 D:\xampp\htdocs\...\extensions\SolrStore\SolrTalker.php(281): SolrTalker->so
lrAdd('<add><doc><fiel...')
#2 D:\xampp\htdocs\...\extensions\SolrStore\SolrTalker.php(441): SolrTalker->ad
dDoc(Object(SolrDoc))
#3 D:\xampp\htdocs\...\extensions\SolrStore\SolrConnectorStore.php(139): SolrTa
lker->parseSemanticData(Object(SMWSemanticData))
#4 D:\xampp\htdocs\...\extensions\SemanticMediaWiki\includes\storage\SMW_Store.
php(303): SolrConnectorStore->doDataUpdate(Object(SMWSemanticData))
#5 D:\xampp\htdocs\...\extensions\SemanticMediaWiki\includes\SMW_ParseData.php(
316): SMWStore->updateData(Object(SMWSemanticData))
#6 D:\xampp\htdocs\...\extensions\SemanticMediaWiki\includes\SMW_ParseData.php(
445): SMWParseData::storeData(Object(ParserOutput), Object(Title), true)
#7 [internal function]: SMWParseData::onLinksUpdateConstructed(Object(LinksUpdat
e))
#8 D:\xampp\htdocs\...\includes\Hooks.php(216): call_user_func_array('SMWParseD
ata::o...', Array)
#9 D:\xampp\htdocs\...\includes\GlobalFunctions.php(3631): Hooks::run('LinksUpd
ateCons...', Array)
#10 D:\xampp\htdocs\...\includes\LinksUpdate.php(98): wfRunHooks('LinksUpdateCo
ns...', Array)
#11 D:\xampp\htdocs\...\includes\job\RefreshLinksJob.php(49): LinksUpdate->__co
nstruct(Object(Title), Object(ParserOutput), false)
#12 D:\xampp\htdocs\...\maintenance\runJobs.php(78): RefreshLinksJob->run()
#13 D:\xampp\htdocs\...\maintenance\doMaintenance.php(105): RunJobs->execute()
#14 D:\xampp\htdocs\...\maintenance\runJobs.php(108): require_once('D:\xampp\ht
docs...')
#15 {main}
SBachenberg (talkcontribs)

Hi James, could you sent me the source code of one of your pages, that makes Problems. My mail is simon.bachenberg(at)gmail.com

I created a new mediawiki now to create this error and i need some good data for it ;-)

For testing purposes you can add $wgSolrDebug = true; to your localsettings.php to see everything that gets sent to solr.

SBachenberg (talkcontribs)

Hi James, we should have fixed this error now in the newest SVN version, could you please test it.

Reply to "Unexpected XML tag doc/p"