Jump to content

Topic on Extension talk:CirrusSearch

213.61.173.172 (talkcontribs)

I have one large page with a larger wikitable (12 column, 3300 rows).

This page is the only large page and the only one not indexed, because of a "TypeError" in the HtmlFormatter:

api.php?action=query&format=json&prop=cirrusbuilddoc&pageids=1231&formatversion=2

{

    "error": {

        "code": "internal_api_error_TypeError",

        "info": "[8c19427845d7c9abfb9f5240] Exception caught: HtmlFormatter\\HtmlFormatter::onHtmlReady(): Argument #1 ($html) must be of type string, null given, called in /var/www/w/vendor/wikimedia/html-formatter/src/HtmlFormatter.php on line 314",

        "errorclass": "TypeError",

        "trace": "TypeError at /var/www/w/vendor/wikimedia/html-formatter/src/HtmlFormatter.php(90)\nfrom /var/www/w/vendor/wikimedia/html-formatter/src/HtmlFormatter.php(90)\n#0 /var/www/w/vendor/wikimedia/html-formatter/src/HtmlFormatter.php(314): HtmlFormatter\\HtmlFormatter->onHtmlReady()\n#1 /var/www/w/includes/content/WikiTextStructure.php(179): HtmlFormatter\\HtmlFormatter->getText()\n#2 /var/www/w/includes/content/WikiTextStructure.php(221): WikiTextStructure->extractWikitextParts()\n#3 /var/www/w/includes/content/WikitextContentHandler.php(167): WikiTextStructure->getOpeningText()\n#4 /var/www/w/extensions/CirrusSearch/includes/BuildDocument/ParserOutputPageProperties.php(95): WikitextContentHandler->getDataForSearchIndex()\n#5 /var/www/w/extensions/CirrusSearch/includes/BuildDocument/ParserOutputPageProperties.php(70): CirrusSearch\\BuildDocument\\ParserOutputPageProperties->finalizeReal()\n#6 /var/www/w/extensions/CirrusSearch/includes/BuildDocument/BuildDocument.php(172): CirrusSearch\\BuildDocument\\ParserOutputPageProperties->finalize()\n#7 /var/www/w/extensions/CirrusSearch/includes/Api/QueryBuildDocument.php(58): CirrusSearch\\BuildDocument\\BuildDocument->finalize()\n#8 /var/www/w/includes/api/ApiQuery.php(671): CirrusSearch\\Api\\QueryBuildDocument->execute()\n#9 /var/www/w/includes/api/ApiMain.php(1904): ApiQuery->execute()\n#10 /var/www/w/includes/api/ApiMain.php(879): ApiMain->executeAction()\n#11 /var/www/w/includes/api/ApiMain.php(850): ApiMain->executeActionWithErrorHandling()\n#12 /var/www/w/api.php(90): ApiMain->execute()\n#13 /var/www/w/api.php(45): wfApiMain()\n#14 {main}"

    }

}


I tried and found, that when the Page length (in bytes) is approx < 793.845 Byte, it is working without error. When going > 793.978 Byte I get the TypeError.

I think the page length is only for content, therefore the limit seems to be 1MB for the whole html page.

$wgMaxArticleSize or $wgAPIMaxResultSize is not solving the issue.

I looked into the settings of php, jvm, mediawiki and nginx but did not found a solution.


Is there any settings to extend the limit?

DCausse (WMF) (talkcontribs)
213.61.173.172 (talkcontribs)

thank you very much. This was exact the issue.

I updated the HtmlFormatter.php in my v1.39.7 installation manually with the new version: https://gerrit.wikimedia.org/r/c/HtmlFormatter/+/997959/2/src/HtmlFormatter.php#b306

updated the index with:

php /var/www/w/extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --reindexAndRemoveOk --indexIdentifier now

php /var/www/w/extensions/CirrusSearch/maintenance/ForceSearchIndex.php

Now everything is fine.

Reply to "page not indexed"