Topic on Extension talk:CirrusSearch

Jump to navigation Jump to search

Only minimal indexing, most pages are not indexed, almost as ForceSearchIndex.php isn't populating

Matija.pu (talkcontribs)

Hi, I have same problem that was already posted in talks and I have tried every possible solution that was presented here:

Product Version
MediaWiki 1.36.0
PHP 7.3.19 (cgi-fcgi)
MariaDB 10.5.11-MariaDB
ICU 64.2
Elasticsearch 6.5.4
CirrusSearch 6.5.4 (264629b)
Elastica 6.1.3 (9f6e66a)

I have tried Kibana and eventualy used Elasticsearch Head Chrome extension to find out state of indexes in elasticsearch. I have used php Saneitize.php to find that pages are not indexed but it was obvious because of 150 pages only 15 pages are indexed.

I did noticed something interesting for this problem analysis but in the end I wasn't able to catch what is happening. After standard set od statements for populating elastic with mediawiki pages

Step 0. -> $wgDisableSearchUpdate = true

Step 1. -> php UpdateSearchIndexConfig.php

Step 2. -> #$wgDisableSearchUpdate = true

Step 3. -> php ForceSearchIndex.php --skipLinks --indexOnSkip

Step 4. -> php ForceSearchIndex.php --skipParse

and restarting elasticsearch service, after some time, few (2-4, usually less than 10) pages would become additionaly indexed.

So, initialy zero, than after some steps with restarting 4 pages, than after some steps 6 pages, than after some steps 15, etc. and than I would not be able to repeat. Strange.

In the end I was not able to find pattern or if it is some elastic memory cache sharding problem or some error in cirrus sending pages for indexing. I was not able to catch any php error and pages were randomly choosen for indexing.

Any suggestion? Tnx

DCausse (WMF) (talkcontribs)

I would suggest to check the elasticsearch logs but restarting elastic should not have any impact on the number of indexed pages (elasticsearch has no cache that could explain what you see, only refresh_interval of the index set to a high value could explain this but it's set to a low value by default by CirrusSearch).

The behavior you describe suggests that it is a JobQueue issue. Please see Manual:Job_queue and check that it is properly setup.

Matija.pu (talkcontribs)

Yes! This was JobQueue issue. With php runJobs.php I did get for every page and file to become indexed in elasticsearch.