Hi, I have same problem that was already posted in talks and I have tried every possible solution that was presented here:
- CirrusSearch Only Partially Indexing
- ForceSearchIndex.php isn't populating ES
- Some pages index and other do not in CirrusSearch
I have tried Kibana and eventualy used Elasticsearch Head Chrome extension to find out state of indexes in elasticsearch. I have used php Saneitize.php to find that pages are not indexed but it was obvious because of 150 pages only 15 pages are indexed.
I did noticed something interesting for this problem analysis but in the end I wasn't able to catch what is happening. After standard set od statements for populating elastic with mediawiki pages
Step 0. -> $wgDisableSearchUpdate = true
Step 1. -> php UpdateSearchIndexConfig.php
Step 2. -> #$wgDisableSearchUpdate = true
Step 3. -> php ForceSearchIndex.php --skipLinks --indexOnSkip
Step 4. -> php ForceSearchIndex.php --skipParse
and restarting elasticsearch service, after some time, few (2-4, usually less than 10) pages would become additionaly indexed.
So, initialy zero, than after some steps with restarting 4 pages, than after some steps 6 pages, than after some steps 15, etc. and than I would not be able to repeat. Strange.
In the end I was not able to find pattern or if it is some elastic memory cache sharding problem or some error in cirrus sending pages for indexing. I was not able to catch any php error and pages were randomly choosen for indexing.
Any suggestion? Tnx