Topic on Project:Support desk

Intermittent Stoppages/Stalls, Unable to Continue

18
Summary by GMShimokura

Thank you Bawolff for all the information and support.

I will eventually be forced to update either my installation or hardware (or both) so for now I will keep this in mind.

Gregg

GMShimokura (talkcontribs)

Mediawiki 1.33.1 on a late 2012 Mac Mini Server running Yosemite (10.10.5) using Server Internal Apache Version 2.4.16, PHP 7.2.21, MySQL 5.6.22

I have been successfully running for 4 months now as per the above. All is fine, except for regular but unpredictable behaviour of my server becoming unresponsive, requiring a hard reboot of the server to get access once again.


Symptoms at failure point:

- Browser becomes unresponsive and progress bar stalls

- Closing/reopening browser does not seem to help

- Same behaviour from all access points (tried phone, or another computer) so it appears server related

- Other web installations on the same server such as Piwigo, Webtrees and Avideo continue to work fine


FYI - I am the only user of the server (it is private) so the server load is low.


Debug attempts so far:

- Enabled as much logging as I could understand, but logs do not show any entries

- Checked: error_log, access_log, php_errors.log

- Have experimented with Caching scheme, the following seems to be the most stable:

$wgMainCacheType = CACHE_DB;

$wgSessionsInObjectCache = true;

$wgSessionCacheType = CACHE_DB;

#$wgObjectCacheSessionExpiry = 3600; (so disabled)

$wgParserCacheType = CACHE_DB;

- Tried Chrome/Firefox in addition to Safari with same results

- Tried emptying Browser Cache with no effect

- Tried php maintenance/update.php before reboot with no effect


Clue? Hunch?:

- Previously I had $wgObjectCacheSessionExpiry enabled and the problem would occur at expiry time if the server stayed responsive long enough to reach the Expiry time

Could it be related?


Questions:

  1. Are there any specific debug actions I could take to collect relevant information?
  2. I don't fully understand the different caching schemes, so I would guess that what I have configured is not optimal. Any recommendations?
  3. It seems relevant that my other web installations work fine without interruptions, Avideo is PHP based so it seems there is something specific to Mediawiki that is problematic with my setup and I am reluctant to change my PHP installation unless necessary. I could provide any/all LocalSettings.php values as needed. Please suggest any that could help.
  4. Is there a way to reboot/restart Mediawiki without a hard reset/power cycling of my server?


Thank you,

Gregg

Bawolff (talkcontribs)

Its very hard to know without more details about what MediaWiki is getting stuck doing. Some possibilities is it might be getting stuck doing some sort of recache event (like for localisation cache) that never finishes so it tries to redo it on every request. Or similarly some job that just never finishes and gets retried on every request. Its somewhat odd that rebooting the server would actually fix but just waiting a while does not, since the PHP part of MediaWiki is mostly stateless. Are you using the same webserver for all apps?

It might be a little bit of a pain to setup, but if you could setup profiling - Manual:Profiling that might be helpful. As a first step though, if next time this happens you could run top on your server to see what process if any is using CPU. Setting $wgDebugLogFile to a file would also be helpful to get debug logging (Its important that wherever you set it is somewhere that apache has permission to write to)

So caching advice generally:

  • Do you have php-apcu installed? If so, CACHE_ACCEL would probably be a better choice for $wgMainCacheType
  • Some people say that CACHE_NONE is better for $wgMainCacheType than CACHE_DB, if CACHE_DB is all you have available.
  • Try to set $wgCacheDirectory to some location that the webserver can write to (preferably not web accessible). This may be more efficient for localization cache.
GMShimokura (talkcontribs)

Thank for the detailed reply. You have given me some things to try.

Re: Are you using the same webserver for all apps?

Ans: Yes - all using same Apache service

Re: Setting $wgDebugLogFile to a file

Ans: Already done, forgot to mention. Nothing recorded on failure..

Here is my plan based on your reply:

Trial 1:

- I will set the $wgCacheDirectory

- For next failure I will wait much longer to see if it fixes itself or an error message appears. If after 10 mins nothing happens I will assume that it can't fix itself - unless you think 10 mins is too short?

If something does appear I will update this issue with what I have found.

- I will check for renegade processes using top during the waiting period

Trial 2:

- I do have php-acpu install so I will try $wgMainCacheType = CACHE_ACCEL I think I did try this before but will try it again.

Trial 3:

- I will endeavour to setup Profiling and hope to provide some meaningful output..


Thanks again,

Gregg


GMShimokura (talkcontribs)

Here is a data point from Trial 1:

- No change in the log files - Nothing logged during 10-15 mins of waiting

Snapshot of top

Starting Trial #2:

ie. Changed $wgMainCacheType = CACHE_ACCEL


GMShimokura (talkcontribs)

Here is a second data point from Trial #2 with CACHE_ACCEL

- No change in the log files - Nothing logged during 10-15 mins of waiting

Observation: Anecdotally I would say on my server CACHE_ACCEL also accelerates the time to failure..

Attached is another top output.

Another Observation: There seems to be an active connection trying to be maintained between the browser and the server during the 'stalling' period. When I reboot the server, the browser then comes back with a 'lost connection' type of message where it says it can no longer reach the server while the power down is happening. When the server come back online, the message disappears and the browser goes back to trying to reconnect and do something as it was doing before the reboot.


Question: Is there a way to soft reboot Mediawiki so that I don't have to power cycle my server?


I will now try to enable Profiling but this will take some time.

Gregg

top output


GMShimokura (talkcontribs)

Hi again.

Trial #3 Results:

I have failed to be able to install XHProf and/or Tideways. I get compiler errors when running make.

I have also reverted to CACHE_DB for the MainCacheType as it is more reliable (with CACHE_ACCEL I get the stalling in less than an hour).

Gregg

Ciencia Al Poder (talkcontribs)

You can restart apache instead of reboot the computer.

The TOP displays a load of 2.06. However, the most CPU intensive process is top itself with 2.01%... And I don't see the I/O wait indicator in that top command (which is very useful to detect contention problems)

You seem to be using php as mod_php in apache. You may get some insights in apache subprocesses looking at this https://www.tecmint.com/check-apache-httpd-status-and-uptime-in-linux/

Bawolff (talkcontribs)

I don't really know, but it sounds like this is maybe more an issue with apache or php, and not with MediaWiki.


To confirm, its loading forever (like the webserver keeps the connection open)? Its not just loading a blank page, but stalled trying to load something? It might be interesting to know what the network tab of the web browser developer console says, if it says that loading is stalled at a specific step (not that I would know what to do with that information)

GMShimokura (talkcontribs)

Hi.

I will endeavour to read up on mod_php. Is this the recommended method vs the alternatives?

I confirm the browser is loading forever (showing nothing) and it seems the browser/server connection is trying to be (re)established.

It is not loading a blank page. The progress bar/circle runs but does not progress.

I find it interesting that my other applications using the same Apache service all run great without interruption.

Gregg

Bawolff (talkcontribs)

mod_php and fastcgi or the two most popular methods for running php. Both are reasonable choices, and the differences are small enough to probably not make any difference in practice.

GMShimokura (talkcontribs)

Thanks.

Other than restarting Apache, is there some way to reset Mediawiki so that I don't affect all server websites?

Gregg

Ciencia Al Poder (talkcontribs)

mod_php causes it to run inside the webserver worker processes. There's no way to reset "only" one application.

GMShimokura (talkcontribs)

Hi, still trying to debug this problem.

Recently I noticed a stoppage instead of stalling and getting some output in the browser and in the logs. From Apache error_log:

[Thu Feb 20 09:27:52.552990 2020] [php7:notice] [pid 70816] [client 192.168.1.1:52945] PHP Notice:  unserialize(): Error at offset 492 of 1505 bytes in .../mediawiki/1.33.1/includes/libs/objectcache/APCUBagOStuff.php on line 111, referer: .../mediawiki/1.33.1/index.php/Main_Page

[Thu Feb 20 09:27:52.553115 2020] [php7:notice] [pid 70816] [client 192.168.1.1:52945] PHP Stack trace:, referer: .../mediawiki/1.33.1/index.php/Main_Page

[Thu Feb 20 09:27:52.553131 2020] [php7:notice] [pid 70816] [client 192.168.1.1:52945] PHP   1. {main}() .../mediawiki/1.33.1/load.php:0, referer: .../mediawiki/1.33.1/index.php/Main_Page

[Thu Feb 20 09:27:52.553145 2020] [php7:notice] [pid 70816] [client 192.168.1.1:52945] PHP   2. require() .../mediawiki/1.33.1/load.php:31, referer: .../mediawiki/1.33.1/index.php/Main_Page

[Thu Feb 20 09:27:52.553157 2020] [php7:notice] [pid 70816] [client 192.168.1.1:52945] PHP   3. require_once() .../mediawiki/1.33.1/includes/WebStart.php:77, referer: .../mediawiki/1.33.1/index.php/Main_Page

[Thu Feb 20 09:27:52.553170 2020] [php7:notice] [pid 70816] [client 192.168.1.1:52945] PHP   4. ExtensionRegistry->loadFromQueue() .../mediawiki/1.33.1/includes/Setup.php:127, referer: .../mediawiki/1.33.1/index.php/Main_Page

[Thu Feb 20 09:27:52.553183 2020] [php7:notice] [pid 70816] [client 192.168.1.1:52945] PHP   5. ExtensionRegistry->exportExtractedData() .../mediawiki/1.33.1/includes/registration/ExtensionRegistry.php:173, referer: .../mediawiki/1.33.1/index.php/Main_Page

[Thu Feb 20 09:27:52.553196 2020] [php7:notice] [pid 70816] [client 192.168.1.1:52945] PHP   6. Skins\\Chameleon\\Chameleon::init() .../mediawiki/1.33.1/includes/registration/ExtensionRegistry.php:407, referer: .../mediawiki/1.33.1/index.php/Main_Page

[Thu Feb 20 09:27:52.553208 2020] [php7:notice] [pid 70816] [client 192.168.1.1:52945] PHP   7. ExtensionRegistryHelper\\ExtensionRegistryHelper->loadExtensionRecursive() .../mediawiki/1.33.1/skins/chameleon/src/Chameleon.php:62, referer: .../mediawiki/1.33.1/index.php/Main_Page

[Thu Feb 20 09:27:52.553222 2020] [php7:notice] [pid 70816] [client 192.168.1.1:52945] PHP   8. ExtensionRegistryHelper\\ExtensionRegistryHelper->loadModuleRecursive() .../mediawiki/1.33.1/vendor/mediawiki/mw-extension-registry-helper/src/ExtensionRegistryHelper.php:53, referer: .../mediawiki/1.33.1/index.php/Main_Page

[Thu Feb 20 09:27:52.553235 2020] [php7:notice] [pid 70816] [client 192.168.1.1:52945] PHP   9. ExtensionRegistry->loadFromQueue() .../mediawiki/1.33.1/vendor/mediawiki/mw-extension-registry-helper/src/ExtensionRegistryHelper.php:81, referer: .../mediawiki/1.33.1/index.php/Main_Page

[Thu Feb 20 09:27:52.553263 2020] [php7:notice] [pid 70816] [client 192.168.1.1:52945] PHP  10. APCUBagOStuff->get() .../mediawiki/1.33.1/includes/registration/ExtensionRegistry.php:168, referer: .../mediawiki/1.33.1/index.php/Main_Page

[Thu Feb 20 09:27:52.553277 2020] [php7:notice] [pid 70816] [client 192.168.1.1:52945] PHP  11. APCUBagOStuff->doGet() .../mediawiki/1.33.1/includes/libs/objectcache/BagOStuff.php:183, referer: .../mediawiki/1.33.1/index.php/Main_Page

[Thu Feb 20 09:27:52.553289 2020] [php7:notice] [pid 70816] [client 192.168.1.1:52945] PHP  12. APCUBagOStuff->unserialize() .../mediawiki/1.33.1/includes/libs/objectcache/APCUBagOStuff.php:48, referer: .../mediawiki/1.33.1/index.php/Main_Page

[Thu Feb 20 09:27:52.553301 2020] [php7:notice] [pid 70816] [client 192.168.1.1:52945] PHP  13. unserialize() .../mediawiki/1.33.1/includes/libs/objectcache/APCUBagOStuff.php:111, referer: .../mediawiki/1.33.1/index.php/Main_Page

[Thu Feb 20 09:27:53.510827 2020] [core:notice] [pid 294] AH00052: child pid 70816 exit signal Segmentation fault (11)

[Thu Feb 20 09:27:53.510965 2020] [core:notice] [pid 294] AH00052: child pid 70778 exit signal Segmentation fault (11)

[Thu Feb 20 09:27:54.513008 2020] [core:notice] [pid 294] AH00052: child pid 70779 exit signal Segmentation fault (11)

During this messaging I do get in the Apache access_log:

"OPTIONS * HTTP/1.0" 200 -

A couple more notes:

- I am not using CACHE_ACCEL, still using CACHE_DB

- Other applications on the same Apache server continue to run fine.

- I have edited out my IP and local root directory from the above

Is there somewhere a recommended Apache setup that is known to work well with Mediawiki?

Browser Message
Bawolff (talkcontribs)

well the php segfaulting is definitely related.

For the unserialize error - i guess there is something corrupt in the cache. If that is somehow related to this could explain why restarting apache fixes it since i think that would cause a cache purge.

Try setting $wgMainCacheType = CACHE_NONE; to see if that fixes it (this will slow down things, but better a slow wiki than a dead wiki) [the other cache type variables should be fine as default, just comment them out in LocalSettings.php if they are specified. The important thing for this test is that none of them are CACHE_ACCEL since we want to check if apcu is the culprit)


Fwiw, this sounds more like an issue with your version of php (or possibly version of some php extension) than apache

Bawolff (talkcontribs)

Sorry, i just saw you said you weren't using CACHE_ACCEL. I also just remembered that MediaWiki will still use apcu in certain circumstances (for data in a multi-server setup that you want per server).

Anyways, can you also try adding the following to LocalSettings.php

$wgObjectCaches['apcu'] = [ 'class' => EmptyBagOStuff::class, 'reportDupes' => false ];
$wgObjectCaches['apc'] = [ 'class' => EmptyBagOStuff::class, 'reportDupes' => false ];
GMShimokura (talkcontribs)

Thanks I have this set for now:

$wgSessionsInObjectCache = true;

$wgSessionCacheType = CACHE_DB;

$wgMainCacheType = CACHE_DB;

$wgParserCacheType = CACHE_DB;

$wgObjectCaches['apcu'] = [ 'class' => EmptyBagOStuff::class, 'reportDupes' => false ];

$wgObjectCaches['apc'] = [ 'class' => EmptyBagOStuff::class, 'reportDupes' => false ];

Will report if things change.


GMShimokura (talkcontribs)

Hi Bawolff,

Well it has been 4 days without any stalls so I do believe we have found the root cause using the above configuration. Before closing this issue should I be trying to debug using CACHE_ACCEL any further?

For example:


Q1: In the long run do I need CACHE_ACCEL or in other words under what condition would CACHE_ACCEL be of significant benefit?


Q2: How would I determine if it is in fact my version of PHP (currently 7.2.21) that is at fault and more importantly which version of PHP that I should change to?


Q3: How would I determine if it is in fact a specific extension of PHP that is at fault and more importantly which version of the extension that I should change to?


Thank you very much for your patience and expertise,

Gregg

Bawolff (talkcontribs)

I would suspect it is your php (it might not even be the version, it could be a broken compile of it or something). It might be the apcu extension, i don't know. To debug further would require more specialist knowledge of php than i have. I guess you could ask in their forms.


Having working apcu will make mediawiki faster but is not critical. You can also potentially use memcached instead for a similar purpose.