Page 1 of 1

Caught SIGTERM, shutting down

Posted: Mon Jun 18, 2018 9:19 am
by kd4pyr
Hi Robbie,

I really like the features and ease of use in 1.4.1.

An interesting issue I noticed is that every so often, when using Nagios Core, the page refresh would not work. I would receive a message "Error: Could not read host and service status information!".

When looking at the Nagios Core Alert History log I noticed the message "Caught SIGTERM, shutting down..." is logged every five minutes and during the 15 or so seconds later the message "Nagios 4.3.4 starting... " is logged, then refreshes work. So if a refresh occurs in the 15 second window, you get the error message.

Attached are screen shots documenting the issue.  I know you are busy with other flavors of NEMS, but wanted to document what I was seeing.

Thanks again for a great upgrade.

Rick

RE: Caught SIGTERM, shutting down

Posted: Mon Jun 18, 2018 10:10 am
by Robbie Ferguson
How interesting. This sounds like a bug with the monit config. I will look into it!

When you open Monit from the NEMS Dashboard, can you see Nagios as running?

PS - I'm glad to hear you're enjoying NEMS 1.4.1!! Thanks for the compliments. :)

Robbie

RE: Caught SIGTERM, shutting down

Posted: Mon Jun 18, 2018 10:56 am
by Robbie Ferguson
Bug confirmed. I've connected to @ronjohntaylor 's NEMS server (Thanks Ron for giving me access so I can see a live NEMS server "in the wild"!) and can see that after 5 minutes, Nagios indeed receives a SIGTERM.

I stopped the monit service and watched the logs, and the SIGTERM happened again after 5 minutes, so the problem is not monit.

I'll continue investigating and hope to issue a patch this afternoon.

Great find - thank you!

RE: Caught SIGTERM, shutting down

Posted: Mon Jun 18, 2018 11:41 am
by Robbie Ferguson
Ohmigosh - found it!

It's Migrator running the backup. It's stopping Nagios to backup the configs, and then restarting it!

I'm going to have to think this through :) For now, I'll issue a patch that leaves Nagios running even during the backup, and will do some testing to ensure backup integrity.

Thanks again,
Robbie

RE: Caught SIGTERM, shutting down

Posted: Mon Jun 18, 2018 11:43 am
by Robbie Ferguson
Patch issued. Either run: sudo nems-quickfix
Or wait 24 hours for your system to receive the update.

RE: Caught SIGTERM, shutting down

Posted: Mon Jun 18, 2018 1:59 pm
by kd4pyr
Thanks Robbie.  Your support is nothing short of amazing. I am putting you in for a raise. ;)

Rick

RE: Caught SIGTERM, shutting down

Posted: Mon Jun 18, 2018 3:04 pm
by Robbie Ferguson
Ha - thanks! Just don't base it on percentage okay? :D