copy collector config failed

Found something amiss in NEMS? Confirm first that you are running the latest version, and then post your bug report here.
sbrock
Junior Member
Posts: 5
Joined: Mon Aug 14, 2017 5:55 pm

RE: copy collector config failed

Post by sbrock »

Hi Robbie,

I have re-generated the config and its all working now, thanks for your help
User avatar
Robbie Ferguson
Posting Freak
Posts: 835
Joined: Wed Mar 07, 2012 3:23 pm
Location: Ontario, Canada
Contact:

RE: copy collector config failed

Post by Robbie Ferguson »

That's great news. Thanks sbrock.

I'll begin writing the patch to fix this on all NEMS servers. I'll release the patch this weekend.

Robbie
Robbie Ferguson // The Bald Nerd

Did I help you out? Appreciate what I do? Please consider saying thanks:
CWynter
Junior Member
Posts: 1
Joined: Fri Jul 06, 2018 5:54 am

RE: copy collector config failed

Post by CWynter »

Hi Robbie,

I appear to have a similar problem on 1.4.1 and have applied the sudo nems-quickfix and rebooted.
It did not resolve.

The sequence is that I installed a clean 1.4.1 and then restored the backup from 1.3

NagVis fails at logon with : Unable to connect to the /usr/local/nagios/var/rw/live.sock in backend nagios: Connection refused
NConf - Generate Nagios config > Deploy: 
copy collector config - Failed (Source file does not exist (/tmp/Default_collector/)
PHP copy - OK
copy global config - Failed (sudo /bin/systemctl restart nagios
Job for nagios.service failed because the control process exited with error code.)

$ systemctl status nagios.service
   Loaded: loaded (/etc/init.d/nagios; generated; vendor preset: enabled)
   Active: failed (Result: exit-code) since Fri 2018-07-06 11:50:25 SAST; 9s ago
     Docs: man :s ystemd-sysv-generator(8)
  Process: 8227 ExecStart=/etc/init.d/nagios start (code=exited, status=8)


:D
As of the 21st July 2018: The bug appears to have been resolved.

Downloaded latest build and imaged sdCard.
Ran the nems-init and reboot procedure followed by nems-upgrade > reboot.
Imported 1.3 backup and all working.

Thank you for your efforts is resolving the problem.
Attachments

[The extension txt has been deactivated and can no longer be displayed.]

Last edited by CWynter on Sun Jul 22, 2018 5:45 am, edited 1 time in total.
QACjason
Junior Member
Posts: 9
Joined: Mon May 21, 2018 2:07 pm

RE: copy collector config failed

Post by QACjason »

NEMS version is 1.4.1.  After upgrading I restored my 1.3.1 backup.  The hosts all seem to change to Linux OS and host preset of Linux-server. This also happened adding my 1.1 backup to 1.3 server.

That being said I get the same copy collector config FAILED - source_file FAILED Source file does not exist {/tmp/Default_Collector} when trying to deploy the Nagios Config
Tech Dave
Junior Member
Posts: 11
Joined: Fri Jun 15, 2018 6:11 am

RE: copy collector config failed

Post by Tech Dave »

QACjason wrote: NEMS version is 1.4.1.  After upgrading I restored my 1.3.1 backup.  The hosts all seem to change to Linux OS and host preset of Linux-server. This also happened adding my 1.1 backup to 1.3 server.

That being said I get the same copy collector config FAILED - source_file FAILED Source file does not exist {/tmp/Default_Collector} when trying to deploy the Nagios Config
This is very odd for me as well - about 3/4 weeks ago I did an upgrade from 1.3.1 to 1.4 on a standalone platform - no problems at all, but today when upgrading a different platform from 1.3.1 to 1.4.1 I too get :
source_file FAILED

Source file does not exist (/tmp/Default_collector/)

When trying to deploy what appears to be a successfully generated Nagios Config - no errors until trying to deploy. The restore had failed to update the email address contact from the defaultnagios@localhost, no big issue for the restore - I can live with that, and that's why I had to change it, regenerate and then try to deploy - so there is a bug there for me as well.....

However, all other information is correct. Host details are spot-on and have not changed.

Unfortunately this problem appears to render the upgrade useless, as no deployment changes can be made, so it's a rebuild from scratch for me - fortunately this was simply a monitoring the main NEMS system. In fact I might simply go back to 1.3.1 for a few days until Robbie can cast his wisdom, and then try again......as at least with this build I am not crying if it fails - the NEMS it is monitoring - that is a different story...........
Tech Dave
Junior Member
Posts: 11
Joined: Fri Jun 15, 2018 6:11 am

RE: copy collector config failed

Post by Tech Dave »

OK - So I thought, why not nems-init and, change the default email then try to generate a config. Interestingly it appears to take some time on the first try - presumably generating the required files, and then deploys OK.

So then I tried a restore and then deploy, and Bingo it works fine. So to prove the theory, I will flash to fresh state, then:

1) SST - place decryption key in place
2) Change default email address
3) Generate and Deploy a config, and then;
4) Restore, generate and deploy

It should work and I will report back - hopefully in 15 minutes or so.....

Dave.
Tech Dave
Junior Member
Posts: 11
Joined: Fri Jun 15, 2018 6:11 am

RE: copy collector config failed

Post by Tech Dave »

So I tried this going from 1.3.1 to 1.4.1
 
1)     Flash image
2)     Boot and nems-init
3)     Open SST and place Decryption key (if applicable)
4)     NConf and change default@localhost email address of my admin contact
5)     Open NConf and Generate and Deploy Config
6)     Run a restore of local backup
7)     Generate and Deploy – No errors

But it fails at the deployment of config - as before – so there is something definitely wrong with the process.

So at the moment it is 1.4.1 blank or 1.3.1 with config, but unable to import.

Hoping Robbie can fix this in due course, but going back to 1.3.1 for the moment on the particular device in question.

All the best,
 
Dave
Tech Dave
Junior Member
Posts: 11
Joined: Fri Jun 15, 2018 6:11 am

RE: copy collector config failed

Post by Tech Dave »

OK - So I am bored - NOT, but what the heck, so I thought lets try a fresh install of 1.4 and then restore - Bingo it works fine, without fault, that is:

1) nems-init
2) SST - Decryption Password
3) nConf - change default@localhost email
4) Generate Config
5) Deploy Config
6) SSH and sudo nems-restore
7) Check details - all perfect
8) Generate Config
9) Deploy Config

Bingo and no problems and no errors - so something has definitely found its way into 1.4.1 and is causing a problem with restore, but 1.4 doesn't have it.

So upgrade to 1.4.1 manually (-update and -upgrade), instead of waiting for 24 hours, even though it is already reporting 1.4.1 (nice work Robbie during the init).

Then run generate and deploy and......

IT FAILS - so 1.4.1 doesn't restore properly and with rolling updates everyday, it will fail within 24 hours, without forcing update. BACK TO 1.3.1 - HOPE THAT RESTORES :-/

Hope this is useful to anyone, and of course Robbie.

All the best,

Dave.
Tech Dave
Junior Member
Posts: 11
Joined: Fri Jun 15, 2018 6:11 am

RE: copy collector config failed

Post by Tech Dave »

Just for completeness.....

Back on 1.3.1 - latest patches and restore, generate, deploy works fine.

Will stay here until 1.4.1 is capable of accepting configurations from 1.3.1 Not because it is impossible to move on this unit, but so I have an easy build unit to test with.

Have a good weekend everyone.

Dave.
User avatar
Robbie Ferguson
Posting Freak
Posts: 835
Joined: Wed Mar 07, 2012 3:23 pm
Location: Ontario, Canada
Contact:

RE: copy collector config failed

Post by Robbie Ferguson »

Just a note for those who are not Patrons (as Patrons are being updated on https://patreon.com/nems/ through both text and the Vlog) - I'm making headway on this issue, and I thank each of you for your detailed descriptions of the problem(s).

I will be rolling out the fix very, very soon. Almost there!

THANK YOU for your patience and understanding. This one has really prompted me to modify my development methodology. Once this is fixed and you've all confirmed, I will sigh relief, and then rework my testing and deployment process to avoid ever having something like this happen again.

Thanks all.
Robbie
Robbie Ferguson // The Bald Nerd

Did I help you out? Appreciate what I do? Please consider saying thanks:
Post Reply