Back to OP5 Monitor FAQ

Pollers or peers are disconnected

mon node status shows the status of one or more pollers and/or peers as inactive or disconnected.

Problem Copied

Possible Cause(s) Copied

ist diagnose can quickly diagnose most aspects of the above-mentioned errors.

Possible Solution(s) Copied

Basic troubleshooting Copied

Try a mon restart first on all nodes. If a restart does not fix the issue, proceed with checking other steps.

Ensure that the nodes are able to communicate with each other. Tools such as ssh, ping, or nc can be good to verify if communications can be established.

An example using nc is shown below. Merlin runs on port 15551 by default.

[root@mon9-mas01 ~]# nc -zv mon9-mas02peer 15551
Connection to mc-rocky-mon9-mas02peer (xx.xx.xx.xx) 15551 port [tcp/*] succeeded!

Verify OS and OP5 versions Copied

Clustering in OP5 requires the same OS and OP5 versions. Run this command on all devices, and make sure that all devices are running the same version:

cat /etc/op5-monitor-release

It should give output such as this:

Screenshot from 2021 09 22 22 46 20

If there are differences, please rectify the situation by getting all devices on the same version.

Troubleshooting SSH issues Copied

Check in the /var/log/secure file, and see if there are any errors pertaining to SSH. If there is, run these commands on the server having the issue. This will need to run for each additional server in the cluster. An example:

## mon sshkey push <hostname1>
## asmonitor mon sshkey push <hostname1>
## mon sshkey push <hostname2>
## asmonitor mon sshkey push <hostname2>

This pushes all SSH keys over to the other servers in the cluster. OP5 uses password-less SSH connections for some communications, so we need to make sure all the SSH keys are moved everywhere.

Check the Merlin log file /var/log/op5/merlin/neb.log as well. In some instances, you may see errors like below:

[1676376045] 4: stdout: Offending RSA key in /opt/monitor/.ssh/known_hosts:1
[1676376045] 4: stdout: RSA host key for monitor1_peer has changed and you have requested strict checking.
[1676376045] 4: stdout: Host key verification failed.

For scenarios where an IP address or hostname has changed, you will need to first remove the known_hosts entry of the affected node before running the mon sshkey push commands. On all affected nodes, run the command below to remove the ssh key for the monitor user:

runuser -l monitor -c 'ssh-keygen -R hostname'

After removing the known_hosts entry and re-runnign mon sshkey push, restart Merlin on all nodes:

sysetmctl restart merlind

and then observethe status via mon node status.

Verify Naemon configuration Copied

["Geneos"] ["FAQ"]

Was this topic helpful?