×
Back to Geneos FAQ
NetProbe - Netprobe is crashing
Related topics Copied
Netprobe crashing, disconnecting, empty dataviews, Gateway-probedata shows down probes
Problems Copied
- You notice a Netprobe has disconnected from the Gateway and monitored data is missing.
- You have set-up a monitor-of-monitors and it reports a Netprobe is down in the Gateway-Probe dataview.
Possible causes Copied
- Root Cause 1 — The server the Netprobe is running on has crashed or been disconnected from the network.
- Root Cause 2 — The Netprobe has triggered a Memory Protection restart.
- Root Cause 3 — The Netprobe has terminated because of an error condition.
- Root Cause 4 — The Netprobe depends on many external APIs and libraries for a variety of plugins, such as database libraries, middleware, and Java; sometimes if there is a fault in one of these then the Netprobe can also fail
- Root Cause 5 — There is a shortage of resources on the server that means the Netprobe cannot continue to run. Typically this is a disk space issue but will also sometimes be triggered by memory shortages and the Operating System terminating processes based on a selection algorithm that the Netprobe has little control over.
- Root Cause 6 — AV (Antivirus) or SELinux is enforcing policies.
Possible solutions Copied
- Solution to Root Cause 1 — Establish if the server the Netprobe normally runs on is up and accessible. If you have a Gateway Probe plugin configured in your Gateway then you can check the connectionStatus column here. Normally, it would say
Upand any other status reflects a problem.
The full list of values is given in the documentation for the plugin, but here is the list from there:
UnknownUpDownUnreachableRejectedRemovedSuspended
Note
The connection stateUnreachableindicates that the probe is unreachable, but not necessarily the server hosting it. For example, the probe might be unresponsive or its port might be in use. This means that you cannot use this status to check on the state of health of the server itself. If theconnectionStatusisDownorUnreachable, you should check the server health and the process, but the other non-Upstates will hint that there is another issue.
- Solution to Root Cause 2 — If the log file says something like
ERROR: NetProbe Restart Messagethen this is most likely the Netprobe Memory Protection feature protecting your server from a potential sudden increase in memory usage by the Netprobe. The linked document explains more and how to go about monitoring, and if necessary, tuning the settings. - Solution to Root Cause 3 - If you find the server is
OKbut the Netprobe process itself is not running then first check the Netprobe log file. The last entries in the log should give a hint as the the cause of the process termination, which could be a crash but may also be intentional such as a SIGTERM signal from a kill command or, on Windows, the Service being stopped. - Solution to Root Cause 4 - Check the version of the Netprobe (and other Geneos components) to ensure that you are up-to-date with releases and system requirements, we operate in a continuous release cycle with fixes and improvements.
- Solution to Root Cause 5 - Check the server in general to ensure it it healthy, has enough disk, memory, file descriptor, etc.
- Solution to Root Cause 6 - Make sure your Antivirus solution or SELinux have been configured with the correct exclusions or policies to allow the operation of the Netprobe.
Related documentation Copied
["Geneos"]
["Geneos > Netprobe"]
["FAQ"]