The end of life (EOL) date for this module is on 31 January, 2020.
Self-Monitoring
Description
Self-monitoring allows you to obtain information about the state of the cluster such as node membership, gateway connectivity, client connections and more.
Prerequisites
To use self-monitoring you will require:
- A Gateway capable of running Compute Engine rules
- A Netprobe capable of running the API plugin
For more information about configuring these components, see the Gateway 2 Reference Guide.
Quick start
Firstly you will need to nominate a netprobe for the self-monitoring. In this example it will be on localhost:7036 called ‘Cluster Monitoring Probe’.
Make sure the probe is specified in the gateway setup file:
... <probes> ... <probe name="Cluster Monitoring Probe"> <hostname>localhost</hostname> <port>7036</port> </probe> ... </probes> ...
Now add an entity pointed at the nominated probe, with a sampler called Cluster Monitoring Sampler (we will add the sampler later):
... <managedEntities> ... <managedEntity name="Cluster Monitoring Entity"> <probe ref="Cluster Monitoring Probe"/> <sampler ref="Cluster Monitoring Sampler"/> </managedEntity> ... </managedEntities> ...
For your convenience, we include a gateway setup include file with a sampler called ‘Cluster Monitoring Sampler’ and some rules. This can be found in the node directory: resources/self-monitoring-gateway-include.xml
(see Include File).
Warning
The included gateway file is intended as a guide. The contents should be changed to suit the needs of each installation.
To use it, add it as an include file. Assuming that the node is installed in /opt/openaccess-node/latest:
Note: The priority must have a higher value than the main setup file.
... <includes> <priority>1</priority> <!-- Main setup priority--> ... <include> <priority>2</priority> <!-- Make sure this is higher than the main setup priority --> <required>true</required> <location>/opt/openaccess-node/latest/resources/self-monitoring-gateway-include.xml</location> </include> ... </includes> ...
Finally, in config/settings.conf edit the monitoring section, change enabled to ‘true’ and set the host and port to your nominated netprobe:
monitoring { enabled = true metrics { ... host = "localhost" port = 7036 ... ...
For the full configuration see Configuration.
If everything works you should see a set of dataviews for each node in the cluster:
Configuration
- Minimum configuration that is required for monitoring to work:
- IP address of a Netprobe that is configured with a Netprobe API sampler
- Port address of the Netprobe
- Name of the Netprobe API sampler
- Name of the Managed Entity containing the Netprobe API sampler
Monitoring configuration example (settings.conf):
... monitoring { enabled = false metrics { #these settings can be overwritten for any metric type if necessary host = "localhost" port = 7036 # Set this to true if connecting to a Secure Netprobe, default is false secure = false sampler = "Cluster Monitoring" entity = "OA Cluster" cluster-nodes { enabled = true view = clusterNodes } node-gateways { enabled = true view = gateways } node-clients { enabled = true view = clients } ############################################################################################### # queries and orb queue stats are useful for diagnostics only, enable on a 'need to have' basis ############################################################################################### cluster-queries { enabled = false view = clusterQueries } node-orb-queue-stats { enabled = false view = queues } } } ...
Note: By default all metrics are displayed under a single sampler, but it is possible to configure a dedicated sampler or even Netprobe for each metric type. In order to achieve this host, port, sample and entity configuration values can be overridden for each metric type.
Include File
Warning
The included gateway file is intended as a guide. The contents should be changed to suit the needs of each installation.
The gateway include file can be found in resources/self-monitoring-gateway-include.xml
. It contains:
- A Sampler called Cluster Monitoring Sampler`
- The sampler contains a computed view called
Diagnostics
. This is used by the compute engine rules to check the state of the cluster.
In addition to this the include file has several rules:
Rule name View Description Cluster Nodes Status Cluster Nodes Checks the memeber status of each node Cluster Nodes CPU Utilisation Cluster Nodes Checks the overall CPU utilsation on the node box. Adjust the values as required Cluster Nodes Memory Usage Cluster Nodes Checks the Used JVM heap memory. Adjust the values as required Cluster Connected Gateway Status Assigned gateways Checks the connection status of node to its assigned gateways Cluster Integrity Cluster Nodes Checks that the number of nodes is the same as the number of cluster node dataviews Stale cluster monitoring views All Checks the last update time of each view. Adjust the times as required Cluster integrity - number of nodes NA Keeps track of the number of expected nodes
Monitoring Views
Cluster Nodes
This view is obtained from the cluster-nodes
metric. Each node lists nodes that are currently part of the cluster with some resource usage information and the configured gateways count.
Normally each node will publish the same information in this view, but the if cluster disintegrates each node can have different views of the cluster.
Such situations should be detected by the self-monitoring rules provided. See Include File for details on where to find default rules.
Example view:
+---------------------+----------------+------+-----------------+------------+-----------+--------------------+--------+ | nodes | host | port | machineCpuUsage | usedMemory | maxMemory | gatewaysConfigured | status | +=====================+================+======+=================+============+===========+====================+========+ | 192-168-10-146-2551 | 192.168.10.146 | 2551 | 0.36% | 4.47% | 981 MB | 1 | Up | +---------------------+----------------+------+-----------------+------------+-----------+--------------------+--------+ | 192-168-10-146-2552 | 192.168.10.146 | 2552 | 2.62% | 2.07% | 800 MB | 0 | Up | +---------------------+----------------+------+-----------------+------------+-----------+--------------------+--------+
Fields description:
Column name Description nodes Unique row name (hostname + port) host Node hostname port Node port machineCpuUsage CPU utilisation percentage, sum of User + Sys + Nice + Wait, weights based on remaining cpu capacity: 1 - utilisation usedMemory Used JVM heap memory maxMemory Maximum JVM heap memory gatewaysConfigured Number of gateways from settings.conf to which this node will try to connect status Node status in the cluster (e.g. Joining, Up, Down, Removed)
Connected Clients
This view is obtained from the node-clients
metric. Since Open Access clients can connect to any node in the cluster there is a separate view for each cluster node.
Example view:
+----------------------------------------------+----------------+-------+------------------+------+---------------------+---------+-----------+ | uniqName | host | port | status | user | client | version | roles | +==============================================+================+=======+==================+======+=====================+=========+===========+ | aclite-1-1-192-168-220-81-58724 | 192.168.220.81 | 58724 | Logged in | user | aclite | 1-1 | ROLE_USER | +----------------------------------------------+----------------+-------+------------------+------+---------------------+---------+-----------+ | OpenAccessApiClient-1-1-192-168-10-146-40741 | 192.168.10.146 | 40741 | Authenticating | user | OpenAccessApiClient | 1-1 | | +----------------------------------------------+----------------+-------+------------------+------+---------------------+---------+-----------+ | OpenAccessApiClient-1-1-192-168-10-146-56178 | 192.168.10.146 | 56178 | Unable to log in | | | | | +----------------------------------------------+----------------+-------+------------------+------+---------------------+---------+-----------+
Fields description:
Column name | Description |
---|---|
uniqName | Unique name of connected client (format: client type + version + hostname + port) |
host | Client hostname |
port | Client port |
status | Authentication status |
user | Client username (a username will be displayed even if authentication is disabled but a username was specified when connecting to the node) |
client | Client type |
version | Version of Open Access client library |
roles | Roles that are configured for the logged in user (only displayed when authentication is enabled) |
Assigned gateways
Note: The status field changed in 2.0, and the state field was added.
This view is obtained from the node-gateways
metric. Example view:
+---------------------+--------------+----------------+-------------+----------------+---------------+------------+--------+
| uniqName | name | primaryHost | primaryPort | secondaryHost | secondaryPort | status | state |
+=====================+==============+================+=============+================+===============+============+========+
| 192-168-10-146-7039 | Demo Gateway | 192.168.10.146 | 7039 | 192.168.10.147 | 7039 | Connected | Active |
+---------------------+--------------+----------------+-------------+----------------+---------------+------------+--------+
| 192-168-10-146-7040 | | 192.168.10.146 | 7040 | | | Assigned | |
+---------------------+--------------+----------------+-------------+----------------+---------------+------------+--------+
Fields description:
Column name | Description |
---|---|
uniqName | Unique name of configured gateway |
name | Configured gateway name (available after connecting) |
primaryHost | Gateway primary host (from configuration) |
primaryPort | Gateway primary port (from configuration) |
secondaryHost | Gateway secondary host (available after connecting) |
secondaryPort | Gateway secondary port (available after connecting) |
status |
|
state |
|
Note: The following views are useful for diagnostics and should only be enabled on a ‘need to have’ basis:
Running queries
This view is obtained by enabling the cluster-queries
metric. The source column includes client hostname/IP address, port and unique query number.
Example view:
+----------------------------------+-----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | source | queryType | path | +==================================+===========+=====================================================================================================================================================================================+ | ClusterSystem-1990008510 | DataSet | //gateway | +----------------------------------+-----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | 192-168-220-81-57578--1940838693 | DataSet | //dataview | +----------------------------------+-----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | 192-168-220-81-57578--606738513 | DataSet | //managedEntity | +----------------------------------+-----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | 192-168-220-81-57578--1775872277 | DataSet | /geneos/gateway[(@name="Demo Gateway")]/directory/probe[(@name="Probe")]/managedEntity[(@name="VM")] | +----------------------------------+-----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | 192-168-220-81-57578--562344160 | DataSet | /geneos/gateway[(@name="Demo Gateway")]/directory/probe[(@name="Probe")]/managedEntity[(@name="VM")]/sampler | +----------------------------------+-----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | 192-168-220-81-57578--79799599 | DataSet | /geneos/gateway[(@name="Demo Gateway")]/directory/probe[(@name="Probe")]/managedEntity[(@name="VM")]/sampler[(@name="API Sampler")][(@type="")]//dataview | +----------------------------------+-----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | 192-168-220-81-57578--1172427576 | DataSet | /geneos/gateway[(@name="Demo Gateway")]/directory/probe[(@name="Probe")]/managedEntity[(@name="VM")] | +----------------------------------+-----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | 192-168-220-81-57578-779710210 | DataView | /geneos/gateway[(@name="Demo Gateway")]/directory/probe[(@name="Probe")]/managedEntity[(@name="VM")]/sampler[(@name="API Sampler")][(@type="")]/dataview[(@name="cluster-queries")] | +----------------------------------+-----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Fields description:
Column name | Description |
---|---|
source | Unique name of source that is publishing messages for this subscription to the client |
queryType | Type of the query (e.g. DataSet, DataView, etc.) |
path | Query path |
ORB queue stats
This view is obtained by enabling the node-orb-queue-stats
metric. An example view:
+------------------------------+------------------------+--------+-------------+---------+----------+---------+-------------------------+---------------------+------------------+ | name | typeT | connId | bdoExportId | numMsgs | dataSize | addRate | queueProcessingTime1Min | percentProcTime1Min | oldestMessageAge | +==============================+========================+========+=============+=========+==========+=========+=========================+=====================+==================+ | from-comms-queue | QT_InboundSystemQueue | 0 | -1 | 0 | 0 | 12 | 257869 | 1.1885600410436674 | 0.0 | +------------------------------+------------------------+--------+-------------+---------+----------+---------+-------------------------+---------------------+------------------+ | to-comms-queue | QT_OutboundSystemQueue | 0 | -1 | 0 | 0 | 5 | 0 | 0.0 | 0.0 | +------------------------------+------------------------+--------+-------------+---------+----------+---------+-------------------------+---------------------+------------------+ | bdo-0-directory | QT_BdoQueue | 2 | 0 | 0 | 0 | 0 | 0 | 0.0 | 0.0 | +------------------------------+------------------------+--------+-------------+---------+----------+---------+-------------------------+---------------------+------------------+ | bdo-1-distributedmetadata | QT_BdoQueue | 2 | 1 | 0 | 0 | 0 | 0 | 0.0 | 0.0 | +------------------------------+------------------------+--------+-------------+---------+----------+---------+-------------------------+---------------------+------------------+ | bdo-4-gatewaycontrol | QT_BdoQueue | 2 | 4 | 0 | 0 | 0 | 0 | 0.0 | 0.0 | +------------------------------+------------------------+--------+-------------+---------+----------+---------+-------------------------+---------------------+------------------+ | bdo-5-distributedcommandlist | QT_BdoQueue | 2 | 5 | 0 | 0 | 0 | 0 | 0.0 | 0.0 | +------------------------------+------------------------+--------+-------------+---------+----------+---------+-------------------------+---------------------+------------------+ | bdo-7-distributedkbalist | QT_BdoQueue | 2 | 7 | 0 | 0 | 0 | 0 | 0.0 | 0.0 | +------------------------------+------------------------+--------+-------------+---------+----------+---------+-------------------------+---------------------+------------------+ | bdo-8-dataview1 | QT_BdoQueue | 2 | 8 | 0 | 0 | 0 | 0 | 0.0 | 0.0 | +------------------------------+------------------------+--------+-------------+---------+----------+---------+-------------------------+---------------------+------------------+ | bdo-12-dataview1 | QT_BdoQueue | 2 | 12 | 0 | 0 | 0 | 0 | 0.0 | 0.0 | +------------------------------+------------------------+--------+-------------+---------+----------+---------+-------------------------+---------------------+------------------+ | bdo-13-dataview1 | QT_BdoQueue | 2 | 13 | 0 | 0 | 0 | 0 | 0.0 | 0.0 | +------------------------------+------------------------+--------+-------------+---------+----------+---------+-------------------------+---------------------+------------------+ | bdo-14-dataview1 | QT_BdoQueue | 2 | 14 | 0 | 0 | 11 | 11713510 | 53.9894672347797 | 0.0 | +------------------------------+------------------------+--------+-------------+---------+----------+---------+-------------------------+---------------------+------------------+ | bdo-15-dataview1 | QT_BdoQueue | 2 | 15 | 0 | 0 | 22 | 5939016 | 27.37388790711174 | 0.0 | +------------------------------+------------------------+--------+-------------+---------+----------+---------+-------------------------+---------------------+------------------+ | bdo-16-dataview1 | QT_BdoQueue | 2 | 16 | 0 | 0 | 10 | 3785522 | 17.44808481706489 | 0.0 | +------------------------------+------------------------+--------+-------------+---------+----------+---------+-------------------------+---------------------+------------------+
Fields description: