Geneos


The end of life (EOL) date for this module is on 31 January, 2020.

Self-Monitoring

Description

Self-monitoring allows you to obtain information about the state of the cluster such as node membership, gateway connectivity, client connections and more.

Prerequisites

To use self-monitoring you will require:

  • A Gateway capable of running Compute Engine rules
  • A Netprobe capable of running the API plugin

For more information about configuring these components, see the Gateway 2 Reference Guide.

Quick start

Firstly you will need to nominate a netprobe for the self-monitoring. In this example it will be on localhost:7036 called ‘Cluster Monitoring Probe’.

Make sure the probe is specified in the gateway setup file:

...
<probes>
  ...
  <probe name="Cluster Monitoring Probe">
    <hostname>localhost</hostname>
    <port>7036</port>
  </probe>
  ...
</probes>
...

Now add an entity pointed at the nominated probe, with a sampler called Cluster Monitoring Sampler (we will add the sampler later):

...
<managedEntities>
  ...
  <managedEntity name="Cluster Monitoring Entity">
    <probe ref="Cluster Monitoring Probe"/>
    <sampler ref="Cluster Monitoring Sampler"/>
  </managedEntity>
  ...
</managedEntities>
...

For your convenience, we include a gateway setup include file with a sampler called ‘Cluster Monitoring Sampler’ and some rules. This can be found in the node directory: resources/self-monitoring-gateway-include.xml (see Include File).

Warning

The included gateway file is intended as a guide. The contents should be changed to suit the needs of each installation.

To use it, add it as an include file. Assuming that the node is installed in /opt/openaccess-node/latest:

Note: The priority must have a higher value than the main setup file.

...
<includes>
  <priority>1</priority> <!-- Main setup priority-->
  ...
  <include>
    <priority>2</priority> <!-- Make sure this is higher than the main setup priority -->
    <required>true</required>
    <location>/opt/openaccess-node/latest/resources/self-monitoring-gateway-include.xml</location>
  </include>
  ...
</includes>
...

Finally, in config/settings.conf edit the monitoring section, change enabled to ‘true’ and set the host and port to your nominated netprobe:

monitoring {
  enabled = true
  metrics {
    ...
    host = "localhost"    port = 7036
    ...
  ...

For the full configuration see Configuration.

If everything works you should see a set of dataviews for each node in the cluster:

../../ImportedGeneosImages/self-monitoring-example.png

Configuration

Minimum configuration that is required for monitoring to work:
  • IP address of a Netprobe that is configured with a Netprobe API sampler
  • Port address of the Netprobe
  • Name of the Netprobe API sampler
  • Name of the Managed Entity containing the Netprobe API sampler

Monitoring configuration example (settings.conf):

...
monitoring {
  enabled = false
  metrics {
    #these settings can be overwritten for any metric type if necessary
    host = "localhost"    port = 7036
    # Set this to true if connecting to a Secure Netprobe, default is false
    secure = false
    sampler = "Cluster Monitoring"    entity = "OA Cluster"    cluster-nodes {
      enabled = true
      view = clusterNodes
    }

    node-gateways {
      enabled = true
      view = gateways
    }

    node-clients {
      enabled = true
      view = clients
    }

    ###############################################################################################
    # queries and orb queue stats are useful for diagnostics only, enable on a 'need to have' basis
    ###############################################################################################
    cluster-queries {
      enabled = false
      view = clusterQueries
    }

    node-orb-queue-stats {
      enabled = false
      view = queues
    }
  }
}
...

Note: By default all metrics are displayed under a single sampler, but it is possible to configure a dedicated sampler or even Netprobe for each metric type. In order to achieve this host, port, sample and entity configuration values can be overridden for each metric type.

Include File

Warning

The included gateway file is intended as a guide. The contents should be changed to suit the needs of each installation.

The gateway include file can be found in resources/self-monitoring-gateway-include.xml. It contains:

  • A Sampler called Cluster Monitoring Sampler`
  • The sampler contains a computed view called Diagnostics. This is used by the compute engine rules to check the state of the cluster.

In addition to this the include file has several rules:

Rule nameViewDescription
Cluster Nodes StatusCluster NodesChecks the memeber status of each node
Cluster Nodes CPU UtilisationCluster NodesChecks the overall CPU utilsation on the node box. Adjust the values as required
Cluster Nodes Memory UsageCluster NodesChecks the Used JVM heap memory. Adjust the values as required
Cluster Connected Gateway StatusAssigned gatewaysChecks the connection status of node to its assigned gateways
Cluster IntegrityCluster NodesChecks that the number of nodes is the same as the number of cluster node dataviews
Stale cluster monitoring viewsAllChecks the last update time of each view. Adjust the times as required
Cluster integrity - number of nodesNAKeeps track of the number of expected nodes

Monitoring Views

Cluster Nodes

This view is obtained from the cluster-nodes metric. Each node lists nodes that are currently part of the cluster with some resource usage information and the configured gateways count. Normally each node will publish the same information in this view, but the if cluster disintegrates each node can have different views of the cluster. Such situations should be detected by the self-monitoring rules provided. See Include File for details on where to find default rules.

Example view:

+---------------------+----------------+------+-----------------+------------+-----------+--------------------+--------+
| nodes               | host           | port | machineCpuUsage | usedMemory | maxMemory | gatewaysConfigured | status |
+=====================+================+======+=================+============+===========+====================+========+
| 192-168-10-146-2551 | 192.168.10.146 | 2551 | 0.36%           | 4.47%      | 981 MB    | 1                  | Up     |
+---------------------+----------------+------+-----------------+------------+-----------+--------------------+--------+
| 192-168-10-146-2552 | 192.168.10.146 | 2552 | 2.62%           | 2.07%      | 800 MB    | 0                  | Up     |
+---------------------+----------------+------+-----------------+------------+-----------+--------------------+--------+

Fields description:

Column nameDescription
nodesUnique row name (hostname + port)
hostNode hostname
portNode port
machineCpuUsageCPU utilisation percentage, sum of User + Sys + Nice + Wait, weights based on remaining cpu capacity: 1 - utilisation
usedMemoryUsed JVM heap memory
maxMemoryMaximum JVM heap memory
gatewaysConfiguredNumber of gateways from settings.conf to which this node will try to connect
statusNode status in the cluster (e.g. Joining, Up, Down, Removed)

Connected Clients

This view is obtained from the node-clients metric. Since Open Access clients can connect to any node in the cluster there is a separate view for each cluster node.

Example view:

+----------------------------------------------+----------------+-------+------------------+------+---------------------+---------+-----------+
|                   uniqName                   |      host      |  port |      status      | user |        client       | version |   roles   |
+==============================================+================+=======+==================+======+=====================+=========+===========+
| aclite-1-1-192-168-220-81-58724              | 192.168.220.81 | 58724 | Logged in        | user | aclite              | 1-1     | ROLE_USER |
+----------------------------------------------+----------------+-------+------------------+------+---------------------+---------+-----------+
| OpenAccessApiClient-1-1-192-168-10-146-40741 | 192.168.10.146 | 40741 | Authenticating   | user | OpenAccessApiClient | 1-1     |           |
+----------------------------------------------+----------------+-------+------------------+------+---------------------+---------+-----------+
| OpenAccessApiClient-1-1-192-168-10-146-56178 | 192.168.10.146 | 56178 | Unable to log in |      |                     |         |           |
+----------------------------------------------+----------------+-------+------------------+------+---------------------+---------+-----------+

Fields description:

Column nameDescription
uniqNameUnique name of connected client (format: client type + version + hostname + port)
hostClient hostname
portClient port
statusAuthentication status
userClient username (a username will be displayed even if authentication is disabled but a username was specified when connecting to the node)
clientClient type
versionVersion of Open Access client library
rolesRoles that are configured for the logged in user (only displayed when authentication is enabled)

Assigned gateways

Note: The status field changed in 2.0, and the state field was added.

This view is obtained from the node-gateways metric. Example view:

+---------------------+--------------+----------------+-------------+----------------+---------------+------------+--------+
|       uniqName      |     name     |  primaryHost   | primaryPort | secondaryHost  | secondaryPort |   status   | state  |
+=====================+==============+================+=============+================+===============+============+========+
| 192-168-10-146-7039 | Demo Gateway | 192.168.10.146 |        7039 | 192.168.10.147 |          7039 | Connected  | Active |
+---------------------+--------------+----------------+-------------+----------------+---------------+------------+--------+
| 192-168-10-146-7040 |              | 192.168.10.146 |        7040 |                |               | Assigned   |        |
+---------------------+--------------+----------------+-------------+----------------+---------------+------------+--------+

Fields description:

Column nameDescription
uniqNameUnique name of configured gateway
nameConfigured gateway name (available after connecting)
primaryHostGateway primary host (from configuration)
primaryPortGateway primary port (from configuration)
secondaryHostGateway secondary host (available after connecting)
secondaryPortGateway secondary port (available after connecting)
status
  • Assigned - The gateway has been assigned to this node in the cluster
  • Connected - Gateway is connected (name and secondary host/port will be populated)
  • Error - Node is unable to communicate with gateway
state
  • If status is ‘Assigned’, this field will be blank
  • If status is ‘Connected’, can be Active, Inactive, Pending, Standby, InactiveLicensePending, InactiveLicenseDenied, Error
  • If status is ‘Disconnected’, can be NoEMFComponent, Disconnected, Unreachable, NotTrusted, WrongComponentType, WrongVersion, GatewayIdConflict

Note: The following views are useful for diagnostics and should only be enabled on a ‘need to have’ basis:

Running queries

This view is obtained by enabling the cluster-queries metric. The source column includes client hostname/IP address, port and unique query number.

Example view:

+----------------------------------+-----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|              source              | queryType |                                                                                         path                                                                                        |
+==================================+===========+=====================================================================================================================================================================================+
| ClusterSystem-1990008510         | DataSet   | //gateway                                                                                                                                                                           |
+----------------------------------+-----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 192-168-220-81-57578--1940838693 | DataSet   | //dataview                                                                                                                                                                          |
+----------------------------------+-----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 192-168-220-81-57578--606738513  | DataSet   | //managedEntity                                                                                                                                                                     |
+----------------------------------+-----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 192-168-220-81-57578--1775872277 | DataSet   | /geneos/gateway[(@name="Demo Gateway")]/directory/probe[(@name="Probe")]/managedEntity[(@name="VM")]                                                                                |
+----------------------------------+-----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 192-168-220-81-57578--562344160  | DataSet   | /geneos/gateway[(@name="Demo Gateway")]/directory/probe[(@name="Probe")]/managedEntity[(@name="VM")]/sampler                                                                        |
+----------------------------------+-----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 192-168-220-81-57578--79799599   | DataSet   | /geneos/gateway[(@name="Demo Gateway")]/directory/probe[(@name="Probe")]/managedEntity[(@name="VM")]/sampler[(@name="API Sampler")][(@type="")]//dataview                           |
+----------------------------------+-----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 192-168-220-81-57578--1172427576 | DataSet   | /geneos/gateway[(@name="Demo Gateway")]/directory/probe[(@name="Probe")]/managedEntity[(@name="VM")]                                                                                |
+----------------------------------+-----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 192-168-220-81-57578-779710210   | DataView  | /geneos/gateway[(@name="Demo Gateway")]/directory/probe[(@name="Probe")]/managedEntity[(@name="VM")]/sampler[(@name="API Sampler")][(@type="")]/dataview[(@name="cluster-queries")] |
+----------------------------------+-----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Fields description:

Column nameDescription
sourceUnique name of source that is publishing messages for this subscription to the client
queryTypeType of the query (e.g. DataSet, DataView, etc.)
pathQuery path

ORB queue stats

This view is obtained by enabling the node-orb-queue-stats metric. An example view:

+------------------------------+------------------------+--------+-------------+---------+----------+---------+-------------------------+---------------------+------------------+
|             name             |         typeT          | connId | bdoExportId | numMsgs | dataSize | addRate | queueProcessingTime1Min | percentProcTime1Min | oldestMessageAge |
+==============================+========================+========+=============+=========+==========+=========+=========================+=====================+==================+
| from-comms-queue             | QT_InboundSystemQueue  |      0 |          -1 |       0 |        0 |      12 |                  257869 |  1.1885600410436674 |              0.0 |
+------------------------------+------------------------+--------+-------------+---------+----------+---------+-------------------------+---------------------+------------------+
| to-comms-queue               | QT_OutboundSystemQueue |      0 |          -1 |       0 |        0 |       5 |                       0 |                 0.0 |              0.0 |
+------------------------------+------------------------+--------+-------------+---------+----------+---------+-------------------------+---------------------+------------------+
| bdo-0-directory              | QT_BdoQueue            |      2 |           0 |       0 |        0 |       0 |                       0 |                 0.0 |              0.0 |
+------------------------------+------------------------+--------+-------------+---------+----------+---------+-------------------------+---------------------+------------------+
| bdo-1-distributedmetadata    | QT_BdoQueue            |      2 |           1 |       0 |        0 |       0 |                       0 |                 0.0 |              0.0 |
+------------------------------+------------------------+--------+-------------+---------+----------+---------+-------------------------+---------------------+------------------+
| bdo-4-gatewaycontrol         | QT_BdoQueue            |      2 |           4 |       0 |        0 |       0 |                       0 |                 0.0 |              0.0 |
+------------------------------+------------------------+--------+-------------+---------+----------+---------+-------------------------+---------------------+------------------+
| bdo-5-distributedcommandlist | QT_BdoQueue            |      2 |           5 |       0 |        0 |       0 |                       0 |                 0.0 |              0.0 |
+------------------------------+------------------------+--------+-------------+---------+----------+---------+-------------------------+---------------------+------------------+
| bdo-7-distributedkbalist     | QT_BdoQueue            |      2 |           7 |       0 |        0 |       0 |                       0 |                 0.0 |              0.0 |
+------------------------------+------------------------+--------+-------------+---------+----------+---------+-------------------------+---------------------+------------------+
| bdo-8-dataview1              | QT_BdoQueue            |      2 |           8 |       0 |        0 |       0 |                       0 |                 0.0 |              0.0 |
+------------------------------+------------------------+--------+-------------+---------+----------+---------+-------------------------+---------------------+------------------+
| bdo-12-dataview1             | QT_BdoQueue            |      2 |          12 |       0 |        0 |       0 |                       0 |                 0.0 |              0.0 |
+------------------------------+------------------------+--------+-------------+---------+----------+---------+-------------------------+---------------------+------------------+
| bdo-13-dataview1             | QT_BdoQueue            |      2 |          13 |       0 |        0 |       0 |                       0 |                 0.0 |              0.0 |
+------------------------------+------------------------+--------+-------------+---------+----------+---------+-------------------------+---------------------+------------------+
| bdo-14-dataview1             | QT_BdoQueue            |      2 |          14 |       0 |        0 |      11 |                11713510 |    53.9894672347797 |              0.0 |
+------------------------------+------------------------+--------+-------------+---------+----------+---------+-------------------------+---------------------+------------------+
| bdo-15-dataview1             | QT_BdoQueue            |      2 |          15 |       0 |        0 |      22 |                 5939016 |   27.37388790711174 |              0.0 |
+------------------------------+------------------------+--------+-------------+---------+----------+---------+-------------------------+---------------------+------------------+
| bdo-16-dataview1             | QT_BdoQueue            |      2 |          16 |       0 |        0 |      10 |                 3785522 |   17.44808481706489 |              0.0 |
+------------------------------+------------------------+--------+-------------+---------+----------+---------+-------------------------+---------------------+------------------+

Fields description: