Terracotta Universal Messaging Solution
Introduction Copied
As technology evolves, web and mobile platforms are becoming increasingly essential to clients and business. The Universal Messaging Solution enhances productivity and provides real-time alerts to support staff. Although the status of Terracotta Universal Messaging can be viewed through the Enterprise Manager tool supplied, this only provides a static view of the Cluster and no ability for automated alerts or integration with other monitoring tools. As more of the global FX e-platforms and banks rely on Terracotta Universal Messaging for streaming, this new plug-in enables them to see more and deal with issues faster than before.
Glossary Copied
- Cluster - a selection of at least 3 Universal Messaging Realms. The Realms work in conjunction with each other to supply a messaging service, in a fault tolerant load balanced way. If a Realm fails for what every reason, the other Realms in the cluster will continue to work as if it hadn’t failed. To achieve this goal, a single Realm is identified as a Master Node which keeps the status of the other Realms. If this master Realm fails, then the other Realms will vote to select a new master Realm.
- Channel - a generic name for a Topic or Queue, as defined by the JMS standard.
- Connection - a physical connection over which a message is sent.
- Realm - identifies an individual instance of a Universal Messaging Server.
- RNAME - the key to all information in the Universal Messaging API. This uniquely identifies a Universal Messaging instance or Realm. The RNAME is made up of three parts:
<protocol>://<hostname>:<port>
<protocol>
- can be one of the four available native communications protocol identifiers: nsp (socket), nhp (HTTP), nsps (SSL), and nhps (HTTPS).<hostname>
- the hostname or IP address that the Universal Messaging Realm is running.<port>
- the TCP port on the hostname that the Universal Messaging Realm is bound to be, using the same wire protocol.
Technology Copied
The Universal Messaging plug-in makes extensive use of the Terracotta Universal Messaging admin API to provide detailed, real-time statistics about the performance of Universal Messaging Clusters. Using this information in conjunction with Geneos FKM (File Keyword Monitoring) against the Universal Messaging log provides a comprehensive view of the health of your Universal Messaging Cluster. Rules can be set up to provide the support team with automated alerts, which frees up resources and enables swift resolution to issues.
Many people successfully use Universal Messaging as a middleware delivery mechanism for low latency applications; however, the technology is a blind spot for them from a monitoring point of view. Surely, they need to gather more sophisticated metrics from Terracotta Universal Messaging. The following sections briefly describe the views that are available.
Prerequisites Copied
- Instrumentation XML-RPC API.
- Universal Messaging Solution package with dependent libs (these are included in the lib subdirectory).
- Universal Messaging plug-in license:
UMMonitor.lic
. Please contact ITRS Support for a trial licence.
Installation Copied
Sampler Copied
Set up a sampler. Ths is set up as an API plug-in.
- Set the name to “Cluster”. If you wish to change the name, make sure that this value is used in the
UMMonitor.properties
file. - Set the plugin type to API.
<sampler name="Cluster"> <plugin> <api></api> </plugin> </sampler>
Netprobe Copied
Select a netprobe, preferably on the machine where you will be running the plug-in code. In this example, it’s called “UM probe”.
Managed Entity Copied
Set up a managed entity that joins the probe and the sampler.
-
Set the name to “Universal Messaging”. If you wish to change the name, make sure that this value is used in the
UMMonitor.properties
file. -
Set Options to probe, and select the probe you set up in Netprobe.
-
Reference the sampler you set up in Sampler.
<managedEntity name="Universal Messaging"> <probe ref="UM probe"></probe> <sampler ref="Cluster"></sampler> </managedEntity>
Universal Messaging Permissions Copied
Using the Enterprise Manager, ensure that the user has full access permissions. You can do this using the command line tools:
./naddrealmacl <user> <server where plugin is running> full
Universal Messaging Plug-in with Dependent Libs Copied
Create a directory on the server where you are running the netprobe you want to use to monitor Universal Messaging. Copy the contents of the tar file to this location.
You should see the following:
UMMonitor/
UMMonitor.jar
lib/
log4j-1.2.16.jar
vim25.jar
ws-commons-util-1.0.2.jar
xmlrpc-client-3.1.3.jar
xmlrpc-common-3.1.3.jar
xmlrpc-server-3.1.3.jar
Plug-in Configuration Copied
By default, the plug-in uses a config file called UMMonitor.properties
. You can specify a different config file on the command line
or use Java properties on the command line to override properties in the config file. If there is no config file to be found,
running the plug-in the first time will generate a default config file. Confirm that the UMMonitor.properties
file has the correct
settings especially:
netprobeServer=localhost
netprobePort=7036
Logging Configuration Copied
The logging is configured using log4j
. By default, it is configured to log to the console and a log file (UMMonitor.log
)
that will roll twice a day (AM and PM).
Initialisation Copied
To run the UMMonitor.jar
file:
java -jar UMMonitor.jar
Views Copied
Cluster Monitor Copied
The Cluster Monitor dataview contains the state, master, online, and can be master columns.
Realm Monitor Copied
Interface Monitor Copied
Channel Monitor Copied
Connection Monitor Copied
Thread Monitor Copied
Default Rules Copied
Once the Universal Messaging monitoring is in place, the real value that Geneos brings is to allow you to create alerts that identify a Realm in trouble. This allows your support teams to identify a problem before it affects your business service. To create the alerts, you need to set up rules based on the information that is available in the data views.
Cluster Monitoring Copied
These values are available from the Cluster Monitor view. It’s important to build a rule that works across both the Master Realm and each Realm, as the two views may differ.
Rule Description | Alert Level |
---|---|
Highlight if less than 51% of the Realms are available – this means that Quorum cannot be reached | Red |
Highlight if the Master nodes view differs from the view of the individual node. E.g. if the Master believes it loses connection with one realm, it appears as offline, but it may just be that the connection is broken | Amber |
Realm Monitoring Copied
On each individual Realm, it is worth setting up monitoring for the following fields.
Rule Description | Alert Level |
---|---|
If getUsedMemory is greater than Warning memory level (default 85%) | Amber |
If getUsedMemory is greater than Error memory level (default 95%) | Red |
If getUsedMemory increases rapidly in value over three consecutive sample periods | Red |
Consumed per sec should not be 0 | Amber |
The number of messages Consumed should have a near linear relationship Published value.You can use Breach Predictor to calculate this relationship. | Amber |
Channel Monitoring Copied
The following rules relate to the Channel views.
Rule Description | Alert Level |
---|---|
Consumed per sec should not be 0 | Amber |
The number of messages Consumed should have a near linear relationship Published value. For example, if the Consumed value increased 10%, the Published value should also increase.Try to use Breach Predictor to calculate this relationship. | Amber |
If Published Rate or Consumed Rate increase dramatically (over 10%) | Amber |
If getUsedSpace > 1GB, this means the storage may be persisting incorrectly.Universal Messaging supports 5 channel types:
|
Red |
Connection Monitoring Copied
The following rules relate to the Connection views.
Rule Description | Alert Level |
---|---|
Alert If the getQueueSize value is always increasing | Amber |
Log Monitoring Copied
This uses the File Keyword Monitor to examine the logs and raise alerts based on the following information. These are the suggested Red and Amber FKM Keys to watch.
Grep | Description | Test | Alert Level |
---|---|---|---|
Disconnected\s+from
|
This means a Realm is disconnected and should be picked up by the Cluster Monitor. | Shut down realm. | Amber |
Inactive\s+drivers\s+bound\s+to
|
This occurs when the network stack does not correctly report a connection closure. | Block traffic using firewall. | Amber |
Driver\s+inactive\s+on\s+adapter
|
This occurs when the network stack does not correctly report a connection closure. | Block traffic using firewall. | Amber |
Fatal
|
This is a serious error. | This is hard to reproduce, Tag. The next level is security, which is easy to check for. Just create a channel from one host, remove all permissions in the ACL panel for *@* acl entry, and run the npubchan example program on the channel from another host. You will see a security log message. |
Red |
Logged\s+Out\s+using\s+(nsp|nhp|nsps|nsp)\s+session\s+established\s+for\s[0-6]?[0-9]\s+Seconds
|
Try to find the text “Logged out” using the <PROTOCOL> session established for <x> seconds.If the X seconds is less than 60, then alert. |
Start up an APP , and shut it down before 60 seconds is up. |
Amber |
ThreadPool:<“+myName+”>(“+myIdleThreadCount+”)hasbeenactiveforover60seconds
|
Alert if the thread pool has taken 60 seconds to process. | This will be hard to recreate. | Red |