Geneos

Hot Standby

Overview

As the Gateway is responsible for consolidating all monitoring data for distribution, and this introduces a single point of failure. To alleviate the problem, two Gateways can be run as a hot standby pair, so that if one Gateway fails the other remains in operation until the fault is rectified.

A Gateway can be in one of three roles:

  • Stand Alone.
  • Primary.
  • Secondary.

A single Gateway runs in Stand Alone mode. A hot standby pair consists of one primary and one secondary Gateway.

In hot standby mode, the primary Gateway is responsible for all Gateway operations, collecting and analysing data from Netprobes, and distributing the results to connected Active Consoles. The secondary Gateway remains in an idle state, ready to take over monitoring duties should the primary Gateway fail.

If the primary Gateway fails, the secondary Gateway connects to Netprobes and assumes the Gateway duties until the primary Gateway becomes operational again.

Operation

Hot standby failback

Under normal operation, restarting the primary Gateway results in the secondary Gateway releasing the Netprobes and the primary regaining control.

If you do not want this to happen automatically, both Gateways must be started with the -manual-failback option on the command line.

If both Gateways are not started with the same option, the secondary Gateway does not successfully connect to the primary and the primary runs stand alone until this is corrected.

In manual failback mode, when a secondary Gateway becomes active it does not relinquish control until the manual failback command is issued. This allows you to restart your Gateway and transfer control at a convenient time.

Note: A Gateway configured for manual failback does not allow connections from clients that do not support this feature. This includes old versions of the Active Console and OpenAccess. Active Console informs the user that the connection has been rejected with the following reason: "Connection to hot-standby Gateway rejected. Gateway is in manual failback mode and client does not support this."

Scheduling failback

The failback command (/Gateway:failback) can scheduled like any other internal command. This could be useful to schedule an automatic failback outside of business hours should a secondary take over the role of primary at some point.

A suggested XPath for the target that would ensure that the command is only run on an active secondary in a manual-failback mode is shown below:

/geneos/gateway/directory[(param("HotStandbyRole")="Secondary")][(param("HotStandbyManualFailbackActive")="true")][(rparam("HotStandbyState")="Active")

File synchronisation

Hot standby Gateways can also be configured with a set of external files to keep synchronized. These files are transferred from the active (typically primary) Gateway if the file sizes are different between Gateways, or the active Gateway detects that the file has been modified.

External files are synchronized at two points:

  • When a setup change is applied.
  • If the Gateway detects a change while checking external files.

The frequency at which external files are checked can be controlled using the hotStandby > syncFiles > externalFilesCheckInterval setting.

Included setup files (see File merging) can also optionally be synchronized between Gateways, although these files are synchronized only when a setup change is applied, and not on file modification or size changes.

Hot Standby with secure ports

To use secure ports with Gateways, configure the Gateway with a secure listen port in operatingEnvironment > listenPorts. The port for both the primary and secondary Gateway must match the secure port set. If you are using different secure ports on the primary and secondary Gateways, see Using different ports for primary and secondary Gateways.

Note: It is possible to specify both secure and insecure ports with Hot Standby. However, if you do specify both secure and insecure ports, you cannot specify different ports for primary and secondary Gateways.

Using different ports for primary and secondary Gateways

The Gateway setup is synchronised between primary and secondary Gateways. Therefore, configuring the settings with a different value depending upon the Gateway role must be done externally to the Gateway setup file.

This also applies to the listenPort setting for Gateway. Therefore, to use different ports for primary and secondary Gateways, you must set the listen port using the ‑port command line option. The ‑port command line option overrides the secure listen port if set, otherwise it overrides the insecure port. For example:

To start the primary Gateway:

gateway2.linux -setup hotStandbyExample.xml -port 22040

To start the secondary Gateway:

gateway2.linux -setup hotStandbyExample.xml -port 22045

The Gateway hotStandby section as displayed in the Gateway Setup Editor (GSE):

Note: You can only override one port setting using the command line option -port. Therefore, you cannot specify different ports for primary and secondary Gateways if you also need to configure both secure and insecure ports.

Paired operation

It is recommended that primary and secondary Gateways run on different hosts. Therefore, failure of the host (or network connectivity of the host) does affect the secondary Gateway and monitoring can continue unaffected.

Once the hot-standby configuration is completed, the setup file should be copied from the primary to the secondary host. This setup should then be applied to the primary Gateway first, and then to the secondary Gateway.

The primary Gateway must be started first because the secondary Gateway attempts to connect to it to synchronise operation. This connection is maintained while both Gateways are in operation. Should the connection fail, the secondary Gateway becomes active after a brief time and takes over monitoring duties from the failed primary Gateway.

While the secondary Gateway is active, you should not notice a difference in services between Gateways.

In particular it is still possible to alter the Gateway setup. When the primary Gateway is running again the synchronisation between Gateways will ensure that the primary Gateway receives an up-to-date copy of the setup file before it takes up monitoring operations again.

When the primary Gateway is reactivated there will be a short period when the secondary Gateway releases control of Netprobes and the primary Gateway acquires them again during which monitoring is paused. It is possible to remove this period by switching the (active) secondary Gateway to a primary role by altering the Gateway setup, before the primary Gateway is restarted. When performing this switch, please ensure that the setup for the old primary Gateway is also edited so that it runs as a secondary Gateway.

A parameter, HotStandbyRole and a runtime parameter HotStandbyState are available on the directory object. These can be used to alert when hot standby failovers occur. The following rule is an example of how this could be used.

Target:

/geneos/gateway/directory

Rule:

if param "HotStandbyRole" = "Secondary" and rparam "HotStandbyState" = "Active" then
run "myFailoverAction"
endif	                      

Parameter HotStandbyRole and runtime parameter HotStandbyState can be described as follows:

If HotStandbyRole = Primary then HotStandbyState = Active
If HotStandbyRole = Secondary then HotStandbyState = Active or Inactive
If HotStandbyRole = Stand Alone then HotStandbyState = Active

Configuration

Basic configuration for a Hot Standby pair of Gateways consists of specifying the hostname and port for each Gateway.

Hostnames can be specified as either the machine hostname (obtained by running the hostname command), as an IP address, or as a fully qualified domain name (e.g. "somehost.somedomain.com").

When Hot Standby Gateway setup is applied, a Gateway determines whether it is running as the primary or secondary host. It does this by comparing first the primary then the secondary hostname with the identity of the host the Gateway is running on. The first match found determines the Gateway role; if no settings match the Gateway will run as Stand Alone.

Each comparison is performed in the following order:

  • First, the Gateway checks if a configured hostname is an IP address (e.g. 192.168.1.135); this is then compared with the IP address(es) of the host.
  • Second, if a configured hostname is a fully qualified domain name (e.g. "somehost.somedomain.com"), the whole name as given and then the first part of the configured hostname is compared with the Gateway host name. For example, "somehost.somedomain.com" matches if the Gateway host name is also "somehost.somedomain.com", or "somehost" but not if the Gateway host name is "somehost.anotherdomain.com".
  • Otherwise, the Gateway compares the configured hostname with the full Gateway host name and then with the first part of its name. Therefore, if the configured host name is "somehost", it matches if the Gateway host name is "somehost.somedomain.com" or "somehost", as well as "somehost.anotherdomain.com".

A secondary Gateway attempts to connect to the primary Gateway on the configured hostname and port. If the hostname is not specified as an IP address, Gateway attempts to resolve this using a forward DNS lookup.

When the primary Gateway receives a connection from another Gateway, it attempts to authenticate this Gateway by matching the incoming connection against the configured secondary Gateway. The way this is done is controlled by the hotStandby > primarySecurityCheck setting, which by default resolves the connecting IP to a hostname using a reverse DNS lookup (if required) and match this against the secondary Gateway configuration.

Basic tab

These settings are found under the Basic tab.

hotStandby

Holds the primary and secondary Gateway settings.

hotStandby > primary

Contains settings for the primary Gateway in a hot-standby pair.

hotStandby > primary > hostname

The primary Gateway host, specified as either the machine hostname or an IP address.

The secondary Gateway uses this setting to connect to the primary Gateway, and so the hostname must also resolve to an IP address using a forward DNS lookup.

hotStandby > primary > port

The listen port of the primary Gateway. If the Gateway is listening on both a secure and an insecure port, this value must match the secure port the Gateway is listening on.

If the primary and secondary Gateways need to listen on different ports, one or both of the Gateways must be started with the command line -port option to override the port value set in operatingEnvironment > listenPorts. This value must match the port set on the command line if that is used. See Using different ports for primary and secondary Gateways.

hotStandby > secondary

Contains settings for the secondary Gateway in a hot standby pair.

hotStandby > secondary > hostname

The secondary Gateway host, specified either as the machine hostname or an IP address.

The primary Gateway checks any incoming Gateway connections against this setting, either comparing the IP address directly or by using a reverse DNS lookup to resolve the connecting host to a hostname for comparison.

hotStandby > secondary > port

The listen port of the secondary Gateway. If the Gateway is listening on both a secure and an insecure port, this value must match the secure port the Gateway is listening on.

If the primary and secondary Gateways need to listen on different ports, one or both of the Gateways must be started with the command line -port option to override the port value set in operatingEnvironment > listenPorts. This value must match the port set on the command line if that is used. See Using different ports for primary and secondary Gateways.

Advanced tab

These settings are found under the Advanced tab.

hotStandby > syncFiles

Hot standby File synchronisation is configured using this section.

hotStandby > syncFiles > setupIncludes

Boolean setting controls whether included setup files are synchronised between Gateways.

If set to true, include files are synchronised.

However, this should only be done if there is a separate copy of the include files for both the primary and secondary Gateways in the hot standby pair.

It is recommended that you locate include files in a shared location, so that they can be shared between multiple Gateways (e.g. a global users file).

Mandatory: No
Default: false

hotStandby > syncFiles > disabledSetupIncludes

Boolean setting controlling whether disabled included setup files are synchronised between Gateways.

If set to true, disabled include files are synchronised.

Mandatory: No
Default: false

hotStandby > syncFiles > externalFilesCheckInterval

Defines how frequently Gateway checks the external files to confirm if they need to be re-synchronised. Files are checked to investigate if the file size is different between Gateways, or if the file modification time has been updated since the last check.

The check interval is specified in seconds. After this many seconds since the previous check was made, Gateway checks the external files again. External files are also checked when a setup change is applied.

Mandatory: No
Default: 300 (5 minutes)

hotStandby > syncFiles > externalFiles

A list of external files can be defined here, which the Gateway attempts to keep in sync across the hot standby pair.

Mandatory: No

hotStandby > syncFiles > externalFiles > externalFile

Defines a single external file, which the Gateway attempts to keep in sync between Gateways in a hot standby pair.

hotStandby > syncFiles > externalFiles > externalFile > primaryPath

Specifies the path of the external file on the primary Gateway. The Gateway searches here for the file when performing synchronisation.

This location should be both readable and writeable by the Gateway.

Mandatory: Yes

hotStandby > syncFiles > externalFiles > externalFile > secondaryPath

Specifies the path of the external file on the secondary Gateway. This optional setting allows you to configure external files that are present on both Gateways, but in different locations.

If a secondary path is not specified, the path is assumed to be the same as the configured primary path.

hotStandby > primarySecurityCheck

This setting controls how the primary Gateway verifies that a connection is from the configured secondary Gateway when using hot standby functionality.

Value Effect
reverseDns The IP address of the connecting secondary is resolved to a hostname using a reverse DNS lookup. This hostname is then compared against the configured secondary.
forwardDns If the configured secondary Gateway is specified as a hostname, then this is resolved to an IP address. This address is then compared with the address of the connecting secondary Gateway.
disabled Security checks are disabled. A connection from a Gateway is assumed to be from the configured secondary Gateway.
Mandatory: No
Default: reverseDns

hotStandby > primaryInactiveTimeout

Time in seconds a primary Gateway waits at start-up for a secondary Gateway to connect to it.

Mandatory: No
Default: 30

hotStandby > primaryActivePendingTimeout

Time in seconds a primary Gateway waits for the secondary Gateway to release any Netprobes it controls, after the secondary Gateway has connected.

Mandatory: No
Default: 30

hotStandby > secondaryActivePendingTimeout

Time in seconds a secondary Gateway waits to reconnect to the primary Gateway after the connection between them is lost.

Mandatory: No
Default: 30