Operating Environment
Overview
The Gateway Operating environment top-level section contains settings that affect the Gateway as a whole, and do not belong to any other section. Almost all of the settings in this section are optional.
Operation
The only mandatory setting in this section is operatingEnvironment > gatewayName. This name is displayed to all users that connect to the Gateway, and is also used in name lookup and database logging. It is strongly recommended that this name be unique for each Gateway on a particular site.
The Gateway listen ports can also be set in the operating environment (operatingEnvironment > listenPorts). The listen ports are used by components connecting to gateway such as Active Console or Webslinger to request monitoring data for display to users.
Note: This does not include Netprobe
connections, as configuration for these are contained
within Probes. If not
configured, the gateway listen port defaults to
port 7039
for insecure channel and
7038
for secure channel.
Data quality options
Settings in Operating environment control the data quality algorithm that Gateways use to maintain a consistent level of service under excessive load. This algorithm runs throughout the lifetime of a gateway (unless operatingEnvironment > dataQueues > disableChecks is set) and operates as follows:
- A gateway monitors dataview updates to determine if the oldest pending update becomes stale (as controlled by the operatingEnvironment > dataQueues > maxDataAgeMs). If this occurs, a probes connection is dropped to reduce the gateway's load and restore timely data processing.
- The gateway determines which connection to drop based upon CPU usage and processing the incoming data over the previous minute. The connection with the highest load is then suspended for a period (see operatingEnvironment > dataQueues > connectionSuspensionDuration) before the gateway reconnects.
- Once a connection has been suspended, no further suspensions will occur until a grace period (see operatingEnvironment > dataQueues > suspendGracePeriod) has elapsed, allowing time to evaluate the effect of the suspension on the quality of the data.
- Setup changes represent a special case where the data age metrics may spike in the gateway. During setup application no incoming data from netprobes is processed, leading to a backlog of updates to be applied. To avoid unnecessary netprobe suspensions, the algorithm is disabled during setup changes and for operatingEnvironment > dataQueues > setupGracePeriod seconds afterwards.
Probes suspension may additionally be controlled by the suspend probe and unsuspend probe commands. See the commands appendix for details.
For more information regarding Data quality, see Data Quality User Guide.
Memory protection
Settings for memory protection are found in the Data Quality tab.
When data quality is disabled, or in extreme situations when it cannot suspend sufficient probes to prevent the gateway becoming overloaded, the gateway throttles the reading of TCP data to prevent the backlogged data-queues from unbounded growth. This is necessary but less preferable to a managed data-quality suspension because if it continues without recovery, netprobe connections either flow-control or timeout and netprobes are dropped at random.
There are two threshold levels:
- Low-priority threshold (operatingEnvironment > dataQueues > memoryProtection > lowPriorityThresholdMB).
- High-priority threshold (operatingEnvironment > dataQueues > memoryProtection > highPriorityThresholdMB).
When the low-priority threshold is breached, the gateway throttles reads from all importing (netprobe) connections but remain responsive to downstream components, such as Active Console. In the unlikely event that this fails to prevent memory growth and the high-priority threshold is breached, the gateway throttles reads from all connections and become unresponsive until it recovers.
The default for the low-priority threshold is 250 MB. This is calculated to be enough to buffer 77 seconds (see operatingEnvironment > dataQueues > maxDataAgeMs) worth of data on a high bandwidth gateway (approximately 30,000 cell updates per second). This is in order to give the data-quality algorithm time to step in and save the situation before the threshold is reached.
Note: These thresholds govern memory usage by unprocessed EMF messages only. The gateway memory footprint as a whole is typically far more influenced by other factors and could potentially exceed these thresholds without issue. It is unusual for unprocessed EMF messages to account for more than a few megabytes in a normally operating gateway.
Conflation
Settings for conflation are found in the Data Quality tab.
Conflation is an optional and less drastic method of coping with an overloaded gateway than a data quality suspension. When the data queues (containing incoming sampler updates from netprobes) become backlogged due to the gateway being unable to process them as fast as they arrive, conflation allows the gateway to discard out-of-date cell updates and only process and publish the latest cell values. As this could potentially result in the gateway discarding important updates, or missing short-lived events, conflation is disabled by default and should only be used with care.
Rapid cell updates
When a Netprobe has published several updates to the same cell before the gateway has processed the first update:
- Update cell from 1 to 2
- Update cell from 2 to 3
- Update cell from 3 to 4
- Update cell from 4 to 5
With conflation active, gateway only publishes the latest value:
- Update cell from 1 to 5
Updates to a recently created row
When a netprobe updates values in a recently created row before the gateway has processed the create:
- Create row newRow with three cells: 100,200,300.
- Update first cell in newRow from 100 to 111.
- Update second cell in newRow from 200 to 222.
With conflation active, gateway adds the row to the dataview with the latest values:
- Create row newRow with three cells: 111,222,300.
Updates to a row that is then removed
When a netprobe updates values in a row and then removes the row before the gateway has processed the updates:
- Update first cell in row1 from 100 to 111.
- Update second cell in row1 from 200 to 222.
- Remove row row1.
With conflation active, gateway discards the updates and only process the row-removal:
- Remove row row1.
Short lived rows
When a netprobe creates a row and then removes it again before the gateway has processed the create:
- Create row newRow with four cells: 100,200,300,400.
- Update first cell in newRow from 100 to 111.
- Remove row newRow.
With conflation active, gateway conflates away the update as normal but does not conflate away the entire row:
- Create row newRow with four cells: 111,200,300,400.
- Remove row newRow.
Potential Issues
Conflation can prevent a gateway from becoming overloaded, and ensure that published values are always up-to-date, but there are a number of potential issues which you should be aware of.
Lost Spikes
A dataview cell that updates from 32% to 34% to 33% is unlikely to cause issues by having the intermediate update conflated away, but one that updates from 32% to 99% to 33% may miss an important spike.
Similarly, a cell that goes from OK to ERROR to OK again, could cause an alert to be missed if conflation is enabled.
This might also affect compute-engine rules that use statistical functions such as maximum or minimum.
Rate Function
Rate function triggers off the time an update is processed, rather than the time it is generated, and therefore its general performance is likely be improved by conflation.
Note: Spikes in the rate-of-change in a cell may be conflated away.
sampleTime and logNetprobeSampleTimeForDataItems
If logNetprobeSampleTimeForDataItems is configured, cell updates may be logged with sample-times later than they were produced with. This is because the sample-time is published by the netprobe along with the sample-data and will be conflated to the latest value.
E.g. A series of updates produced at twenty second intervals by the netprobe:
- Update cell1 @ 09:25:02
- Update cell2 @ 09:25:22
- Update cell3 @ 09:25:42
Might be conflated into a single update with the latest sample-time:
- Update cell1, cell2, and cell3 @ 09:25:42
The updates to cell1 and cell2 may be logged with this later sample-time. Similarly, rules that reference the sample-time may only see the later value.
Configuration
Basic tab
These settings are found under the Basic tab.
operatingEnvironment > gatewayName
A short name identifying the Gateway.
When using database logging functionality, this name is also logged to the database, and is used to identify records for the Gateway.
Mandatory: Yes
When the <<timestamp>> FATAL: Gateway Mandatory 'gatewayName' has not been specified. [/gateway/operatingEnvironment]
error message appears, this means that the Gateway name setting of the configuration files with the highest priority has no value. To resolve this:
- Look for any enabled Gateway name settings in the Gateway Setup Editor, and check the main and include files.
- Click Includes > Priority field, and increase the priority of the configuration file (main or include files) that has a Gateway name setting with a value.
- Click Save current document to apply the changes.
operatingEnvironment > licensingGroup
Group that the Gateway requests licences from on the Licence Daemon.
operatingEnvironment > listenPorts
The gateway listen ports for incoming connections.
- If operatingEnvironment > listenPorts > secure is set and operatingEnvironment > listenPorts > insecure is not set, then the Gateway only listens on a secure port.
- If operatingEnvironment > listenPorts > secure is not set then the gateway only listens on an insecure port.
- If both operatingEnvironment > listenPorts > secure and operatingEnvironment > listenPorts > insecure are set, the gateway listens on two ports.
See Secure Communications for more details.
The listen port can also be specified as a command-line argument to gateway. If this is done, then the command-line value is used for the lifetime of the gateway process - it cannot be overridden or altered by editing the gateway setup file. An example of using this command-line option is shown below:
gateway2 -port <12345>
operatingEnvironment > listenPorts > secure
This specifies that the gateway should listen securely. In order to listen securely, a SSL certificate needs to be provided using either the -ssl-certificate or -ssl-certificate-key command line option. By default if configured to be secure, the gateway will listen on port 7038. This can be overriden by using the child setting operatingEnvironment > listenPorts > secure > listenPort.
operatingEnvironment > listenPorts > secure > listenPort
This value overrides the default secure listenPort. Specify an integer in the range 1-65535.
operatingEnvironment > listenPorts > insecure
This specifies that the Gateway should insecurely. By default if configured to allow insecure connections, the gateway will listen on port 7039. This can be overriden by using the child setting operatingEnvironment > listenPorts > insecure > listenPort.
operatingEnvironment > listenPorts > insecure > listenPort
This value overrides the default insecure listenPort. Specify an integer in the range 1-65535.
operatingEnvironment > var
List of user environment variable definitions. See User Variables and Environments for details on how to configure environment variables.
Advanced tab
These settings are found under the Advanced tab.
operatingEnvironment > maxLogFileSizeMb
Maximum size in Megabytes of the Gateway log file before it rolls that log file over.
Valid values are 1-2047 inclusive for 32-bit Gateways.
10
operatingEnvironment > logArchiveScript
The name of a batch file or shell script that should be executed when the log file is rolled over.
Note: Using operatingEnvironment > logArchiveScript overrides LOG_ARCHIVE_SCRIPT (if set).
operatingEnvironment > timezone
This sets the TZ
environment variable which determines the time zone the Gateway runs in. This
allows a Gateway in one country to monitor
Netprobes in another country whilst keeping the
time zones the same.
The time zone is specified in the format:
std[+/-offset]
Where std represents one of the standard time
zones. Available valid time zones can be found by
examining the system time zone database, found in/usr/share/zoneinfo
If you specify an offset explicitly, it overrides the definition in the system time database, including the rules to automatically adjust for daylight savings time. It is interpreted as the number of hours necessary to add or subtract to get Coordinated Universal Time (UTC).
For example, if you want your Gateway to run in US Eastern Standard Time, you can:
- (Recommended) Specify the time zone as
America/New_York
. - Specify the timezone as
EST
. - (Not recommended) Specify the time zone as
EST+5EDT,M3.2.0/2,M11.1.0/2
(see POSIX documentation for theTZ
variable). - (Not recommended) Specify the time zone as
EST+5
when DST is not in effect and change it toEST+6
when DST is in effect.
The Gateway attempts to validate your choice of time zone against the
TZ
environment variable. If TZ
is not set, Gateway uses the local time of its host machine.operatingEnvironment > timezoneabbreviation
A list of time zone abbreviations and their default time zone regions. This is used to override the time zone abbreviations when parsing dates in rules and when parsing information from dataviews for standard formatting.
operatingEnvironment > internalQueueSizeLimit
Controls the maximum length of the internal update queue. Updates to data-items (e.g. a severity change as the result of running a rule) are placed in the queue temporarily between data updates.
The default maximum limit should be adequate for normal gateway operation. If a pair (or more) of rules are configured such that an update caused by rule A makes rule B fire and update, then this can cause the internal queue to fill faster than it is processed. If the queue is completely filled an error message is logged, and gateway performance is likely to be affected. The solution is to write rules A and B to be more selective, so that they do not fire each other.
Certain compute engine rules (typically involving wildcarded paths) can also fill the processing queue during gateway startup. The queue limit can be increased to prevent warning messages if required, however this should only be done if it is known that this situation is a "one off".
4000
operatingEnvironment > numRuleEvaluationThreads
Specifies the maximum number of rule evaluation threads the Gateway can run. These threads are used to execute rules on data changes, and can be enabled if rule execution is becoming a bottleneck on a busy gateway.
It is recommended that this is not set too high as doubling the number of threads does not double throughput. It should not be set higher than the number of CPU cores available.
To set the number of rule evaluation threads used the Gateway will determine the number of available processors, using the taskset
command on systems or the psrset
command on systems, and evaluate the numRuleEvaluationThreads variable. The lower value will be used to specify the number of threads used by the Gateway
to execute rules.
The number of rule evaluation threads used is recorded in the gateway log. If the available number of processors is changed while the gateway is running then the number of threads to use is re-evaluated at the next setup change.
For more detailed information about the optimum value to use, see Gateway Performance Tuning.
A hard limit can also be placed on the number of
rule threads by setting the environment variable
MAX_RULE_THREADS
to a positive integer. This will override the value specified in the Gateway Setup Editor.
0
(no threads used)
operatingEnvironment > historyFiles
The maximum number of history files that the gateway is allowed to create when receiving set-up changes.
Valid values are 0 -9999 inclusive.
To suppress history file creation altogether, set this to zero.
10
operatingEnvironment > dataDirectory
Allows you to specify where temporary files which the gateway may produce while running should be stored.
If not set, files are stored in the current working directory. If the directory specified already contains any of these temporary files, they are over-written.
The data directory must have read, write, and execute permissions as it needs to be able to read, write and search within it.
operatingEnvironment > duplicateRowAlerts
When duplicate rows in a single dataview are detected, gateway alerts the user of this fact as it indicates a configuration error. These alerts can be adjusted using this setting.
Value | Effect |
---|---|
NONE | No alerts regarding duplicate rows are produced. |
STATUS | The dataview samplingStatus headline is updated with an error message warning about duplicate rows. |
TICKER_AND_STATUS | The samplerStatus headline is updated as above, and additionally an event ticker event is created regarding the duplicate rows. |
TICKER_AND_STATUS
operatingEnvironment > insecurePasswordLevel
In a number of places throughout the Gateway configuration, passwords have to be specified.
Examples of this might be plugins that require logins to systems to retrieve the data they need, the gateway's connection to the database, or the configuration of users. In most of these places it is possible to enter the password in a number of different formats (depending on context), from a cleartext format, to more secure formats such as AES (for two way), and crypt (for one way).
While it may be useful to use a cleartext format in a UAT or testing environment, you may prefer to ensure that a secure format is used when in a production environment. This setting helps locate these, by causing each insecure password to generate an issue at the specified level. This is shown when validating or saving the setup.
Note: We have deemed standard encoded passwords (std) to be insecure since they are encoded rather than encrypted, and these will be flagged in the same way as cleartext passwords.
The setting has the following effects:
None
— no checks are performed on the security of the passwords and no issues are reported.Critical
— the setup cannot be saved and the Gateway cannot be started with any insecure passwords present.
With
Warning
or Error
set, the ability to save
the setup with insecure passwords present depends on if the -max-severity
command line
parameter is set. See the Gateway Installation Guide.
The Gateway data reports the level of this setting.
None
operatingEnvironment > allowComputeEngine
Specifies whether the Gateway compute engine feature is available to add additional data to existing dataviews.
It is allowed by default, but administrators and users can use this setting to disallow compute engine features. See Compute Engine.
operatingEnvironment > writeStatsToFile
The "write stats to file" section contains settings controlling how load monitoring statistics are written out from the gateway.
These statistics can then be read by the Gateway load. Also see the Gateway Performance Tuning for more information.
Connections tab
These settings are found under the Connections tab.
operatingEnvironment > heartbeatInterval
Number of seconds before a Gateway sends a heartbeat message to a connected component if it does not receive any communication from the component.
Gateway expects a reply within the number of seconds specified by the connectWait setting. If the reply is not received within this time, the connection is terminated and re-established.
The valid range for the heartbeat interval is 20-300 seconds inclusive.
75
(seconds)
operatingEnvironment > connectWait
Time in seconds to wait for a connection to Netprobe to be established. That is, the maximum duration the gateway waits after sending the initial TCP SYN segment for a SYN/ACK reply from the Netprobe.
The valid range is 1-300 seconds inclusive.
30
(seconds)
operatingEnvironment > dnsCacheExpiryTime
Time in minutes that the Gateway caches the result of resolving a hostname to an IP address.
Valid values are
0-2880
inclusive.
If set to 0
, hostnames are cached indefinitely.
720
(12 hours)
operatingEnvironment > clientConnectionRequirements > minimumComponentVersion > minimumForAllComponents
This instructs the Gateway to
reject connections from every component with versions older than the specified version.
You can specify the minimum version using the:
- Version number. For example,
GA4.7.0
, orGA2011.2.1
. - Version number with the build date. For example,
GA4.7.0-180529
.
operatingEnvironment > clientConnectionRequirements > minimumComponentVersion > components > component > name
Name of a Geneos component type. The drop-down list has the following options:
- Active Console
- Gateway
- Licence Daemon
- Netprobe
- Web Dashboard
- Webslinger
operatingEnvironment > clientConnectionRequirements > minimumComponentVersion > components > component > version
The minimum version of the component selected in operatingEnvironment > clientConnectionRequirements > minimumComponentVersion > components > component > name that the Gateway accepts connections from.
You can specify the minimum version using the:
- Version number. For example,
GA4.7.0
, orGA2011.2.1
. - Version number with the build date. For example,
GA4.7.0-180529
.
operatingEnvironment > clientConnectionRequirements > requireCertificates
This allows the gateway to require certificates when connections are made to the gateway for certain connection types. This can be enabled/disabled for each supported connection type. The following connection types are supported:
- Netprobe: Incoming connections from netprobes (this will include Floating Netprobes and Self-announcing Netprobes).
- Importing Gateways: Incoming Importing Gateway connections.
- Importing Gateways: Incoming Gateway connections from Importing Gateways (to which this gateway exports data).
- Secondary Gateways: Incoming Gateway connections from the secondary Gateway in a Hot Standby configuration.
operatingEnvironment > httpConnectionRequirements
This group of settings allow HTTP requests made to the Gateway (e.g. from a web browser) to be restricted.
operatingEnvironment > httpConnectionRequirements > internalData
The internal data web pages provide low level information about various parts of the system.
They may be requested by ITRS support when debugging issues. They do not form part of the normal operation of the Gateway so can safely be restricted to Geneos administrators.
The internal data pages available, and the information available on each page, can vary by version. However, the connection requirements cover all internal data pages, so will secure any pages added in future versions.
operatingEnvironment > httpConnectionRequirements > internalData > acceptHosts
This allows the internal data web pages, used for debugging issues, to be viewed only from particular locations. The available settings are:
All
— allow access from any host.Local
— allow access only from the local loopback interface where the Gateway is running (127.0.0.1).None
— prevent access completely.Specific
— a list of locations may be entered. Each item in the list can be specified as a hostname (if a reverse DNS entry is available for the remote host) or as an IP address. The source of any HTTP requests must match at least one item in the list otherwise they are rejected. If no items are specified, access is prevented completely.
The remote hostname and IP address are written to the Gateway log file, along with the URL requested, for any attempts that are blocked. This can be useful to see if the Gateway host is able to access a reverse DNS entry for the remote host and therefore what would need to be added to the 'specific' list for the request to be accepted. If a hostname is not available then the IP address is seen instead of the name in the log file, so will appear twice.
local
operatingEnvironment > DNS > maxAcceptableDNSLookupTime
The maximum time in seconds that the Gateway is allowed to perform a reverse DNS lookup.
If this time is exceeded, reverse DNS lookups are disabled for the IP address for the number of units of time specified in operatingEnvironment > DNS > DNSReverseLookupDisableTime > value.
For non-Gateway components, this setting defaults to the time specified in the environment variable $HR_TIMEOUT
.
1
operatingEnvironment > DNS > DNSReverseLookupDisableTime > value
Number of units of time that reverse DNS lookups are disabled for after exceeding operatingEnvironment > DNS > maxAcceptableDNSLookupTime.
The unit of time is specified using operatingEnvironment > DNS > DNSReverseLookupDisableTime > units
Once the time has elapsed, reverse DNS lookups are re-enabled for the IP address.
For non-Gateway components, DNSReverseLookupDisableTime can be specified using the environment variable $HR_REVERSE_LOOKUP_DISABLE_TIME
.
5
operatingEnvironment > DNS > DNSReverseLookupDisableTime > units
The unit of time used to determine how long DNS lookups are disabled for after exceeding operatingEnvironment > DNS > maxAcceptableDNSLookupTime.
There are two options:
minutes
seconds
minutes
Debug tab
These settings are found under the Debug tab.
operatingEnvironment > debug
A list of gateway debug settings. These settings are only intended for debugging error conditions and should be enabled with care.
Caution: Use of these setting is likely to adversely impact the performance of the Gateway and should only be enabled when debugging a particular configuration and in coordination with ITRS support staff.
Data Quality tab
These settings are found under the Data Quality tab.
operatingEnvironment > dataQueues > disableChecks
Enable this setting to disable the data quality checking algorithm.
false
(algorithm is run)
operatingEnvironment > dataQueues > maxDataAgeMs
Time in milliseconds of the maximum acceptable age for a pending update to a dataview. The limit is inclusive, so an update must be older than the set value to cause a connection to be suspended.
For more details on how this setting is used, see Data quality options.
Note: The default value is set to approximate the behaviour of Gateway versions prior to the introduction of the data quality feature.
77000
(77 seconds)
operatingEnvironment > dataQueues > connectionSuspensionDuration
Time in seconds that a connection (to a Netprobe) is suspended before the gateway reconnects.
For more details on how this setting is used, see Data quality options.
300
(5 minutes)
operatingEnvironment > dataQueues > suspendGracePeriod
Time in seconds specifying how long the gateway waits after suspending a connection before allowing further connections to be suspended.
For more details on how this setting is used, see Data quality options.
60
(1 minute)
operatingEnvironment > dataQueues > setupGracePeriod
Time in seconds specifying how long the gateway suspends the data quality algorithm for after a setup change.
For more details on how this setting is used, see Data quality options.
60
(1 minute)
operatingEnvironment > dataQueues > memoryProtection
Allows overriding the default data-queue memory protection settings.
For more details on how this setting is used, see Memory protection.
operatingEnvironment > dataQueues > memoryProtection > lowPriorityThresholdMB
Threshold size in MB for backlogged EMF messages at which the gateway throttles read-data from low-priority connections.
Low priority connections are importing EMF connections (Netprobe connections only). All other gateway connections continue to operate normally.
For more details on how this setting is used, see Memory protection.
500
operatingEnvironment > dataQueues > memoryProtection > highPriorityThresholdMB
Threshold size in MB for backlogged EMF messages at which the gateway throttles read-data from all connections.
In practice it is very unlikely even for a heavily overloaded gateway to hit this threshold, as the low-priority threshold is hit first.
For more details on how this setting is used, see Memory protection.
750
operatingEnvironment > conflation
Settings to control conflation of incoming monitoring data.
For more details on how this setting is used, see Conflation .
operatingEnvironment > conflation > enabled
Whether conflation is enabled.
Conflation can significantly aid an overloaded gateway and ensure that all published data is as up-to-date as possible. However, it does this by discarding out-of-date cell updates and should not be enabled if this is unacceptable.
For more details on how this setting is used, see Conflation .
operatingEnvironment > conflation > strategy
Specify the strategy for controlling gateway conflation, so that conflation is only enabled when required and no updates are unnecessarily discarded.
operatingEnvironment > conflation > strategy > maxDataAgeThreshold
Under this strategy the gateway does not enable conflation unless the maximum age of backlogged updates (as displayed by the Probe data ) exceeds a certain threshold.
Conflation works best when it is preventing stale data from building up rather than clearing large backlogs (not only does it have fewer backlogged messages to process, but it minimises the amount of updates conflated away) and the threshold should not be set too high.
An ideal value for the threshold is the minimum samplers > sampler > sampleInterval used in the setup. For this reason it defaults to the default sampleInterval of twenty seconds.
operatingEnvironment > conflation > strategy > maxDataAgeThreshold > threshold
Time in milliseconds for the threshold maxDataAge above which conflation is enabled. An ideal value for this setting is the minimum samplers > sampler > sampleInterval used in the setup.
20000