Gateway Rules, Actions, and Alerts
Rules Copied
Introduction Copied
Rules are a key part of the Geneos monitoring system. They allow run-time information to be updated and actions to be fired in response to specific Gateway events. Typically the updates apply to the severity of cells, reflected in the Active Console by red, amber and green cell backgrounds.
The rules top-level section contains a number of named rule groups, which can be nested. These in turn contain a number of named rules. The rule groups, apart from grouping the rules, can provide defaults to apply to all the rules they contain. If no defaults are required then rules may also be placed directly into the rules top-level section.
Rules can vary in complexity. A Show Rules command can be run by right clicking on any cell, or any item in the directory (Gateway, Probes, Managed Entities etc.). This shows the rules that have been defined for that item in the order that the Gateway will execute them.
The rule code which is the heart of a rule is described first. This is followed by a description of how to apply the rule to items in the system.
Additional functionality can be used with Rules that is not part of the basic Gateway operation. This is described in the Compute Engine section.
Rule Code Copied
Here is an example of a rule that might be set on the CPU utilisation of a host.
if value > 90 then
severity critical
elseif value > 70 then
severity warning
else
severity ok
endif
The following sections describe how to construct a rule.
Data Types Copied
There are a number of data types that can be used with the rules. These are string, integer, double (floating point/decimal), Boolean and null. Severity can also be used.
String Copied
Strings represent textual items. When they are used in rules, they must always be enclosed in double quotes:
"the quick brown fox"
If you need to use double quotes inside a string then they must be escaped with a backslash. To use a literal backslash it must be escaped with a second backslash. Forward slashes do not need escaping:
"he said \"I like foxes\" and \"I like cats\""
"C:\\Program Files\\ActiveConsole"
"/usr/local/geneos/gateway"
Integer Copied
Integers are whole numbers:
0
234
-45
Double Copied
Doubles are floating point numbers - numbers that have a decimal point:
0.1
15.1876
23.42
Boolean Copied
Booleans are logical true/false values:
true
false
Null Copied
Null is nothing:
null
Severity Copied
Literals can be used to indicate the severities an item can have:
undefined
ok
warning
critical
Severities are simply aliases for readability, and internally are treated the same as integers. The values are: undefined 0, ok 1, warning 2 and critical 3. This allows rules such as:
if severity > ok then ...
Conversion between types Copied
Data types will automatically be converted between types as necessary. The following table shows how the conversion takes place:
Where x indicates any other value.
Note
Nothing will ever convert to a null - null will always be converted to the relevant data type during comparisons.
To String | To Integer | To Double | To Boolean | |
---|---|---|---|---|
From String |
|
"10" => 10
"10.3" => 10
"10 x" => 10
"-10" => -10
"x 3" => 0
"x" => 0
|
"10.5" => 10.5
"74" => 74.0
"-1.3e3" => -1.3e3
"10.3 x" => 10.3
".45" => 0.45
"x" => 0.0
|
"" => false
"0" => false
"x" => true
|
From Integer |
1 => "1"
-1 => "-1"
|
|
1 => 1.0
-1 => -1.0
|
0 => false
x => true
-x => true
|
From Double |
3.2 => "3.2"
1.0 => "1"
-5.0 => "-5"
-5.1 => "-5.1"
|
5.1 => 5
5.9 => 5
-5.1 => -5
-5.9 => -5
|
|
0.0 => false
x.x => true
-x.x => true
|
From Boolean |
true => "1"
false => ""
|
true => 1
false => 0
|
true => 1.0
false => 0.0
|
|
From null | null => "" | null => 0 | null => 0.0 | null => false |
Properties Copied
It is possible to query the properties of the item that the rule is set on.
if value > 10 then...
if severity = critical then...
if attribute "LOCATION" = "London" then...
The following properties are available:
Additionally, the keyword previous
may be used to refer to the previous value of a property. For example:
if value > previous value + 10 then...
if severity <> previous severity then...
Only one property changes at a time, so it is not possible to use the previous
keyword to refer to multiple properties, like this:
if value > previous value and severity <> previous severity then...
but it is possible to refer to the same property multiple times
if value > previous value or value < previous value - 10 then...
When an item is first created, or when a rule is first added, then the previous
value of any property will be null.
Note
previous
allows access to the value the last time the rules were run, and not the last time the value changed.For a given rule evaluation,previous
will access the previous value of the attribute whose change triggered that rule evaluation. For any other attribute,previous
will access the current value of the attribute. For more details and an example of the implications of this, please see Re-Evaluating Rules.In general, a rule should not useprevious
for attributes which the rule itself changes.
Property | Target | Description |
---|---|---|
value | Cells | Returns the current value of the cell (Any) |
severity | DataItem | Returns the severity of the DataItem (Severity) |
active | Cells | Returns the active status of the cell (Boolean) |
snoozed | Data-items | Returns the snooze state of the DataItem (Boolean) |
state "userAssigned" | Data-items | Returns the userAssigned state of the DataItem (Boolean) |
attribute "<name>" | Managed Entities | Replace <name> with the name of the managed entity attribute you wish to query. e.g. attribute "LOCATION". If the managed entity has this attribute it's value is returned otherwise it evaluates to a blank string. |
param "HostName" | Gateways, Probes | Returns the hostname on which the gateway/probe is running (String) |
param "Port" | Gateways, Probes | Returns the port on which the gateway/probe is listening. (Integer) |
param "HotStandbyRole" | Directory | Returns the role of the gateway. (One of "Stand Alone", "Primary" or "Secondary") |
param "Group" | Directories, Probes, Managed Entities,Samplers | Returns the group name of the data-item. There is not a group name for dataviews or cells. (String) |
param "Description" | Managed Entities,Samplers | Returns the description of the data-item. (String) |
param "BannerVar" | Managed Entities | Returns the path to the banner variable define on the entity. (String) |
param "Virtual"
param "Floating"
param "SelfAnnounced"
|
Probes | Returns true if the probe is of the type specified. (Boolean) |
param "Secure" | Probes | Returns true if the probe is connected to Gateway securely. (Boolean) |
param "Imported" | Probes | Returns true if the probe is imported. (Boolean) |
param "ExportingGateway" | Probes | Returns the name of the exporting gateway which the probe is imported from (if it is). (String) |
param "PluginName" | Samplers | Returns the plugin name of the sampler. (String) |
param "SampleInterval" | Samplers | Returns sample interval of the sampler in seconds (Integer) |
param "UsingSampleInterval"
param "UsingFileTrigger"
param "UsingSampleTime"
|
Samplers | Returns true if the sampler is using the specified sample type. (Boolean) |
rparam "ConState" | Probes | Returns the connection state of the probe ("Unknown", "Up" , "Down", "Unreachable", "Rejected", "Removed", "Unannounced", "Suspended", "WaitingForProbe") |
rparam "ImportedConState" | Probes | Returns the imported connection status of the probe ("Unknown", "Up", "Down", "Suspended", "Rejected", "Unreachable") |
rparam "Rejection Reason" | Probes | Returns a numerical code passed by the probe to the gateway to explain the reason that it rejected the connection from the gateway (Integer) |
rparam "Rejection Message" | Probes | Returns a human readable string that explains the reason that probe rejected the connection from the gateway (String) |
rparam "Version" | Probes | Returns the version of the probe (String) |
rparam "OS" | Probes | Returns the OS string as reported by the operating system. (String) |
rparam "AssignedUser" | Data-items | Returns the name of the person that item has been assigned to. (String) |
rparam "SampleIntervalActive" | Sampler | Returns true if the sampler is active (otherwise false) (Boolean) |
rparam "SampleTime" | Dataview | Returns the time of the last Sample in a human readable form (String) |
rparam "SampleInfo" | Sampler | Returns human readable string published by the probe about the sampling. This is normally blank (String) |
rparam “ImportingConnectionName” | Probes | Returns the configured connection name for importing Gateway to Gateway connections. |
Data Items and Paths Copied
Paths can also be used to refer to the properties of other data-items, termed secondary variables. The paths themselves are defined separately in the path aliases section of the rule, and are each given names.
if value > path "mypath" value then...
The paths may be absolute, in that they refer to an exact item in the system. They can also be relative, for example “the same row, but column X”. More information about properties and paths can be found in the Geneos XPaths.
Other live system data Copied
It is also possible to check the state of other parts of the Geneos system.
It is possible to test against the time of the gateway:
if within activetime "time1" then...
This may be used for changing the thresholds at different times of the day, for example:
if value > 70 and within activetime "trading hours" then
...
elseif value > 90 then
...
else
...
endif
Variables Copied
Managed Entity variables are accessible from within rules. If a variable does not exist, an empty value will be returned (the empty value converts to 0 if used as a numeric value). In most cases using a non simple variable will not work in rules. The simple variable types are boolean, string, integer, and double. The inList() function is a special case as this function can reference stringList variables as well as simple variables.
if value > $(variable) then ...
Local variables can also be set, which are then accessible as per the above.
set $(variable) "literal string value"
If repeatedly using a path lookup value (e.g. in a sequence of if statements) it is more efficient to store the looked-up value in a variable and reference this instead.
set $(variable) path "aliasName" value
Variable Active Time Copied
It is possible to use variable active time within rules.
if within activetime $(vTime) then...
where “vTime” may be defined as a variable of type active time and have different values per managed Entity.
This may be used for changing the thresholds at different times of the day for each geographic locations, for example:
if within activetime $(vTime) then
if value > 90
...
elseif value > 90 then
...
else
...
endif
Target Names Copied
Parts of a unique name can be extracted from the target data-item when a rule is executed. These names can then be used for comparison as for a normal value. For example, to set a variable to the row name of the target item, use the following:
set $(row) target "rowName"
The names which can be extracted are as follows:
- gatewayName
- netprobeName
- netprobeHost
- netprobePort
- managedEntityName
- samplerName
- samplerType
- dataviewName
- rowName
- headlineName
- columnName
Operators Copied
There are a number of operators that can be used to manipulate the data. Operators typically appear between two expressions (called the left hand side and right hand side), for example:
5 + 3
true and false
Some operators only operate on a single expression, for example:
not true
Comparison Copied
Comparison operators allow two values to be compared. The result will be a Boolean.
Operator | Description |
---|---|
= | Equality operator. The result is true if both sides are equivalent. Both sides will be converted to the same type before comparison. Text comparisons are case sensitive. |
<> | Not equal operator. The result will be true if = would return false. |
> | The result will be true if the left hand side is greater than the right hand side. Both sides are converted to numbers before the comparison takes place. |
< | The result will be true if the left hand side is less than the right hand side. Both sides are converted to numbers before the comparison takes place. |
>= | The result will be true if the left hand side is greater than or equal to the right hand side. Both sides are converted to numbers before the comparison takes place. |
<= | The result will be true if the left hand side is less than or equal to the right hand side. Both sides are converted to numbers before the comparison takes place. |
like |
Similar to =, but the right hand side should contain a string which may have wildcards in it. An asterisk (*) matches 0 or more characters and a question mark (?) matches a single character. The comparison is case insensitive.
e.g.
'hello' like 'h*o'
|
unlike | The result will be true if like would return false. |
Logic Copied
The logical operators operate on Boolean values and result in Boolean values.
Operator | Description |
---|---|
and |
The result will be true if both sides evaluate to true. If the left hand side evaluates to false then the right hand side will not be evaluated, because it is irrelevant to the outcome. This is called short circuiting. |
or |
The result will be true if either side evaluates to true. If the left hand side evaluates to true then the right hand side will not be evaluated, because it is irrelevant to the outcome. This is called short circuiting. |
not | The result will be true if the right hand side is false, false if the right hand side is true. |
Arithmetic Copied
The arithmetic operators operate on numbers and produce numbers as results.
Operator | Description |
---|---|
+ | Adds two numbers together. If either side is a double then the result will be a double, otherwise it will be an integer. |
- | Subtracts the right hand side from the left hand side. If either side is a double then the result will be a double, otherwise it will be an integer. |
* | Multiplies two numbers together. If either side is a double then the result will be a double, otherwise it will be an integer. |
/ |
Divides the left hand side by the right hand side. Will always result in a double value. Note: Dividing by zero will result in 0.0. |
% |
Modulo operator. Gets the remainder after the left hand side is divided by the right hand side. Will always result in an integer value. Note: Modulo 0 will result in 0. |
Order of precedence Copied
Normal mathematical order of precedence rules apply, so given the following:
5 + 3 * 6
3 * 6 will be evaluated first, producing 18. 5 will then be added to this, producing 23. The order of evaluation can be controlled using parentheses, for example:
(5 + 3) * 6
This will cause 5 + 3 to be evaluated first, producing 8. This will then be multiplied by 6, producing 48.
The following table shows the order of precedence for all the operators. Items nearer the top of the table will bind tighter (will be evaluated first). This explains the above example, as * appears higher than + in the table. Items at the same level will evaluate from left to right.
not |
* / % |
+ - |
< > <= >= |
= <> like unlike |
and |
or |
Functions Copied
A range of functions exist to manipulate the data. Functions take one or more parameters, depending on the function being used. The parameters are enclosed in parentheses and separated by commas:
function(parameter1)
function(parameter1, parameter2)
function(parameter1, parameter2, ...)
For example:
abs(-8)
The list of available functions is available in the Functions configuration section.
Control Statements Copied
if/else Copied
The if/else construct allows choices to be made based on multiple conditions. One of the following forms can be used:
if [condition] then
[optional additional if/else statements]
[updates and actions] or [set variables]
[optional additional if/else statements]
endif
if [condition] then
[optional additional if/else statements]
[updates and actions] or [set variables]
[optional additional if/else statements]
else
[optional additional if/else statements]
[updates and actions] or [set variables]
[optional additional if/else statements]
endif
if [condition 1] then
[optional additional if/else statements]
[updates and actions] or [set variables]
[optional additional if/else statements]
elseif [condition 2] then
[optional additional if/else statements]
[updates and actions] or [set variables]
[optional additional if/else statements]
else
[optional additional if/else statements]
[updates and actions] or [set variables]
[optional additional if/else statements]
endif
The conditions are expressions that must evaluate to a Boolean value. Typically this will be the case because a comparison operator will be used.
The code block for each if or else condition in an if/else construct can contain [updates and actions] or it can [set variables]. It cannot do both. However, it can contain as many other if/else constructs as needed. Each code block in an if/else construct is independent so the contents of one block will not enforce any requirements on the contents of any other block. For example:
if value <> previous value then
set $(changed) "true"
if value > 75 then
severity critical
else
severity warning
endif
endif
Here the outer if block contains a [set variables] operation, but this does not restrict the inner if/else from performing an [updates and actions] operation.
Updates and actions are grouped in a transaction that all occur together. This is detailed further in the Rule Evaluation section. The set statements are not part of the transaction and so to avoid confusion, they cannot be included in the same if block as the transaction. If the same boolean expression should be used to set variables and fire a transaction then two separate if statements are required.
Updates and Actions Copied
If the condition of a rule is true then a number of updates or actions can be applied to the system.
Updates Copied
Updates have the following format
[property] [value]
e.g.
severity critical
The most common property to set is severity, but it is also possible to change the active properties of items and, with the compute engine, the value.
It is only possible to change properties of the target of a rule. This is similar to the behaviour of Microsoft Excel, where the result of a calculation is shown in the cell where the formula is defined.
Actions Copied
Actions have the following format:
run [action name]
e.g.
run "email support team"
Action Throttle Copied
The default throttles for an action can be overridden in rules where the action is called. An example would be in rules where a single action “email” may be throttled more for CPU problems than for security problems with UNIX-USERS. A rule set on CPU utilisation may look like:
if value > 10 then
severity warning
run "email support" throttle "ten a minute"
run "email managers" throttle "three in five mins"
endif
Whereas a rule on a security issue might look like:
if value > 10 then
severity critical
run "email support" throttle "twenty a minute"
run "email managers" throttle "ten in five mins"
endif
Although the same could be achieved by creating several actions, allowing the throttles to be overridden like this prevents duplication of data in the setup file.
For more information on throttling, refer to Action Throttling.
User Data Copied
When actions are specified, then additional data can be passed. These userdata
variables are set as environment variables when the script runs.
userdata
variables are created by Actions. They exist only for the duration of a rule execution, and they are not available after a rule has completed execution. For example, they are not available in an Effect called from the Alerting subsystem.
Note
userdata variables are not passed between nestedif/else
statements. Eachif/else
statement is a separate transaction, and userdata variables are made in the context of each transaction. Therefore, whenif
statements are nested, each outerif
statement has its own transaction that becomes active separately from the transaction of any innerif
statement.
You can set the following types of information:
name / value pairs
For shared libraries environment variables are passed as arguments. See Shared library actions.
values of the Variables Copied
userdata "variable" $(var)
The value of the variable named var will available to the environment variable “variable”.
Data item properties such as value, active, snoozed etc Copied
DataItem userdata "dataItem" value
The environment variable “dataitem” will contain the value of the dataitem that is the subject of the rule.
A selection of properties from several data items based on an xpath Copied
userdata "dataItems" %%//cell value
The environment variable data item will contain a list of cell values in the form.
[false, 1, 2.3, "four"]
A target such as a dataview name.
userdata "view_of" target "dataviewName"
The environment variable “view_of” will be set to the name of the dataview
If system is in a particular active time Copied
userdata "active" within activetime "timeActive"
The environment variable “active” is set to true or false depending if the system is within the activetime or not
The value of managed entity attributes Copied
userdata "failover_attribute" attribute "FAILOVER"
Sets the environment variable “failover_attribute” to the value of the managed entity attribute “FAILOVER”.
Current timeseries values Copied
userdata "usual-at-this-time" timeseries "MyTimeSeries"
The environment variable “usual_at_this_time” is set to the value of the time series at the present moment.
Delay Copied
A delay may be specified that will queue the updates and actions for the specified number of seconds or samples. If the condition is no longer true before the end of the delay then the updates and actions will be cancelled. This allows rules such as “if the value is greater than 70 for more than a minute then the severity is critical”. By default there is no delay. The format is:
delay [length] (seconds/samples)
Example:
delay 60
delay 60 seconds
delay 2 samples
The delay can be specified in terms of seconds or samples. If no unit is specified, then seconds is assumed. When the delay is specified in samples, this refers to the sampling of the target of the rule.
Note
It is pointless setting a delay as a number of seconds less than the sample interval of the sampler producing the data. This is because the value doesn’t have a chance to change and so the delay will not prevent the severity change. If a delay is set in seconds then it is therefore recommended to be at least two and a half times the sample interval. This issue can be avoided altogether by setting the delay in terms of samples.
Rules Copied
An individual rule contains the rule code (which may contain multiple if statements), and has a number of other attributes such as the targets that it applies to and the priority within the system.
Targets Copied
Rules must specify one or more targets that they apply to. A target is an XPath expression which specifies one or more parts of the system, for example individual cells, all cells in a column, the same cell on all managed entities, the samplingStatus headline of every dataview, etc. Rules that change the value of cells or headlines should only target computed data items you have created using the Compute Engine.
Rules may optionally have additional XPath expressions, called contexts, which are typically used to restrict the targets of a rule to particular managed entities or types. These are particularly useful when set using rule group defaults (see rule group defaults below).
For example, if a rule’s target was “all headlines named samplingStatus” and it had a context selecting “all headlines in the FixedIncomeLondon managed entity”, then the rule would apply only to sampling status headlines in that entity. It could be extended to other managed entities by adding more context XPaths or to other named headlines by adding more target XPaths.
Briefly, if no contexts exist, a rule applies to a particular item if it matches this expression:
(target1 or target2 or target3 …)
If contexts do exist for a rule, the rule applies to items that match this expression:
(target1 or target2 or target3 …) and (context1 or context2 or context3 …)
So at least one target and at least one context must match an item for the rule to apply to it.
Priority Copied
Multiple rules can apply to a single item. Priority is used to determine the order that the rules are evaluated. Rules with a higher overall priority are evaluated first. Every rule must set a priority and may optionally specify a priority group. 1 is the highest priority that can be set, 2 is lower than this, etc.
The overall priority of a rule is determined first by its priority group and then by its priority. If no group is set then it is treated as 0, i.e. the priority will be higher than any rule where a priority group is set. Priority groups may be set using rule group defaults (see rule group defaults below).
An optional setting on each rule, Stop further evaluation, allows lower priority rules to be ignored altogether. The ‘Show Rules’ command shows all rules applying to an item in order of priority; if the ‘Stop further evaluation’ setting applies, a dividing line with the text ‘EVALUATION STOPS HERE’ is shown so that you can see which are being used and which are not.
When priority is displayed (for example by the ‘Show Rules’ command), it will be shown as group:priority, so a rule with a priority group of 7 and a priority of 3 will be shown as 7:3.
Here are some examples of some priorities, shown with the highest priority first and lowest priority last:
0:2 0:5 0:6 1:1 1:8 3:4 4:2
If two rules have the same overall priority (that is the same priority and priority group) then the system may evaluate them in any order; this order may vary between target items and may change when the configuration is changed or reloaded.
ActiveTime Copied
The active time of a rule determines when it applies to the system. Outside of the active time it will be as if the rule did not exist at all. Multiple active times can apply to each rule, and these will be combined together using the same rules as other active times. See Active Times for more details.
By default the active times set on rules also affect the active state of the rule’s target cells. The target cell will go inactive when all rules that apply to it go inactive (the cells active state is a logical OR of the active state of all rules that have activeTimeAffectsCell set true). This can be turned off on a rule by rule basis using the activeTimeAffectsCell setting, or overridden by explicitly setting the active state of a cell from within a rule.
Rule commands Copied
You can perform various commands from the Active Console, which allow you to examine and modify rules.
Show rules Copied
This command allows you to display the available information about the rules set on a specified target.
The information is displayed in a new Output window which shows:
Note
You can see the current value of any variable used in the rule code by hovering the cursor over it.
The possible targets are:
- Cell
- Dataview
- Sampler
- Managed Entity
- Probe
- Directory
Property | Description |
---|---|
Gateway | Name of the current Gateway |
Rule name | Name of each rule |
Group | Name of the Rule Group each rule belongs to |
Priority | Priority level of each rule |
Active | Shows if the rule is active or inactive |
Context |
Lists all the contexts defined for this rule. Contexts that apply to the target appear in boldface |
Targets | Path to the object each rule targets |
Rule | The code that implements the rule |
Show variables Copied
This command allows you to display the available variables that can be used to create rules for a given target.
The information is displayed in a new Output window which shows:
Variables Copied
Macros Copied
The possible targets are:
- Cell
- Dataview
- Sampler
- Managed Entity
- Probe
- Directory
Column | Description |
---|---|
Variable | Name of the variable |
Type | Data type of the variable (for example: string, integer, double) |
Source | Name of the file where the variable is defined |
Section | State tree section where the variable is defined |
Column | Description |
---|---|
Macro | Name of the macro |
Value | Value stored in the macro |
Show rule contentions Copied
This command allows you to display a list of rules ordered by priority. This is useful when examining conflicts between rules.
The information is displayed in a new Output window which shows:
The possible targets are:
- Dataview
- Sampler
- Managed Entity
- Probe
- Directory
Column | Description |
---|---|
rule | Name of the rule |
file | Name of the file where the rule is defined |
priority group |
Value of the rule's priority group (lower is more important) The overall priority of a rule is determined first by its priority group and then by its priority |
priority | The relative priority of the rule (lower is more important) |
Rule Evaluation Copied
As has already been mentioned, rules will be evaluated from top to bottom in priority order. However it is important to understand what happens when updates to the system are applied and actions are fired.
The most important factors to remember are:
- Each property can only be set once per rule evaluation.
- Updates are transactional
- Updates are applied once rules have finished evaluating
- Rules are evaluated whenever necessary
Single Property Updates Copied
Consider a rule like:
if value > 20 then
severity warning
endif
if value > 30 then
severity critical
endif
If the value is greater than 30 then the two updates conflict. The first ‘if’ will set the severity to warning; the second ‘if’ will also try to set the severity, but will not be able to, because it is already being set. (There is also a problem here in that the rule has no ’else’ clause to reset severity to ‘ok’ or undefined when the values falls below 20, see if without else)
Transactions Copied
The transactional element means that updates and actions that are grouped together will either all be applied, or none of them will be applied. Expanding on the problematic rule from the previous section:
if value > 20 then
severity warning
run "email support"
endif
if value > 30 then
severity critical
run "email boss"
endif
Since the severity has been set to warning, the action “email support” can fire, but since the severity cannot be set to critical, the action “email boss” will not fire.
Note
As long as the value remains over 20 then the first transaction is ‘reachable’ or ‘active’ (i.e. if you run the rule again then you will still set severity to warning and run email support). This means that ’email support’ will be eligible for escalation/repetition (see the Actions section for more details).
Applying Updates Copied
Take the following rule
if severity = critical then
severity warning
endif
if severity = critical then
run "email support"
endif
If the severity turns critical then it will be changed to warning, but only after the rule has finished evaluating. That means that the second condition will also still be true, and support will be emailed.
Re-Evaluating Rules Copied
When writing rules, it should be assumed that they can run at any point, and will be run whenever the state of the system changes. Take the following rule:
if value <> previous value then
severity critical
else
severity ok
endif
When the value of the item changes then the severity will be set to critical. As soon as the rule has finished and the state of the system has been changed then the rule will be re-evaluated, and since the value has not changed, the severity will be immediately returned to ok. In this case the severity change may not be visible to the user, but an event will be generated that can be seen in the event ticker.
Actions can also be applied to the above:
if value <> previous value then severity critical run "email support"
else severity ok
endif
In this case, as well as the ticker event being generated, the action will fire whenever the value of the target changes.
Note
Since the rule will be re-evaluated the action will not be eligible for escalation or repetition.
Note
It is important to understand that when part of a rule is triggered and fires, resets an action or changes some attribute, the rule will be re-evaluated. This is of particular significance when using theprevious
keyword, as it will only access the previous value of the attribute whose change triggered the rule evaluation. For any other attribute,previous
will access the current value. Using theprevious
keyword for attributes in a rule which are changed by the rule itself may cause duplicate actions as the rule will be re-evaulated multiple times.
Disabling Rules at Start Up Copied
In a large Gateway setup, the overall time taken for the Gateway to start up can sometimes be reduced by delaying the application of rules. This is controlled by the startup delay setting, which allows you to specify an interval in seconds between the Gateway becoming active and the rules being applied.
This may be useful because, when a data item is added to the Gateway, all existing rules must be checked to determine whether they apply to that item. Conversely, when a rule is added, all existing data items must be checked to see whether the rule applies to them. In some cases - for instance if there are a huge number of data items and many rules, most of which only apply to a few items - it is faster to check each rule against all the data items than vice versa. The startup delay setting lets the gateway start with no rules and wait until most of the data items are present before applying them.
The length of the delay interval, and whether an interval is useful at all, must be determined by experiment in a non-production environment. This is because it is not possible for the Gateway to detect when it has finished connecting to all available Netprobes, since it does not know which Netprobes are about to come up, or to determine when all samplers have finished sending their initial data, since this depends on the nature of each sampler and its configuration.
When rules are enabled after a startup delay, the ‘fire on configuration change’ settings for Actions, Alerts, Ticker Events and database logging are respected. The initial application of a rule is considered to take place in the context of a configuration change so that, by default, any actions, events or alerts triggered as the rule is first applied will be suppressed.
Rule Groups Copied
Rule groups may be used to group rules together in logical sets for display purposes in the setup editor. However, they can also be used to set defaults that apply to all rules that they contain.
Multiple sets of defaults can also be specified, usually using contexts so that each set of defaults applies to a different set of items. For example, different default active times could be used for separate sets of managed entities.
Rule defaults Copied
Defaults can be set for contexts, priority group and active times. These will each be set on any rules that do not have these set already.
For example, if a set of defaults is configured, with default priority group 5, active time “London business hours” and a context of ‘managed entities with attribute “Region” set to “London”’;
- A rule in the group with priority group 3, no active time and no context would get the active time and context specified in the defaults
- A rule with no priority group, active time “London evenings” and no context would get the priority and context specified in the defaults
- A rule with priority group 2, no active time and the context of ‘managed entites with attribute “Division” set to “Fixed Income”’ would get the active time “London business hours” from the defaults. Since the default context would not apply, this rule would apply to all managed entities in the “Fixed Income” division, regardless of their “Region” attribute. This may not be what was intended: contexts should be specified either via defaults or directly on rules, but not both ways in the same rule group.
Transaction defaults Copied
Transaction defaults can extend some or all of the transactions in all the rules in the group. The specification for transaction defaults has two parts: a ‘match’ section and a ‘data’ section; both consist of statements in the rule language.
The ‘match’ section specifies which transactions will be extended, based on the updates they perform. For example, it could specify “severity critical” or “active false”. If no match conditions are specified then the defaults will apply to all transactions. If multiple match conditions are specified then the defaults will apply only to transactions which meet all the conditions.
The ‘data’ section specifies one or more statements, each of which will be added to the matching transactions, as long as it does not conflict with a statement already present in the transaction.
- An update statement specified as a default will be added to matching transactions which do not update any properties.
- A delay statement specified as a default will be added to matching transactions which do not set a delay.
- A run action statement specified as a default will be added to matching transactions which do not run an action.
- A userdata statement specified as a default will be added to all matching transactions. If a transaction sets the same user data variable, its setting will take priority over the default.
For example, consider a transaction default which matches “severity critical” and sets “run “email support””: this will add the “email support” action to any transactions that set the severity to critical and do not already run any actions.
A set of defaults can contain multiple transaction defaults, each with their own ‘match’ and ‘data’ sections. If a transaction in a rule matches more than one transaction default, each ‘data’ section is applied in turn.
Nested rule groups Copied
Rule groups can be nested and sets of defaults can be specified at each level of nesting. Defaults on groups at an inner level of nesting are not merged with defaults on the groups that contain them: if a set of defaults defined on a group has the same name as a set defined at an outer level, the innermost set of defaults apply; otherwise it as if the sets of defaults defined at the outer level were copied to the inner level.
Common Pitfalls Copied
if without else Copied
It is not necessary to put an else in an if statement, but in most cases it is sensible to do so. A rule such as the following may cause problems:
if value > 10 then
severity critical
endif
If the value starts off as 5 then the cell will be grey. If the value becomes 11 then the cell will go red. If it goes back to 5 then it will still be red, because it has not been told to do anything different. In fact, it will continue to be red until the gateway or Netprobe are restarted, or the rule is changed. What was probably meant was:
if value > 10 then
severity critical
else
severity ok
endif
Overlapping defaults Copied
If two or more sets of rule group defaults apply to to the same rule, it is as if the rule was specified twice, once for each set of defaults.
For example, suppose Rule Group G contains these two sets of defaults and a relevant rule:
- Default “x”: In transactions that set severity critical, run action X
- Default “y”: In transactions that set severity critical, run action Y
- Rule 1: When value > 90 then severity critical
In this scenario, both defaults will apply to Rule 1, effectively creating two instances of rules within the gateway:
- Rule 1, defaults “x”: When value > 90 then severity critical, run action X
- Rule 1, defaults “y”: When value > 90 then severity critical, run action Y
Both the rules will apply to the rule targets, and the “Show Rules” output for a given target will show both rules. Because both rule transactions set severity, only one will run. If the priority group is the same in each case, then the choice of rule is unpredictable.
To avoid this ambiguity, if multiple sets of defaults apply to a group of rules, each set of defaults should specify a different set of contexts. As long as these contexts do not overlap and as long as none of the rules to which they apply specifies any contexts, at most one rule will apply to each item.
Alternatively, (if some overlap of contexts is unavoidable) each set of defaults could specify a different priority group, or they could have different (and non-overlapping) active times. As long as none of the rules specifies a priority group (or an active time), this will ensure that at most one rule applies at a time.
Configuration Copied
Rules Copied
rules > ruleGroup Copied
Rule groups allow rules to be grouped together, and can also provide default values for a number of settings. See Rule Groups.
rules > ruleGroup > name Copied
Specifies the rule group name.
Mandatory: Yes
rules > ruleGroup > default Copied
Specifies defaults that apply to rules (rather than the rule code block contained within a rule), for example setting an active time. More than one default setting can be configured.
rules > ruleGroup > default > name Copied
Specifies the default setting name.
Mandatory: Yes
rules > ruleGroup > default > rule Copied
Specifies defaults that apply to rules (rather than the rule code block contained within a rule), for example setting an active time.
rules > ruleGroup > default > rule > contexts Copied
Specifies default contexts that will apply to any rules that don’t already have at least one context. This target cannot include any runtime information in its filters. If it does then you will see an error like;
WARN: RuleManager Ignored context for default 'Broken default' as XPath contains non-identifying predicate
Mandatory: No
rules > ruleGroup > default > rule > priorityGroup Copied
Specifies a default priority group that will apply to any rule that doesn’t already have a priority group set.
Mandatory: No
rules > ruleGroup > default > rule > activeTime Copied
Specifies an active time that will apply to any rule that doesn’t already have at least activetime set. It can be specified using active time name or a variable active time.
Mandatory: No
rules > ruleGroup > default > transaction Copied
Specifies defaults that apply to transactions. For example, setting a default action to run.
rules > ruleGroup > default > transaction > match Copied
Specifies which transactions will receive the defaults. Any parts specified here must be present and match those in the rule for the defaults to be applied.
Mandatory: No
Default: All transactions will receive the defaults.
rules > ruleGroup > default > transaction > data Copied
Specifies the defaults to apply to the transactions that match. Each part specified here will be added to each matching transaction, as long as it does not conflict with the existing content of the transaction.
rules > rule Copied
An individual rule provides all the relevant information necessary to provide alerting on parts of the Geneos system. See the Rules section for more details.
rules > rule > name Copied
Specifies the name of the rule.
Mandatory: Yes
rules > rule > contexts Copied
Specifies the data-items that this rule applies to. These are normally more general than targets, and typically restrict the targets (e.g. a context may specify all cells inside two managed entities, and the targets may specify all cells in a particular column). This target cannot include any runtime information in its filters. If it does then you will see an error like;
WARN: RuleManager Ignored context for rule 'Broken rule' as XPath contains non-identifying predicate
Mandatory: No
Default: All items are valid (if no rule group defaults apply)
rules > rule > targets Copied
Specifies the data-items that this rule applies to - the items that will be affected by property updates. This target cannot include any runtime information in its filters. If it does then you will see an error like;
Rule 'Broken rule' ignored as a target contains non-identifying predicate
More information about identifying predicates and non-identifying predicates can be found in Geneos XPaths especially in the section on Predicates.
Mandatory: Yes
rules > rule > priorityGroup Copied
The priority group of the rule. Higher priority (lower numbered) rules will be evaluated before lower priority ones. A rule with a priority group of 2 and priority of 3 has a higher overall priority than (and will be evaluated before) a rule with a priority group of 3 and a priority of 2.
Mandatory: No
Default: 0 (if no rule group defaults apply)
rules > rule > priority Copied
The priority of the rule. Higher priority (lower numbered) rules will be evaluated before lower priority ones. Rules with the same priority may be evaluated in any order.
Mandatory: Yes
rules > rule > activeTime Copied
Active times specify when this rule is active. When the rule is outside the active times then it is as if the rule was not in the setup file at all. By default the active times set on a rule will also affect the active state of the rule’s targets. See activeStateAffectsCell for details.
Mandatory: No
Default: Active all the time (if no rule group defaults apply)
rules > rule > activeStateAffectsCell Copied
If set then the active state of the rule affects the active state of the cell. The active state of the cell is set from a logical OR of all the rules that apply to it that have this setting set. If this setting is set false then the active state of the rule will not affect the active state of its target cells in any way.
Mandatory: No
Default: true
rules > rule > stopFurtherEvaluation Copied
If set then any lower priority rules on the same item will not be evaluated. This is indicated in the ‘Show Rules’ command with a horizontal line with the text ‘EVALUATION STOPS HERE’.
Mandatory: No
Default: false
rules > rule > pathVariables Copied
Path variables are used in pathAliases to dynamically specify parts of a path.
The most common usage of path variables is to extract the row name from a data-item, and then use this in a path alias to extract the value of a cell in a different dataview, which has a corresponding name.
To do this, configure a path variable named pathVar
to reference the row name using the following syntax:
target "rowName"
Then configure a path alias (e.g. by dragging a cell from the relevant dataview) and alter the row name to reference the path variable above by setting the name comparison value to $pathVar
.
Note
Note: Path variables can only be utilized in path aliases for rules and cannot be used to replace column or headline names. Path variables are invalid everywhere else.
Mandatory: No
rules > rule > pathAliases Copied
Path aliases may be used in rules to refer to secondary data-items.
Mandatory: No
rules > rule > evaluateOnDataviewSample Copied
Enabling this would cause this rule to evaluate only when the data view of the item specified in “target” does a sample.
This would be useful in a situation where a rule is being used to populate the value of a cell. Typically such a rule would populate the value of a cell based on a calculation involving the values from a number of cells. E.g.:
value total(wpath "s" value)
In the above rule, the rule target is being populated with the total of the values from the cells denoted by wildcarded path “s”. (For example, “s” might refer to all the cells in a column).
By default, the rule evaluates whenever the value of any of the cells denoted by “s” changes. If this flag is enabled, the rule will only be evaluated when the dataview of the “target” of the rule does a sample.
Enabling this flag has two advantages:
- Performance: As the rule is guaranteed to be evaluated only once per sample interval. The benefits are more apparent in rule that calculated a value based on a large number of cells. E.g. The total of a 100 cells.
- When performing a rate calculation based on a computed cell: The rate function uses the current and previous value of a cell to calculate the rate of change per second. When a rate is calculated based on a computed cell, let’s say a total, the calculated rate would not be particularly useful if the total is updated each time one of the source cells for the total changes. Enabling this flag on the rule that calculates the total would make the rule that calculates the rate produce a more useful value. (In fact the gateway will produce a warning if the gateway spots that the above described situation has occurred. This can be disabled using disableRateWarning rules > rule > disableRateWarning on the rule that does the rate calculation.)
Mandatory: No
Default: false
rules > rule > disableRateWarning Copied
Disables the warning mentioned in evaluateDataviewSamples.
Mandatory: No
Default: false (i.e. produce a warning if needed)
rules > rule > block Copied
This is where the rule code goes. This gets evaluated each time any relevant data changes. The right-click menu provides some of the most common keywords and functions that can be used. For a full list of what can appear in a code block, and details of how each can be used, please refer to Rule Code.
Mandatory: Yes
rules > startupDelay Copied
This section controls whether the Gateway will delay applying rules for a number of seconds after it is started or becomes active in the course of a Hot Standby failover or failback. See the Disabling Rules at Start Up.
rules > startupDelay > interval Copied
The number of seconds that the Gateway will wait after becoming active before it applies the rule configuration.
Mandatory: No
Default: 0 (i.e. no delay)
Functions Copied
This section contains the standard function definitions. Compute Engine functions are listed separately.
Numeric Functions Copied
abs Copied
abs(number)
Absolute value operator. Any negative values are converted to positive values.
abs(3) => 3
abs(-3) => 3
sqrt Copied
sqrt(number)
Calculates the square root of the given number. Negative inputs result in null being returned.
sqrt(4) => 2
pow Copied
pow(base, exponent)
Raises the given base to the power of the given exponent.
pow(10, 2) => 100
String Functions Copied
stringBefore Copied
stringBefore(string: haystack, string: needle)
Gets the substring of haystack that starts at the beginning and ends immediately before the first instance of needle. If needle is not in haystack then the whole of haystack will be returned.
stringBefore("abcdefg", "d") => "abc"
stringBefore("abcabcabc", "ca") => "ab"
stringBefore("abcdefg", "p") => "abcdefg"
stringAfter Copied
stringAfter(string: haystack, string: needle)
Gets the substring of haystack that starts at the immediately after the first instance of needle. If needle is not in haystack then the whole of haystack will be returned.
stringAfter("abcdefg", "d") => "efg"
stringAfter("abcabcabc", "ca") => "bcabc"
stringAfter("abcdefg", "p") => "abcdefg"
toUpper Copied
toUpper(string)
Converts all the characters of string to upper case. Non-alphabetic characters are not changed.
toUpper("hello") => "HELLO"
toUpper("HELLO") => "HELLO"
toUpper("Hello World") => "HELLO WORLD"
toUpper("Hello 123") => "HELLO 123"
toLower Copied
toLower(string)
Converts all the characters of string to lower case. Non-alphabetic characters are not changed.
toLower ("hello") => "hello"
toLower("HELLO") => "hello"
toLower("Hello World") => "hello world"
toLower("Hello 123") => "hello 123"
concat Copied
concat(string: left, string: right)
Joins two strings together, returning a string value. Function arguments which are not strings (i.e. numeric values) will be converted to a string to make the function call.
concat("hello", " world!") => "hello world!"
concat(123, 456) => "123456"
replace Copied
replace(string:original, string:replaceWhat, string:replaceWith)
(i.e. numeric values) will be converted to a string to make the function call.
Replaces string replaceWhat with string replaceWith in the original string.
replace("1,000",",","") =>"1000"
replace("1,000,000",",",".") =>"1.000.000"
inList Copied
inList(String:needle, String:haystack1, String:haystack2, ...)
Returns true if the first string (the needle) is found in any of the other strings provided to inList(). The strings can be passed using geneos variables as well as static strings. The variables that are supported in inList() are integer, double, string and stringList. If a stringList variable is used, inList() will test the needle against all the strings that have been defined in the stringList. For more information, see environments > environment > var > stringList in User Variables and Environments.
if inList(value, "one","two","three") then ...
if inList(value, $(string1), $(string2)) then ...
if inList(value, $(stringList)) then ...
substr Copied
substr(String:source, int:start[,int:end])
Extracts part of a string given the start and end position within the string. The end is optional and defaults to the length of the source. String indexes start at 1 (one).
substr("Mary had a little lamb",1,4) =>"Mary"
substr("Mary had a little lamb",19) =>"lamb"
trim Copied
trim(String:source)
Removes whitespace from either side of a string.
trim(" Hello ") =>"Hello"
ltrim Copied
ltrim(String:source)
Removes whitespace from the left side of a string.
ltrim(" Hello ") =>"Hello "
rtrim Copied
rtrim(String:source)
Removes whitespace from either side of a string.
rtrim(" Hello ") =>" Hello"
strpos Copied
strpos(String:hackstack, String:needle)
Returns the position of the first occurance of needle in haystack or 0 (zero) if not found.
strpos("one,two, three","one") => 1
strrpos Copied
strpos(String:hackstack, String:needle)
Returns the position of the last occurance of needle in haystack or 0 (zero) if not found.
strrpos("one,two, three and back to one","one") => 27
regMatch Copied
regMatch(string:haystack, string: needle, [string:flags])
Returns whether the given regular expression (needle) matches the given string (haystack). Accepts Perl Compatible Regular Expressions. The following optional flag can be specified: i
Case Insensitive comparison
Function arguments which are not strings will be converted to a string to make the function call. If an invalid regular expression is specified then null will be returned.
regMatch("One Two Three", ".*Two.*") => true
Time Functions Copied
now Copied
now()
Gets the current time at the point of execution expressed as seconds since the UNIX epoch.
startOfMinute Copied
startOfMinute(dateTime: time)
Gets the datetime timestamp representing the start of the minute in the given datetime timestamp. If no datetime timestamp is provided, current timestamp will be used for evaluation.
startOfMinute(1298937659) => 1298937600
startOfHour Copied
startOfHour(dateTime: time)
Gets the datetime timestamp representing the start of the hour in the given datetime timestamp. If no datetime timestamp is provided, current timestamp will be used for evaluation.
startOfHour(1298941199) => 1298937600
startOfDay Copied
startOfDay(dateTime: time)
Gets the datetime timestamp representing the start of the day in the given datetime timestamp. If no datetime timestamp is provided, current timestamp will be used for evaluation.
startOfDay(1299023999) => 1298937600
startOfMonth Copied
startOfMonth(dateTime: time)
Gets the datetime timestamp representing the start of the month in the given datetime timestamp. If no datetime timestamp is provided, current timestamp will be used for evaluation.
startOfMonth(1301615998) => 1301612400
startOfYear Copied
startOfYear(dateTime: time)
Gets the datetime timestamp representing the start of the year in the given datetime timestamp. If no datetime timestamp is provided, current timestamp will be used for evaluation.
startOfYear(1304121600) => 1293840000
parseDate Copied
parseDate(string format, string date, [string:timezoneRegion])
Returns a time value, expressed as the number of seconds since the UNIX epoch, generated by parsing the specified string using the format provided. Any components of the date that cannot be determined from the format will be populated using a value as would be returned from startOfDay (now()). A full list of the available Time Formatting Parsing Codes is available in Time Zones and Time Formats.
parseDate("%d %B %Y %H:%M:%S", "1 January 2010 15:42:59") => 1262360579
parseDate("%H", "14") => the timestamp corresponding to 14:00 on the day of execution of the rule
An optional third parameter can be used to specify the timezone region of the date. A full list of timezone regions with their GMT offsets is available in Time Zones and Time Formats. This will override any use of the %Z specifier in the format.
parseDate("%d %B %Y %H:%M:%S %Z", "25 February 2013 13:14:15 KST")
parseDate("%d %B %Y %H:%M:%S %Z", "25 February 2013 13:14:15 KST", "Asia/Qyzylorda")
If the %Z specifier is used it is important to remember that many common timezone abbreviations can collide. A list of how the gateway interprets abbreviations can be obtained by running it with the -display-timezone-defaults command line switch, users can override the default meanings, or create new ones abbreviations, by specifying them in Operating Environment section of the gateway configuration. In all cases, if the third optional parameter is provided, it will over-ride any timezone parsed from the date.
parseDate("%d %B %Y %H:%M:%S %Z", "25 February 2013 13:14:15", "Asia/Qyzylorda")
If no %Z is specified but the optional third parameter is specified you can interpret a date as if were from the region specified and got the Gateway’s timezone.
printDate Copied
printDate(string format, [dateTime timestamp, [string:timezoneRegion]])
Formats the dateTime value timestamp, expressed as a count of seconds since the UNIX epoch, according to the format format. If no timestamp value is specified, or an invalid value provided, the time returned from startOfDay(now()) will be used. The format conforms to the list of the Time Formatting Printing Codes in Time Zones and Time Formats.
printDate("%d %B %Y %H:%M:%S", 1262360579) => "01 January 2010 15:42:59"
printDate("%d %B %Y") => the timestamp corresponding to day of execution of rule
printDate("%d %B %Y %H:%M:%S", "InvalidTimeStamp") => "01 January 1970 00:00:00"
An optional third parameter can be used to specify the timezone region for which to format the date. If either the %Z or %z format specifiers are used then the timezone abbreviation or UTC offset respectively will be printed in accordance with this parameter. A full list of timezone regions with their GMT offsets is available in Time Zones and Time Formats. If the third parameter is not specified then the date will be formatted according to the timezone of the gateway operating environment.
printDate("%d %B %Y %H:%M:%S %Z", 1262360579, "America/Panama") => "01 January 2010 10:42:59 EST"
Compute Engine Copied
Introduction Copied
The Compute Engine functionality allows additional value to be added to the monitored data from the Geneos system by giving you the ability to add computed data items. A computed data item is a row, column or headline added to an existing dataview and populated with calculated data, or a dataview consisting entirely of added rows, columns and headlines. Computed dataviews enable you to create summaries across multiple parts of the system.
Some possible uses are:
- Create a new column in the CPU plugin that shows a 1-minute moving average of the %usertime column.
- Create a new headline in the PROCESSES plugin that counts the number of process instances being run by user ‘fidessa’
- Create a new total row in a plugin that adds up all the values in relevant columns.
- Create a new view, summarising the lines to the exchanges, with a column that will be green if all the lines are up, amber if some of the lines are down and red if all the lines are down.
- Create a new view using data from other views via the Gateway-sql plugin (see Gateway SQL plugin).
There are also a number of extensions to the rules that are part of the compute engine functionality. These are explained below.
Adding computed data to existing dataviews Copied
To add rows, columns and headlines to existing dataviews, the relevant sampler must be selected from the samplers section of the setup file. A dataview must then be added (in the Advanced tab) with the same name as the view you wish to add to.
You can specify the new headlines, columns and rows using stringList variables. For more information, see environments > environment > var > stringList in User Variables and Environments.
Pressing the additions button will then result in the following dialog, which can be used to add the additional information:
Creating new views Copied
To create entirely new views, the following steps should be taken:
- Create a new sampler
- Do not set a plugin type
- Add a dataview row (as above)
- Use “Create on gateway” and set the name of the first column.
- Press “additions” and define the rows and columns that you want
- Add the sampler to a managed entity
Populating computed cells with data Copied
Newly created computed cells are populated using rules. This means that any rule operators and functions can be used on these cells.
The target of the rule should be one or more computed cells to populate. The value of data items that are not created using the Compute Engine should not be changed by rules.
The format of the rule code is:
value [value to set]
e.g. to set a cell to always have the value 7, the rule would be
value 7
Paths can also be used to get the value of another cell. The paths themselves are defined separately, and are each given names.
value path "other cell" value
The paths may be absolute, in that they refer to an exact item in the system. They can also be relative, for example “the same row, but column X”. Please refer to the path editor documentation for details of how to set up paths.
To perform a calculation simply specify it as you do with normal rules:
value path "other cell" value / 2
or
value path "other cell" value + path "third cell" value
Many more calculations can be performed, which are described below.
The values returned from calculations will typically be Integer or Double values, but these will normally have meaning (units) associated with them such as MB, MB/s, %, etc. When writing values out, the numbers can be formatted with the format function. The number of decimal places can also be specified.
For example:
value format("%.2f Mb", path "other cell" value)
which may output
3.15 Mb
rather than simply
3.15242
Extended Rule Syntax Copied
The following sections describe some more functions and features that are enabled as part of the compute engine.
Using functions with ranges of items Copied
Some useful functions exist to help summarise data, such as average, total, max, min, count and standard deviation. As well as normal parameters, these operate on wildcard paths which return sets of items rather than a single item.
Note
‘wpath’ (wildcard path) is used instead of ‘path’.
value total(wpath "line throughput" value)
or
value count(wpath "lines up" value)
These functions can take multiple parameters if required:
value average(path "cell a" value, path "cell b" value)
or
value average(wpath "line throughput" value, path "additional" value)
Available functions are listed in the Statistical Functions configuration section, and include maximum, minimum, average, total, count, standard deviation and rate.
Using historic time-based functions Copied
Using the normal rule syntax, previous values can be compared against the current value. The time-based functions allow a summary to be created from a period of time.
For example, to populate a cell with the highest the CPU has been over the last minute:
value maximum(path "cpu" value for "1 minute")
The “1 minute” refers to a named History Periods. These are defined at the top level of the rules section - at the same place as rule groups.
Note
‘wpath’ cannot be used with historical functions.
All the functions mentioned under sections Statistical Functions and Duration-Weighted Statistical Functions can be used in this manner.
Local Variables Copied
In addition to Managed Entity variables, temporary local variables can be set and then accessed using the same syntax:
set $(temp) value + 1
if $(temp) > 0 then ...
Local variables override Managed Entity variables. Once a rule sets a variable with the same name as a Managed Entity variable then the Managed Entity variable will no longer be accessible from within that rule. Local variables have global scope but cease to exist at the end of each rule evaluation.
Tips Copied
Conditions Copied
It is also possible to use the conditional functionality of rules when setting values. For example:
if path "other cell" value > 10 then
value path "other cell" value
else
value path "third cell" value
endif
Setting other properties in addition Copied
Although this section has dealt simply with values, it is possible to set other properties at the same time in the same rule, just like with normal rules. e.g.
value path "other cell" value
severity path "other cell" severity
Items outside ‘if’ blocks are independent of each other (i.e. they each have their own transaction). For the above example, this means that the value will be set, even if a previous rule has already set the severity. If you need them to be in the same transaction you can enclose them in an if like this:
if true then
value path "other cell" value
severity path "other cell" severity
endif
Feedback Copied
Some dramatic behaviour can be seen if rules are written incorrectly. Take a counter, for example:
if path "my cell" severity = critical then
value value + 1
else
This will increase the value of the current cell whenever “my cell” is red rather than when “my cell” turns red. This will cause the cell to continually increase in value. Some kind of stop or exit condition is needed to fix it, for example:
if path "my cell" severity = critical
and path "my cell" previous severity <> critical then
value value + 1
else
Configuration Copied
History Periods Copied
rules > historyPeriod Copied
History periods provide the means to aggregate values from a target from a given period in time. This collection of values can then be used with historic time based functions.
History periods are named, so for example if you are collecting values on a daily basis it may make sense to simply call your history period “day”. This gives you the ability to refer to the maximum (or any other aggregate function) value for a day on a given target.
Alternatively you may be responding to a particular event such as an extremely slow network and want to analyze the data from when the incident occurred. In which case you can configure a one off history period and call it something like “The day the earth stood still”.
There are two types of history periods - “rolling” and “fixed”. The following example illustrates the difference.
Say you are calculating the average of a cell for a period of a day. You can either use a “rolling period” or a “fixed period”.
“Rolling period of 1 day”:
The calculation will always be done based on the values of the last 24 hours. E.g. At 2 pm today, the average will be calculated using the values collected between 2 pm yesterday and now. The average will never be reset as such at any point. Of course, at 2.30pm today, we will no longer consider the values collected before 2.30pm yesterday.
“Fixed period of 1 day”:
The calculation will be based on the values collected between 12.00am today and midnight. At midnight the average will reset and the calculation will start again based on the values collected from that point onwards. (You can optionally change the start time of the period).
Mandatory: No
rules > historyPeriod > calculationPeriod Copied
This is a choice between a rolling period of time e.g. day, month, quarter… and a fixed period such as a particular day starting at a particular time.
See rules > historyPeriod for a detailed explanation.
Mandatory: Yes
rules > historyPeriod > calculationPeriod > rollingPeriod Copied
A rolling period allows you to define intervals such as days, months, past 3 months, etc. So for example if you had an historical period called quarter, you could refer to the average value for a target over the last quarter, two quarters, etc.
See rules > historyPeriod for a detailed explanation.
rules > historyPeriod > calculationPeriod > rollingPeriod > measure Copied
This is the unit of time by which this period is measured. A choice of month, week, day, hour, minute or second.
Mandatory: Yes
Default: Month
rules > historyPeriod > calculationPeriod > rollingPeriod > length Copied
This is the number of measure units that make up an individual period.
Mandatory: Yes
Default: 1 unit
rules > historyPeriod > calculationPeriod > rollingPeriod > maxValues Copied
This is the maximum number of values to record for the purpose of a given historical calculation.
In the case of historical calculations using rolling periods, the gateway must record the historical values used as input. This is needed so that outdated values can be dropped out of the calculation when needed.
For example, say we are calculating the average over an hour, for a rule target that is sampled every 3 seconds. If we were to store each input value, we would have to store 1200 values (1 Hour / 3). Once every 3 seconds, we would have to drop off the oldest input value recorded, and recalculate the average by going through all 1200 stored values. If we run this rule for a large number of targets, the resource cost could be significant.
Hence, instead of recording each and every input value in the period, the gateway breaks down the period into a set of sub-periods and records just a single representative value for each of those periods. The maxValues setting denotes the maximum such values to record (i.e. the maximum number of sub-periods).
For example, in the previous example, if we set maxValues to 120, the gateway will store only a single input value for each 30-second period (1 hour / 120). More importantly, the gateway will only need to recalculate the average every 30 seconds, and would only need to go through 120 values for the recalculation.
This setting should be set according to the accuracy required. For example, for the case of averages, the default setting of 100 would generally result in computed values that are accurate to the first 2 digits.
This setting would be ignored if the number of samples collected during the period is less than the specified value. For example, for a 5-minute period on a 10-second sample interval, only 60 values will ever be stored. Typically, there would be no need to modify this setting for shorter periods.
Mandatory: No
Default: 100
rules > historyPeriod > calculationPeriod > fixedPeriod Copied
A fixed period defines a fixed period in time from which you want to reference values.
See rules > historyPeriod for a detailed explanation.
rules > historyPeriod > calculationPeriod > > fixedPeriod > length Copied
This is the unit of time by which this period is measured. A choice of month, week, day, hour or minute is available for selection. Here is an example that calculates the average of a cell for a fix time period using the above choices:
- Minute option: Average will reset at the boundary of each minute and calculation of Average will start again based on the values collected from that point onwards.
- Hour option: Average will reset at the boundary of each hour and calculation of Average will start again based on the values collected from that point onwards.
- Day option: Average will reset each midnight and it will start again based on the values collected from that point onwards.
- Week option: Average will reset each Sunday midnight and it will start again based on the values collected from that point onwards.
- Month option: Average will reset at end of each month midnight time and it will start again based on the values collected from that point onwards.
See rules > historyPeriod for a detailed explanation.
Mandatory: Yes
rules > historyPeriod > calculationPeriod > fixedPeriod > start Copied
Specifies when the time period starts.
Mandatory: No
rules > historyPeriod > calculationPeriod > fixedPeriod > start > month Copied
A non-negative integer value representing the month that the historical period started.
Mandatory: No
Default: January
rules > historyPeriod > calculationPeriod > fixedPeriod > start > dayOfMonth Copied
Which day of the month did the fixed period start? This is represented as a non-negative integer.
Mandatory: No
Default: 1st day of month
rules > historyPeriod > calculationPeriod > fixedPeriod > start > dayOfWeek Copied
Deprecated: Use rules > historyPeriod > calculationPeriod > fixedPeriod > start > weekDay instead.
Which day of the week did the fixed period start? This is represented as a non-negative integer.
Mandatory: No
Default: 1st day of week i.e. Sunday
rules > historyPeriod > calculationPeriod > fixedPeriod > start > weekDay Copied
Which day of the week did the fixed period start? This should be one of the values:
- Monday
- Tuesday
- Wednesday
- Thursday
- Friday
- Saturday
- Sunday
Mandatory: No
Default: Sunday
rules > historyPeriod > calculationPeriod > fixedPeriod > start > hour Copied
This allows you to further specify to the hour when the fixed period started.
Mandatory: No
Default: 00 hours (00-23 being the range)
rules > historyPeriod > calculationPeriod > fixedPeriod > start > minute Copied
This allows you to further specify to the minute when the fixed period started.
Mandatory: No
Default: 00 mins (00-59 being the range)
rules > historyPeriod > calculationPeriod > fixedPeriod > start > second Copied
This allows you to further specify to the second when the fixed period started.
Mandatory: No
Default: 00 secs (00-59 being the range)
rules > historyPeriod > activeTime Copied
This is a reference to an active time to determine when this historical period is in use.
Mandatory: No
Obcerv Time Series for Dynamic Thresholds Copied
Obcerv Time Series builds on the ideas behind Database driven Time Series and Gateway Hub driven Time series. However, unlike these two methods, you do not define a dataset as you would for Database and Gateway Hub-driven time series. Instead, you define a Query Template. This template is specified in the Rules section of the GSE. For more information, see Obcerv Time Series for Dynamic Thresholds.
Creating a new Obcerv Time Series template Copied
rules > obcervTimeSeriesTemplate > name Copied
Specifies the name of the Obcerv Time Series.
Mandatory: Yes
rules > obcervTimeSeriesTemplate > periodicity Copied
Period of cycle which defines seasonality. This can be set to either one day or one week.
Mandatory: Yes
rules > obcervTimeSeriesTemplate > periods Copied
Number of periods to query.
If the periodicity has been set to one week, and periods is set to six, then the time series will be created using up to six weeks of historic data. If no data is available for a specific bucket, then the data returned from Obcerv for that bucket will be empty. Otherwise, Obcerv will build values with the available data.
Mandatory: Yes
rules > obcervTimeSeriesTemplate > bucketSize Copied
This setting enables the creation of time series buckets with a chosen size. It allows you to choose how long an interval each value in the time series represents. It can assume any of the following options:
- 5 mins
- 10 mins
- 15 mins
- 30 mins
- 1 hour
- 2 hours
- 3 hours
- 4 hours
- 6 hours
- 12 hours
Mandatory: Yes
rules > obcervTimeSeriesTemplate > reloadTime Copied
Designated time for the Gateway to reload data from Obcerv. The time is applied using the timezone of Gateway. Reloading is done once every day.
Mandatory: Yes
rules > obcervTimeSeriesTemplate > timezone Copied
Timezone of the data. This can either be Gateway, Netprobe, or a specific timezone.
Data will be queried from the previous midnight in the specified timezone, and the day and week periods will be associated with the end time.
Mandatory: No
Default: Gateway Timezone
Creating a new rule that uses the Obcerv Time Series Template Copied
Once a template has been defined, a rule can be created to use a path alias to reference the time series. When the rule is applied to a cell, the Gateway will construct a query and generate a new timeseries for the data referenced in the path alias.
A time series path alias is comprised of an xpath that appends the time series information to an xpath that references a single cell. So the following xpath ./timeSeries[(@obcervTimeSeriesTemplate="Daily")][(@aggregation="avg")]
will cause the Gateway to create an Obcerv time series and query the data from Obcerv. The query will include all the information in the obcervTimeSeriesTemplate together with the source data to use, which in this case is the current cell the rule will run on. Other xpaths can be used:
-
../cell[(@column="latency")]/timeSeries[(@obcervTimeSeriesTemplate="Daily")][(@aggregation="max")]
- this gets the Obcerv Time Series that returns the maximum value of the data for the column called latency in the same row of the dataview. -
/geneos/gateway/directory/probe[(@name="P")]/managedEntity[(@name="me")]/sampler[(@name="s")]/dataview[(@name="dv")]/rows/row[(@name="r")]/cell[(@column="c")]/timeSeries[(@obcervTimeSeriesTemplate="Weekly")][(@aggregation="count")]
- this gets the Obcerv Time Series that returns the number of metric values that are in a specific bucket for a specific cell.
To see the value, we can use Compute Engine to set the value of a cell. In that case, if we create a new column on the dataview next to the cell we are interested in, we can set the xpath to ../cell[(@column="Col1")]/timeSeries[(@template="Daily")][(@aggregation="avg")]
.
When cells matching rules referring to Obcerv Timeseries arrive, the Gateway will initiate queries to Obcerv, generating Time Series data that it will maintain up-to-date.
Click Continue to start the interactive demo below.
Visualising the time series data on a chart Copied
It is also possible to plot Obcerv time series data in a chart in a similar way that we plot a database time series.
When you right-click on a cell that is the source of time series data that has been queried by Gateway, Obcerv will add an Obcerv Time Series menu item to the context menu. This option will display all the time series queried by Gateway for that specific cell. You can then choose a destination, which can be either a new or existing chart, and then it will be presented with a dialog box.
This allows you to set the start time (Time from) and the end time (Time to) of the chart together with the choice of query result (Graph Type).
Note that it is permissible and often correct for the End Time to precede the Start Time. In this case, the graph will run from Start Time to the end of the day, and then to End Time.
The data is then queried and placed in an Active Chart. The chart will re-query its data if any of the following occur:
- Gateway connection drops and reconnects
- Template parameters change
- Time moves from before the start time to after the start time
The workspace saves enough data that if the AC restarts and connects to the Gateway, it will re-query the Obcerv data and reset the chart.
Functions Copied
This section details the functions specific to compute engine. The standard function definitions are also available.
String Functions Copied
format Copied
format(string, anything, ...)
The string specifies a format and then, depending on the format, more arguments may be present.
The format is composed of zero or more directives: ordinary characters (excluding %) that are copied directly to the result; and conversion specifications, each of which requires an additional argument to be passed to the format function.
Each conversion specification consists of a percent sign (%), followed by the following in order:
- An optional precision specifier that says how many decimal digits should be displayed for floating point numbers. This consists of a period (.) followed by the number of digits to display.
- A type specifier that says what type the argument data should be treated as. Possible types: - d - the argument is treated as an Integer, and presented as a (signed) decimal number. - f - the argument is treated as a Double, and presented as a floating-point number. - s - the argument is treated and presented as a String. - % - a % followed by another % character will write % to the output. No argument is required for this.
This function is most useful when writing values into cells.
format("%d Mb", 5) => "5 Mb"
format("%d %%", 6) => "6 %"
format("%f Mb", 5.346) => "5.346 Mb"
format("%.2f Mb", 5.348) => "5.35 Mb"
format("%.5f Mb", 5.348) => "5.34800 Mb"
format("There are %d files with %d in error", 6, 4) => "There are 6 files with 4 in error"
Statistical Functions Copied
Note
All statistical functions, including duration-weighted statistical functions, ignore empty strings and treat their value as null.
maximum Copied
maximum(number, ...)
Gives the highest value from a set of values. Can be used with ranges or history periods.
maximum(1, 8, 2, -10, 6) => 8
maximum(3) => 3
minimum Copied
minimum(number, ...)
Gives the lowest value from a set of values. Can be used with ranges or history periods.
maximum(1, -8, 2, 10, -6) => -8
minimum(3) => 3
average Copied
average(number, ...)
Gives the average (mean) of a set of values. Can be used with ranges or history periods.
average(1, 8, 2, -10, 6) => 8
average(3) => 3
total Copied
total(number, ...)
Gives the total (sum) of a set of values. Can be used with ranges or history periods.
total(1, 3, 2, 2, 4) => 12
total(3) => 3
count Copied
count(anything, ...)
Gives the number of items in the list. Can be used with ranges or history periods.
If you want to count a range of cells using a wpath
, and you want to include blank cells, then you can use the severity property to avoid empty strings. This may look similar to count(wpath "alias" severity)
.
standardDeviation Copied
standardDeviation(anything, ...)
Calculates the population standard deviation from a set of values. Can be used with ranges or history periods.
See evaluateOnDataviewSample on increasing performance of rules with statistical calculations.
Duration-Weighted Statistical Copied
Functions
The following statistical functions perform calculations while weighting each historical value by the duration for which the particular value was present. Only historical items can be used as parameters to these functions. E.g.:
The following is valid:
value durationWeightedAverage(path "cpu" value for "1 minute")
The following is not:
value durationWeightedAverage(wpath "cpu" value)
One point to note about weighting by duration is that at the point a new value is seen by the gateway, that value will has an associated duration of 0 and hence would effectively not be considered for the calculation.
For example, consider the case of a rule calculating a duration-weighted average of a cell X. As with all rules containing historical functions, the rule will be re-evaluated each time the dataview containing cell X samples. However, when this happens and a new value is seen for cell X, the associated duration for that new value will be 0 seconds as cell X only changed to that new value at this very point in time. Therefore, the new value will not be reflected in the calculated result.
At a later point in time, when the rule is evaluated again (e.g. at the next sample) the previously-mentioned value will now have an associated duration and hence will be reflected in the calculated result.
This effectively means that the results of duration-weighted calculations would appear to be “one sample behind” - however this is the expected behaviour according to the way such rule calculations are invoked.
durationWeightedAverage Copied
durationWeightedAverage(historical item)
Calculates the average, but with each source value weighted according to the duration for which the value was present.
E.g.
The duration-weighted 1 minute average for the above values would be:
(100*10 + 200*10 + 300*40) / (10 + 10 + 40)
Value | Duration |
---|---|
100 | 10 seconds |
200 | 10 seconds |
300 | 40 seconds |
durationWeightedTotal Copied
durationWeightedTotal(historical item)
Calculates the total, but with each source value weighted according to the duration for which the value was present.
E.g.
The duration-weighted 1 minute total for the above values would be:
(100*10 + 200*10 + 300*40)
Value | Duration |
---|---|
100 | 10 seconds |
200 | 10 seconds |
300 | 40 seconds |
durationWeightedStandardDeviation Copied
durationWeightedStandardDeviation(historical item)
Calculates the population standard deviation, but with each source value weighted according to the duration for which the value was present.
E.g.
The duration-weighted 1 minute standard deviation for the above values would be calculated as follows:
Total duration = 10 + 10 + 40 = 60
Average = (100*10 + 200*10 + 300*40) / 60 = 250
Standard deviation = square-root (((100 - 250)^2 * 10 + (200 - 250)^2 * 10 + (300 - 250)^2 * 40) / 60)
Value | Duration |
---|---|
100 | 10 seconds |
200 | 10 seconds |
300 | 40 seconds |
Handling of non-numeric Copied
values
For statistical functions mentioned above, strings are treated as a valid number if they begin with a numerical value. All other strings, except for empty strings, are treated as 0 valued.
For example:
- “10kb” is treated as 10.
- “number10” is treated as 0.
- Blank values " " (including white-space only values of any length) are ignored.
For example:
minimum(20, "10boxes", " ") evaluates to 10
average(20, "10boxes", " ") evaluates to (20 + 10) / 2
count("one", "two", "", " ") evaluates to 2
The above logic applies to historical calculations as well. A blank value will be considered as a lack of a value for that particular duration.
For example:
Average = (100 + 300) / 2
Duration weighted average = (100*10 + 300*30) / 40
Value | Duration |
---|---|
100 | 10 seconds |
15 seconds | |
300a | 30 seconds |
Other Functions Copied
rate Copied
rate(item)
Calculates the rate of change of an individual item. It can be applied to only one data-item. E.g.
rate(path "s" value)
The rate of value change is calculated at a particular point in time as: (new value - previous value) / sampling interval (in secs)
The following table gives an example of rate change for cell values:
Note
When the new value is lower than previous value, the rate of change is in negative.
This function cannot be used with history periods or ranges as it only deals with current value change. The function recalculates the rate of value change whenever a new sample is seen e.g. at sample intervals, sample times or during manual sampling.
Normally, rules are evaluated for computed cells whenever any secondary variable changes or the target data-items change. This function (rate) is an exception in this case. For computed cells, the rate of value change for a target data-item is calculated only at sample intervals. For this to happen, one needs to set the evaluateOnDataviewSample flag on that rule. Otherwise, the gateway issues a warning. For more details, refer evaluateOnDataviewSample.
Time (in secs) | 0 | 2 | 4 | 6 | 8 | 10 |
Cell Value | 1 | 2 | 8 | 4 | 3 | 5 |
Rate | 0 | 0.5 | 3.0 | -2.0 | -0.5 | 1.0 |
first Copied
first(path/wpath/literal, ...)
Returns the first concrete item encountered when evaluating the function parameters in the order specified.
E.g.
Note
For a given wpath matching multiple items it’s not possible for the user to predict which item will be considered as ‘first’ by the gateway. For example, for a wpath matching all the cells in a particular column, the item returned from this function may not necessarily be the one from the first row as seen by the user, nor from the row that was created first.
Hence, for a deterministic outcome this function should only be used with paths that at most would match one item at run time:
One use case would be to access the first matching item from a set of unique paths in a user-specified order. E.g.:
first(path "x" value, path "y" value, path "z" value).
This path would return the value of “x” if it exists, otherwise the value of “y” if it exists, and so on, returning empty if none of them exist.
Note
The example uses the prefix “path"instead of “wpath” as in the previous examples.
The above rule can be further enhanced to return a pre-determined value if none of the cells exists:
first(path "x" value, path "y" value, path "z" value, 100).
Another use case for this function is to bypass a path uniqueness warning as explained below:
For example, take a rule that applies to table cells.
Rule contents:
value path "s" value
Path alias “s” defined as:
../../rows/row[wild(@name,"row1_*")]/cell[(@column="c")]
Given this rule the gateway will produce a validation warning saying path “s” is not uniquely specified as it can potentially match multiple cells (as row1_* can theoretically match multiple rows). However, if it’s known by the user that at runtime there will only ever be at most a single row that matches the pattern row1_*, then the rule can be changed as follows:
value first(wpath "s" value)
This rule will not produce the same warning as by saying wpath we are explicitly declaring the path can match multiple items and then asking the gateway to take the first match.
Note
You cannot use a historical path (i.e. path “p” value for “period”) as a parameter to this function.
Function | Return value |
---|---|
first(wpath "p" value) | Returns the value of the first cell that matches path "p". If no matching cell, returns empty. |
first(wpath "p" value, wpath "q" value, …) | Returns the value of the first cell that matches path "p". If no matching cell, then checks path "q" and so on. |
first(wpath "p" value, 10) | Returns the value of the first cell that matches path "p". If no matching cell, then returns 10. |
Persistence Copied
The compute engine component of the gateway adds value to monitoring data, by allowing the publishing of additional data-items which contain derived (computed) values. Some of these values may be computed from historical data, such as an average value over a period of time.
The persistence feature of compute engine ensures that if the gateway is restarted for some reason, then these values are restored and that the data required to perform a computation is still accessible.
Persistence Configuration Copied
persistence Copied
This top level section contains the configuration necessary for data persistence. If this section is not enabled, then no persistence data will be saved between gateway restarts.
persistence > writePeriod Copied
This is the frequency in seconds in which the persistence store will be updated. The update is a difference between what was stored before and what has changed.
Mandatory: No
Default: 10.
persistence > rewritePeriod Copied
This is the frequency in seconds with which the persistence store will be completely rewritten. A rewrite of a persistence store that has grown over time will most likely result in a smaller file.
Mandatory: No
Default: 60.
Actions Copied
Introduction Copied
Actions provide further processing to be performed on gateway events, as controlled by a user‑defined configuration. At present, events which can trigger an action are currently limited to rules but may be extended at a later date.
Actions allow the gateway to interface with other external systems, so that monitoring data can trigger other events in addition to being displayed in ActiveConsole. For instance using actions gateway can send emails or pager messages to inform users of events, or add a failure to a user-assignment system to be investigated.
Actions can also be used to automatically resolve problems. For example, an action could be configured to restart a process when it is detected the process is not running (e.g. by Geneos process monitoring), and if this fails then the action can notify a user.
Operation Copied
Actions are fired in response to gateway events, according to the configuration. When an action is fired it is run in the context of the specific data-item which caused the event, such as a Managed Variable that triggered a rule.
The value and other attributes of this item are then made available to the action, which allows for an action to have a customised operation depending upon these values. Depending upon the type of action being fired, the values will be passed to the action in different ways. Please refer to the appropriate action configuration section for further details.
Values passed to actions include the following:
- Data identifying the data-item and action being fired.
- If the data-item is from a dataview table, then additional values from the dataview row.
- Any managed entity attributes which have been configured.
- Additional user data as configured in an environment.
- A list of knowledge-base articles which apply to the data-item.
Action Configuration Copied
Basic Configuration Copied
Actions are configured within the Actions top-level section of the gateway setup. Configuration consists of a list of action definitions, which specifies what will be done when the action is fired. As actions are referenced by name in other parts of the gateway setup, each action must have a unique name among all other actions to prevent ambiguity.
Script actions Copied
A script action can run a shell-script or executable file. The minimum required configuration for this type of action is the executable to run, the command-line (which may be empty, but must still be specified) and the location to run this action.
Depending upon the configured runLocation, this action will run either on the Gateway or Netprobe hosts. Netprobe actions will run on the Netprobe containing the data-item that triggered the action, unless another Netprobe has been explicitly specified with the probe setting.
An action run on Netprobe requires that probe encoded password is specified in the probe configuration. If not specified, the Netprobe will return the error: “Remote Command not executed
- invalid password”. If there is no password to configure, run the Netprobe with -nopassword flag to avoid this problem.
For an action which executes on the Gateway, the value of the exeFile setting is checked to ensure that the executable is accessible by the Gateway. If this is not the case, the Gateway will be unable to execute the action and a setup validation error is produced. If an absolute path to the executable is not specified, the Gateway prepends ./ to the path.
Note
This validation cannot be performed by actions which run on Netprobe.
The behaviour for Actions changed in Geneos v4.7 to provide consistent behaviour between actions run on the Gateway and on the Netprobe. The default shell is now used to run script actions on the Gateway. As an alternative to running a script, an action can execute a user-defined command that is set to run an external program. In this scenario, the program does not use the shell.
Note
This behaviour change also applies for Effects. Some factors to consider whether to use the shell to run a script (or other external program) are listed in Script effects.
When executing a script action, the script / executable being run is passed the values and attributes of the data-item which triggered the action. These are passed in environment variables, which the script can then read and respond as required. The environment variables which are passed are listed below. If there is a name clash, an item that is further down this list will take precedence over an item further up. For example, a userdata setting takes precedence over an entity attribute with the same name and the name of the rule takes precedence over the value in a column named RULE.
See Appendix for an example action script file. An example configuration using the setup editor is shown below.
_ACTION |
The name of the action being triggered. |
_GATEWAY |
The name of the gateway firing the action. |
_VARIABLEPATH |
The full gateway path to the data-item. |
_NETPROBE_HOST |
The hostname of the probe the data-item belongs to (if any). |
_PROBE |
The name of the probe the data-item belongs to (if any). |
_MANAGED_ENTITY |
The name of the managed entity the data-item belongs to (if any). |
_SAMPLER |
The name of the sampler the data-item belongs to (if any). |
_DATAVIEW |
The name of the dataview the data-item belongs to (if any). |
_VARIABLE |
Short name of the data-item if it is a managed
variable, in the form |
<attribute name> |
The values of any managed entity attributes which have been specified. Environment variables are named with the managed entity attribute names, and the values contain the attribute values. |
_PLUGINNAME |
The plugin name of the sampler the data-item belongs to (if any). |
_SAMPLER_TYPE |
The type of the sampler the data-item belongs to (if any). |
_SAMPLER_GROUP |
The group of the sampler the data-item belongs to (if any). |
_<column name> |
The values of cells in the same row as the data-item, if it is a managed variable. Environment variables are named with the column name (prefixed with an underscore), and the values are the values of the cell in that column. |
_ROWNAME |
The row name of the dataview cell the data-item belongs to (if any). |
_COLUMN |
The column name of the dataview cell the data-item belongs to (if any). |
_HEADLINE |
The headline name of the dataview cell the data-item belongs to (if any). |
_FIRSTCOLUMN |
The name of the first column of the dataview the data-item belongs to (if any). |
_RULE |
The rule that triggered this action. This is the full path to the rule including the rule groups i.e. group1 > group 2 > rulename |
_KBA_URLS |
A list of application knowledge base article URLs, separated by newlines. |
_SEVERITY | The data-item severity. One of UNDEFINED , OK , WARNING , CRITICAL or USER . |
_VALUE |
The value of the dataview cell the data-item belongs to (if any). |
_REPEATCOUNT | The number of times this action has been repeated for the triggering data-item. |
rmTransactionId | The unique ID of the rule transaction with the current setup. |
<user data> | Additional user data as configured in the rule which triggered the action. Environment variables are named with the configured name, and contain the user-specified value. |
_HOSTNAME |
Alias for _MANAGED_ENTITY, provided for backwards compatibility. |
_REALHOSTID |
Alias for _NETPROBE_HOST, provided for backwards compatibility. |
User assignment script Copied
actions
In the authentication section of the setup you can define actions for user assignment and unassignment of items. These actions have the following additional variables:
Environment Variable | Description |
---|---|
_ASSIGNEE_USERNAME |
Name of the Geneos user assigned this item. The name is taken from the user definition in the authentication section. |
_ASSIGNER_USERNAME | Name of the Geneos user assigning this item to another. The name is taken from the user definition in the authentication section. |
_ASSIGNEE_EMAIL | Email address of the Geneos user assigned this item. The address is taken from the user definition in the authentication section. |
_COMMENT |
Comment entered by the assigner or the user who unassigned the
item. |
_PREVIOUS_COMMENT |
Contents of the _COMMENT environment variable from the previous assign/unassign event. |
_PERIOD_TYPE |
Period for which the item is assigned:
|
Shared library actions Copied
Shared library actions execute functions from within a shared library. Library actions are more versatile than script actions since they can store state between different executions of an action, however they also require more effort on behalf of the user to create.
Library actions currently only execute on the gateway, and require a minimum of the library file and function name to be configured. Like script actions, these settings name are checked by gateway during setup validation to ensure the function can be found, so that an invalid configuration is detected immediately rather than when the action is run.
Shared library functions must have the following prototype (similar to the main
function of a basic C program).
extern "C" int functionName(int argc, char** argv);
When a library action is executed, the values and attributes of the data-item which triggered the action are passed to it. These are passed as an array of strings of the form NAME=VALUE
in the argv
parameter. The number of values passed is given in the argc
parameter. These variables are named as for script actions above.
See Appendix for an example action shared library function. An example configuration using the setup editor is shown below.
Command actions Copied
Command-type actions can run any command supported by gateway. These commands are referenced by name (as commands are uniquely named) and the configuration must supply all arguments expected by the command in order to be valid. The number and type of arguments expected will vary according to the command being referenced.
Arguments can be specified with a static value, a text value or a parameter value. A static value will have the same value every time the action is executed. A text value will have variable value depending upon the values of the Geneos variables (evaluated to the command target data-item environment). The parameter value configuration allows users to select a variable value, which will be populated from the data-item which triggered the action similar to environment variables passed to script actions.
An example is shown below using the /SNOOZE:time
internal command. This command snoozes a data-item for a specified time period, and takes arguments as specified in the table below.
Of the arguments listed three are user-input arguments - those at indexes 1, 4 and 5. To execute the command, these arguments must have values specified. For this command arguments 4 and 5 have defaults specified, and so will take these values if they are not overridden.
Index | Description | Argument Type | (Default) Value |
---|---|---|---|
1 | User comment | User input: MultiLineString | |
2 | Item severity | XPath | state/@severity |
3 | Snooze type | Static | time |
4 | Snooze duration | User input: Float | 24 |
5 | Snooze units | User input: Options | Hours (3600 - default), minutes (60), days (86400). |
Effect actions Copied
Effect actions call gateway effects. Currently an effect merely duplicate the basic functionality of Script, Shared-library, and Command action types, but allows the same effect to be shared with Alerts.
Calling an effect will have exactly the same effect as calling the same type of action. I.e. there is no difference from configuring a script action and configuring an action that calls a script effect.
Advanced Features Copied
When an action is triggered, the action remains valid until it is cancelled by the system. While an action is valid, it is eligible for execution, repetition or escalation as configured by the user.
In GSE, go to Actions > Advanced.
In particular an action triggered by a rule will remain valid until the transaction in that rule is no longer active. For example, the following sequence of events is quite common:
- The value of a variable changes to outside of the allowable range specified in a rule.
- The rule is triggered, and has an associated action. This action now becomes valid.
- The action remains valid until the variable value changes back to inside the allowable range, at which point the action is then cancelled by the system.
A triggered action is linked to a particular triggering (which for rules is a specific transaction definition) for a particular variable. For example, if the active transaction in a rule changes to a different transaction but with the same action, then the current action will be cancelled and the new action triggered. Similarly different data-items which trigger the same transaction of the same rule are considered distinct. The cancellation of one triggered action for a particular variable will not cancel any other actions (except for actions in an escalation chain).
An action that remains valid for a period of time can be configured to repeat or escalate to another action. These are called repeating and escalating actions respectively. An action can be both repeating and escalating if required.
There are a number of situations where actions will be suppressed by default, such as the Gateway starting up (see Advanced Action Settings). In these cases, the initial action will not fire but any repeats or escalations will be scheduled and fire later if the action continues to be valid.
Repeating Actions Copied
A repeating action is an action which repeats (i.e. runs again) after a configured time period, provided the action remains valid for that time. When an action fires for the first time the _REPEATCOUNT environment variable (or equivalent for non-script type actions) has an initial value of 0. This value is incremented for each subsequent repetition.
Repeating actions could be used for example to inform users at regular intervals that a problem still exists, or attempt to fix a problem (e.g. restart a process) before escalating if unsuccessful.
In GSE, go to Actions > Advanced > Repeat interval.
Escalating Actions Copied
An escalating action is an action which triggers another action. Each action can be optionally configured with one action to escalate to, which is called the escalation action. A valid action will fire this escalation action once it has been valid for the configured escalation interval, after which is will not escalate again (contrast this with a repeating action which repeats every interval).
Escalation actions are useful for situations where the same action cannot resolve the issue (unlike repeating actions). For example, an error condition could trigger an action which fires an email. If this error is not then rectified within the escalation interval (perhaps because a technician is away from their desk unable to receive email) the action could then escalate to fire a pager message.
When an escalation action is triggered an “escalation chain” is formed, with a link from the triggering action to the escalation action. E.g. if action A escalates to (fires) action B, then the chain is A -> B. If action B then escalates to action C, the chain becomes A -> B -> C.
The chain only remains valid so long as all actions in the chain are valid. If an action in the chain is invalidated then actions in the chain are reset. Typically, it is the first action in the chain which is invalidated, normally due to a rule transaction changing when a dataview value changes.
In the case of the example above, action A would be invalidated causing actions B and C to be invalidated also. Supposing action A is once again fired, then after the escalation interval expired action B would be fired, since A had been reset.
When configuring escalating actions, it is a checked error to form a cycle of escalations. For instance, it is invalid to configure an action A, which escalates to action B (escalation A -> B), and also configure escalations B -> C and C -> A. This is because it would cause a cyclic escalation chain (e.g. A -> B -> C -> A), meaning the actions would escalate indefinitely in an infinite loop. Gateway will resolve this by removing the last detected escalation during setup validation - in this case the escalation C -> A
- as well as issuing a setup validation message.
In GSE, go to Actions > Advanced > Escalation interval.
Restricted Actions Copied
Actions can be configured with restrictions which will prevent them from firing, depending upon the condition of the data-item that the action is fired with. Currently conditions which can be checked include the snooze and active state of the data-item or parent items. For restrictions configured with multiple restrictions, the action will fire only if none of the restrictions apply.
Specifying a restriction on items can help prevent unwanted actions from firing. Snoozing is typically used to ignore an error while it is being investigated, whereas active state changes based on an active time. Depending upon the action, it may be helpful to ignore a condition if either of these conditions is true. Since this is a common activity, these are the default restrictions for actions.
For example an action which sends emails could be restricted to firing only if an item is not snoozed, since if the item is snoozed someone is investigating the problem. Similarly an action which restarts a process could be restricted to firing only if an item is active, since the process may not be required outside of the active time specified.
- In GSE, go to Actions > Advanced > Snoozing.
- Select Fire if item not snoozed.
The queueUntilUnrestricted setting controls what happens when an action which was triggered while restricted later has the restrictions lifted (e.g. is unsnoozed). If the action is still valid and this setting is set to true, then the action will then be fired and repeat / escalate as normal. Otherwise the action will remain un-fired until it is invalidated.
Active Times Copied
Actions can optionally reference an active time by name using the activeTime setting, allowing time-based control of an action. Similar to an inactivity restriction described above setting an active time will prevent the action from being fired if the time is inactive, however this applies to the entire action rather than the item the action is fired upon. See Active Times for details on this gateway feature.
Using active times on an action can be useful for controlling common / shared actions which may be fired on several data-items, for which restrictions are not appropriate or which do not themselves have active times configured. For example, an active time could be configured on an action which emails users. Outside of office hours there will be nobody to respond to the alert, and so the action can be disabled at these times.
The queueUntilActive setting controls what happens when an action which was triggered outside of an active time later re-enters that time (i.e. becomes active). If the action is still valid and this setting is set to true, then the action will then be fired and repeat / escalate as normal. Otherwise the action will remain un-fired until it is invalidated.
Configuration Settings Copied
Actions are configured in the Actions
top-level section of the gateway setup. Configuration consists of a list of action definitions, each of which contains the minimum required configuration for their type. Each action is identified by a user-supplied name, which must be unique among all other actions to prevent ambiguity, as actions are referenced by name.
Grouping settings Copied
actions Copied
The top-level actions section contains configuration for the actions feature of gateway. This consists of a list of action definitions, optionally grouped for easier management.
actions > actionGroup Copied
Action groups allow grouping of actions in the gateway setup, allowing for easier management of actions. Grouping has no effect on the function of an action.
actions > actionGroup > name Copied
The action group name is not used by gateway. It should be used to describe the purpose or contents of the actions in the group.
Mandatory: Yes
Common action settings Copied
The settings below are common to all types of action.
actions > action Copied
An action definition contains the configuration required for a single action. The minimum configuration required will vary depending upon the type of action being configured.
actions > action > name Copied
Actions are referenced by other parts of the gateway setup by name. In order to avoid ambiguity, actions are required to have a unique name among all other actions.
Mandatory: Yes
actions > action > escalationAction Copied
Actions can be configured with an optional escalation action. The escalation action will be fired if the action remains valid for the escalationInterval. See the section on escalating actions for more details.
Mandatory: No
Default: (no escalation action)
actions > action > escalationInterval Copied
The escalation interval controls how long (in seconds) an action must remain valid before its escalation action is called (if configured). See the section on escalating actions for more details.
Mandatory: No
Default: 300 (5 minutes)
actions > action > repeatInterval Copied
The repeat interval controls how long (in seconds) an action must remain valid before it repeats (fires again). See the section on repeating actions for more details.
Mandatory: No
Default: Will not repeat
actions > action > repeatBehaviour Copied
Specifies the operation of repeats after an escalation is triggered. By default, operation continues firing even after an escalation, while it ceases operation if set to cancelAfterEscalation.
Mandatory: No
Default: continueAfterEscalation
actions > action > activeTime Copied
Optionally specifies an active time for this action. If the action is triggered outside of this time the action will not fire. Firing may optionally be delayed until the time is entered again using the queueUntilActive setting.
See Active Times.
Mandatory: No
Default: (no associated active time - the action will always fire, subject to restrictions)
actions > action > queueUntilActive Copied
This Boolean setting allows the firing action to be deferred until the associated active time of an action is active.
See Active Times.
Mandatory: No
Default: false
actions > action > restrictions Copied
Restrictions can be applied to an action to prevent the action from firing under certain conditions. Conditions currently include the snoozing / active state of the triggering data-item and action throttling. If multiple conditions are configured, the action will only fire so long as no restrictions are met.
For more details see the restrictions section.
Mandatory: No
actions > action > restrictions > snoozing Copied
The snoozing restriction can be used to prevent an action firing depending upon the snooze state of the data-item which triggered the action. Allowable values are listed below:
Mandatory: No
Default: fireIfItemAndAncestorsNotSnoozed
Value | Effect |
---|---|
alwaysFire | The action is always fired, regardless of snooze state. |
fireIfItemNotSnoozed | The action is fired if the triggering data-item is not snoozed. |
fireIfItemAndAncestorsNotSnoozed | The action is fired if the triggering data-item and all of its ancestor data-items are not snoozed. |
actions > action > restrictions > inactivity Copied
The inactivity restriction can be used to prevent an action firing depending upon the active state of the data-item which triggered the action. Allowable values are listed below:
Mandatory: No
Default: fireIfItemAndAncestorsActive
Value | Effect |
---|---|
alwaysFire | The action is always fired, regardless of active state. |
fireIfItemActive | The action is fired if the triggering data-item is active. |
fireIfItemAndAncestorsActive | The action is fired if the triggering data-item and all of its ancestor data-items are active. |
actions > action > restrictions > queueUntilUnrestricted Copied
This Boolean setting allows the firing action to be deferred until all the configured snoozing and inactivity restrictions have been lifted.
Mandatory: No
Default: false
actions > action > restrictions > throttle Copied
This is a reference to a configured throttle. See Action Throttling.
Script action settings Copied
The settings below define a script type action.
actions > action > script Copied
Script type actions allow the gateway to run a shell-script or executable file in response to gateway events. See the script action section above for more details.
Mandatory: Yes (one of script, sharedLibrary, command or effect must be specified for an action)
actions > action > script > exeFile Copied
Specifies the shell script or executable file which will be run when the action is fired. For script actions which run on gateway (configured using the runLocation setting) this parameter is checked at setup validation time to ensure that the file exists.
Mandatory: Yes
actions > action > script > arguments Copied
This setting specifies the command-line arguments which will be passed to the script or executable when the action is run.
Mandatory: Yes
actions > action > script > runLocation Copied
The run location specifies where the action should be run. Valid values are detailed below.
Mandatory: Yes
Value | Effect |
---|---|
gateway | The action is run on the gateway. |
netprobe | The action is run on the netprobe from which the triggering data-item came, unless this is overridden using the probe setting. An action run on Netprobe requires that probes > probe > encodedPassword is specified in the probe configuration. |
actions > action > script > probe Copied
This setting allows users to configure a specific netprobe to run the action on, when the action has been configured to run on netprobes using the runLocation setting.
Mandatory: No
Shared library action Copied
settings
The settings below define a shared library type action.
actions > action > sharedLibrary Copied
Shared library type allow the gateway to run a function from a shared library in response to gateway events. See the shared library section above for more details.
Mandatory: Yes (one of script, sharedLibrary, command or effect must be specified for an action)
actions > action > sharedLibrary > libraryFile Copied
Specifies the location of the shared library to use for this action. This setting is checked during setup validation to ensure that gateway can access this library.
Mandatory: Yes
actions > action > sharedLibrary > functionName Copied
Specifies the name of the shared library function to run, inside the library specified by the libraryFile setting. This setting is checked during setup validation to ensure that this function exists within the shared library.
Mandatory: Yes
actions > action > sharedLibrary > runThreaded Copied
Optional Boolean setting specifying whether to run the shared library function within a thread or not. Running an action in a thread is slightly less efficient but is recommended for library actions which take some time to complete, to ensure that execution does not interfere with normal gateway operation.
Note
Shared library functions using this setting which maintain state should be written to be thread-safe to avoid potential problems.
Mandatory: No
Default: true
actions > action > sharedLibrary > staticParameters Copied
Defines static parameters which are always passed to the shared library function along with any values defined by the action.
Mandatory: No
actions > action > sharedLibrary > staticParameters > staticParameter > name Copied
Name of static parameter. This must be unique, if a parameter of the same name is defined by the action then this setting will be overridden.
Mandatory: Yes
actions > action > sharedLibrary > staticParameters > staticParameter > value Copied
Value of static parameter.
Mandatory: Yes
Command action settings Copied
The settings below define a command type action.
actions > action > command Copied
Command type actions allow the gateway to run an internal or user-defined command in response to gateway events. See the command section above for more details.
Mandatory: Yes (one of script, sharedLibrary, command or effect must be specified for an action)
actions > action > command > ref Copied
This setting specifies which command will be executed when this command type action is fired. Commands are referenced using the unique command name.
Mandatory: Yes
actions > action > command > args Copied
This section allows the action to supply arguments to the command. If a command has any arguments without default values, then these must be specified so that the command can be run. This condition is checked during setup validation.
Mandatory: No (unless the command has arguments without default values)
actions > action > command > args > arg Copied
Each arg definition specifies a single argument to pass to the command.
actions > action > command > args > arg > target Copied
The target setting specifies which argument in the command this definition applies to. Command arguments are numbered from one. E.g. A target value of four means that the contents of this definition will be supplied as the fourth argument to the specified command.
Mandatory: Yes
actions > action > command > args > arg > static Copied
Specifies a static value for the command argument. This value will be the same for all executions of this action.
Mandatory: Yes (mutually exclusive with text or parameter below)
actions > action > command > args > arg > text Copied
A variable argument value for the command. This can include static text or Geneos variables which will be evaluated to their respective values depending upon the target data-item the command is being executed on. Example: if a Geneos variable “OS
” is defined with different values at 2 different Managed Entities, and the command is run on both these Managed Entities data-items, then both command instances will get different value of “OS
” depending upon the Managed Entity data-item it is being run on. The argument type is singleLineStringVar and can consist of static data and/or any number of Geneos variables interleaved together with/without static data. E.g. “Host:$(OS)-$(VERSION)
” where “OS
” and “VERSION
” are 2 pre-defined Geneos variables. Currently only the following variables values can be properly converted to string:
Mandatory: Yes (mutually exclusive with static or parameter below)
Variable Type | Value |
---|---|
boolean | "true" is checked, "false" otherwise |
double | The actual double value specified |
integer | The actual integer value specified |
externalConfigFile | The name of an external configuration file. |
macro | The value of the macro selected - gateway name or gateway port or managed entity name or probe host or probe name or probe port or sampler name. |
actions > action > command > args > arg > stdAES Copied
A secure password type for commands that take password arguments.
actions > action > command > args > arg > parameter Copied
Specifies a parameterised value for the command argument. This value is obtained from the data-item which triggered the action, and so can change on every execution. Possible values are listed below.
Mandatory: Yes (mutually exclusive with static above)
Value | Effect |
---|---|
action | The name of the action being triggered. |
dataview | The name of the dataview the data-item belongs to (if any). |
gateway | The name of the gateway firing the action. |
managedEntity | The name of the managed entity the data-item belongs to (if any). |
probe | The name of the probe the data-item belongs to (if any). |
probeHost | The hostname of the probe the data-item belongs to (if any). This value is provided for backwards compatibility. |
repeatCount | The number of times this action has been repeated for the triggering item. |
sampler | The name of the sampler the data-item belongs to (if any). |
severity | The data-item severity. One of
UNDEFINED , OK , WARNING , CRITICAL or USER . |
variable | Short name of the data-item if it is a
managed variable, in the form <!>name for
headlines or row.col for table cells.
This value is provided for backwards
compatibility. |
variablePath | The full gateway path to the data-item. |
Internal Command action settings Copied
The settings below define an internal command type action. Most of the configuration options are identical to that for a command.
actions > action > internalCommand Copied
See actions > action > command.
actions > action > internalCommand > name Copied
Specifies the internal command.
actions > action > internalCommand > args Copied
See actions > action > command > args.
actions > action > internalCommand > args > arg Copied
See actions > action > command > args > arg.
actions > action > internalCommand > args > arg > target Copied
See actions > action > command > args > arg > target.
actions > action > internalCommand > args > arg > static Copied
See actions > action > command > args > arg > static.
actions > action > internalCommand > args > arg > text Copied
See actions > action > command > args > arg > text.
actions > action > internalCommand > args > arg > stdAES Copied
See actions > action > command > args > arg > stdAES.
actions > action > internalCommand > args > arg > parameter Copied
See actions > action > command > args > arg > parameter.
Effect action settings Copied
The settings below define a command type action.
actions > action > effect Copied
Effect type actions call a gateway effect. See the effects section below for more details.
Mandatory: Yes (one of script, sharedLibrary, command or effect must be specified for an action)
actions > action > effect > ref Copied
This setting specifies which effect will be called when this effect type action is fired. Effects are referenced using the unique effect name.
Mandatory: Yes
Advanced Actions settings Copied
These settings can be found on the Advanced tab of the Actions section in the Gateway Setup Editor.
actions > fireOnComponentStartup Copied
Actions may be fired when a Gateway or Netprobe is first started.
Note
The Gateway Action’s default behaviour is to not fire when it is called during an event due to a Gateway, a Netprobe, or a sampler start-up to avoid false alerts.
To change this, enable the Fire on component startup on the Gateway Action’s Advanced tab.
This is an example of Action’s default behaviour that shows the Gateway logs. If you enable the fireOnComponentStartup, the action fires the set rules.
<Thu Jul 5 16:17:08> INFO: ActionManager Action DataItem 'dummy action' generated (variable=/geneos/gateway[(@name="MNL_PABO_GATEWAY_888800")]/directory/probe[(@name="pabo_prod")]/managedEntity[(@name="CentOS 7")]/sampler[(@name="dummy toolkit")][(@type="")]/dataview[(@name="dummy toolkit")]/rows/row[(@name="Row*")]/cell[(@column="Value")])
<Thu Jul 5 16:17:08> INFO: ActionManager Action 'dummy action' would have fired, but stopped as this was during the startup of a component
Mandatory: No
Default: false
actions > fireOnConfigurationChange Copied
Actions may be fired following a change of the gateway configuration file.
Mandatory: No
Default: false
actions > fireOnCreateWithOkSeverity Copied
Actions may be fired as the result of a dataview item being created and transitioning from undefined to OK severity.
Mandatory: No Default : False
actions > escalateIfFiringSuppressed Copied
Actions may be escalated on component startup or configuration change or on dataview item being created with OK severity only if original action is also firing on those conditions. If the original action is suppressed for these reason(s), then escalation will also be suppressed for same reason(s). If firing of action is not suppressed for any reason, then action will always be escalated and this setting will have no effect.
Mandatory: No Default : True
Action Throttling Copied
In a number of scenarios it is necessary to throttle the actions that occur, so that some of them are not sent. To do this a throttle needs to be defined and referenced from the restriction settings of an action.
Gateway allows you to configure rolling throttles to restrict the number of actions. With a rolling throttle it is possible to, for example, only allow one of these actions to be fired within twenty four hours or alternatively five actions within a five minute period.
Throttles can be applied to actions through configuration or as part of a rule’s transaction. When part of a transaction it overrides an existing throttle.
A throttle can fire a summary action at configurable periods. This action could be used to send an email or text message summarising the number of actions throttled since the first action was fired or the first action was blocked, then subsequently since the last summary was sent. If no actions were throttled during this period then no summary is sent.
A summary action cannot run a Netprobe command unless it explicitly specifies the Netprobe the command should run on. This is because a throttle can apply to data items from more than one Netprobe.
The summary action has the following information set in the environment.
- _SEVERITY — “UNDEFINED”
- _MANAGED_ENTITY — “UNDEFINED”
- _NETPROBE_HOST — “UNDEFINED”
- _VARIABLE — “THROTTLER”
- _VALUE — The number of throttled actions
- _THROTTLER — The name of the throttle
- _USERDATA — is not set
This ensures that existing Gateway throttling scripts can be reused without change. New scripts will be able to identify the throttle responsible.
Basic Configuration Copied
actions > throttle > name Copied
This is a name to uniquely identify the throttle.
Mandatory: Yes
actions > throttle > noOfActions Copied
This is the number of actions allowed before throttling.
Mandatory: Yes
actions > throttle > per Copied
This is the number of time units used to define throttling duration. For example if you were setting a throttle of one action per ten minute interval. It would be “10”.
Mandatory: Yes
actions > throttle > interval Copied
This is the time interval in use seconds, minutes or hours, allowing the throttle to be defined in number of actions per interval.
Mandatory: Yes
Advanced Configuration Copied
Grouping Copied
Groupings allow a throttle to keep different counters for different logical groups. Each group is defined by a collection of XPaths which are evaluated when the action is fired. There is also a default group to throttle items that do not match the grouping criteria.
The result of evaluating each of these XPaths are gathered together to uniquely identify the throttling group.
Note
To be part of a group all of the grouping criteria must be met. If the grouping criteria are not all met the default group will be used.
Some examples of grouping are outlined below.
Throttling per dataview Copied
ancestor::dataview
This will evaluate to the dataview of the data-item that triggered the action. Effectively defining separate throttling for each dataview as the throttle is applied.
If you have FKM and CPU dataviews triggering actions they would each fire up actions up to the configured limited within the configured time period.
Throttling separately for one specific plugin Copied
ancestor::dataview[@name="cpu"]
This will throttle actions triggered by dataviews named “cpu” separately to all other actions to which the throttle is applied. There is an implicit default throttle for data-items that do not belong to a configured group.
Throttling by row for one specific plugin Copied
ancestor::sampler[(param("PluginName")="FKM")]/dataview
@rowname
This will throttle actions triggered by each row of an FKM dataview separately. This is useful for throttling actions on columns like status, where the value is associated with the file. It is not useful for throttling trigger rows as these should all be throttled together.
Note
There are two XPaths. Both have to be satisfied. This effectively defines a group for each row of the FKM dataview. When the action is fired the questions asked are “Is this part of the FKM dataview?” and “which row does it belong to?”
Throttle each data item separately Copied
.
(dot) The current data-item; Throttle every data-item separately.
Throttling by set of plugin types Copied
ancestor::dataview[@name="disk"]/ancestor::managedEntity
ancestor::dataview[@name="cpu"]/ancestor::managedEntity
ancestor::dataview[@name="network"]/ancestor::managedEntity
ancestor::dataview[@name="hardware"]/ancestor::managedEntity
This will throttle “system” actions together in one group.
Throttling by fkm dataviews per filename Copied
ancestor::sampler[(param("PluginName")="FKM")]/dataview
../cell[(@column="filename")]/@value
This will throttle actions triggered by each fkm file from each fkm dataview seperately. Actions fired from cells associated with the same filename will be throttled into the same group.
Valid Grouping Paths Copied
You may receive a warning about parts of a configured grouping path not uniquely identifying a gateway item. Going in an upward direction (i.e. ancestor or “…”) this is ok and will not generate a warning. The problem occurs when going “downwards”, let’s say your XPath is defined as:
ancestor::probe[@name="Holly"]//sampler
The intention being to throttle actions for all samplers on that probe. This will work for a while, until samplers are added or taken away from the probe. When the next action is fired the set of active samplers will be different to the previous set. This will lead to the action being throttled by a different group. This probably isn’t the intended behaviour and is why the gateway issues a warning.
If the configured XPath were simply:
ancestor::probe[@name="Holly"]
This would throttle every action originating from that probe.
actions > throttle > grouping Copied
Groupings allow a throttle to keep different counters for different logical groups.
Mandatory: No
actions > throttle > grouping > paths Copied
Groupings allow a throttle to keep different counters for different logical groups. Each group is defined by a collection of XPaths which are evaluated when the action is fired. See Geneos XPaths for more information on XPaths.
Mandatory: No
actions > throttle > grouping > paths > > path Copied
Groupings allow a throttle to keep different counters for different logical groups. Each group is defined by a collection of XPaths which are evaluated when the action is fired. There is also a default group to throttle items that do not match the grouping criteria.
See the Configuring Grouping section for more information. See the Grouping section for more information.
Mandatory: No
Summarising Throttling Copied
actions > throttle > summary Copied
Defines when summary actions should be fired.
Mandatory: No
actions > throttle > summary > send Copied
This is the number of time units after which the summary action should be fired.
Mandatory: Yes
actions > throttle > summary > interval Copied
This is the time interval in use seconds, minutes or hours.
Mandatory: Yes
actions > throttle > summary > strategy Copied
Which strategy should be used? Fire the action a configured time after the first allowed action or a configured time after the first blocked action.
Mandatory: Yes
actions > throttle > summary > action Copied
The summary action that should be fired.
Mandatory: Yes
Action examples Copied
Script Copied
An example UNIX script which accesses action parameters is shown below.
#!/bin/sh
echo _ACTION = ${_ACTION}
echo _SEVERITY = ${_SEVERITY}
echo _VARIABLEPATH = ${_VARIABLEPATH}
echo _GATEWAY = ${_GATEWAY}
echo _PROBE = ${_PROBE}
echo _NETPROBE_HOST = ${_NETPROBE_HOST}
echo _MANAGED_ENTITY = ${_MANAGED_ENTITY}
echo _SAMPLER = ${_SAMPLER}
echo _DATAVIEW = ${_DATAVIEW}
echo _VARIABLE = ${_VARIABLE}
echo _REPEATCOUNT = ${_REPEATCOUNT}
An equivalent Windows batch file is as follows:
@echo off
echo _ACTION = %_ACTION%
echo _SEVERITY = %_SEVERITY%
echo _VARIABLEPATH = %_VARIABLEPATH%
echo _GATEWAY = %_GATEWAY%
echo _PROBE = %_PROBE%
echo _NETPROBE_HOST = %_NETPROBE_HOST%
echo _MANAGED_ENTITY = %_MANAGED_ENTITY%
echo _SAMPLER = %_SAMPLER%
echo _DATAVIEW = %_DATAVIEW%
echo _VARIABLE = %_VARIABLE%
echo _REPEATCOUNT = %_REPEATCOUNT%
Multi Line Variables In Actions Copied
Outputting multi line messages stored in environment variables are not supported by any of the built-in echo commands for Windows and Linux. To work around this issue you could use the following examples to output multi line messages from an environment variable.
Windows Copied
VBScript has been used for this example as it is present in Windows 2000 and up.
Option Explicit
Dim objWindowsShell
Dim objEnvironmentSet objWindowsShell = WScript.CreateObject("WScript.Shell")
For Each objEnvironment In objWindowsShell.Environment("PROCESS")
WScript.Echo objEnvironment
Next
Set objWindowsShell = Nothing
Linux Copied
#!/bin/sh
/bin/echo _ACTION = "${_ACTION}"
/bin/echo _SEVERITY = "${_SEVERITY}"
/bin/echo _VARIABLEPATH = "${_VARIABLEPATH}"
/bin/echo _GATEWAY = "${_GATEWAY}"
/bin/echo _PROBE = "${_PROBE}"
/bin/echo _NETPROBE_HOST = "${_NETPROBE_HOST}"
/bin/echo _MANAGED_ENTITY = "${_MANAGED_ENTITY}"
/bin/echo _SAMPLER = "${_SAMPLER}"
/bin/echo _DATAVIEW = "${_DATAVIEW}"
/bin/echo _VARIABLE = "${_VARIABLE}"
/bin/echo _REPEATCOUNT = "${_REPEATCOUNT}"
/bin/echo _VALUE = "${_VALUE}"
Shared library Copied
Shared library actions are more powerful than script or executable actions. This is because shared library actions can store state information between invocations of the action, since the library is loaded as a part of gateway and will remain loaded until removed from the gateway configuration.
Gateway can call any function within the library with the correct function signature, which is displayed below:
extern "C" int functionName(int, char**);
When called, the function will be passed the number of variable strings in the first parameter, and an array of null-terminated strings in the second parameter, each string of the form NAME=VALUE
. The return value is only reported to the gateway log and not used otherwise.
#include <stdio.h>
extern "C" int logAction(int argc, char* argv[]) { for (int ix = 0; ix < argc; ++ix) { printf("%s\n", argv[ix]); } return 0; }
The example above can be saved to a file (e.g. logAction.cpp) and compiled with the following command on a Linux system. This produces the library file logAction.so which contains the function logAction.
g++ logAction.cpp -fpic -Wl,-G -shared -o logAction.so
Libemail Copied
Overview Copied
The gateway is packaged with a shared library called libemail.so that provides simple SMTP mail functionality. It has one exported function SendMail that sends an e-mail via a configured SMTP server. Values are passed into the function using the NAME=VALUE
syntax described above and can be set using the shared library static parameters settings or from Action user data. Any issues encountered while running are output to stderr.
The library works by having a set of predefined formats (you can think of these as templates) which can be overridden. The format contains the text of the message that would be sent when the action is triggered.
The template can contain a number of macros that are substituted when the %(NAME_OF_MACRO) is found in the text of the the format. Some macros are defaulted for you and listed in the configuration section below. The default message formats can be overridden by setting a static parameter with the same name and supplying the new text. Message formats are listed in the message format section below.
You can define any macro name you want and use these in your message format. However in addtition the library will be supplied with a number of pre-configured macros which are defaulted with useful or default settings. These can be overridden in the static parameters section of the actions configuration.
Configuration Copied
To configure libemail set up a sharedLibrary-type action or effect with the libraryFile set to libemail.so
and the functionName set to SendMail
. For Alerting this is the minimum set-up required, although typically users will want to specify a server (_SMTP_SERVER
) and the return path and name (_FROM
and _FROM_NAME
). When using the library from an action it will also be necessary to set the _TO
field in the user data.
All supported configuration parameters are listed below:
Note
You will see below a number of message formats which use Gateway supplied parameters such as %(_VALUE), %(_SEVERITY) and %(_ASSIGNEE_EMAIL) which is populated for events triggered by user assignment. You can use Gateway supplied parameters such as these in your configuration options above. This allows you to tailor the subject, addressing etc.
Note
You may also use your own macro names in your own message formats.
Message Formats Copied
If an _ALERT
parameter is present libemail assumes it is being called as part of a gateway alert and will use the appropriate format depending on the value of _ALERT_TYPE
(Alert, Clear, Suspend, or Resume). If no _ALERT
parameter is specified libemail assumes it is being called as part of an action and uses _FORMAT
.
A user defined format will always override the default format. If the _FORMAT
parameter is specified by the user then this will override any default formats whether or not _ALERT
is present.
Subjects behave in the same way as formats.
Formats and subjects have a simple parameter expansion capability using the %(%(_ALERT)
will expand to the name of the alert.
Default _FORMAT Copied
This is an automatically generated mail from Geneos Gateway: %(_GATEWAY)
Action "%(_ACTION)" is being fired against Geneos DataItem %(_VARIABLEPATH)
The dataitem value is "%(_VALUE)" and its severity is %(_SEVERITY)
Default _ALERT_FORMAT Copied
This is an automatically generated mail from Geneos Gateway: %(_GATEWAY)
Alert "%(_ALERT)" is being fired because Geneos DataItem %(_VARIABLE) in dataview %(_DATAVIEW) in Managed Entity %(_MANAGED_ENTITY) is at %(_SEVERITY) severity.
The cell value is "%(_VALUE)"
This Alert was created at %(_ALERT_CREATED) and has been fired %(_REPEATCOUNT) times.
The item's XPath is %(_VARIABLEPATH)
This alert is controlled by throttle: "%(_THROTTLER)".
The default _ALERT_FORMAT also lists the values of all matched alert levels.
Default _CLEAR_FORMAT Copied
This is an automatically generated mail from Geneos Gateway: %(_GATEWAY).
Alert "%(_ALERT)" is being cancelled because Geneos DataItem %(_VARIABLE) in dataview %(_DATAVIEW) in Managed Entity %(_MANAGED_ENTITY) is at %(_SEVERITY) severity.
The cell value is "%(_VALUE)"
This Alert was created at %(_ALERT_CREATED) and has been fired %(_REPEATCOUNT) times.
The item's XPath is %(_VARIABLEPATH)
This alert is controlled by throttle: "%(_THROTTLER)".
The default _CLEAR_FORMAT also lists the values of all matched alert levels.
Default _SUSPEND_FORMAT Copied
This is an automatically generated mail from Geneos Gateway: %(_GATEWAY).
Alert "%(_ALERT)" is being suspended because of: "%(_SUSPEND_REASON)". No notifications will be fired for this alert until it is resumed. If the alert is cancelled before it is resumed no further notifications will be fired.
The cell value is "%(_VALUE)"
This Alert was created at %(_ALERT_CREATED) and has been fired %(_REPEATCOUNT) times.
The item's XPath is %(_VARIABLEPATH)
This alert is controlled by throttle: "%(_THROTTLER)".
The default _SUSPEND_FORMAT also lists the values of all matched alert levels.
Default _RESUME_FORMAT Copied
This is an automatically generated mail from Geneos Gateway: %(_GATEWAY).
Alert "%(_ALERT)" is being resumed because of: "%(_RESUME_REASON)". Geneos DataItem %(_VARIABLE) in dataview %(_DATAVIEW) in Managed Entity %(_MANAGED_ENTITY) is %(_SEVERITY) severity.
The cell value is "%(_VALUE)"
This Alert was created at %(_ALERT_CREATED) and has been fired %(_REPEATCOUNT) times.
The item's XPath is %(_VARIABLEPATH)
This alert is controlled by throttle: "%(_THROTTLER)".
The default _RESUME_FORMAT also lists the values of all matched alert levels.
Default _SUMMARY_FORMAT Copied
This is an automatically generated mail from Geneos Gateway: %(_GATEWAY)
Summary for alert throttle "%(_THROTTLER)"
%(_VALUE) Alerts have been throttled in the last %(_SUMMARY_PERIOD), including:
%(_DROPPED_ALERTS) Alert(s)
%(_DROPPED_CLEARS) Clear(s)
%(_DROPPED_SUSPENDS) Suspend(s
%(_DROPPED_RESUMES) Resume(s)
The default _SUMMARY_FORMAT also lists all current alerts controlled by the throttle.
Parameter | Effect |
---|---|
_SMTP_SERVER | Address of SMTP server to connect to (defaults to localhost). |
_SMTP_PORT | Port of SMTP server (defaults to 25) |
_SMTP_TIMEOUT |
Timeout value for communications between the SMTP server and the library. Specifies how long the library should wait for each interaction with the SMTP server. e.g. connect, reads and writes. The time is for each operation and not the overall time. The time is specified as <seconds.microsections> e.g. 1.500 which states one second and 500 microseconds. The default is zero which means wait indefinitely or until the system timeout (implementation dependent). |
_FROM | Return path e-mail address (defaults to geneos@localhost) |
_FROM_NAME | Return path name (defaults to Geneos) |
_FORMAT | Format of mail message (default below) |
_ALERT_FORMAT | Format of alert-type Alert mail message (default below) |
_CLEAR_FORMAT | Format of alert-type Clear mail message (default below) |
_SUSPEND_FORMAT | Format of alert-type Suspend mail message (default below) |
_RESUME_FORMAT | Format of alert-type Resume mail message (default below) |
_SUMMARY_FORMAT | Format of alert-type ThrottleSummary mail message (default below) |
_SUBJECT | Subject of mail message (defaults to "Geneos Alert") |
_ALERT_SUBJECT | Subject of alert-type Alert message (defaults to "Geneos Alert Fired") |
_CLEAR_SUBJECT | Subject of alert-type Clear message (defaults to "Geneos Alert Cancelled") |
_SUSPEND_SUBJECT | Subject of alert-type Suspend message (defaults to "Geneos Alert Suspended") |
_RESUME_SUBJECT | Subject of alert-type Resume message (defaults to "Geneos Alert Resumed") |
_SUMMARY_SUBJECT | Subject of alert-type ThrottleSummary message (defaults to "Geneos Alert Throttle Summary") |
_TO | Comma delimited message of recipient addresses. This has no default and must be set (the Alerting functionality of the gateway will set it automatically, along with the CC and BCC lists, and associated info types and names) |
_TO_INFO_TYPE | Corresponding comma delimited list of recipient information types. Addresses that do not have corresponding information types matching "Email" or "e-mail" (case insensitive) will be excised from the list. This list must be the same length as the _TO list if it is provided. (If it is absent then all addresses are assumed to be e-mail addresses). |
_TO_NAME | Corresponding comma delimited list of recipient names. If present, this list must be the same length as the _TO list. |
_CC | As with _TO but for Carbon Copy address list. |
_CC_INFO_TYPE | As with _TO_INFO_TYPE but for Carbon Copy address list. |
_CC_NAME | As with _TO_NAME but for Carbon Copy address list. |
_BCC | As with _TO but for Blind Carbon Copy address list. |
_BCC_INFO_TYPE | As with _TO_INFO_TYPE but for Blind Carbon Copy address list. |
_BCC_NAME | As with _TO_NAME but for Blind Carbon Copy address list. |
Alerting Copied
Overview Copied
The Alerting feature of gateway automatically notifies users when a cell severity goes to Warning or Critical. It is completely removed from the rule logic that sets the cell severity and can instead issue alerts based on arbitrary criteria such as system or server location.
Alerts are configured in hierarchies trees based on the properties of the item being alerted on. A hierarchy has a defined set of levels specifying what property to match on. Properties can be part of the geneos directory structure (e.g. plug-in or sampler) or a user defined managed entity attribute (e.g. COUNTRY or CITY). Alerts can be defined at any level of a hierarchy.
When the severity on a cell changes to Warning or Critical the Alert Manager walks down the hierarchy matching the cell properties at each level. If a matching alert is found it is fired.
Alerts can vary in complexity. A ‘Show Alerts’ command, run by right clicking on any cell, will show the Alerts that would fire were a severity change to occur.
Hierarchies Copied
A hierarchy tree has a set depth, with each level defined as matching to a particular data-item parameter. The following parameter types are available:
The tree can be formed of any number of branches, not descending below the defined levels. Every branch (or alert level) has a name value which is matched to its corresponding level match parameter, i.e. if the first level has a match parameter of pluginName then all level one branches will have names corresponding to different plugins. Matching is exact (wildcards are not supported) and case sensitive. There is no need to provide alert levels for every possible value.
Each alert level can define both warning and critical notifications. When a dataview cell’s severity is changed to warning or critical, the cell is compared against the hierarchy, the Alert Manager walks down the matching alert levels and the most specific (bottommost) notification found is fired. Less specific notifications further up the tree can also optionally be fired.
Any number of hierarchies can be created. Default behaviour is to fire a single notification from every matching hierarchy, but hierarchies have priorities and it is possible to configure only the highest priority matching alert to fire.
Parameter | Meaning |
---|---|
Managed Entity Parameter | A user defined managed entity parameter (e.g. COUNTRY). |
Managed Entity Name | The name of the managed entity. |
Row Name | The table row name of the cell. |
Column Name | The table column name of the cell. |
Headline Name | The name of the headline. |
Dataview Name | The name of the dataview. |
Sampler Name | The name of the sampler. |
Sampler Type | The type of the sampler. |
Sampler Group | The group of the sampler. |
Plug-In Name | The name of the plug-in. |
Notifications Copied
At each alert level, notifications can be defined for warning and critical severity. Notifications are specified in an escalation ladder, if the alert is still valid after the escalation interval the next escalation level is fired. Each notification can fire multiple effects to separate distribution lists, each with its own repeat interval.
Distribution Lists Copied
Each notification effect has three distribution lists: To, CC (Carbon Copy), and BCC (Blind Carbon Copy). When adding a user or user-group to a distribution list the information type to add can be specified, this corresponds to the generic user information defined in the Gateway Authentication. Be default this will be “Email” but any information can be specified and passed to the effect, and because an effect can be a user defined script any information can be interpreted.
Effects Copied
When a notification fires it calls an effect, passing it information about the alert, the distribution lists, and about the data-item the alert has been created for.
Variables Passed To Effect Copied
The following variables are passed to the called effect. The exact form in which they are passed will depend on the type of effect.
Note
userdata variables are only available in Effects called from Actions. They are not available in Effects called from Alerts. userdata variables exist only for the duration of a rule execution and are not available after the rule has completed execution.
Alert information Copied
_ALERT
The name of the alert, this is formed of the hierarchy name, the name of each alert level, the severity, and the level down the escalation chain: e.g. for a hierarchy matching on plugin name and row name: myHeirarcy/FKM/myFile.txt/WARNING/0
_ALERT_CREATED
The time the alert was created.
_ALERT_TYPE
The type of alert being fired, this can be one of five values: Alert
, Clear
, Suspend
, Resume
, or ThrottleSummary
. See below for more details on Suspend and Resume alerts.
_CLEAR
Whether or not this alert is a clear.
_ALERT_CLEARED
The time the alert was cleared. This variable is present only if the alert type is Clear
.
_SUSPEND_REASON
Only present for Suspend alerts. The reason why the alert is being suspended.
_RESUME_REASON
Only present for Resume alerts. The reason why the alert is being resumed.
_HIERARCHY
The name of the hierarchy
_HIERARCHY_LEVEL
The hierarchy level on which the alert was fired (depth down the tree). Zero biased.
_LEVEL_
The matching parameter of the hierarchy level (e.g._ROWNAME). Level is zero biased.
_MATCHED_
The matching parameter of the hierarchy level in a human readable form (e.g. rowName)
_THROTTLER
Name of the throttle controlling this alert, blank if not throttled.
Data item information Copied
_SEVERITY
The data-item severity. One of UNDEFINED
, OK
, WARNING
, CRITICAL
or USER
.
_VALUE
Value of data-item
_VARIABLEPATH
The full gateway path to the data-item.
_GATEWAY
The name of the gateway firing the action.
_PROBE
The name of the probe the data-item belongs to
_NETPROBE_HOST
The hostname of the probe the data-item belongs to (if any).
_MANAGED_ENTITY
The name of the managed entity the data-item belongs to.
_SAMPLER
The name of the sampler the data-item belongs to.
_SAMPLER_TYPE
The type of the sampler the data-item belongs to.
_SAMPLER_GROUP
The group of the sampler the data-item belongs to.
_DATAVIEW
The name of the dataview the data-item belongs to.
_PLUGINNAME
The name of the plugin that created the dataview.
_ROWNAME
The name of the row the data-item belongs to (if any).
_COLUMN
The name of the column the data-item belongs to (if any).
_HEADLINE
The name of the headline (if the data-item is a headline).
_VARIABLE
Short name of the data-item if it is a managed variable, in the form <!>name
for headlines or row.col
for table cells. This value is provided for backwards compatibility with older action scripts.
_REPEATCOUNT
The number of times this notification has been repeated for the triggering data-item.
_FIRSTCOLUMN
The name of the first column of the dataview the data-item belongs to (if any).
_
The values of cells in the same row as the data-item. Environment variables are named with the column name (prefixed with an underscore), and the values are the values of the cell in that column.
The values of any managed entity attributes which have been specified. Environment variables are named with the managed entity attribute names, and the values contain the attribute values.
_KBA_URLS
A list of application knowledge base article URLs, separated by newlines.
Distribution Lists Copied
_TO
Comma delimited list of message recipients, populated from the values in the user information in the authentication section (e.g. auser@itrsgroup.com, 12345678)
_TO_INFO_TYPE
Corresponding comma delimited list of information types. The effect can use this to treat each element in the _TO list differently if necessary. (e.g. Email, Pager).
_TO_NAME
Corresponding comma delimited list of users full names, as specified in the authentication section. (e.g. Alan User, Alan User)
_CC
As with the _TO list, a comma delimited list of Carbon Copy recipients.
_CC_INFO_TYPE
As with the _TO_INFO_TYPE list, a comma delimited list of Carbon Copy recipient information types.
_CC_NAME
As with the _TO_NAME list, a comma delimited list of Blind Carbon Copy recipient names.
_BCC
As with the _TO list, a comma delimited list of Carbon Copy recipients.
_BCC_INFO_TYPE
As with the _TO_INFO_TYPE list, a comma delimited list of Blind Carbon Copy recipient information types.
_BCC_NAME
As with the _TO_NAME list, a comma delimited list of Blind Carbon Copy recipient names.
libemail Copied
The gateway ships with pre-built shared library called libemail.so designed to interpret alert or action parameters and send e-mails using an SMTP server. See the appendix for more details.
Repeating Notifications Copied
A repeating notification is a notification which repeats (i.e. is sent again) after a configured time period, provided the alert remains valid for that time. When a notification fires for the first time the _REPEATCOUNT environment variable (or equivalent for non-script type alert) has an initial value of 0. This value is incremented for each subsequent repetition.
Repeating notifications could be used for example to inform users at regular intervals that a problem still exists. It is possible to set repeating notifications to cancel after the notification has escalated.
There are a number of situations where notifications will be suppressed by default, such as the Gateway starting up (see fireOnComponentStartup and fireOnConfigurationChange). In these cases, the initial action will not fire but any repeats or escalations will be scheduled and fire later if the alert continues to be valid.
Escalation Copied
Notifications are configured in an escalation chain where each step on the chain has a separate escalation interval (specified in seconds). If the alert is still valid (i.e. the problem has not been fixed) after the escalation interval has passed the next notification is fired.
Clears Copied
A notification with clear set will be fired one final time when the triggering data-item’s severity drops below the alert severity to inform the recipients that the alert has been cancelled. The variables passed to a clear notification are the same as what are being passed to a normal notification except that:
-
The
_CLEAR
variable will be set toTRUE
. -
The
_ALERT_TYPE
variable will be set toClear
. -
An additional
_ALERT_CLEARED
variable will be set as the time the alert was cleared.
A data-item of severity Warning changing to Critical will not produce a clear for the Warning alert. The Alert is instead put on hold and when the data-item’s severity eventually drops back below warning a clear will be issued.
Severity Transition Example 1 Copied
- OK -> Warning - WARNING alert fires. Repeats and escalations scheduled.
- Warning -> Critical - WARNING alert repeats and escalations cancelled. - CRITICAL alert fires. Repeats and escalations scheduled.
- Critical -> Warning - CRITICAL alert repeats and escalations cancelled. - CRITICAL clear fired. - WARNING alert fires (at first escalation level and with a repeat count of 0). Repeats and escalations scheduled.
- Warning -> OK - WARNING alert repeats and escalations cancelled. - WARNING clear fired.
Severity Transition Example 2 Copied
- OK -> Warning - WARNING alert fires. Repeats and escalations scheduled.
- Warning -> Critical - WARNING alert repeats and escalations cancelled. - CRITICAL alert fires. Repeats and escalations scheduled.
- Critical -> OK - CRITICAL alert repeats and escalations cancelled. - CRITICAL clear fired. - WARNING clear fired.
Suspended Alerts Copied
Alerts are suspended when any of the following occur:
- The alert hierarchy goes out of active time.
- The target cell becomes inactive (due to rule activeTime).
- The target cell is snoozed.
When an alert is suspended it is put into special state where no notifications are sent. If the Clear flag is set, a single Suspend (_ALERT_TYPE=Suspend
) notification is sent to inform the users that the Alert has been suspended and the reason why (_SUSPEND_REASON
).
If the alert is still valid (i.e. the target cell is still red/orange) once all three suspension criteria are no longer met, then the alert is resumed. A resumed alert has its escalation chain and repeat count reset.
When the alert is resumed, notifications are only fired if the alert first occurred during inactive or snoozed state and the first notification of the alert was suppressed. Resume notifications will have _ALERT_TYPE
set to Resume
and the reason the alert has been resumed (_RESUME_REASON
).
There are six possible reason why an alert can be suspended or resumed:
The following examples illustrate the precise suspend and resume behaviour. They use the hierarchy activetime but are equally valid for the snoozed and active state of the target cell.
The examples below use the following symbols:
- Red triangles mark notifications that are fired.
- Yellow triangles mark Suspend notifications that are fired.
- Green triangle mark Clear notifications that are fired.
- White (empty) triangles mark where notifications would have been fired but are not, as explained in each example.
Example 1: Repeating Copied
While hierarchy is inactive, repeats stop. When hierarchy goes active, repeats start again with a repeat count of zero. When the hierarchy returns to active, a notification is not fired. This is because the alert for the cell going critical already fired during the previous active period.
Example 2: Escalations Copied
When the cell becomes critical in the inactive period, an alert is not fired. The time when the cell severity changed is recorded and included in the Resume notification.
The alert is queued until active, and the escalation chain begins only in the active period.
A Resume notification is fired because the cell became critical during the inactive period. The notification includes the timestamp of the severity change.
Example 3: Resetting Escalations Copied
The escalation notification fires after the escalation interval, and is cancelled when the alert is suspended on entering the inactive period. When the hierarchy returns to active, a notification is not fired. This is because the alert for the cell going critical already fired during the previous active period.
When the hierarchy goes active, the escalation chain is restarted. A notification fires if the escalation interval elapses and the cell is still critical.
Example 4: Clears Copied
When the cell is critical on entering the inactive period, a Suspend notification is fired.
When the cell returns to OK during the inactive period, a Clear notification is not fired.
Example 5: Cells Changing State While Alert Is Suspended Copied
When the cell is critical on entering the inactive period, a Suspend notification is fired.
While the alert is suspended, notifications are not fired.
When alert is resumed, cell state is re-evaluated and notifications are restarted. A Resume notification is fired because the cell became critical during the inactive period.
Clear is later fired when cell returns to OK.
Example 6: Intervals That Span Entire Suspended Period Copied
When an alert is suspended, all scheduled repeats and escalations are cancelled. Even if the alert is resumed before they would have fired, a notification does not occur. When the hierarchy returns to active, a notification is not fired. This is because the alert for the cell going critical already fired during the previous active period.
When the hierarchy goes active, scheduled repeats and escalations are restarted. A notification fires if the escalation interval elapses and the cell is still critical.
Reason | Description |
---|---|
Hierarchy Active Time | The hierarchy has gone in or out of active time. |
Cell Active State | The target cell has gone active or inactive. |
Cell Snooze State | The target cell has been snoozed or unsnoozed. |
Ancestor Active State | An ancestor of the target cell has gone active or inactive. |
Ancestor Snooze State | An ancestor of the target cell has been snoozed or unsnoozed. |
Configuration Change | The configuration specifying the activeTime or snoozing/active state restrictions for this alert has been altered. |
Active Times Copied
Alert hierarchies can optionally reference an active time by name using the activeTime setting, allowing time-based control of alerts. Setting an active time will suspend all alerts from that hierarchy, preventing them from being fired while the time is inactive. Once the active time period resumes, and if the alert is still valid, it will fire and repeat / escalate as normal.
See the section on active times for details on this gateway feature.
For example, an active time could be configured on a hierarchy which emails users. Outside of office hours there will be nobody to respond to the notification, and so the alert can be disabled at these times.
Restricted Alerts Copied
Alert hierarchies can be configured with restrictions which will suspend them, depending upon the condition of the data-item that the alert is fired on. Currently conditions which can be checked include the snooze and active state of the data-item or parent items. For alerts configured with multiple restrictions, the alert will fire only if none of the restrictions apply.
Specifying a restriction on items can help prevent unwanted alerts from firing. Snoozing is typically used to ignore an error while it is being investigated, whereas active state changes based on an active time. Depending upon the alert, it may be helpful to ignore a condition if either of these conditions is true. Since this is a common activity, these are the default restrictions for alerts.
For example, an alert which sends emails could be restricted to firing only if an item is not snoozed, since if the item is snoozed someone is investigating the problem.
Alert Throttling Copied
In a number of scenarios it is necessary to throttle alert notifications so that some of them are not sent. To do this a throttle needs to be defined and referenced from the throttling section of an Alert or Hierarchy.
Gateway2 allows you to configure rolling throttles to restrict the number of notifications. With a rolling throttle it is possible to say only allow one notification to be fired within twenty four hours or five notifications within a five minute period.
Summaries Copied
A throttle can fire a summary effect at configurable periods. This effect could be used to send an email or text message summarising the number of alerts throttled since the first alert was fired or the first alert was blocked, then subsequently since the last summary was sent. Naturally if no alerts were throttled during this period no summary is sent.
As alerts are throttled their distribution lists are collected and the summary effect is fired to the combined distribution list of every user who missed an alert controlled by that throttle.
Summary Effect Copied
The summary effect has the following information set in the environment.
Alert information Copied
_ALERT
“UNDEFINED”
_ALERT_CREATED
Time and date at which summary is fired
_ALERT_TYPE
“ThrottleSummary”
_CLEAR
“FALSE”
_HIERARCHY
“UNDEFINED”
_HIERARCHY_LEVEL
“UNDEFINED”
_CURRENT_ALERTS
Number of currently valid alerts controlled by this throttle.
_CURRENT_ALERT_
The name of the current alert. One for every currently valid alert controlled by this throttle.
_CURRENT_ALERT_HOLD_
Whether or not the current alert is on hold (an alert of a higher severity has been raised against the DataItem). One for every currently valid alert controlled by this throttle.
_CURRENT_ALERT_SUSPEND_
Whether or not the current alert is suspended. One for every currently valid alert controlled by this throttle.
_CURRENT_ITEM_
XPath of the alert DataItem. One for every currently valid alert controlled by this throttle.
_THROTTLER
The name of the throttle
DataItem Information Copied
_GATEWAY
Name of gateway
_SEVERITY
“UNDEFINED”
_MANAGED_ENTITY
“UNDEFINED”
_NETPROBE_HOST
“UNDEFINED”
_VARIABLE
“THROTTLER”
Throttling Information Copied
_VALUE
The total number of throttled notifications.
_DROPPED_ALERTS
The number of throttled Alert notifications.
_DROPPED_CLEARS
The number of throttled Clear notifications.
_DROPPED_SUSPENDS
The number of throttled Suspend notifications.
_DROPPED_RESUMES
The number of throttled Resume notifications.
_SUMMARY_PERIOD
String describing the summary period.
_SUMMARY_STRATEGY
String describing the summary strategy.
Distribution Lists Copied
_TO
Comma delimited list of message all _TO recipients who had a notification throttled by this throttle.
_TO_INFO_TYPE
Corresponding comma delimited list of information types. The effect can use this to treat each element in the _TO list differently if necessary. (e.g. Email, Pager)
_TO_NAME
Corresponding comma delimited list of users full names, as specified in the authentication section. (e.g. Alan User, Alan User)
_CC
Comma delimited list of message all _CC recipients who had a notification throttled by this throttle.
_CC_INFO_TYPE
As with the _TO_INFO_TYPE list, a comma delimited list of Carbon Copy recipient information types.
_CC_NAME
As with the _TO_NAME list, a comma delimited list of Blind Carbon Copy recipient names.
_BCC
Comma delimited list of message all _BCC recipients who had a notification throttled by this throttle.
_BCC_INFO_TYPE
As with the _TO_INFO_TYPE list, a comma delimited list of Blind Carbon Copy recipient information types.
_BCC_NAME
As with the _TO_NAME list, a comma delimited list of Blind Carbon Copy recipient names.
Grouping Copied
Groupings allow a throttle to keep different counters for different logical groups. Each group is defined by a collection of XPaths which are evaluated when the action is fired. See Geneos XPaths for more information on XPaths. There is also a default group to throttle items that do not match the grouping criteria.
The result of evaluating each of these XPaths are gathered together to uniquely identify the throttling group. To be part of a group all of the grouping criteria must be met and if the grouping criteria are not all met the default group will be used.
Examples Copied
Throttling per dataview Copied
ancestor::dataview
This will evaluate to the dataview of the data-item that triggered the alert. Effectively defining separate throttling for each dataview as the throttle is applied.
If you have FKM and cpu dataviews triggering alerts they would each fire alerts up to the configured limited within the configured time period.
Throttling separately for one specific Copied
plugin.
ancestor::dataview[@name="cpu"]
This will throttle actions triggered by dataviews named “cpu” separately to all other actions to which the throttle is applied. There is an implicit default throttle for data-items that do not belong to a configured group.
Throttling by row for one specific Copied
plugin.
ancestor::sampler[(param("PluginName")="FKM")]/dataview
@rowname
This will throttle alerts triggered by each row of an FKM dataview separately.
Note
There are two XPaths. Both have to be satisfied. This effectively defines a group for each row of the FKM dataview. When the alert is fired the questions asked are “Is this part of the FKM dataview?” and “which row does it belong to?”
Throttle each data item separately Copied
.
(dot) The current data-item; Throttle every data-item separately.
Throttling by set of plugin types Copied
ancestor::dataview[@name="disk"]
ancestor::dataview[@name="cpu"]
ancestor::dataview[@name="network"]
ancestor::dataview[@name="hardware"]
This will throttle “system” alerts together in one group.
Throttling by fkm dataviews per Copied
filename.
ancestor::sampler[(param("PluginName")="FKM")]/dataview
../cell[(@column="filename")]/@value
This will throttle alerts triggered by each fkm file from each fkm dataview seperately. Alerts fired from cells associated with the same filename will be throttled into the same group.
Valid Grouping Paths Copied
You may receive a warning about parts of a configured grouping path not uniquely identifying a gateway item. Going in an upward direction (i.e. ancestor or “…”) this is ok and will not generate a warning. The problem occurs when going “downwards”, let’s say your XPath is defined as:
ancestor::probe[@name="Holly"]//sampler
The intention being to throttle actions for all samplers on that probe. This will work for a while, until samplers are added or taken away from the probe. When the next action is fired the set of active samplers will be different to the previous set. This will lead to the action being throttled by a different group. This probably isn’t the intended behaviour and is why the gateway issues a warning.
If the configured XPath were simply:
ancestor::probe[@name="Holly"]
This would throttle every action originating from that probe.
In a number of scenarios it is necessary to throttle alert notifications so that some of them are not sent. To do this a throttle needs to be defined and referenced from the throttling section of an Alert or Hierarchy.
Gateway2 allows you to configure rolling throttles to restrict the number of notifications. With a rolling throttle it is possible to say only allow one notification to be fired within twenty four hours or five notifications within a five minute period.
When a notification fires it calls an effect (as defined in the effects section), passing it information about the alert, the distribution lists, and about the data-item the alert has been created for.
Alert Commands Copied
Show Alerts Copied
This command shows all the alerts that are applicable to a selected data-item. Each applicable alert’s configuration (as configured in the Gateway Setup Editor) is presented in a formatted output to the Active Console Output pane. This command applies only to data view data cells.
Evaluate Alerts Copied
This command shows all the data-items in the system that currently matches the alert criteria in that part of the hierarchy tree. It can be executed by right clicking on an Alert in the Alerting hierarchy in the Gateway Setup Editor. The matching criterion is as specified in the hierarchy tree. Example, if you configure an Alerting hierarchy by managed entity name, and configure an Alert underneath this Alerting hierarchy with a managed entity name to report all Warning and Critical alerts, only this managed entity (or its descendant data-items) that currently have warning or critical severity will be displayed.
For each data-item, it presents the following information:
This command is useful in cases where the first severity change caused the Alert to be raised and notified to users but subsequent severity changes might go un-notified because they only keep the Alert valid at that point in time. One can find all the severity changes in all managed variables that currently match the Alert criteria.
Value | Effect |
---|---|
Name | The name of data-item |
Display Name | The display name of data-item |
Type | The data-item type, e.g. ManagedVariable |
User Readable Path |
The XPath of data-item Note: Beginning Geneos 5.5.x, the Managed Entity display name is used in the user readable paths throughout the Gateway Setup Editor, except when the GSE is opened as a standalone application. This only applies if you open the GSE within the Active Console. |
Severity | The severity of data-item (0 - Undefined, 1 - Ok, 2 - Warning, 3 - Critical) |
Snoozed | Whether data-item is snoozed (true, false) |
Knowledge Base | Whether has knowledge base (true, false) |
Active | Whether data-item is active (true, false) |
DirectKnowledgeBase | Whether has direct knowledge base (true, false) |
Snoozed Parents | The number of snoozed parents |
User Assigned | Whether assigned to any user |
ManagedVariable Legacy Name | The legacy name of the managed variable |
ManagedVariable Value | The value of managed variable |
ManagedVariable Cell Column Name | The managed variable column name |
ManagedVariable Cell Row Name | The managed variable row name |
Configuration Copied
alerting > hierarchy Copied
A hierarchy specifies the criteria on which basis an alert is fired and defines the alerts to be fired along with their distribution lists. Any number of hierarchies can be specified.
Mandatory: No
alerting > hierarchyGroup Copied
Hierarchy groups are used to group sets of hierarchies, to improve ease of setup management.
Mandatory: No
alerting > hierarchyGroup > name Copied
Specifies the name of the hierarchy group. Although the name is not used internally by gateway, it is recommended to give the group a descriptive name so that users editing the setup file can easily determine the purpose of the group.
Mandatory: Yes
alerting > hierarchy > name Copied
Unique name that identifies the hierarchy.
Mandatory: Yes
alerting > hierarchy > priority Copied
Specifies the priority of the hierarchy. When stopWhenMatched is set hierarchies are processed in priority order. If two hierarchies are specified with the same priority the gateway will determine the order in which they are processed.
Mandatory: Yes
alerting > hierarchy > levels Copied
Specifies the matching criteria for every level of the hierarchy tree.
Mandatory: No
alerting > hierarchy > levels > level > match Copied
The parameter of the data-item that must match the alert name in order to match this level of the hierarchy tree.
Mandatory: Yes
alerting > hierarchy > levels > level > match > managedEntityAttribute Copied
Specifies the managed entity attribute to match on.
alerting > hierarchy > levels > level > match > managedEntityName Copied
Specifies that the alert level should match on the cells managed entity name.
alerting > hierarchy > levels > level > match > rowName Copied
Specifies that the alert level should match on the cells row name.
alerting > hierarchy > levels > level > match > columnName Copied
Specifies that the alert level should match on the cells column name.
alerting > hierarchy > levels > level > match > headlineName Copied
Specifies that the alert level should match on the headline name.
alerting > hierarchy > levels > level > match > dataviewName Copied
Specifies that the alert level should match on the cells dataview name.
alerting > hierarchy > levels > level > match > samplerName Copied
Specifies that the alert level should match on the cells sampler name.
alerting > hierarchy > levels > level > match > samplerType Copied
Specifies that the alert level should match on the cells sampler type.
alerting > hierarchy > levels > level > match > samplerGroup Copied
Specifies that the alert level should match on the cells sampler group.
alerting > hierarchy > levels > level > match > pluginName Copied
Specifies that the alert level should match on the cells plugin name.
alerting > hierarchy > alert Copied
An alert describes a single branch of a hierarchy tree. It is evaluated if its name matches the appropriate level match-parameter of the target data-item where the appropriate level is the depth down the tree.
Hierarchies can contain any number of alert branches, and alert branches can contain any number of child alert branches. The depth of the alert tree cannot descend below the number of defined levels.
Mandatory: No
alerting > hierarchy > alert > name Copied
Used to match against the appropriate levelmatch-parameter of the target data-item. Matching is case sensitive and does not allow wild-cards. The appropriate level is the level for the depth this alert in the hierarchy tree.
For example, if the corresponding level in the hierarchy tree matches on managed entity name, then the name specified here must exactly match the name of a managed entity.
Mandatory: Yes
alerting > hierarchy > alert > alert Copied
Specifies a child alert.
Mandatory: No
alerting > hierarchyProcessing Copied
Specifies how to process the hierarchy. Can take two values: processAll or stopWhenMatched.
Mandatory: No
Default: processAll
alerting > hierarchyProcessing > processAll Copied
Process all hierarchies, firing all alerts that match.
Mandatory: Yes (must be one of processAll or stopWhenMatched)
alerting > hierarchyProcessing > stopWhenMatched Copied
Process hierarchies in priority order and stop after the first notification is fired.
Mandatory: Yes (must be one of processAll or stopWhenMatched)
alerting > hierarchy > activeTime Copied
Specifies an optional activeTime, outside of which the hierarchy will not fire any notifications. See Active Times.
Mandatory: No
alerting > hierarchy > restrictions > snoozing Copied
The snoozing restriction can be used to prevent an alert notification firing depending upon the snooze state of the data-item which triggered the alert. Allowable values are listed below:
Mandatory: No
Default: fireIfItemAndAncestorsNotSnoozed
Value | Effect |
---|---|
alwaysFire | The alert is always fired, regardless of snooze state. |
fireIfItemNotSnoozed | The alert is fired if the triggering data-item is not snoozed. |
fireIfItemAndAncestorsNotSnoozed | The alert is fired if the triggering data-item and all of its ancestor data-items are not snoozed. |
alerting > hierarchy > restrictions > inactivity Copied
The inactivity restriction can be used to prevent an alert notification firing depending upon the active state of the data-item which triggered the alert. Allowable values are listed below:
Mandatory: No
Default: fireIfItemAndAncestorsActive
Value | Effect |
---|---|
alwaysFire | The alert is always fired, regardless of active state. |
fireIfItemActive | The alert is fired if the triggering data-item is active. |
fireIfItemAndAncestorsActive | The alert is fired if the triggering data-item and all of its ancestor data-items are active. |
alerting > fireOnComponentStartup Copied
Alerts may be fired when a gateway or netprobe is first started.
Mandatory: No
Default: false
alerting > fireOnConfigurationChange Copied
Alerts may be fired following a change of the gateway configuration file.
Mandatory: No
Default: false
alerting > hierarchy > throttle Copied
Specify a default throttle for all alerts in this hierarchy. This must be the name of a throttle defined in the Alerting section of the gateway setup.
Mandatory: No
alerting > hierarchy > alert > warning Copied
Defines the warning notification for this alert branch. This alert is fired when a matching cell’s severity is set to warning by a rule, and remains valid until the cell’s severity drops below warning, rises to critical, or the cell is deleted.
Any number of notifications can be defined in an escalation chain. Initially only the first will be fired, this will escalate to the second after the escalation interval has passed, that will then escalate to the third, if defined, and so on until either all notifications have been fired or the alert is no longer valid.
Mandatory: No
alerting > hierarchy > alert > warning > level Copied
Defines a single notification.
Mandatory: No
alerting > hierarchy > alert > warning > level > escalationInterval Copied
Period in seconds after which the alert, if still valid, will escalate to the next notification in the escalation chain, if it has been defined. This defaults to 300 (5 minutes).
Mandatory: No
Default: 300
alerting > hierarchy > alert > warning > level > notification Copied
Defines a single notification in the escalation chain of notifications. A notification can contain any number of effects, each with a separate distribution list and repeat interval.
Mandatory: No
alerting > hierarchy > alert > warning > level > notification > clear Copied
Whether or not to fire a clear notification when the alert is cancelled. Clears are fired using the same effect and distribution lists as the alert but with the variable _CLEAR set to true.
Mandatory: No
Default: false
alerting > hierarchy > alert > warning > level > notification > repeat Copied
Repeat settings for the notification.
Mandatory: No
alerting > hierarchy > alert > warning > level > notification > repeat > interval Copied
The repeat interval for the notification in seconds. The effect will be fired each interval while the alert is valid and the restrictions are met.
Mandatory: Yes
alerting > hierarchy > alert > warning > level > notification > repeat > behaviour Copied
Specifies the operation of repeats after an escalation is triggered. By default, operation continues firing even after an escalation, while it ceases operation if set to cancelAfterEscalation.
Mandatory: No
Default: continueAfterEscalation
alerting > hierarchy > alert > warning > alwaysNotify Copied
Specifies that the alert should fire even if other, more specific, matching alerts were fired further down this branch of the hierarchy tree. Default behaviour is for only the most specific matching alert to fire.
Mandatory: No
Default: false
alerting > hierarchy > alert > critical Copied
Defines the critical notification for this alert branch. This alert is fired if a matching cell goes to critical severity and remains valid until the cell drops below critical severity or is deleted.
Any number of notifications can be defined in an escalation chain. Initially only the first will be fired, this will escalate to the second after the escalation interval has passed, that will then escalate to the third, if defined, and so on until either all notifications have been fired or the alert is no longer valid.
Mandatory: No
alerting > hierarchy > alert > critical > level Copied
Defines a single notification.
Mandatory: No
alerting > hierarchy > alert > critical > level > escalationInterval Copied
Period in seconds after which the alert, if still valid, will escalate to the next notification in the escalation chain, if it has been defined. This defaults to 300 (5 minutes).
Mandatory: No
Default: 300
alerting > hierarchy > alert > critical > level > notification Copied
Defines a single notification in the escalation chain of notifications. A notification can contain any number of effects, each with a separate distribution list and repeat interval.
Mandatory: No
alerting > hierarchy > alert > critical > level > notification > clear Copied
Whether or not to fire a clear notification when the alert is cancelled. Clears are fired using the same effect and distribution lists as the alert but with the variable _CLEAR set to true.
Mandatory: No
Default: false
alerting > hierarchy > alert > critical > level > notification > repeat Copied
Repeat settings for the notification.
Mandatory: No
alerting > hierarchy > alert > critical > level > notification > repeat > interval Copied
The repeat interval for the notification in seconds. The effect will be fired each interval while the alert is valid and the restrictions are met.
Mandatory: Yes
alerting > hierarchy > alert > critical > level > notification > repeat > behaviour Copied
Specifies the operation of repeats after an escalation is triggered. By default, operation continues firing even after an escalation, while it ceases operation if set to cancelAfterEscalation.
Mandatory: No
Default: continueAfterEscalation
alerting > hierarchy > alert > critical > alwaysNotify Copied
Specifies that the alert should fire even if other, more specific, matching alerts were fired further down this branch of the hierarchy tree. Default behaviour is for only the most specific matching alert to fire.
Mandatory: No
Default: false
alerting > hierarchy > alert > warning > level > notification > effect Copied
The effect that is to be fired for this notification.
Mandatory: Yes
alerting > hierarchy > alert > critical > level > notification > effect Copied
The effect that is to be fired for this notification.
Mandatory: Yes
alerting > hierarchy > alert > warning > level > notification > user Copied
Defines the user information that will be passed to the effect. Any number of users can be passed to an effect in one of three distribution lists, To, CC, and BCC.
Mandatory: No
alerting > hierarchy > alert > warning > level > notification > user > user Copied
The name of the User, as defined in the Authentication section.
Mandatory: Yes
alerting > hierarchy > alert > warning > level > notification > user > infoType Copied
The user information to include on the distribution list, as defined in the authentication section, user information. Will default to “Email” if not set.
Mandatory: No
Default: Email
alerting > hierarchy > alert > warning > level > notification > user > list Copied
The distribution list to include the user on. Each notification has three distribution lists: To, CC, and BCC. Will default to “To” if not set.
Mandatory: No
Default: To
alerting > hierarchy > alert > warning > level > notification > group Copied
Defines group information that will be passed to the effect.
Groups have now been deprecated in favour of roles, as authentication user groups have also been deprecated for roles.
Please see the documentation for roles if you still want to configure groups.
Role, infoType, list, include.
Mandatory: No
Deprecated: See Roles settings.
alerting > hierarchy > alert > warning > level > notification > role Copied
Defines the Role information that will be passed to the effect. Any number of roles can be passed to an effect in one of three distribution lists, To, CC, and BCC.
Mandatory: No
alerting > hierarchy > alert > warning > level > notification > role > role Copied
The name of the Role, as defined in the Authentication section.
Mandatory: Yes
alerting > hierarchy > alert > warning > level > notification > role > infoType Copied
The Role information to include on the distribution list, as defined in the Gateway Authentication . Will default to “Email” if not set.
Mandatory: No
Default: Email
alerting > hierarchy > alert > warning > level > notification > role > list Copied
The distribution list to include the role on. Each notification has three distribution lists: To, CC, and BCC. Will default to “To” if not set.
Mandatory: No
Default: To
alerting > hierarchy > alert > warning > level > notification > role > include Copied
What information from the role to include in the list. Can be one of three values:
Mandatory: No
Default: ROLE
Value | Effect |
---|---|
ROLE | Include only the information from the actual role section. |
MEMBERS | Include the information from all the role's individual member user sections. |
ROLE+MEMBERS | Include information from both the group section and all the group's individual member user sections. |
alerting > hierarchy > alert > critical > level > notification > user Copied
Defines the user information that will be passed to the effect. Any number of users can be passed to an effect in one of three distribution lists, To, CC, and BCC.
Mandatory: No
alerting > hierarchy > alert > critical > level > notification > user > user Copied
The name of the User, as defined in the Gateway Authentication.
Mandatory: Yes
alerting > hierarchy > alert > critical > level > notification > user > infoType Copied
The user information to include on the distribution list, as defined in the authentication section, user information. Will default to “Email” if not set.
Mandatory: No
Default: Email
alerting > hierarchy > alert > critical > level > notification > user > list Copied
The distribution list to include the user on. Each notification has three distribution lists: To, CC, and BCC. Will default to “To” if not set.
Mandatory: No
Default: To
alerting > hierarchy > alert > critical > level > notification > group Copied
Defines group information that will be passed to the effect.
Groups have now been deprecated in favour of roles, as authentication user groups have also been deprecated for roles.
Please see the documentation for roles if you still want to configure groups.
Role, infoType, list, include.
Mandatory: No
Deprecated: See Roles settings.
alerting > hierarchy > alert > critical > level > notification > role Copied
Defines the Role information that will be passed to the effect. Any number of roles can be passed to an effect in one of three distribution lists, To, CC, and BCC.
Mandatory: No
alerting > hierarchy > alert > critical > level > notification > role > role Copied
The name of the Role, as defined in the Authentication section.
Mandatory: Yes
alerting > hierarchy > alert > critical > level > notification > role > infoType Copied
The Role information to include on the distribution list, as defined in the authentication section. Will default to “Email” if not set.
Mandatory: No
Default: Email
alerting > hierarchy > alert > critical > level > notification > role > list Copied
The distribution list to include the role on. Each notification has three distribution lists: To, CC, and BCC. Will default to “To” if not set.
Mandatory: No
Default: To
alerting > hierarchy > alert > critical > level > notification > role > include Copied
What information from the group to include in the list. Can be one of three values:
Mandatory: No
Default: ROLE
Value | Effect |
---|---|
ROLE | Include only the information from the actual role section. |
MEMBERS | Include the information from all the group's individual member user sections. |
ROLE+MEMBERS | Include information from both the group section and all the role's individual member user sections. |
alerting > hierarchy > alert > throttle Copied
Specify a throttle to apply to all notifications at this alert level, and all alert levels below this level unless overridden.
Mandatory: No
alerting > throttle Copied
Specifies an AlertThrottle for throttling alerts.
Mandatory: No
alerting > throttle > name Copied
This is a name to uniquely identify the throttle.
Mandatory: Yes
alerting > throttle > noOfAlerts Copied
This is the number of alerts allowed within the time interval.
Mandatory: Yes
alerting > throttle > per Copied
This is the number of time units used to define throttling duration. For example if you were setting a throttle of one action per ten minute interval. It would be “10”.
Mandatory: Yes
alerting > throttle > interval Copied
This is the time interval in use seconds, minutes or hours, allowing the throttle to be defined in number of alerts per interval.
Mandatory: Yes
alerting > throttle > grouping Copied
Groupings allow a throttle to keep different counters for different logical groups.
Mandatory: No
alerting > throttle > grouping > paths Copied
Groupings allow a throttle to keep different counters for different logical groups. Each group is defined by a collection of XPaths which are evaluated when the action is fired. See the Geneos XPaths for more information on XPaths.
Mandatory: No
alerting > throttle > grouping > paths > path Copied
Groupings allow a throttle to keep different counters for different logical groups. Each group is defined by a collection of XPaths which are evaluated when the alert is fired. There is also a default group to throttle items that do not match the grouping criteria.
The result of evaluating each of these XPaths are gathered together to uniquely identify the throttling group.
See the Grouping section for more information.
Mandatory: No
alerting > throttle > summary Copied
Defines when summary effects should be fired.
Mandatory: No
alerting > throttle > summary > send Copied
This is the number of time units after which the summary effect should be fired.
Mandatory: Yes
alerting > throttle > summary > interval Copied
This is the time interval in use seconds, minutes or hours.
Mandatory: Yes
alerting > throttle > summary > strategy Copied
Which strategy should be used? Fire the effect a configured time after the first allowed alert or a configured time after the first blocked alert.
Mandatory: Yes
alerting > throttle > summary > effect Copied
The effect to fire for an Alert summary.
Mandatory: Yes
Effects Copied
Introduction Copied
Effects are defined routines that can be performed by the gateway. They cannot be called directly but are called as part of an action or alert. An effect can be thought of as a cut down action.
Effects allow the gateway to interface with other external systems, so that monitoring data can trigger other events in addition to being displayed in ActiveConsole. For instance, effects can be created to send emails or pager messages.
Operation Copied
Effects are called by Actions and Alerts but is always run in the context of the specific data-item which caused the event, such as a Managed Variable that triggered a rule.
The value and other attributes of this item are then made available to the effect, which allows for an effect to have a customised operation depending upon these values. Depending upon the type of effect being fired, the values will be passed to the action in different ways. Please refer to the appropriate effect configuration section for further details. The information passed to an effect will also differ slightly depending on whether it is called by an action or alert.
Values passed to actions include the following:
- Data identifying the data-item and action or alert being fired.
- If the data-item is from a dataview table, then additional values from the dataview row.
- Any managed entity attributes which have been configured.
- In the case of effect called from actions, additional user data as configured in an environment.
- A list of knowledge-base articles which apply to the data-item.
Effect Configuration Copied
Basic Configuration Copied
Effects are configured within the Effects top-level section of the gateway setup. Configuration consists of a list of effect definitions, which specifies what will be done when the effect is fired. As effect are referenced by name in other parts of the gateway setup, each effect must have a unique name among all other effects to prevent ambiguity.
Script effects Copied
A script effect can run a shell-script or executable file. The minimum required configuration for this type of effect is the executable to run, the command-line (which may be empty, but must still be specified) and the location to run this effect.
Depending upon the configured runLocation, this effect will run either on the Gateway or Netprobe hosts. Netprobe effects will run on the Netprobe containing the data-item that triggered the effect, unless another Netprobe has been explicitly specified with the probe setting.
An effect run on Netprobe requires that a probes > probe > encodedPassword is specified in the probe configuration. If not specified, the Netprobe will return the error: “Remote Command not executed
- invalid password”. If there is no password to configure, run the Netprobe with -nopassword flag to avoid this problem.
For an effect which executes on the Gateway, the value of the exeFile setting is checked to ensure that the executable is accessible by the Gateway. If this is not the case, the Gateway will be unable to execute the action and a setup validation error is produced. If an absolute path to the executable is not specified, the Gateway prepends ./ to the path.
Note
This validation cannot be performed by actions which run on Netprobe.
The behaviour for Effects changed in Geneos v4.7 to provide consistent behaviour between effects run on the Gateway and on the Netprobe. The default shell is now used to run script effects on the Gateway. As an alternative to running a script, an action can execute a user-defined command that is set to run an external program. In this scenario, the program does not use the shell.
Some factors to consider whether to use the shell to run a script (or other external program) are:
- The shell enables full control of the arguments passed to the script and allows redirection of the output.
- The shell involves the cost of an additional process.
Note
This behaviour change also applies for Actions. See Script actions.
Below is an example of an effect with quoted arguments and a redirect. When executed, it writes [He][said]["Hello, World"]
to action.log
.
When executing a script effect, the script / executable being run is passed the values and attributes of the data-item which triggered the alert or action that called the effect. These are passed as environment variables, which the script can then read and respond as required. The environment variables which are passed are listed in the actions and alert sections.
Shared library Copied
Shared library effects execute functions from within a shared library. Library effects are more versatile than script effects since they can store state between different executions of an effect, however they also require more effort on behalf of the user to create.
Library effects currently only execute on the gateway, and require a minimum of the library file and function name to be configured. Like script effects, these settings name are checked by gateway during setup validation to ensure the function can be found, so that an invalid configuration is detected immediately rather than when the effect is run.
Shared library functions must have the following prototype (similar to the main
function of a basic C program).
extern "C" int functionName(int argc, char** argv);
When a library effect is executed, the values and attributes of the data-item which triggered the action or alert are passed to it. These are passed as an array of strings of the form NAME=VALUE
in the argv
parameter. The number of values passed is given in the argc
parameter. These variables are listed in the actions and alert sections.
See the Shared library section for an example of a shared library action.
Command effects Copied
Command-type effects can run any command supported by gateway. These commands are referenced by name (as commands are uniquely named) and the configuration must supply all arguments expected by the command in order to be valid. The number and type of arguments expected will vary according to the command being referenced.
Arguments can be specified with a static value, text value or a parameter value. A static value will have the same value every time the effect is executed. A text value will have variable value depending upon the values of the Geneos variables (evaluated to the command target data-item environment). The parameter value configuration allows users to select a variable value, which will be populated from the data-item which triggered the alert or action, similar to environment variables passed in by the actions and alert.
An example is shown below using the /SNOOZE:time
command.
This command snoozes a data-item for a specified time period, and takes arguments as specified in the table below:
Index | Description | Argument Type | (Default) Value |
---|---|---|---|
1 | User comment | User input: MultiLineString | |
2 | Item severity | XPath | state/@severity |
3 | Snooze type | Static | time |
4 | Snooze duration | User input: Float | 24 |
5 | Snooze units | User input: Options | Hours (3600 - default), minutes (60), days (86400). |
Of the arguments listed, three are user-input arguments - those at indexes 1, 4 and 5. To execute the command, these arguments must have values specified. For this command arguments 2 and 3 have defaults specified, and so will take these values if they are not overridden.
Configuration Settings Copied
Effects are configured in the Effects
top-level section of the gateway setup. Configuration consists of a list of effect definitions, each of which contains the minimum required configuration for their type. Each effect is identified by a user-supplied name, which must be unique among all other effect to prevent ambiguity, as effects are referenced by name.
Mandatory: Yes
Common effect settings Copied
The settings below are common to all types of effect.
effects > effect Copied
An effect definition contains the configuration required for a single effect. The minimum configuration required will vary depending upon the type of effect being configured.
effects > effect > name Copied
Effects are referenced by other parts of the gateway setup by name. In order to avoid ambiguity, effects are required to have a unique name among all other effects.
Mandatory: Yes
Script effect settings Copied
The settings below define a script type effect.
effects > effect > script Copied
Script type effects allow the gateway to run a shell-script or executable file in response to gateway events. See the script effects section above for more details.
Mandatory: Yes (one of script, sharedLibrary or command must be specified for an effect)
effects > effect > script > exeFile Copied
Specifies the shell script or executable file which will be run when the effect is called. For script effects which run on gateway (configured using the runLocation setting) this parameter is checked at setup validation time to ensure that the file exists.
Mandatory: Yes
effects > effect > script > arguments Copied
This setting specifies the command-line arguments which will be passed to the script or executable when the effect is called.
Mandatory: Yes
effects > effect > script > runLocation Copied
The run location specifies where the script should be run. Valid values are detailed below.
Mandatory: Yes
Value | Effect |
---|---|
gateway | The script is run on the gateway. |
netprobe | The script is run on the netprobe from which the triggering data-item came, unless this is overridden using the probe setting. An effect run on Netprobe requires that probes > probe > encodedPassword is specified in the probe configuration. |
effects > effect > script > probe Copied
This setting allows users to configure a specific netprobe to run the script on, when the script has been configured to run on netprobes using the runLocation setting.
Mandatory: No
Shared library effect settings Copied
The settings below define a shared library type effect.
effects > effect > sharedLibrary Copied
Shared library type allow the gateway to run a function from a shared library in response to gateway events.
Mandatory: Yes (one of script, sharedLibrary or command must be specified for an effect)
effects > effect > sharedLibrary > libraryFile Copied
Specifies the location of the shared library to use for this effect. This setting is checked during setup validation to ensure that gateway can access this library.
Mandatory: Yes
effects > effect > sharedLibrary > functionName Copied
Specifies the name of the shared library function to run, inside the library specified by the libraryFile setting. This setting is checked during setup validation to ensure that this function exists within the shared library.
Mandatory: Yes
effects > effect > sharedLibrary > runThreaded Copied
Optional Boolean setting specifying whether to run the shared library function within a thread or not. Running an effect in a thread is slightly less efficient but is recommended for library effects which take some time to complete, to ensure that execution does not interfere with normal gateway operation.
Note
Shared library functions using this setting which maintain state should be written to be thread-safe to avoid potential problems.
Mandatory: No
Default: true
effects > effect > sharedLibrary > staticParameters Copied
Defines static parameters which are always passed to the shared library function along with any values passed in by the action or alert.
Mandatory: No
effects > effect > sharedLibrary > staticParameters > staticParameter > name Copied
Name of static parameter. This must be unique, if a parameter of the same name is passed in by the calling action or alert then this setting will be overridden.
Mandatory: Yes
effects > effect > sharedLibrary > staticParameters > staticParameter > value Copied
Value of static parameter.
Mandatory: Yes
Command effect settings Copied
The settings below define a command type effect.
effects > effect > command Copied
Command type actions allow the gateway to run an internal or user-defined command in response to gateway events. See the command section above for more details.
Mandatory: Yes (one of script, sharedLibrary or command must be specified for an effect)
effects > effect > command > ref Copied
This setting specifies which command will be executed when this command type effect is fired. Commands are referenced using the unique command name.
Mandatory: Yes
effects > effect > command > args Copied
This section allows the action to supply arguments to the command. If a command has any arguments without default values, then these must be specified so that the command can be run. This condition is checked during setup validation.
Mandatory: No (unless the command has arguments without default values)
effects > effect > command > args > arg Copied
Each arg definition specifies a single argument to pass to the command.
effects > effect > command > args > arg > target Copied
The target setting specifies which argument in the command this definition applies to. Command arguments are numbered from one. E.g. A target value of four means that the contents of this definition will be supplied as the fourth argument to the specified command.
Mandatory: Yes
effects > effect > command > args > arg > static Copied
Specifies a static value for the command argument. This value will be the same for all executions of this effect.
Mandatory: Yes (mutually exclusive with text or parameter below)
effects > effect > command > args > arg > text Copied
A variable argument value for the command. This can include static text or Geneos variables which will be evaluated to their respective values depending upon the target data-item the command is being executed on. Example: if a Geneos variable “OS
” is defined with different values at 2 different Managed Entities, and the command is run on both these Managed Entities data-items, then both command instances will get different value of “OS
” depending upon the Managed Entity data-item it is being run on. The argument type is singleLineStringVar and can consist of static data and/or any number of Geneos variables interleaved together with/without static data. E.g. “Host:$(OS)-$(VERSION)
” where “OS
” and “VERSION
” are 2 pre-defined Geneos variables. Currently only the following variables values can be properly converted to string:
Mandatory: Yes (mutually exclusive with static or parameter below)
Variable Type | Value |
---|---|
boolean | "true" is checked, "false" otherwise |
double | The actual double value specified |
integer | The actual integer value specified |
externalConfigFile | The name of an external configuratin file. |
macro | The value of the macro selected - gateway name or gateway port or managed entity name or probe host or probe name or probe port or sampler name. |
effects > effect > command > args > arg > parameter Copied
Specifies a parameterised value for the command argument. This value is obtained from the data-item which triggered the action or alert that called the effect, and so can change on every execution. Possible values are listed below.
Mandatory: Yes (mutually exclusive with static or text above)
Value | Effect |
---|---|
action | The name of the action being triggered. |
severity | The data-item severity. One of
UNDEFINED , OK , WARNING , CRITICAL or USER . |
variablePath | The full gateway path to the data-item. |
gateway | The name of the gateway firing the action. |
probe | The name of the probe the data-item belongs to (if any). |
probeHost | The hostname of the probe the data-item belongs to (if any). This value is provided for backwards compatibility. |
managedEntity | The name of the managed entity the data-item belongs to (if any). |
sampler | The name of the sampler the data-item belongs to (if any). |
dataview | The name of the dataview the data-item belongs to (if any). |
variable | Short name of the data-item if it is a
managed variable, in the form <!>name for
headlines or row.col for table cells.
This value is provided for backwards
compatibility. |
repeatCount | The number of times this action has been repeated for the triggering item. |
Annotations Copied
Annotations solve the following problems.
Conditional text in email templates Copied
libemail like any Geneos shared lib takes a number of parameters.
Anywhere you can use a standard parameter you can use an annotation.
The parameters are setup with the action or effect defition. Additionally you could add user data from a rule that causes it to fire.
However if you want parameters available only on particular dataitems or to contain different text on different dataitems while keeping your configuration simple annotations are the answer.
With annotations you can define key/value pairs and target them to dataitems for use in your email templates for alerting and actions.
Optional environment settings for executable actions Copied
Similarly you can reduce complexity in action configuration. The annotations defined in this section will become environment variables to an executable script triggered through actions or effects.
Note
Annotations are added before user data and rule specific variables. This could lead to annotations being overwritten or on some platforms duplicates. For this reason it is advised that the keys are unique from environment variables or parameters for shared libraries.
annotations > annotation > name Copied
A unique name for the annotation within the configuration.
annotations > annotation > key Copied
This is the name the annotation will be known as when substited for an environment variable or a static command argument. This does not have to be unique and if more than one annotation applies with the same key name then the values are combined.
The values are separated by new lines and the order is undefined. If ordering is important then separate annotations should be used to enforce order.
annotations > annotation > value Copied
The text to be substituted for the given key.
annotations > annotation > targets Copied
A collection of xpaths used to identify applicable targets for the annotation.
annotations > annotation > targets > targets Copied
An xpath identifying targets for the annotation.