Email Alert not Received or Delayed
Related topics Copied
Email alert not sent, email action not sent, email action not triggering, email alert delay, email action delay, emails arriving late, email not arriving on time, email alert not received
Problems Copied
-
Problem 1 - User sets up a rule that calls an email action to run depending on the severity threshold. User notices the rule target hit the severity threshold but they didn’t receive the email alert.
-
Problem 2 - User notices email action called by rule is getting delayed and not sent exactly at the time the issue occurs.
Possible causes Copied
To help determine which of the causes listed below is responsible for the problem you are experiencing first, find out which rule generated the email alert. You can do that by right-clicking on the cell or headline cell that has the severity set, and select the Show Rules option in the right-click menu.
This will bring up the Output window, where you will see the name of the rule and highlighted in yellow the rule logic the data item in question met. Once you confirm the intended rule logic and action is highlighted, search the email action name in the gateway log and check if you see any of the following INFO: ActionManager messages.
Problem 1 Copied
- Possible Cause 1 - If you see the below
INFO: ActionManagermessages it means the critical severity happened after a configuration change. By default, actions are not fired when a rule condition is true after a configuration change.
INFO: ActionManager Action DataItem 'Email Alert' generated (variable=/geneos/gateway[(@name="gateway\_name")]/directory/probe[(@name="netprobe\_name")]/managedEntity[(@name="managed\_entity\_name")]/sampler[(@name="sample\_name")][(@type="")]/dataview[(@name="Log Files")]/rows/row[(@name="example\_log.txt")]/cell[(@column="status")])
INFO: ActionManager Action 'Email Alert' would have fired, but stopped as this was as a result of a configuration change
- Possible Cause 2 - If the gateway or netprobe were just started, by default the action will not trigger to avoid false alerts. The gateway log will show the following
INFO: ActionManagermessages if this is the case.
INFO: ActionManager Action DataItem 'Email Alert' generated (variable=/geneos/gateway[(@name="gateway\_name")]/directory/probe[(@name="netprobe\_name")]/managedEntity[(@name="managed\_entity\_name")]/sampler[(@name="sample\_name")][(@type="")]/dataview[(@name="Log Files")]/rows/row[(@name="example\_log.txt")]/cell[(@column="status")])
INFO: ActionManager Action 'Email Alert' would have fired, but stopped as this was during the startup of a component
- Possible Cause 3 - By default, an action will trigger if the target of the rule and all its ancestors (meaning any levels above it like the managed entity and dataview for example) are not snoozed. If you see the below
INFO: ActionManagermessages it means the data item or one of its ancestors has been snoozed, causing this action not to trigger.
INFO: ActionManager Action DataItem 'Email Alert' generated (variable=/geneos/gateway[(@name="gateway\_name")]/directory/probe[(@name="netprobe\_name")]/managedEntity[(@name="managed\_entity\_name")]/sampler[(@name="sample\_name")][(@type="")]/dataview[(@name="Log Files")]/rows/row[(@name="example\_log.txt")]/cell[(@column="status")])
INFO: ActionManager Action 'Email Alert' did not fire because variable '/geneos/gateway[(@name="gateway\_name")]/directory/probe[(@name="netprobe\_name")]/managedEntity[(@name="managed\_entity\_name")]/sampler[(@name="sample\_name")][(@type="")]/dataview[(@name="Log Files")]' is snoozed
- Possible Cause 4 - If you see the below
INFO: ActionManagermessage, it means the action did not trigger because dataview was created and transitioned from an undefined to okay severity.
INFO: ActionManager Action DataItem 'Email Alert' generated (variable=/geneos/gateway[(@name="gateway\_name")]/directory/probe[(@name="netprobe\_name")]/managedEntity[(@name="managed\_entity\_name")]/sampler[(@name="sample\_name")][(@type="")]/dataview[(@name="Log Files")]/rows/row[(@name="example\_log.txt")]/cell[(@column="status")])
INFO: ActionManager Action 'Email Alert' would have fired, but stopped as this was as a result of transition from undefined to OK state
- Possible Cause 5 - If the Gateway log shows the Netprobe goes down or disconnects, the email action will be removed and is no longer sent.
- Possible Cause 6 - If the Gateway log shows the below INFO: ActionManager messages saying the action has fired as seen below, first you want to check what Gateway version and CentOS/RHEL version the server is running. If the server is using CentOS/RHEL 8, the Gateway version is lower than GA5.9.x, and the
/var/log/maillogdoes not show any entries of the email being sent from the server; then the cause is the Gateway version you are using. We recently found the CentOS/RHEL 8 system version oflibcrypto.so.1.1conflicts with the one shipped with the Gateway.
INFO: ActionManager Action DataItem 'Email Alert' generated (variable=/geneos/gateway[(@name="gateway\_name")]/directory/probe[(@name="netprobe\_name")]/managedEntity[(@name="managed\_entity\_name")]/sampler[(@name="sample\_name")][(@type="")]/dataview[(@name="Log Files")]/rows/row[(@name="example\_log.txt")]/cell[(@column="status")])
INFO: ActionManager Firing action 'Email Alert'
2021-09-10 16:50:51.085-0400 INFO: ActionManager Finished executing '/export/home/itrs/geneos-utils-master/system/scripts/email.pl' with arguments ''.
2021-09-10 16:50:51.085-0400 INFO: ActionManager Completed action 'Email Alert', Exit code: 0
- Possible Cause 7 - If the Gateway log shows the email action completed with an exit code other than 0, please note this is coming from the email script.
Problem 2 Copied
- Possible Cause 1 - If the highlighted Show Rules output shows a delay in the highlighted block, this indicates the rule itself has a delay causing the action not to get triggered right away.
- Possible Cause 2 - If the highlighted Show Rules output references a throttle, this means the action itself is being throttled. To read more about Throttles, see Action Throttling.
Possible solutions Copied
Possible solutions to Problem 1 Copied
- Possible Solution to Cause 1 - To change the default behavior so an action fires following a configuration change, go to the main Actions folder in the Gateway Setup Editor and click on Advanced tab and check mark the setting called Fire on configuration change. Please note this setting affects all actions.
- Possible Solution to Cause 2 - To change the default behavior and have an action trigger right after the Gateway or Netprobe startup, go to the main Actions folder in the Gateway Setup Editor and click on Advanced tab and check mark the setting called Fire on component startup. Please note this setting affects all actions.
- Possible Solution to Cause 3 - This default behavior can be changed from the Action record itself by going to the Action’s Advanced tab and changing the Snoozing setting from the default Fire if item and ancestors not snoozed to Always Fire or Fire if item not snoozed.
To learn more about the difference of each setting, check our documentation. Please note this setting will affect all rules that call this action.
- Possible Solution to Cause 4 - To change this default behavior to allow an action to be fired as a result of a dataview item being created and transitioning from undefined to OK severity, go to the main Actions folder in the Gateway Setup Editor and click on Advanced tab and check mark the setting called Fire on create with ok severity. Please note this setting affects all actions.
- Possible Solution to Cause 5 - When the Netprobe instance comes back up you should see the email alert is generated in the Gateway log.
- Possible Solution to Cause 6 - If you are unable to upgrade your Gateway to version GA5.9.x or higher as a workaround, you may remove or rename the
libcrypto.so*files in the Gatewaylib64directory so that the Gateway picks up the system version. However, this may have consequences to other functionalities of the Gateway. - Possible Solution to Cause 7 - In this case we recommend trying to execute the email script directly from the Gateway server to confirm if the script exits with the same code. If it does, you will need to troubleshoot and fix the email script.
Possible solutions to Problem 2 Copied
- Possible Solution to Cause 1 - You can go back to the rule record and remove the delay from the rule block and save the change. The next time the rule triggers you should no longer see the delay taking place. You can read more about rule delay in our documentation.
- Possible Solution to Cause 2 - You can go back to the rule record and remove the throttle reference from the rule block and save the change. The next time the rule triggers, the action should fire immediately.
Related documentation Copied
Gateway Rules, Actions, and Alerts