How to monitor a log file for X number of errors within a specific timeframe using the FKM sampler
In some instances, it is okay to monitor a log file via FKM where the trigger is shown a certain number of times before it should send an alert or set the status to critical. This can be achieved by using a combination of rules and actions.
These are the following steps:
- Create the FKM sampler.
In the Advanced tab, be sure to enable Trigger count and Seconds since last trigger as this will be essential for your occurrence and time-based monitoring:
For more information, see FKM Overview and FKM Configuration Guide.
- Create a few actions which the rule can run.
For this example, 3 actions have been set up.
AcceptFile- accept the file to reset monitoring. For more details, see FKM plug-in.AssignToMeUnassign
These actions are used as a sample action for when the conditions are satisfied and no longer satisfied. Theoretically, you can set up your own actions, be it scripts, internal (Active Console) commands, or Linux commands. You can find out more from this guide.
- Create rules to run the actions.
Ensure to set the path of the initial value to be monitored. Based on the current setup, the following data view is shown:
The cell secondsSinceLastTrigger is monitored initially.
A path alias found on Advanced tab for “triggerCount” was also added so that this value can be used to evaluate in rules:
The following rule block is defined to achieve the requirement:
if value > 60 and path "triggerCount" value < 5 then
severity ok
run "Action-AcceptFile"
run "Action-Unassign"
elseif value < 60 and path "triggerCount" value > 5 then
severity critical
run "Action-AssignToMe"
endif
The value refers to “secondsSinceLastTrigger,” or the time, path “triggerCount” refers to the occurrences. If a file is being monitored and it has been more than 60 seconds where the keyword trigger has occurred less than 5 times, the system will “Accept” the file so that the previous trigger details displayed on the data view are cleared, as shown in the succeeding screens below:
If the triggerCount or number of occurrences exceeds 5 and the time is within 60, then the actions are run, in this case, Assign to me:
This can be changed to send out alerts. The status may remain in this state until the necessary actions, including accepting the file to clear the data view, are taken.