About the Dynamic Thresholds app

Note

The Dynamic Thresholds app version 1.7.0 requires:

  • Web Console version 3.9.0 or later
  • ITRS Analytics Platform version 2.18.0 or later

The Dynamic Thresholds app is designed to help you monitor your systems more intelligently and reduce alert fatigue. Traditional static thresholds often generate unnecessary alerts when normal fluctuations occur, overwhelming teams and diverting attention from real issues. In contrast, the Dynamic Thresholds app provides adaptive, data-driven anomaly detection by learning from historical data and automatically adjusting to expected behavior. This ensures that alerts focus on statistically significant deviations, allowing you to respond only to genuine anomalies.

Use the Dynamic Thresholds app to set up and manage dynamic thresholds for various metrics. The app leverages a deviation model that learns the historical behavior of your metrics, automatically establishing and adjusting flexible upper and lower boundaries.

The app also offers:

Dynamic Thresholds main screen

From the app’s main screen, you can:

Define a new dynamic threshold configuration Copied

Tip

Watch this product demo tour in full screen to learn how to create a dynamic threshold configuration for your metrics.

Dry run results Copied

Once a metric is selected, a chart is automatically generated on the right-hand side of the screen. This chart shows the metrics and thresholds for any matching entity within the last 24 hours and provides a comprehensive view of your metric behavior.

This chart also displays:

Use this chart to preview the behavior of your dynamic threshold configuration. Changing the selected metric or adjusting the thresholds and training window sliders will automatically update the chart.

Dynamic Thresholds preview chart

Click Analyse Results to simulate the configuration for matching entities. The Dry Run Results table then gives you an overview of how the configuration impacts most entities. You can click on a row to preview the dynamic threshold behavior for a specific entity and adjust your configuration based on these results.

Dynamic Thresholds dry run results

When you adjust your configuration, a banner will appear indicating that the results are no longer valid. Click Refresh Results to show the updated results based on the new configuration.

Dynamic Thresholds refresh results

Import and export configurations Copied

The Dynamic Thresholds app supports importing and exporting configurations in JSON format. This is useful for moving configurations across different environments.

Import Copied

With the Import feature, you can:

When importing files, you may encounter the following status messages:

Valid

The configuration has been successfully validated and is ready for import.

Uploaded

The configuration has been successfully uploaded to the server.

Conflict

The configuration file is valid, but the filename conflicts with an existing configuration. You can choose to override the existing file.

Error

The file could not be imported due to invalid JSON or failed ThresholdConfig validation.

Failed

The file upload to the server failed.

Oversized

The file exceeds the 2MB size limit.

Skipped

The file was skipped because the maximum of 100 configurations was reached.

Export Copied

With the Export feature, you can:

Threshold metric staleness Copied

When a dynamic threshold configuration is enabled, the app publishes threshold metrics for every matching entity. These metrics flow through the platform like ordinary source metrics and contribute to entity activity. As a result, threshold metrics can keep entities appearing active in Entity Viewer even when the original source metric has stopped producing data.

Helm settings Copied

Each dynamic threshold configuration defines a refreshInterval that controls how often the data source asks the platform for fresh aggregation data. When that refresh runs, the platform’s latest-seen timestamps for source metrics are applied to the cached threshold metric. That refresh is the point at which the cached metric’s timestamp is updated from live source activity.

The Helm value thresholdMetricStaleAfter sets how long the app continues to re-publish a cached threshold metric after the underlying source metric stops sending new data. When that duration is exceeded, the cached threshold metric is evicted and is no longer sent.

thresholdMetricStaleAfter is a signal generator daemon setting defined under the daemon key in the signal generator Helm chart values (for example in values.yaml or a values override file). The chart then renders it into the deployment ConfigMap. These values use ISO 8601 duration format (for example PT5M for 5 minutes, PT1H for 1 hour).

Setting Default Role
daemon.thresholdMetricRefreshInterval PT30S How often the daemon re-publishes cached threshold metrics and runs staleness checks.
daemon.thresholdMetricStaleAfter PT1H Maximum age of a cached threshold metric before it is evicted and no longer re-sent.

thresholdMetricStaleAfter must be significantly larger than the largest refreshInterval across all threshold configurations. If thresholdMetricStaleAfter is equal to or smaller than a configuration’s refreshInterval, a race can occur, wherein the staleness logic may evict the metric and clear its signal just before the next scheduled refresh repopulates the cache, even though the source is still healthy.

It is recommended to set thresholdMetricStaleAfter to at least three times the largest refreshInterval.

thresholdMetricStaleAfter Example refreshInterval Safe?
PT1H (default) 300s (5 minutes) Yes — ample margin
PT15M 300s Yes
PT5M 300s No — same window; risk of spurious clears
PT10M 600s (10 minutes) No — same window

Configuring thresholdMetricStaleAfter Copied

You can configure this setting in three ways:

Note

Replace <namespace> with your ITRS Analytics namespace (often itrs). Replace chart names, repository URLs, and release names with those your organization uses for the signal generator.

Option A: Helm upgrade with --reuse-values Copied

This updates the value for the current release. The override is kept across subsequent helm upgrade --reuse-values runs, but is lost if helm upgrade is run without --reuse-values (for example, during a fresh install or a CI job that supplies its own values file).

helm upgrade --install iax-app-signal-generator itrs-snapshots/iax-app-signal-generator \
  --devel --reuse-values -n <namespace> \
  --set daemon.thresholdMetricStaleAfter=PT5M

Restart the deployment so the daemon picks up the new ConfigMap:

kubectl rollout restart deployment/iax-app-signal-generator -n <namespace>
kubectl rollout status deployment/iax-app-signal-generator -n <namespace>

Option B: Direct ConfigMap edit Copied

Use this for a fast change without going through the chart, for example when testing or debugging.

Warning

Editing the ConfigMap directly is temporary. The next helm upgrade usually rebuilds the ConfigMap from values.yaml and overwrites your changes.
kubectl edit configmap iax-app-signal-generator -n <namespace>

Find the thresholdMetricStaleAfter entry under grpcServices and set the value, for example:

        thresholdMetricStaleAfter: PT5M

Save the file, then restart the deployment:

kubectl rollout restart deployment/iax-app-signal-generator -n <namespace>
kubectl rollout status deployment/iax-app-signal-generator -n <namespace>

Option C: Values override file Copied

This is the recommended method, as it persists all changes across future Helm upgrades.

Add the daemon block to values override file of your deployment or to the chart values.yaml in source control:

daemon:
  thresholdMetricStaleAfter: PT5M

Deploy with your override file:

helm upgrade --install iax-app-signal-generator itrs-snapshots/iax-app-signal-generator \
  --devel -n <namespace> -f custom-values.yaml

Restart the deployment if the chart does not roll pods automatically.

Verifying the setting Copied

Check the value stored on the release by running:

helm get values iax-app-signal-generator -n <namespace> -a | grep "thresholdMetricStaleAfter"

Confirm that the daemon started with the expected configuration by running:

kubectl logs deployment/iax-app-signal-generator -n <namespace> \
  | grep -i "staleAfter\|Resending thresholds"

Verifying that eviction is working Copied

Tail logs and watch Send batch lines. Batch sizes should shrink and eventually stop once stale metrics are evicted.

kubectl logs -f deployment/iax-app-signal-generator -n <namespace> \
  | grep -E "Send batch|Evicted stale"

With DEBUG logging enabled, you should also see Evicted stale threshold metric for each evicted entity or metric. To raise the log level via Helm:

helm upgrade --install iax-app-signal-generator itrs-snapshots/iax-app-signal-generator \
  --devel --reuse-values -n <namespace> \
  --set loglevel.app=DEBUG

Signals when a threshold metric is evicted Copied

When a cached threshold metric is evicted because thresholdMetricStaleAfter has been exceeded, the app publishes a clear so the dynamic threshold signal moves to severity NONE, with a message indicating that the signal was cleared due to source metric staleness. You do not need to wait for entity age-out alone to drop a stale severity.

Note

After the app clears a signal to NONE, the visible signal updates immediately. In some cases, when the source metric resumes and values are within thresholds, the Data Pipeline Daemon may not immediately re-publish an OK state and the signal can remain at NONE until the metric next crosses a threshold (for example, Warning or Critical).

Troubleshooting Copied

["ITRS Analytics"] ["ITRS Analytics > Dynamic Thresholds"] ["User Guide"]

Was this topic helpful?