Preflight checks
What are preflight checks? Copied
Preflight checks are automated validation tests that run before installing or upgrading ITRS Analytics. They examine your Kubernetes cluster environment to identify potential issues that could affect deployment success, system stability, or performance.
- Storage configuration and performance
- Cluster resource availability
- Backup and snapshot support
- Service mesh compatibility
- Network and security settings
Each check evaluates specific conditions and reports results as pass, warning, or error. Some checks are informational and allow you to proceed with caution, while others are strict (blocking) and must be resolved before installation can continue.
When to use preflight checks Copied
Preflight checks run automatically in the following scenarios:
- Before deploying ITRS Analytics for the first time
- When updating to a new version of ITRS Analytics
- After modifying settings in the KOTS Admin Console
Although preflight checks run automatically during installation and upgrade, you can also run them manually before installation to validate your environment in advance.
To do this, use the Troubleshoot preflight tooling. In summary, you must:
- Download the
preflightbinary, or install it as akubectlplugin by using Krew. - Run the preflight application manually by using
curlor Krew.
This approach is useful when you want to identify and address environment issues before starting the actual ITRS Analytics installation. Preflight checks help you avoid problems by catching issues early.
- Identify insufficient resources or incompatible configurations before they cause installation to fail.
- Detect storage latency, throughput, or stability issues that could degrade system performance.
- Verify backup and snapshot capabilities are properly configured to safeguard your data.
- Catch problems in minutes rather than discovering them hours into a deployment.
By addressing preflight check findings before proceeding, you ensure a smoother deployment experience and a more stable, reliable ITRS Analytics environment.
How preflight checks work in ITRS Analytics Copied
When you initiate an installation or upgrade through the KOTS Admin Console, preflight checks execute automatically in the background. Here’s what happens:
- Each check runs against your Kubernetes cluster, testing specific requirements.
- Results are displayed in the Admin Console with clear pass/warning/error indicators.
- Failed checks include detailed error messages and recommended actions.
- Depending on the severity, you can either proceed (for warnings) or must resolve issues (for blocking errors).
Warning
Please do not skip or intentionally ignore blocking preflight errors unless you fully understand the impact or are acting under direct guidance from ITRS. The bypass option in the KOTS Admin Console and CLI is intended only for exceptional cases, such as a known preflight defect or a supervised deployment. Ignoring blocking errors can cause the ITRS Analytics installation or upgrade to fail.
Some preflight checks, such as the Disk I/O performance check, take a long time to complete. They can be skipped by toggling the checkboxes Advanced Settings > Preflight and Support Bundle Settings in the Admin Console.
You can skip this check if alternative performance testing tools are available, or if ITRS Analytics is being reconfigured incrementally over short periods (within minutes or hours), where significant disk degradation is not expected. ITRS Analytics requires highly performant disks to operate well.
Preflight check types Copied
The following sections describe the specific preflight checks that ITRS Analytics performs and how to interpret and address their results.
Preflight checks for Trident-based storage Copied
This ensures that the assigned storage class meets best practices and avoids potential issues that could impact stability and performance.
-
Check the storage class associated with each workload. If the provisioner is
csi.trident.netapp.io, validate the parameters for the presence of any of the identified issues. -
The
backendTypeis set toontap-nas-economy. According to Trident documentation,ontap-nas-economyis not recommended for production. -
The
provisioningTypeis set tothin. When configured asthin, there is no strict guarantee that the requested storage will always be available. -
The
snapshotsparameter must be set tofalse. If set totrue, this setting can lead to rapid disk utilization increases, potentially reaching 100% due to snapshot storage. -
Recommended actions:
- If any of these issues are detected, update the storage class parameters accordingly.
- Consult Trident documentation for best practices on configuring backend storage.
- Monitor storage utilization closely, especially when enabling snapshots.
Preflight checks for Disk I/O performance Copied
The preflight Disk I/O performance check is designed to validate the performance of all configured storage classes using the fio (Flexible I/O Tester) benchmarking tool. The objective is to identify latency, throughput, or stability issues before they impact application workloads.
Note
This preflight test usesfioto measure disk sync latency and IOPS (Input/Output Operations Per Second) by creating short-lived PVCs for each configured storage class. Enabling the Disk I/O Performance test can considerably extend the overall runtime of the preflight checks.
For each storage class, the test suite runs four predefined fio profiles, designed to simulate a broad range of I/O workloads:
- Random reads
- Random writes
- Single-path access patterns
- Multi-path access patterns
Each fio profile is configured to run for 60 seconds, resulting in a minimum test duration of 4 minutes per storage class (for example, 4 tests x 60 seconds x the number of storage classes). The total runtime of the preflight check scales linearly with the number of configured storage classes in the environment.
Each storage class is evaluated against the following key Disk I/O performance metrics, along with their associated thresholds and severity levels.
| Metric | Warning Threshold | Error Threshold | Interpretation |
|---|---|---|---|
| Data Sync Latency (99th percentile) | > 10 ms | > 100 ms | High latency indicates potential queuing or slow disk response |
| IOPS (Read/Write Average) | < 3000 | < 2500 | Lower IOPS values suggest inadequate throughput |
| Coefficient of Variation (IOPS) | ≥ 10% | ≥ 20% | High CV suggests unstable or erratic disk performance |
Non-strict preflight mode Copied
Preflight checks for both Warning and Error levels are currently configured in a non-strict mode. This means they will not block the installation process, allowing for greater flexibility during deployment.
Managing settings for preflight and support bundle Copied
The Disk I/O performance check is enabled by default for both preflight checks and support bundles. The configuration options for both features are available in the KOTS Admin Console under Advanced Settings > Preflight and Support Bundle Settings.
Preflight checks Copied
During system reconfiguration, the check is automatically skipped if an existing installation is detected, unless it has been explicitly enabled through configuration.
To disable this, clear the selection for the Run Disk I/O Performance Test checkbox.
Support bundle Copied
This check is managed independently using the Include Disk I/O Test for Support Bundle option. When enabled, it adds the disk performance test to the support bundle, helping capture disk latency and throughput.
It operates independently of the preflight check configuration and remains enabled by default unless explicitly disabled. To modify this setting, simply select or clear the corresponding checkbox.
Preflight checks for CSI VolumeSnapshot support in StorageClass Copied
Ensure that a given StorageClass in a Kubernetes cluster supports CSI VolumeSnapshots, which are essential for consistent backups. This check must be performed prior to initiating any backup operations to ensure support and backup consistency.
Note
Support for backup and restore in ITRS Analytics is disabled by default. You can modify the Enable IAX backup and restore setting located under Advanced Settings > Backup and Restore in the KOTS Admin Console. For more information, see Backup and restore documentation.
Strict preflight check for CSI VolumeSnapshot support Copied
Beginning with ITRS Analytics version 2.12.6, the preflight check for CSI VolumeSnapshotClass support has been made strict (blocking) to ensure consistent and reliable backup and restore functionality. To prevent potential data loss and restore failures, this preflight check blocks installation if no supported VolumeSnapshotClass is detected.
If a compatible VolumeSnapshotClass is not found, the installation will be blocked unless you resolve the issue or explicitly disable backups in the KOTS Admin Console under Advanced Settings > Backup and Restore.
Conceptual flow and example Copied
-
Retrieve the provisioner or driver for the selected
StorageClass(for example,ebs-sc).kubectl get storageclasses.storage.k8s.io ebs-sc -o jsonpath='{.provisioner}' -
Check for matching
VolumeSnapshotClass.kubectl get volumesnapshotclass -o json | jq '.items[] | select(.driver == "<provisioner>") | .metadata.name'
If no VolumeSnapshotClass is found that matches the provisioner of the specified StorageClass, the following error is displayed:
Error
The following StorageClasses do not support CSI VolumeSnapshots:<SC Name (provisioner: <provisioner>)>. Without CSI VolumeSnapshot support, consistent backups cannot be guaranteed and are not supported. To proceed with the installation, you must resolve this issue or disable the backup option in Advanced Settings under Backup and Restore.
Preflight check to verify available pod capacity Copied
This preflight check verifies that your Kubernetes cluster has enough available pod slots to successfully deploy ITRS Analytics.
The check ensures that the cluster can accommodate all required IAX components without exceeding its pod capacity or running into scheduling issues. Insufficient pod availability may lead to failed or incomplete deployments.
How it works Copied
-
The preflight calculates the number of pods required based on the selected cluster size configuration.
-
It then compares this value against the current available pod capacity in the Kubernetes cluster.
-
If the available capacity is insufficient, the deployment will not proceed until resources are freed or the cluster capacity is increased.
Note
This is a strict preflight check. Deployment is blocked if the cluster lacks enough pod slots. The check prevents resource contention and ensures a stable, fully functional ITRS Analytics deployment.
Recommended action Copied
If the check fails:
- Verify your cluster’s node and pod limits.
- Scale up your cluster or free up unused resources.
- Re-run the deployment after ensuring sufficient capacity.
Preflight check to validate Linkerd compatibility with native sidecars Copied
Beginning with ITRS Analytics 2.16.0, workloads that use native sidecars require a supported version of Linkerd.
The Linkerd native sidecars support preflight check verifies that clusters running Linkerd are compatible with ITRS Analytics workloads that use native sidecars. Running the preflight check confirms that clusters with a pre-installed Linkerd meet these requirements before deployment. This helps prevent deployment issues caused by incompatible service mesh versions.
How it validates compatibility Copied
-
Detects Linkerd installation:
- Searches for the
linkerd-configConfigMap in the defaultlinkerdnamespace. - If not found, searches across all namespaces.
- Searches for the
-
Validates Linkerd version:
- If Linkerd is present, its version is parsed and the
proxy.nativeSidecarflag is checked. Full support requires both version 2.15.0 or later and the flag to be enabled. - The version is then compared against the minimum required version (2.15.0).
- If Linkerd is present, its version is parsed and the
-
Reports compatibility:
- If Linkerd is not installed but internal TLS is enabled, or the version is below 2.15.0, the check reports that native sidecar support is unavailable.
- If Linkerd is version 2.15.0 or later and the
proxy.nativeSidecarflag is enabled, native sidecar support is considered compatible.
In summary, here’s how it works:
| Condition | Result |
|---|---|
| Linkerd not installed but internal TLS is enabled | Native sidecar support unavailable |
| Linkerd installed, version < 2.15.0 | Native sidecar support unavailable |
Linkerd installed, version ≥ 2.15.0; nativeSidecar flag disabled |
Native sidecar support unavailable |
| Linkerd installed with version ≥ 2.15.0 and nativeSidecar flag enabled | Native sidecar support available and compatible |
Preflight check for Inotify watcher limits Copied
Beginning with ITRS Analytics 2.17.x, this preflight check validates the fs.inotify.max_user_instances setting on cluster nodes. This kernel parameter controls the maximum number of inotify watchers a node can support, which is critical for workloads that monitor file system events.
The check runs on every node to verify that the parameter meets the recommended threshold, helping to identify potential configuration issues before installations, upgrades, or runtime operations.
How it works Copied
- The check runs as a DaemonSet on all cluster nodes.
- It validates that the
fs.inotify.max_user_instancesvalue meets the recommended minimum. - Any deviations are reported, allowing administrators to address configuration issues proactively.
Some environments may not allow the check to execute fully:
- Security policies may restrict DaemonSet access to the host
/procfilesystem. - Managed cloud environments may prevent modification of node kernel parameters.
- In these cases, the check may fail or generate warnings without an actionable resolution.
Configuration options Copied
You can control the behavior of this preflight check using the Run Node Sysctl Checks toggle, located under Advanced Settings > Preflight and Support Bundle Settings. The toggle is enabled by default.
When enabled (default):
- The inotify DaemonSet collector runs automatically during preflight checks.
- Any detected issues with node watcher limits are reported in the preflight results.
- Analyzer results are included in support bundles for troubleshooting purposes.
- Recommended: Keep this enabled in environments where the DaemonSet has permission to run and kernel parameters can be modified.
When disabled:
- The inotify DaemonSet collector is skipped, and the analyzer is excluded to prevent misleading warnings.
- The collector and analyzer are not included in support bundles.
- A contextual warning informs administrators that pods may fail at runtime with too many open files errors if the kernel parameter is insufficient.
- Only disable this option in environments where DaemonSet execution is restricted or kernel parameters cannot be modified, and ensure that workloads relying on inotify watchers are monitored carefully.
Note
Keep the Run Node Sysctl Checks option enabled whenever possible to proactively detect configuration issues. If the toggle must be disabled, you should monitor pods that rely on inotify watchers and be prepared to address runtime errors caused by insufficient kernel limits.
This check provides technical validation of node-level system settings, helping prevent installation, upgrade, and runtime issues related to inotify watcher limits.
Preflight check for Enforce Zone Spread setting Copied
Beginning with ITRS Analytics 2.17.x, the Enforce Zone Spread preflight check validates whether this setting is compatible with the availability zone topology of your Kubernetes cluster.
This check runs automatically before installation or upgrade and is useful when deploying or updating ITRS Analytics in environments where workloads may be distributed across one or more availability zones. It helps detect configuration mismatches early and prevents scheduling issues that could block critical workloads.
What it checks Copied
The check reads the topology.kubernetes.io/zone label from cluster nodes to determine the cluster’s zone topology. It then compares that topology against whether Enforce Zone Spread is enabled in your ITRS Analytics configuration.
Configuration options Copied
For installations deployed through the KOTS Admin Console, configure Enforce Zone Spread on the Config page under Advanced Settings > Preflight and Support Bundle Settings > Zone Spread. This setting is not enabled by default.
Possible outcomes Copied
| Cluster topology | Enforce Zone Spread | Result | Action required |
|---|---|---|---|
| Multiple zones detected | Enabled | Pass | No action needed |
| Multiple zones detected | Disabled | Warning | Consider enabling it for high availability across zones |
| Single zone detected | Enabled | Warning | Consider disabling it because it is unnecessary in single-zone clusters |
| Single zone detected | Disabled | Pass | No action needed |
| No zone labels found | Enabled | Fail (blocking) | Disable Enforce Zone Spread, or add topology.kubernetes.io/zone labels to cluster nodes |
| No zone labels found | Disabled | Pass | No action needed |
Fail condition Copied
The check fails and blocks the installation or upgrade if Enforce Zone Spread is enabled but no nodes carry the topology.kubernetes.io/zone label. In this state, ITRS Analytics critical workloads cannot be scheduled.
Resolve this before proceeding by doing one of the following:
- Disable Enforce Zone Spread in the ITRS Analytics configuration.
- Add
topology.kubernetes.io/zonelabels to cluster nodes so they reflect the actual zone layout.
The check also fails if node information cannot be retrieved, for example because of RBAC restrictions. If this occurs, verify that the preflight service account has permission to list cluster nodes.
Warning conditions Copied
Warnings are non-blocking but indicate a suboptimal configuration.
- If the cluster spans multiple availability zones and Enforce Zone Spread is disabled, enabling it improves high-availability guarantees across zones.
- If Enforce Zone Spread is enabled on a single-zone cluster, it provides no benefit and may cause scheduling issues if nodes are later removed.