Upgrading to version 2.18.0 for the ClickHouse migration
ITRS Analytics 2.18.0 introduces the platform-wide ClickHouse migration. During the upgrade, ITRS Analytics deploys new ClickHouse-backed workloads, migrates retained data from the legacy stores, and then retires the old data path once the migration has completed successfully.
Because this upgrade changes the underlying storage and query path for major platform services, it requires more preparation than a standard maintenance release. Review this guidance before starting the upgrade.
Important
Version 2.18.0 is a one-way upgrade. Downgrading is not supported.
Do not manually delete TimescaleDB, Loki, or migration jobs during the upgrade unless ITRS Support explicitly instructs you to do so.
What changes in version 2.18.0 Copied
Version 2.18.0 moves additional platform workloads to ClickHouse, including metrics, logs, signals, audit events, entities, and related query paths. As part of the upgrade:
- New ClickHouse workloads are deployed for platform data.
- Historical metrics, signals, and logs are migrated from the legacy backends.
- Configuration service data is moved away from TimescaleDB.
- Query handling for DPD, Entity Service, Latest Metrics, and other platform services is aligned with the ClickHouse data model.
This means the upgrade can take longer than usual and may temporarily require both the old and new storage backends to coexist while migration jobs are running.
Before you upgrade Copied
Complete the following checks before starting the standard upgrade procedure.
Verify cluster support and maintenance window Copied
- Make sure your Kubernetes version is still supported by ITRS Analytics. Do not plan the upgrade on clusters that are still running Kubernetes 1.27 or 1.28.
- Schedule a maintenance window that allows time for data migration. The duration depends on the amount of retained metrics, signals, and logs in the system.
- If you are upgrading a busy production system, perform the upgrade in a staging environment first with representative retained data.
Confirm spare capacity for the migration Copied
During the upgrade, the legacy data stores and the new ClickHouse workloads can run at the same time. Before upgrading, verify that the cluster has enough spare:
- CPU
- memory
- storage capacity
- storage performance
Use the latest resource and hardware requirements and the current sample configuration files as your baseline.
Pay particular attention to storage performance. Slow or undersized storage can significantly lengthen migration time and affect ClickHouse query performance after the upgrade.
Review your retained data volume Copied
Migration effort is directly affected by the amount of data that must be copied from the legacy stores.
Before upgrading, review:
- metrics retention
- signal retention
- log retention
- current TimescaleDB and Loki disk usage
If historical data volume is unusually large, plan for a longer upgrade and higher temporary resource usage.
Review HA sizing and replica settings Copied
For HA deployments, review the latest HA sample configuration before upgrading.
In version 2.18.0, the IAM daemon becomes part of the authentication path for platform services. If you run HA, make sure your target configuration reflects the current HA recommendations for stateless services and that your cluster has enough nodes to place the additional replicas safely.
At minimum, review the newly introduced IAMD replica setting and the current HA replica recommendations for services such as kvStore and licenced in the latest sample configuration before upgrading.
Keep data-type enablement settings intentional and stable Copied
Version 2.18.0 introduces separate ClickHouse-backed workloads for multiple data types. Before upgrading:
- confirm whether logs are meant to stay enabled
- confirm whether traces are meant to stay enabled
- keep those settings consistent throughout the upgrade
Avoid changing log or trace enablement during the upgrade window unless you are intentionally reconfiguring the target deployment and have validated the final configuration first.
Decide whether you need multi-disk ClickHouse storage Copied
If you use additional filesystems for ClickHouse data placement, set the disk count before upgrading so the target layout is in place from the start.
Example:
clickhouse:
metrics:
diskCount: 4
logs:
diskCount: 4
platform:
diskCount: 4
traces:
diskCount: 4
If you do not use multiple filesystems, no action is required.
Breaking changes and compatibility considerations Copied
The ClickHouse migration introduces behavior changes that can affect custom integrations, automation, and API clients.
Metrics API changes Copied
- Requests to metric query endpoints now require a namespace for metric lookups such as
GetMetricsandGetStatusMetrics. - Metric metadata and query behavior now follow the ClickHouse-backed model. If you have custom consumers of metric APIs, re-test them before production rollout.
- Metric query results now align with the normalized unit model used by ClickHouse-backed storage. Validate any automation that depends on previously returned units.
- Query services now preserve nanosecond timestamp precision.
Data view and metric type changes Copied
- Metric handling now distinguishes gauge and counter data more explicitly.
- If you create DPD tasks or data view definitions programmatically, review those payloads before upgrading.
Entity filter behavior changes Copied
NOT_EQUALSandNOT INnow match only when the key is present.- Greater-than and less-than entity expression operators are no longer supported.
If you rely on saved filters, generated filters, or application-side query builders, validate them before upgrading.
Ingestion normalization changes Copied
- Backticks in dimension keys, namespaces, and attribute names are normalized to underscores during ingestion.
- Some log data is normalized more aggressively during ingestion and migration, including timestamp and severity parsing and event identifier handling.
If your data pipeline or downstream tooling depends on exact field names or raw log formatting, verify the output after upgrading.
Required actions during the upgrade Copied
After you start the standard upgrade procedure:
- Allow the ClickHouse schema and migration jobs to run to completion.
- Do not manually scale down or remove legacy data stores while migration is still in progress.
- Monitor the new ClickHouse workloads, migration jobs, and core platform services until the deployment stabilizes.
Expect the operator to clean up legacy TimescaleDB and Loki workloads only after the migration has completed successfully.
Post-upgrade validation Copied
After the upgrade finishes, validate both platform health and data continuity.
Check migration and platform health Copied
Verify that:
- all migration jobs completed successfully
- ClickHouse workloads are healthy
- core services such as
platformd,dpd,kvStore,licenced, and IAM are running normally - no critical pods are stuck restarting
Example checks:
kubectl get jobs -n <namespace>
kubectl get pods -n <namespace>
Validate data access Copied
Confirm that users can still access:
- current and historical metrics
- signal timelines and latest signals
- log search and log source discovery
- audit event queries
- dashboards and data views that depend on entity and metric queries
If you use custom integrations, validate them against a representative sample of:
- metric queries
- status metric queries
- entity filters
- DPD subscriptions or task definitions
Validate retention and storage behavior Copied
After the platform is stable, confirm that:
- expected ClickHouse PVCs exist and are sized correctly
- retention settings are being applied as expected
- storage growth is tracking normally after the migration
Note
Version 2.18.0 relaxes upgrade blocking around changeddiskSizeandstorageClassvalues, but changing those configuration values does not resize existing volumes automatically. If you need larger volumes, expand the underlying PVCs separately using your storage platform’s supported procedure.
When to postpone the upgrade Copied
Delay the upgrade and resolve prerequisites first if any of the following are true:
- the cluster does not meet the current supported Kubernetes version requirements
- the cluster does not have enough spare CPU, memory, or storage for the migration window
- storage performance is below the recommended baseline
- you have custom API clients or automations that have not been tested against the 2.18.0 query behavior changes
- you are unsure whether logs or traces should remain enabled after the upgrade
Once these checks are complete, continue with the standard ITRS Analytics upgrade procedure for your deployment type.