ITRS Analytics deployment planning and resiliency

This guide helps you understand the resiliency characteristics and trade-offs of different ITRS Analytics deployment options. Your choice of deployment model directly affects high availability, continuous operations, and your ability to meet uptime and compliance requirements.

Note
Your resiliency options depend on your primary deployment model. ITRS Analytics can be deployed as SaaS (hosted and managed by ITRS on AWS) or Self-Hosted (deployed and managed by your organization on your own infrastructure). If you have not yet selected a model, see Choosing your ITRS Analytics deployment model before continuing.

Designing a resilient ITRS Analytics deployment Copied

ITRS Analytics is built on a Kubernetes-native architecture, designed for continuous high availability, scalable deployments, and resilient operations. For SaaS deployments, ITRS manages the underlying infrastructure, including high availability across two availability zones and daily automated backups. For Self-Hosted deployments, a key decision that drives most resiliency characteristics is the choice of Kubernetes deployment method: Bring Your Own (Kubernetes) Cluster (BYOC) or Embedded (Kubernetes) Cluster (EC).

BYOC vs EC

Note
Bring Your Own (Kubernetes) Cluster (BYOC) is the recommended deployment model, offering customers maximum flexibility, control, and enterprise-grade resiliency. Embedded Cluster can be suitable for small-scale or trial deployments, but it comes with specific trade-offs and limitations. This guide explains the implications of choosing an Embedded Cluster, rather than treating BYOC and EC as equivalent options.

ITRS Analytics achieves resiliency through high availability and continuous operations mechanisms. By running redundant services across the cluster with intelligent load balancing and automated failover, the platform ensures uninterrupted access to observability data even during component or node failures.

With a Kubernetes-native design, monitoring and alerting workflows continue seamlessly, meeting strict compliance and uptime requirements without requiring manual intervention. Understanding these characteristics and how they differ between BYOC and EC is essential for designing a deployment that aligns with your organization’s uptime, compliance, and continuous operations goals.

Key resiliency concepts Copied

When planning your ITRS Analytics deployment, two fundamental concepts work together to define the platform’s operational characteristics.

High availability (HA) Copied

High availability ensures that your observability platform continues to operate without interruption, even if individual components fail. This is achieved by deploying redundant services, load balancers, and failover mechanisms so that if one workload becomes unavailable, another seamlessly takes over.

Key characteristics:

Focuses on minimizing downtime within the same site or region.
In both multi-node BYOC and Embedded Cluster deployments, no node should have a round-trip time (RTT) greater than 10 ms to any other node in the Kubernetes cluster. This requirement applies to the Kubernetes cluster itself and to all workloads running within it.
As a result, architectures that span multiple data centers or availability zones are supported only if the Kubernetes cluster can span them and the inter-node latency requirements are met. Multiple availability zones within a single region are supported, and multiple data centers may also be supported if they are sufficiently close and meet the required latency thresholds.
Supports deployments across subnets in availability zones.

Note
While HA configurations can deploy across multiple availability zones within a region, all nodes must maintain network latency below 10ms to ensure proper cluster operation.

Continuous operations Copied

ITRS Analytics is designed to maintain continuous operations during localized failures within a single cluster. Through high availability configurations, the platform automatically handles pod failures, node outages, and individual service disruptions without manual intervention. Kubernetes orchestration ensures workloads are rescheduled, traffic is rerouted, and services remain accessible even as infrastructure components fail and recover.

This continuous operations model focuses on keeping the platform available within a single site or region during common failure scenarios—precisely the situations where your observability data is most critical for troubleshooting and incident response.

Important
ITRS Analytics does not provide built-in cross-site or cross-region disaster recovery capabilities. The platform is designed for continuous operations during localized failures (pods, nodes, services) within a single deployment, not for automatic failover between geographically separated deployments.

If your organization requires protection against large-scale or catastrophic events, such as complete data center outages, regional cloud failures, or severe cyber incidents, you must implement your own disaster recovery strategy by:

Running two or more independent ITRS Analytics deployments in separate locations
Implementing your own data synchronization mechanisms between deployments
Managing failover procedures and traffic redirection during regional failures
Maintaining recovery runbooks and testing DR processes regularly

This approach allows you to design disaster recovery that aligns with your specific Recovery Time Objectives (RTO) and Recovery Point Objectives (RPO), while the platform itself focuses on maximizing uptime within each deployment.

Kubernetes deployment methods Copied

ITRS Analytics supports two primary methods for deploying into a Kubernetes platform. Your choice impacts the resiliency features available to you.

Bring Your Own (Kubernetes) Cluster (BYOC) Copied

In this scenario, customers have a dedicated team or expertise to deploy standard Kubernetes services from a hyperscaler or an on-premises system. This is the recommended approach for production deployments.

Examples include:

Self-hosted Kubernetes or OpenShift
AWS Elastic Kubernetes Service (EKS)
Azure Kubernetes Service (AKS)
Google Kubernetes Engine (GKE)
Other managed Kubernetes offerings

Native Bring Your Own Cluster (BYOC) environments typically offer broader capabilities and operational advantages compared to the Embedded Cluster deployment model.

Embedded (Kubernetes) Cluster (EC) Copied

This scenario is for customers who don’t have access to a Kubernetes platform and want ITRS to deploy the Embedded Cluster (packaged K0s) with ITRS Analytics.

Examples include:

Deployment directly on virtual machines (VMs)
Deployment directly on bare metal servers

Advantages of native Bring Your Own Cluster deployments Copied

Native Bring Your Own Cluster (BYOC) environments provide several operational advantages over Embedded Cluster (EC) deployments. The scenarios below illustrate how these advantages play out in real-world ITRS Analytics operations.

Ensuring resilient access with load balancers Copied

Scenario: Your organization runs multiple ITRS Analytics ingestion services and UIs that must remain accessible even during high traffic spikes.

A load balancer is required in all deployment models to ensure resilient access to ITRS Analytics services.

In a Bring Your Own Cluster environment, especially in cloud-based setups, a load balancer is typically readily available and integrates seamlessly with Kubernetes. It distributes traffic across multiple service replicas and often integrates with DNS services, helping maintain stable URLs and endpoints during scaling events or network changes.

In Embedded Cluster deployments, a load balancer is still required but is not provided as part of the deployment. Customers must supply and manage their own load balancer, which can be hardware-based or software-based. This usually requires additional planning and coordination with the network or infrastructure team.