Embedded Cluster shutdown and startup procedures

This guide provides the correct procedures for safely shutting down and starting up Embedded Clusters. These procedures apply to both single-node and multi-node deployments, including High Availability (HA) configurations.

Following these procedures ensures system stability, maintains data integrity, and minimizes unplanned downtime during maintenance operations. The Embedded Cluster uses systemd units for service management, which handle automatic shutdown and startup processes during normal operations.

General guidelines Copied

Pre-maintenance support bundle generation Copied

Note
Always generate and save a support bundle before performing any maintenance operations. This is a critical step that provides essential diagnostic information for troubleshooting potential issues that may arise during or after maintenance.

Before proceeding with any shutdown or maintenance procedure:

Generate a support bundle using the KOTS Admin Console or the CLI (preferred, as it includes node resources).

a. Navigate to the ITRS Analytics folder.

b. Run the following command:
```
sudo ./itrs-analytics support-bundle
```
Example output:
```
✔ Support bundle saved at /home/ec2-user/iax-2.12.0+4/support-bundle-2025-09-19T11_12_23.tar.gz
```
Upload the support bundle to a secure location such as:
- A dedicated server accessible to your team
- An administrator’s PC or laptop
- A network-accessible storage location

Warning
You must never skip this step, as the support bundle captures the last known good state of your system before maintenance activities begin.

Automatic service management Copied

The Embedded Cluster requires minimal manual intervention. The cluster is managed through systemd units that automatically handle proper service shutdown and restart during node reboots. In most scenarios, manual stop or start operations are unnecessary.

Benefits of automatic management:

Ensures all services start in the correct order.
Maintains service dependencies and health checks.
Reduces risk of human error during maintenance.

Clean reboot procedures Copied

Always perform clean system reboots when maintenance requires a restart:

sudo systemctl reboot

Warning
Avoid hard reboots (power button, forced shutdowns) as they can lead to data corruption, service instability, extended recovery times, or potential cluster state inconsistencies.

Single-node clusters Copied

No additional procedures are required for single-node clusters beyond performing clean reboots with systemctl reboot.

Since all services, worker processes, and applications run on the same node, shutting down this node will make the entire cluster unavailable until restart is complete and all services are restored.

Multi-node and High Availability clusters Copied

Multi-node deployments, particularly in High Availability (HA) configurations, require coordinated maintenance procedures to preserve cluster availability and maintain service redundancy.

Worker nodes Copied

Worker nodes operate independently and can be rebooted individually without significant cluster impact.
No coordination between worker nodes is necessary, though clean reboots using systemctl reboot remain the recommended procedure for proper service management.

Control-plane nodes Copied

Control-plane nodes provide essential cluster management services including API coordination, scheduling decisions, and maintaining distributed cluster state.

In HA deployments, the first three nodes typically function as control-plane nodes.
Critical maintenance requirements for preserving cluster availability:
- One-at-a-time maintenance — reboot control-plane nodes individually to maintain cluster management capabilities.
- Maintain majority consensus — ensure at least two control-plane nodes remain active and joined to the cluster during maintenance.
- Avoid concurrent downtime — simultaneous control-plane node outages can trigger cluster-wide service interruption and require complex recovery procedures.

Proper sequencing prevents quorum loss, which would otherwise disable cluster management functions and potentially impact all running workloads.

Best practices Copied

Use systemctl reboot for all planned restarts.
Avoid hard shutdowns to prevent data corruption.
Do not manually stop or start services, unless absolutely necessary.
Schedule maintenance during appropriate windows.
Document maintenance activities for operational records.

Multi-node specific practices Copied

Reboot control-plane nodes sequentially.
Maintain minimum active control-plane nodes (at least 2 out of 3).
Verify cluster health after each node maintenance.

Proper Embedded Cluster shutdown and startup procedures are essential for maintaining system reliability and data integrity. The automated service management through systemd units provides efficient operation with minimal administrative overhead.

Manual service management Copied

Manual stopping or starting of cluster services is possible but generally discouraged. Use this approach only when advised by ITRS Support, as incomplete or incorrect service management can leave the cluster in an unhealthy state.

Service stop commands Copied


sudo systemctl stop k0sworker
sudo systemctl stop k0scontroller
sudo systemctl stop local-artifact-mirror

Service start commands Copied

sudo systemctl start k0sworker
sudo systemctl start k0scontroller
sudo systemctl start local-artifact-mirror

Previous article Next article

Embedded Cluster shutdown and startup procedures

General guidelines Copied

Pre-maintenance support bundle generation Copied

Automatic service management Copied

Clean reboot procedures Copied

Single-node clusters Copied

Multi-node and High Availability clusters Copied

Worker nodes Copied

Control-plane nodes Copied

Best practices Copied

Multi-node specific practices Copied

Manual service management Copied

Service stop commands Copied

Service start commands Copied

Was this topic helpful?

Your thoughts...

How can we improve this topic?

Your thoughts...

Thank you for your feedback!