Back to ITRS Analytics FAQ

“Liveness probe failed” seen in pod events

This FAQ explains how the liveness probe failures manifested on a pod, why they occurred, and the steps taken to restore stability.

Problem Copied

The pod entered a restart loop with frequent restarts over several hours. kubectl get pods showed the pod in 0/1 Running with an increasing restart count.

Pod events:

Events:
  Type     Reason     Age                  From               Message
  ----     ------     ----                 ----               -------
  Warning  Unhealthy  10h (x3 over 10h)    kubelet            Liveness probe failed: Get "http://**HIDDEN**:8080/healthcheck": dial tcp **HIDDEN**:8080: connect: connection refused
  Warning  Unhealthy  10h (x162 over 10h)  kubelet            Readiness probe failed: Get "http://**HIDDEN**:8080/healthcheck": dial tcp **HIDDEN**:8080: connect: connection refused
  Normal   Killing    10h (x9 over 10h)    kubelet            Container iax-app-capacity-daemon failed liveness probe, will be restarted

Kubernetes events reported repeated Liveness probe failed and Readiness probe failed messages, followed by Normal Killing when the kubelet restarted the container:

Container logs did not show a hard error; instead, they stopped at random startup lines, e.g.:

Possible cause Copied

The pod’s startup time exceeded the default liveness probe window. While the service was still initializing, the liveness probe attempted to call /healthcheck and timed out/was refused, causing Kubernetes to mark the container unhealthy and restart it prematurely—creating a loop. The defaults in the deployment were too aggressive for this environment:

This behavior was reproducible in busy/demo-heavy environments and was recognized as an issue with the default probe timing.

Possible solution Copied

Extending the liveness probe window allowed the service to finish initialization and expose a healthy endpoint before the kubelet judged it. Specifically:

livenessProbe:
  httpGet:
    path: /healthcheck
    port: 8080
  initialDelaySeconds: 300   # increased from 120
  timeoutSeconds: 10         # increased from 5
  failureThreshold: 5        # increased from 3
  periodSeconds: 10

After increasing initialDelaySeconds to 300 (with the optional timeout/failure tweaks), the pod stabilized: the pod reached a healthy state, the health probe began responding (HTTP health probe server listening … /healthcheck), and restarts ceased.

Steps: Copied

  1. Identify the pod’s parent deployment.
    • kubectl get deployment -n <namespace>
  2. increase/adjust the livenessProbe settings.
    • kubectl edit deployment <deployment_name> -n <namespace>
["Geneos"] ["FAQ"]

Was this topic helpful?