“Liveness probe failed” seen in pod events
Related to Copied
This FAQ explains how the liveness probe failures manifested on a pod, why they occurred, and the steps taken to restore stability.
Problem Copied
The pod entered a restart loop with frequent restarts over several hours. kubectl get pods showed the pod in 0/1 Running with an increasing restart count.
Pod events:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 10h (x3 over 10h) kubelet Liveness probe failed: Get "http://**HIDDEN**:8080/healthcheck": dial tcp **HIDDEN**:8080: connect: connection refused
Warning Unhealthy 10h (x162 over 10h) kubelet Readiness probe failed: Get "http://**HIDDEN**:8080/healthcheck": dial tcp **HIDDEN**:8080: connect: connection refused
Normal Killing 10h (x9 over 10h) kubelet Container iax-app-capacity-daemon failed liveness probe, will be restarted
Kubernetes events reported repeated Liveness probe failed and Readiness probe failed messages, followed by Normal Killing when the kubelet restarted the container:
- “Liveness probe failed … connection refused”
- “Readiness probe failed … connection refused”
- “Container … failed liveness probe, will be restarted”
Container logs did not show a hard error; instead, they stopped at random startup lines, e.g.:
Starting metric indexer for config …HikariPool-7 - Starting…thenHikariPool-7 - Start completed.The service had not yet made its health endpoint responsive when the liveness probe fired.
Possible cause Copied
The pod’s startup time exceeded the default liveness probe window. While the service was still initializing, the liveness probe attempted to call /healthcheck and timed out/was refused, causing Kubernetes to mark the container unhealthy and restart it prematurely—creating a loop. The defaults in the deployment were too aggressive for this environment:
initialDelaySeconds: 120timeoutSeconds: 5failureThreshold: 3
This behavior was reproducible in busy/demo-heavy environments and was recognized as an issue with the default probe timing.
Possible solution Copied
Extending the liveness probe window allowed the service to finish initialization and expose a healthy endpoint before the kubelet judged it. Specifically:
livenessProbe:
httpGet:
path: /healthcheck
port: 8080
initialDelaySeconds: 300 # increased from 120
timeoutSeconds: 10 # increased from 5
failureThreshold: 5 # increased from 3
periodSeconds: 10
After increasing initialDelaySeconds to 300 (with the optional timeout/failure tweaks), the pod stabilized: the pod reached a healthy state, the health probe began responding (HTTP health probe server listening … /healthcheck), and restarts ceased.
Steps: Copied
- Identify the pod’s parent deployment.
kubectl get deployment -n <namespace>
- increase/adjust the
livenessProbesettings.kubectl edit deployment <deployment_name> -n <namespace>