Fix: Docker container health status unhealthy
Part of: Docker, DevOps & Infrastructure
Quick Answer
How to fix Docker container health check failing with unhealthy status, including HEALTHCHECK syntax, timing issues, missing curl/wget, endpoint problems, and Compose healthcheck configuration.
The Error
You run docker ps and see your container stuck in an unhealthy state:
CONTAINER ID IMAGE STATUS PORTS
a1b2c3d4e5f6 my-app Up 2 minutes (unhealthy) 0.0.0.0:8080->8080/tcpOr you check the container health explicitly:
docker inspect --format='{{.State.Health.Status}}' my-appunhealthyDocker Compose may refuse to start dependent services entirely:
dependency failed to start: container my-app is unhealthyThis means Docker ran the HEALTHCHECK instruction defined in your Dockerfile (or docker-compose.yml) and it failed more times than the allowed retry count. The container is running, but Docker considers it unfit to serve traffic or satisfy dependency conditions.
Why This Happens
Docker health checks are commands that Docker runs inside the container at regular intervals. When the command exits with code 0, the container is healthy. When it exits with code 1, it counts as a failure. After a configured number of consecutive failures, the container transitions from starting to unhealthy.
Several things can cause health check failures:
- Wrong command syntax in the
HEALTHCHECKinstruction. - Timing issues where the health check runs before the application is ready.
- Missing tools like
curlorwgetinside the container image. - Wrong endpoint or the application listening on a different address than what the health check targets.
- Network misconfiguration where the health check uses
localhostbut the app binds to a specific interface. - Compose-specific configuration errors in the
healthchecksection. - Dependency chains where
depends_onwithcondition: service_healthyfails because an upstream service is itself unhealthy.
The key insight is that the health check runs inside the container’s filesystem and network namespace. What works from your host machine may not work inside the container.
Diagnostic Timeline
The reflexive fix for unhealthy is to raise --timeout or --start-period until the container goes green. That works for thirty minutes and then breaks differently. Here is how to debug it properly.
Minute 0 — Resist the timeout reflex. Bumping --timeout from 10s to 60s masks the real issue: either the healthcheck endpoint depends on something that is not ready yet, or the command itself is broken, or it is hitting the wrong address. A slow healthcheck on a healthy app is a misconfiguration, not a tuning problem.
Minute 1 — Read the actual healthcheck output. Skip straight to:
docker inspect --format='{{json .State.Health}}' my-app | python -m json.toolThe Log array contains the last five attempts with their stdout and stderr. This is the single most important debugging artifact and most engineers never look at it. The output tells you whether the command ran, whether it found curl/wget, whether the port was reachable, and what the HTTP response was.
Minute 3 — Classify the failure mode. The ExitCode and Output from the log narrow it down to one of:
- Command not found —
exec: "curl": executable file not found in $PATH. The image (alpine, distroless, scratch) does not ship the binary you assumed was there. - Connection refused — the application has not bound the port yet, or is listening on
0.0.0.0while the check uses127.0.0.1, or vice versa. - HTTP non-2xx — the endpoint exists but returns 503 because a downstream dependency (database, cache, upstream service) is not ready.
- Timeout — the command ran but did not complete inside
--timeout. Usually means the healthcheck endpoint does synchronous I/O (database query, external API call) on every hit. - Permission denied — the container runs as a non-root user and that user cannot exec the healthcheck binary, or cannot bind the socket.
Each of these has a different fix. Raising --timeout only helps case 4 — and only if you actually want the long latency.
Minute 5 — The wrong path: chicken-and-egg healthcheck endpoints. A common antipattern is making /health query the database to “verify the app works end-to-end.” Then the container starts before the database is ready, the healthcheck fails, Docker marks the container unhealthy, and your orchestrator restarts it — over and over. Healthchecks should be liveness (is this process responding to HTTP?) not readiness (is the entire system functional?). Strip database calls out of /health and put them in a separate /ready endpoint that your load balancer consumes, not Docker.
Minute 7 — Run the healthcheck manually. Get a shell in the container and run the exact command Docker is running:
docker exec -it my-app sh
# Then run the literal healthcheck command from your Dockerfile
curl -f http://localhost:8080/health
echo "exit: $?"If it succeeds inside the container but fails as a healthcheck, the difference is usually environment (your interactive shell has variables the container’s PID 1 healthcheck does not) or timing (the application is ready when you exec into the container, but not when the healthcheck runs at startup).
Minute 10 — Check --start-period, not --timeout. Java and .NET applications routinely take 30-60 seconds to boot. If --start-period is 0s (the default before Docker 17.05 and still the default if you do not set it), the first few failures during boot count toward the retry limit and the container goes unhealthy before it has even finished starting. Set --start-period=60s for slow-booting apps. Failures during the start period do not count against --retries.
Real root-cause distribution: roughly 30% wrong binary in image (curl missing on alpine/distroless), 25% bind-address mismatch (0.0.0.0 vs localhost), 20% healthcheck endpoint querying a not-yet-ready dependency, 15% missing --start-period for slow boot, 10% exec form syntax error in the Dockerfile. Walk the list in that order before changing timing parameters.
Fix 1: Check Health Check Command Syntax in Dockerfile
The most common cause is a syntax error in the HEALTHCHECK instruction. There are two valid forms:
Shell form (runs through /bin/sh -c):
HEALTHCHECK CMD curl -f http://localhost:8080/health || exit 1Exec form (runs directly, no shell):
HEALTHCHECK CMD ["curl", "-f", "http://localhost:8080/health"]A frequent mistake is mixing the two forms:
# Wrong - this passes the entire string as argv[0]
HEALTHCHECK CMD ["curl -f http://localhost:8080/health"]Each argument must be a separate array element in exec form. If you need shell features like || or pipes, use the shell form or explicitly invoke the shell:
HEALTHCHECK CMD ["/bin/sh", "-c", "curl -f http://localhost:8080/health || exit 1"]Also verify the endpoint path is correct. If your app exposes /healthz but your health check hits /health, it will get a 404 and fail.
Check what health check is currently configured:
docker inspect --format='{{json .Config.Healthcheck}}' my-app | python -m json.toolThis shows the exact command, interval, timeout, and retry settings Docker is using.
Pro Tip: The
-fflag oncurlmakes it return a non-zero exit code on HTTP errors (4xx, 5xx). Without it,curlexits0even on a 500 response, and your health check would pass when it should fail.
Fix 2: Fix Health Check Timing (interval, timeout, start-period, retries)
Your application might need time to start up. If the health check runs before the app is ready, it will fail during the startup window. Docker provides four timing parameters:
HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
CMD curl -f http://localhost:8080/health || exit 1Here is what each parameter controls:
--interval: Time between health check attempts (default: 30s).--timeout: Maximum time for a single health check to complete (default: 30s). If the command does not finish within this time, it counts as a failure.--start-period: Grace period after container start during which failed checks do not count toward the retry limit (default: 0s). This is the one most people miss.--retries: Number of consecutive failures required to mark the containerunhealthy(default: 3).
If your Java application takes 45 seconds to boot, set --start-period=60s to give it room. During the start period, health check failures are not counted. Only after the start period do failures begin counting toward the retry limit.
A common mistake is setting --timeout too low. If your health endpoint queries a database or performs initialization on first call, it may take longer than expected:
# Too aggressive for a heavy app
HEALTHCHECK --interval=5s --timeout=2s --retries=1 CMD curl -f http://localhost:8080/health
# More forgiving
HEALTHCHECK --interval=15s --timeout=10s --start-period=30s --retries=3 CMD curl -f http://localhost:8080/healthFix 3: Fix curl/wget Not Available in Container
Minimal base images like alpine, distroless, or scratch do not include curl or wget. If your health check uses one of these tools but it is not installed, the check fails immediately.
Check by running the command manually inside the container:
docker exec my-app which curlIf it returns nothing, you have a few options.
Option A: Install curl in the Dockerfile:
# For Alpine
RUN apk add --no-cache curl
# For Debian/Ubuntu
RUN apt-get update && apt-get install -y --no-install-recommends curl && rm -rf /var/lib/apt/lists/*Option B: Use wget instead (pre-installed on Alpine):
HEALTHCHECK CMD wget --no-verbose --tries=1 --spider http://localhost:8080/health || exit 1Alpine ships with BusyBox wget, so this works without installing anything extra.
Option C: Use a built-in language tool:
For Node.js apps, avoid adding curl entirely:
HEALTHCHECK CMD node -e "require('http').get('http://localhost:8080/health', (r) => { process.exit(r.statusCode === 200 ? 0 : 1) })"For Python apps:
HEALTHCHECK CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8080/health')"Option D: Write a tiny health check binary:
For distroless or scratch images, compile a small static binary (in Go, Rust, or C) and copy it into the image. This avoids installing a shell or any extra runtime.
This approach keeps your image small while still supporting health checks.
Fix 4: Fix Health Check Endpoint Not Ready
Sometimes the health check command is correct, but the endpoint it targets is not available. This differs from a timing issue because the endpoint might never become available due to a code or configuration problem.
Check whether the endpoint responds from inside the container:
docker exec my-app curl -v http://localhost:8080/healthIf you get Connection refused, the application is either not running, not listening on that port, or listening on a different interface (see Fix 8).
If you get a non-200 response, check the application logs:
docker logs my-appCommon causes:
- The application crashed after starting but Docker has not restarted it yet.
- The health endpoint depends on a database connection that is not available. If your
/healthroute queries the database, a database outage will make your container unhealthy. Consider splitting into a liveness check (is the process alive) and a readiness check (can it serve traffic). For Docker’sHEALTHCHECK, use the liveness check — something lightweight like returning200 OKif the HTTP server is responding. - The application listens on a different port internally than what you expect. Verify with:
docker exec my-app ss -tlnpOr if ss is not available:
docker exec my-app netstat -tlnpThis shows exactly which ports have listeners inside the container.
Fix 5: Fix Docker Compose healthcheck Config
Docker Compose has its own healthcheck syntax that overrides any HEALTHCHECK in the Dockerfile. The YAML structure trips people up:
services:
web:
image: my-app
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
interval: 30s
timeout: 10s
start_period: 60s
retries: 3Note these common mistakes:
Wrong: Using CMD-SHELL without a string:
# Wrong
test: ["CMD-SHELL", "curl", "-f", "http://localhost:8080/health"]
# Right - CMD-SHELL takes a single string
test: ["CMD-SHELL", "curl -f http://localhost:8080/health || exit 1"]CMD-SHELL passes the second element as a single shell command string. CMD passes each element as a separate argument.
Wrong: Using the short form incorrectly:
# This works - short form uses CMD-SHELL implicitly
test: curl -f http://localhost:8080/health || exit 1Wrong: Indentation or field name errors:
# Wrong - underscore vs hyphen
healthcheck:
start-period: 60s # Wrong in some Compose versions
# Right for Compose v2
healthcheck:
start_period: 60sTo disable a health check inherited from the base image:
healthcheck:
disable: trueIf you are migrating from a Compose build that failed, double-check that your rebuilt image still has the correct health check configuration.
Fix 6: Fix depends_on with condition: service_healthy
Docker Compose lets you delay starting a service until its dependency is healthy:
services:
db:
image: postgres:16
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 10s
timeout: 5s
retries: 5
web:
image: my-app
depends_on:
db:
condition: service_healthyIf db never becomes healthy, web will never start, and you will see:
dependency failed to start: container db is unhealthyDebug this by checking the dependency container first:
docker compose ps
docker inspect --format='{{json .State.Health}}' project-db-1 | python -m json.toolCommon issues with depends_on health chains:
- The dependency container’s health check is misconfigured (apply Fixes 1-5 to that container).
- The dependency container has no health check defined at all. In that case
condition: service_healthywill hang indefinitely. - Circular dependencies where service A depends on B and B depends on A.
If your dependency exits unexpectedly, the container might be running into OOM kills before it can ever become healthy. Look at the container’s exit code with docker inspect --format='{{.State.ExitCode}}' container_name — 137 is the kernel OOM killer, 139 is a segfault, 143 is a clean SIGTERM.
Common Mistake: Defining
depends_onwithcondition: service_healthybut forgetting to add ahealthcheckblock to the dependency service. Without a health check, Docker Compose has no way to determine if the service is healthy and will wait forever.
Fix 7: Debug with docker inspect —format health
When the previous fixes do not resolve the issue, you need detailed diagnostic information. Docker stores the last five health check results, including stdout and stderr from each attempt.
Get the full health check state:
docker inspect --format='{{json .State.Health}}' my-app | python -m json.toolThis returns something like:
{
"Status": "unhealthy",
"FailingStreak": 5,
"Log": [
{
"Start": "2026-03-10T10:00:00.000000000Z",
"End": "2026-03-10T10:00:01.500000000Z",
"ExitCode": 1,
"Output": "curl: (7) Failed to connect to localhost port 8080: Connection refused\n"
}
]
}Key fields to examine:
FailingStreak: How many consecutive failures have occurred. If it is higher than your retry count, the container isunhealthy.ExitCode:0means success,1means failure,2means reserved (do not use exit code 2 in your health check commands).Output: The stdout/stderr from the health check command. This is where you find the actual error.
If the output is empty, the health check command might be failing to execute at all. Check that the binary exists and has execute permissions:
docker exec my-app ls -la /usr/bin/curlIf you see permission denied errors, the container might be running as a non-root user without access to the health check binary. Either fix the permissions or switch to a tool available to that user.
You can also watch health check results in real time using Docker events:
docker events --filter container=my-app --filter event=health_statusThis streams health status changes as they happen, which is useful when you are adjusting timing parameters and want to see the effect immediately.
Fix 8: Fix Network Issues in Health Checks (localhost vs 0.0.0.0)
A subtle but common issue: your application binds to 0.0.0.0 and the health check uses localhost, or vice versa. Inside a container, localhost resolves to 127.0.0.1. If your application only listens on the container’s assigned IP (not the loopback interface), localhost will not reach it.
Check what address your application binds to:
docker exec my-app ss -tlnpYou might see:
State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
LISTEN 0 128 0.0.0.0:8080 0.0.0.0:* users:(("node",pid=1,fd=3))If Local Address shows 0.0.0.0:8080, then localhost:8080 will work because 0.0.0.0 includes all interfaces.
But if it shows 172.17.0.2:8080 (a specific container IP), then localhost:8080 will fail because the app is not listening on 127.0.0.1.
Fix for common frameworks:
Node.js/Express:
// Wrong - binds to localhost only by default in some setups
app.listen(8080, '127.0.0.1');
// Right - binds to all interfaces
app.listen(8080, '0.0.0.0');Python/Flask:
# Wrong
app.run(port=8080)
# Right
app.run(host='0.0.0.0', port=8080)Go:
// Wrong
http.ListenAndServe("localhost:8080", handler)
// Right
http.ListenAndServe(":8080", handler)Another network issue arises with health checks that call external services. If your health check hits an external URL, it depends on container DNS resolution and outbound network access. Prefer checking local endpoints only. A health check should verify that this container is working, not that the internet is available.
If your container uses a custom Docker network and you reference other services by name, make sure the DNS resolution works inside the container:
docker exec my-app nslookup dbIf DNS fails, the container might not be connected to the right network. Verify with:
docker network inspect my-networkStill Not Working?
If none of the fixes above resolved the issue, try these less obvious solutions:
Check for filesystem issues. Some health checks write to a file to signal readiness. If the container’s filesystem is read-only or a volume mount has wrong permissions, the check fails silently:
docker exec my-app touch /tmp/test-writeCheck container resource limits. If the container is CPU-throttled or near its memory limit, the health check command itself may time out. Check resource usage:
docker stats my-app --no-streamIf the container is consistently at its memory limit, the kernel may be terminating it intermittently with exit code 137, which presents as repeated unhealthy transitions punctuated by restarts.
Inspect cgroup throttling. Even without OOM kills, CPU throttling can cause the health check process to stall beyond the timeout:
docker exec my-app cat /sys/fs/cgroup/cpu.statLook for a high nr_throttled value.
Try overriding the health check at runtime. To test whether the Dockerfile health check is the problem, override it at run time:
docker run --health-cmd="curl -f http://localhost:8080/health || exit 1" \
--health-interval=10s \
--health-timeout=5s \
--health-start-period=30s \
--health-retries=3 \
my-appOr disable it entirely to confirm the container works without it:
docker run --no-healthcheck my-appCheck Docker version compatibility. The --start-period flag was added in Docker 17.05. If you are running an older version, this option is silently ignored, and your health checks start counting failures immediately. Check your version:
docker versionReview the application’s graceful shutdown. If the container is being restarted by an orchestrator (like Kubernetes) due to being unhealthy, and the application does not shut down gracefully, it may leave stale PID files or lock files that prevent the next start from succeeding, creating a cycle of unhealthy restarts.
Use a dedicated health check script. Instead of inlining the check in the Dockerfile, create a script with better error handling:
#!/bin/sh
set -e
# Check if the HTTP server responds
curl -f http://localhost:8080/health > /dev/null 2>&1 || exit 1
# Optionally check other conditions
if [ ! -f /tmp/app-ready ]; then
exit 1
fi
exit 0Copy it into the image and reference it:
COPY healthcheck.sh /usr/local/bin/healthcheck.sh
RUN chmod +x /usr/local/bin/healthcheck.sh
HEALTHCHECK --interval=15s --timeout=10s --start-period=30s --retries=3 CMD /usr/local/bin/healthcheck.shThis gives you a single place to add logging, check multiple conditions, and debug failures without rebuilding the image for every health check change.
Verify Docker Desktop’s recent behavior change on macOS and Windows. Docker Desktop changed how it reports health status during container restart in 4.30+. A container that transitions through starting -> healthy -> exited (1) -> starting now sometimes shows as unhealthy for a few seconds during the second starting phase, even though it later becomes healthy. If your monitoring or depends_on check fires during that window, it sees a false unhealthy. Either add a settle delay or filter on a stable health window (3 consecutive checks) rather than the instantaneous status.
Watch for healthcheck inheritance from base images. If you FROM nginx:alpine and your base image declares a HEALTHCHECK, that check runs even if you do not define one yourself. Combined with a custom CMD that overrides the base image’s startup command, the inherited healthcheck may target a service that is no longer running. Either set your own HEALTHCHECK to replace the inherited one, or explicitly disable it:
HEALTHCHECK NONEThis is far more common than people realize — nginx, redis, postgres, and mariadb official images all ship with healthchecks now.
Confirm the healthcheck PID does not leak processes. A healthcheck spawned every 30 seconds that does not clean up its child processes leaks PIDs. After a few hours the container hits the PID limit and the next healthcheck fails with fork: retry: Resource temporarily unavailable. Inside the container, run ps -ef | wc -l over time. If the count grows monotonically, your healthcheck script is leaking. The common culprit is a shell-form healthcheck with a pipeline (curl ... | grep ...) where one side of the pipe dies and orphans the other.
Check whether HEALTHCHECK runs at all under BuildKit or Podman. BuildKit builds the image but the HEALTHCHECK only fires when the container actually runs — and only under docker run, not docker build. Podman 4.x ignores Dockerfile-level HEALTHCHECK instructions by default; you must pass --health-cmd to podman run or use a Quadlet unit. If your container is healthy under Docker but unhealthy (or has no health status) under Podman, the runtime is the difference.
Inspect the cgroup version mismatch. On hosts that have migrated from cgroup v1 to v2 (Ubuntu 22.04+, RHEL 9+), older Docker versions (pre-20.10.17) sometimes report inaccurate CPU usage to healthcheck commands. The check itself succeeds, but docker stats shows artificially low numbers and you misdiagnose a CPU-throttled container as healthy. Upgrade Docker on hosts with cgroup v2.
Trace systemd or supervisord interactions. If your container runs a process supervisor (systemd, supervisord, s6, runit), the supervisor restarts crashed child processes but PID 1 stays alive. Docker sees a healthy container (PID 1 is up) while your application is crash-looping inside it. Healthcheck on the actual application port, not just on PID 1 liveness. For Kubernetes deployments this manifests differently — the orchestrator-side symptom is repeated pod restarts and a higher-level back-off behavior instead of a single container being marked unhealthy.
Confirm depends_on is wired up correctly across the stack. If your healthcheck is fine in isolation but a dependent service still refuses to start, the issue may be in how Compose evaluates the dependency graph. See Docker Compose depends_on not working for the Compose-side gotchas around condition: service_healthy and condition: service_started.
Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.
Was this article helpful?
Related Articles
Fix: Coolify Not Working — Deployment Failing, SSL Not Working, or Containers Not Starting
How to fix Coolify self-hosted PaaS issues — server setup, application deployment, Docker and Nixpacks builds, environment variables, SSL certificates, database provisioning, and GitHub integration.
Fix: Docker Secrets Not Working — BuildKit --secret Not Mounting, Compose Secrets Undefined, or Secret Leaking into Image
How to fix Docker secrets — BuildKit secret mounts in Dockerfile, docker-compose secrets config, runtime vs build-time secrets, environment variable alternatives, and verifying secrets don't leak into image layers.
Fix: Docker Compose Healthcheck Not Working — depends_on Not Waiting or Always Unhealthy
How to fix Docker Compose healthcheck issues — depends_on condition service_healthy, healthcheck command syntax, start_period, custom health scripts, and debugging unhealthy containers.
Fix: Docker Multi-Platform Build Not Working — buildx Fails, Wrong Architecture, or QEMU Error
How to fix Docker multi-platform build issues — buildx setup, QEMU registration, --platform flag usage, architecture-specific dependencies, and pushing multi-arch manifests to a registry.