How do I fix "Docker container health status unhealthy"?

How to fix Docker container health check failing with unhealthy status, including HEALTHCHECK syntax, timing issues, missing curl/wget, endpoint problems, and Compose healthcheck configuration.

Fix: Docker container health status unhealthy

The Container That Refuses to Go Green

Personally, I think unhealthy is one of the most under-investigated Docker statuses because the obvious first move (bump the timeout) usually masks the real problem rather than solving it. The fix depends entirely on WHY the healthcheck is failing: missing binary, bind-address mismatch, slow startup, or a dependency that is not ready. I learned to read the Health.Log output before changing anything. You run docker ps and see your container stuck in an unhealthy state:

CONTAINER ID   IMAGE       STATUS                     PORTS
a1b2c3d4e5f6   my-app      Up 2 minutes (unhealthy)   0.0.0.0:8080->8080/tcp

Or you check the container health explicitly:

docker inspect --format='{{.State.Health.Status}}' my-app

unhealthy

Docker Compose may refuse to start dependent services entirely:

dependency failed to start: container my-app is unhealthy

This means Docker ran the HEALTHCHECK instruction defined in your Dockerfile (or docker-compose.yml) and it failed more times than the allowed retry count. The container is running, but Docker considers it unfit to serve traffic or satisfy dependency conditions.

How Docker Decides a Container Is Unhealthy

Docker health checks are commands that Docker runs inside the container at regular intervals. When the command exits with code 0, the container is healthy. When it exits with code 1, it counts as a failure. After a configured number of consecutive failures, the container transitions from starting to unhealthy.

Several things can cause health check failures:

Wrong command syntax in the HEALTHCHECK instruction.
Timing issues where the health check runs before the application is ready.
Missing tools like curl or wget inside the container image.
Wrong endpoint or the application listening on a different address than what the health check targets.
Network misconfiguration where the health check uses localhost but the app binds to a specific interface.
Compose-specific configuration errors in the healthcheck section.
Dependency chains where depends_on with condition: service_healthy fails because an upstream service is itself unhealthy.

The key insight is that the health check runs inside the container’s filesystem and network namespace. What works from your host machine may not work inside the container.

Quick Reference Before You Dive In

If you arrived here from Google with a fresh unhealthy status, the five facts that resolve roughly 90 percent of cases:

Read docker inspect --format='{{json .State.Health}}' my-app FIRST. It contains the last 5 healthcheck attempts with stdout / stderr. Most engineers never look at it. The Docker HEALTHCHECK reference and the Compose healthcheck docs are the canonical sources.
Most Alpine / distroless images do NOT have curl. If exec: "curl": executable file not found appears in the log, use wget (Alpine ships it) or a tiny Node / Python one-liner.
localhost vs 0.0.0.0 bind-address mismatch is the second most common cause. If your app binds to 127.0.0.1 only, the healthcheck from 0.0.0.0 cannot reach it. Bind to 0.0.0.0 inside containers.
Set --start-period=60s for slow-booting apps (Java, .NET, Rails). Failures during this window do not count toward retries. Default is 0 seconds.
Healthcheck endpoints should NOT call the database. Liveness (process responding to HTTP) is what Docker wants; readiness (full system functional) belongs at the load balancer.

The rest of this article walks through each cause in detail, plus the failure modes most other guides skip.

Diagnostic Timeline

The reflexive fix for unhealthy is to raise --timeout or --start-period until the container goes green. That works for thirty minutes and then breaks differently. Here is how to debug it properly.

Minute 0: Resist the timeout reflex. Bumping --timeout from 10s to 60s masks the real issue: either the healthcheck endpoint depends on something that is not ready yet, or the command itself is broken, or it is hitting the wrong address. A slow healthcheck on a healthy app is a misconfiguration, not a tuning problem.

Minute 1: Read the actual healthcheck output. Skip straight to:

docker inspect --format='{{json .State.Health}}' my-app | python -m json.tool

The Log array contains the last five attempts with their stdout and stderr. This is the single most important debugging artifact and most engineers never look at it. The output tells you whether the command ran, whether it found curl/wget, whether the port was reachable, and what the HTTP response was.

Minute 3: Classify the failure mode. The ExitCode and Output from the log narrow it down to one of:

Command not found: exec: "curl": executable file not found in $PATH. The image (alpine, distroless, scratch) does not ship the binary you assumed was there.
Connection refused: the application has not bound the port yet, or is listening on 0.0.0.0 while the check uses 127.0.0.1, or vice versa.
HTTP non-2xx: the endpoint exists but returns 503 because a downstream dependency (database, cache, upstream service) is not ready.
Timeout: the command ran but did not complete inside --timeout. Usually means the healthcheck endpoint does synchronous I/O (database query, external API call) on every hit.
Permission denied: the container runs as a non-root user and that user cannot exec the healthcheck binary, or cannot bind the socket.

Each of these has a different fix. Raising --timeout only helps case 4, and only if you actually want the long latency.

Minute 5: The wrong path: chicken-and-egg healthcheck endpoints. A common antipattern is making /health query the database to “verify the app works end-to-end.” Then the container starts before the database is ready, the healthcheck fails, Docker marks the container unhealthy, and your orchestrator restarts it, over and over. Healthchecks should be liveness (is this process responding to HTTP?) not readiness (is the entire system functional?). Strip database calls out of /health and put them in a separate /ready endpoint that your load balancer consumes, not Docker.

Minute 7: Run the healthcheck manually. Get a shell in the container and run the exact command Docker is running:

docker exec -it my-app sh
# Then run the literal healthcheck command from your Dockerfile
curl -f http://localhost:8080/health
echo "exit: $?"

If it succeeds inside the container but fails as a healthcheck, the difference is usually environment (your interactive shell has variables the container’s PID 1 healthcheck does not) or timing (the application is ready when you exec into the container, but not when the healthcheck runs at startup).

Minute 10: Check --start-period, not --timeout. Java and .NET applications routinely take 30-60 seconds to boot. If --start-period is 0s (the default before Docker 17.05 and still the default if you do not set it), the first few failures during boot count toward the retry limit and the container goes unhealthy before it has even finished starting. Set --start-period=60s for slow-booting apps. Failures during the start period do not count against --retries.

Real root-cause distribution: roughly 30% wrong binary in image (curl missing on alpine/distroless), 25% bind-address mismatch (0.0.0.0 vs localhost), 20% healthcheck endpoint querying a not-yet-ready dependency, 15% missing --start-period for slow boot, 10% exec form syntax error in the Dockerfile. Walk the list in that order before changing timing parameters.

When to Use Which Fix

The next eight sections cover the fixes in detail. The table below maps your situation to the recommended fix.

Your situation	Recommended fix	Why
Healthcheck syntax in Dockerfile looks wrong	Fix 1: verify shell vs exec form, check endpoint path	Wrong syntax = silent failure
App takes 30+ seconds to boot	Fix 2: tune `--start-period`, `--interval`, `--timeout`, `--retries`	Default 0s start-period is too short
`curl: not found` in healthcheck log	Fix 3: use `wget`, install `curl`, or use language built-in	Alpine / distroless ship without curl
Endpoint returns non-2xx	Fix 4: verify the app, separate liveness from readiness	Endpoint must exist and be light
Compose `healthcheck` not recognized	Fix 5: check `test:` array form, `CMD-SHELL` syntax	YAML structure trips many up
`depends_on: condition: service_healthy` hangs	Fix 6: ensure dependency has its own healthcheck	Hang vs explicit fail
Need detailed diagnostics	Fix 7: `docker inspect` `.State.Health.Log` array	Last 5 attempts with output
Connection refused on localhost	Fix 8: bind app to `0.0.0.0`, not `127.0.0.1`	Container network namespace

If multiple rows apply, pick the topmost one.

Fix 1: Check Health Check Command Syntax in Dockerfile

The most common cause is a syntax error in the HEALTHCHECK instruction. There are two valid forms:

Shell form (runs through /bin/sh -c):

HEALTHCHECK CMD curl -f http://localhost:8080/health || exit 1

Exec form (runs directly, no shell):

HEALTHCHECK CMD ["curl", "-f", "http://localhost:8080/health"]

A frequent mistake is mixing the two forms:

# Wrong - this passes the entire string as argv[0]
HEALTHCHECK CMD ["curl -f http://localhost:8080/health"]

Each argument must be a separate array element in exec form. If you need shell features like || or pipes, use the shell form or explicitly invoke the shell:

HEALTHCHECK CMD ["/bin/sh", "-c", "curl -f http://localhost:8080/health || exit 1"]

Also verify the endpoint path is correct. If your app exposes /healthz but your health check hits /health, it will get a 404 and fail.

Check what health check is currently configured:

docker inspect --format='{{json .Config.Healthcheck}}' my-app | python -m json.tool

This shows the exact command, interval, timeout, and retry settings Docker is using.

A specific flag I always remember when writing healthchecks: curl -f. Without -f, curl exits 0 even on a 500 response, and your healthcheck would pass when the app is actually broken. With -f, curl returns non-zero on any HTTP error (4xx or 5xx) which is what Docker needs to mark the container unhealthy.

Fix 2: Fix Health Check Timing (interval, timeout, start-period, retries)

Your application might need time to start up. If the health check runs before the app is ready, it will fail during the startup window. Docker provides four timing parameters:

HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
  CMD curl -f http://localhost:8080/health || exit 1

Here is what each parameter controls:

--interval: Time between health check attempts (default: 30s).
--timeout: Maximum time for a single health check to complete (default: 30s). If the command does not finish within this time, it counts as a failure.
--start-period: Grace period after container start during which failed checks do not count toward the retry limit (default: 0s). This is the one most people miss.
--retries: Number of consecutive failures required to mark the container unhealthy (default: 3).

If your Java application takes 45 seconds to boot, set --start-period=60s to give it room. During the start period, health check failures are not counted. Only after the start period do failures begin counting toward the retry limit.

A common mistake is setting --timeout too low. If your health endpoint queries a database or performs initialization on first call, it may take longer than expected:

# Too aggressive for a heavy app
HEALTHCHECK --interval=5s --timeout=2s --retries=1 CMD curl -f http://localhost:8080/health

# More forgiving
HEALTHCHECK --interval=15s --timeout=10s --start-period=30s --retries=3 CMD curl -f http://localhost:8080/health

Fix 3: Fix curl/wget Not Available in Container

Minimal base images like alpine, distroless, or scratch do not include curl or wget. If your health check uses one of these tools but it is not installed, the check fails immediately.

Check by running the command manually inside the container:

docker exec my-app which curl

If it returns nothing, you have a few options.

Option A: Install curl in the Dockerfile:

# For Alpine
RUN apk add --no-cache curl

# For Debian/Ubuntu
RUN apt-get update && apt-get install -y --no-install-recommends curl && rm -rf /var/lib/apt/lists/*

Option B: Use wget instead (pre-installed on Alpine):

HEALTHCHECK CMD wget --no-verbose --tries=1 --spider http://localhost:8080/health || exit 1

Alpine ships with BusyBox wget, so this works without installing anything extra.

Option C: Use a built-in language tool:

For Node.js apps, avoid adding curl entirely:

HEALTHCHECK CMD node -e "require('http').get('http://localhost:8080/health', (r) => { process.exit(r.statusCode === 200 ? 0 : 1) })"

For Python apps:

HEALTHCHECK CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8080/health')"

Option D: Write a tiny health check binary:

For distroless or scratch images, compile a small static binary (in Go, Rust, or C) and copy it into the image. This avoids installing a shell or any extra runtime.

This approach keeps your image small while still supporting health checks.

Fix 4: Fix Health Check Endpoint Not Ready

Sometimes the health check command is correct, but the endpoint it targets is not available. This differs from a timing issue because the endpoint might never become available due to a code or configuration problem.

Check whether the endpoint responds from inside the container:

docker exec my-app curl -v http://localhost:8080/health

If you get Connection refused, the application is either not running, not listening on that port, or listening on a different interface (see Fix 8).

If you get a non-200 response, check the application logs:

docker logs my-app

Common causes:

The application crashed after starting but Docker has not restarted it yet.
The health endpoint depends on a database connection that is not available. If your /health route queries the database, a database outage will make your container unhealthy. Consider splitting into a liveness check (is the process alive) and a readiness check (can it serve traffic). For Docker’s HEALTHCHECK, use the liveness check (something lightweight like returning 200 OK if the HTTP server is responding).
The application listens on a different port internally than what you expect. Verify with:

docker exec my-app ss -tlnp

Or if ss is not available:

docker exec my-app netstat -tlnp

This shows exactly which ports have listeners inside the container.

Fix 5: Fix Docker Compose healthcheck Config

Docker Compose has its own healthcheck syntax that overrides any HEALTHCHECK in the Dockerfile. The YAML structure trips people up:

services:
  web:
    image: my-app
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
      interval: 30s
      timeout: 10s
      start_period: 60s
      retries: 3

Note these common mistakes:

Wrong: Using CMD-SHELL without a string:

# Wrong
test: ["CMD-SHELL", "curl", "-f", "http://localhost:8080/health"]

# Right - CMD-SHELL takes a single string
test: ["CMD-SHELL", "curl -f http://localhost:8080/health || exit 1"]

CMD-SHELL passes the second element as a single shell command string. CMD passes each element as a separate argument.

Wrong: Using the short form incorrectly:

# This works - short form uses CMD-SHELL implicitly
test: curl -f http://localhost:8080/health || exit 1

Wrong: Indentation or field name errors:

# Wrong - underscore vs hyphen
healthcheck:
  start-period: 60s   # Wrong in some Compose versions

# Right for Compose v2
healthcheck:
  start_period: 60s

To disable a health check inherited from the base image:

healthcheck:
  disable: true

If you are migrating from a Compose build that failed, double-check that your rebuilt image still has the correct health check configuration.

Fix 6: Fix depends_on with condition: service_healthy

Docker Compose lets you delay starting a service until its dependency is healthy:

services:
  db:
    image: postgres:16
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 10s
      timeout: 5s
      retries: 5

  web:
    image: my-app
    depends_on:
      db:
        condition: service_healthy

If db never becomes healthy, web will never start, and you will see:

dependency failed to start: container db is unhealthy

Debug this by checking the dependency container first:

docker compose ps
docker inspect --format='{{json .State.Health}}' project-db-1 | python -m json.tool

Common issues with depends_on health chains:

The dependency container’s health check is misconfigured (apply Fixes 1-5 to that container).
The dependency container has no health check defined at all. In that case condition: service_healthy will hang indefinitely.
Circular dependencies where service A depends on B and B depends on A.

If your dependency exits unexpectedly, the container might be running into OOM kills before it can ever become healthy. Look at the container’s exit code with docker inspect --format='{{.State.ExitCode}}' container_name; 137 is the kernel OOM killer, 139 is a segfault, 143 is a clean SIGTERM.

A specific mistake I have personally triggered: defining depends_on: condition: service_healthy on a service whose healthcheck block I forgot to write. The dependent service hangs forever because Compose has no way to know if the dependency is healthy. Either add a healthcheck to the dependency or use condition: service_started if liveness is all you need.

Fix 7: Debug with docker inspect —format health

When the previous fixes do not resolve the issue, you need detailed diagnostic information. Docker stores the last five health check results, including stdout and stderr from each attempt.

Get the full health check state:

docker inspect --format='{{json .State.Health}}' my-app | python -m json.tool

This returns something like:

{
    "Status": "unhealthy",
    "FailingStreak": 5,
    "Log": [
        {
            "Start": "2026-03-10T10:00:00.000000000Z",
            "End": "2026-03-10T10:00:01.500000000Z",
            "ExitCode": 1,
            "Output": "curl: (7) Failed to connect to localhost port 8080: Connection refused\n"
        }
    ]
}

Key fields to examine:

FailingStreak: How many consecutive failures have occurred. If it is higher than your retry count, the container is unhealthy.
ExitCode: 0 means success, 1 means failure, 2 means reserved (do not use exit code 2 in your health check commands).
Output: The stdout/stderr from the health check command. This is where you find the actual error.

If the output is empty, the health check command might be failing to execute at all. Check that the binary exists and has execute permissions:

docker exec my-app ls -la /usr/bin/curl

If you see permission denied errors, the container might be running as a non-root user without access to the health check binary. Either fix the permissions or switch to a tool available to that user.

You can also watch health check results in real time using Docker events:

docker events --filter container=my-app --filter event=health_status

This streams health status changes as they happen, which is useful when you are adjusting timing parameters and want to see the effect immediately.

Fix 8: Fix Network Issues in Health Checks (localhost vs 0.0.0.0)

A subtle but common issue: your application binds to 0.0.0.0 and the health check uses localhost, or vice versa. Inside a container, localhost resolves to 127.0.0.1. If your application only listens on the container’s assigned IP (not the loopback interface), localhost will not reach it.

Check what address your application binds to:

docker exec my-app ss -tlnp

You might see:

State   Recv-Q  Send-Q  Local Address:Port  Peer Address:Port  Process
LISTEN  0       128     0.0.0.0:8080        0.0.0.0:*          users:(("node",pid=1,fd=3))

If Local Address shows 0.0.0.0:8080, then localhost:8080 will work because 0.0.0.0 includes all interfaces.

But if it shows 172.17.0.2:8080 (a specific container IP), then localhost:8080 will fail because the app is not listening on 127.0.0.1.

Fix for common frameworks:

Node.js/Express:

// Wrong - binds to localhost only by default in some setups
app.listen(8080, '127.0.0.1');

// Right - binds to all interfaces
app.listen(8080, '0.0.0.0');

Python/Flask:

# Wrong
app.run(port=8080)

# Right
app.run(host='0.0.0.0', port=8080)

Go:

// Wrong
http.ListenAndServe("localhost:8080", handler)

// Right
http.ListenAndServe(":8080", handler)

Another network issue arises with health checks that call external services. If your health check hits an external URL, it depends on container DNS resolution and outbound network access. Prefer checking local endpoints only. A health check should verify that this container is working, not that the internet is available.

If your container uses a custom Docker network and you reference other services by name, make sure the DNS resolution works inside the container:

docker exec my-app nslookup db

If DNS fails, the container might not be connected to the right network. Verify with:

docker network inspect my-network

Stranger Causes I Have Tracked Down

If none of the fixes above resolved the issue, try these less obvious solutions:

Check for filesystem issues. Some health checks write to a file to signal readiness. If the container’s filesystem is read-only or a volume mount has wrong permissions, the check fails silently:

docker exec my-app touch /tmp/test-write

Check container resource limits. If the container is CPU-throttled or near its memory limit, the health check command itself may time out. Check resource usage:

docker stats my-app --no-stream

If the container is consistently at its memory limit, the kernel may be terminating it intermittently with exit code 137, which presents as repeated unhealthy transitions punctuated by restarts.

Inspect cgroup throttling. Even without OOM kills, CPU throttling can cause the health check process to stall beyond the timeout:

docker exec my-app cat /sys/fs/cgroup/cpu.stat

Look for a high nr_throttled value.

Try overriding the health check at runtime. To test whether the Dockerfile health check is the problem, override it at run time:

docker run --health-cmd="curl -f http://localhost:8080/health || exit 1" \
  --health-interval=10s \
  --health-timeout=5s \
  --health-start-period=30s \
  --health-retries=3 \
  my-app

Or disable it entirely to confirm the container works without it:

docker run --no-healthcheck my-app

Check Docker version compatibility. The --start-period flag was added in Docker 17.05. If you are running an older version, this option is silently ignored, and your health checks start counting failures immediately. Check your version:

docker version

Review the application’s graceful shutdown. If the container is being restarted by an orchestrator (like Kubernetes) due to being unhealthy, and the application does not shut down gracefully, it may leave stale PID files or lock files that prevent the next start from succeeding, creating a cycle of unhealthy restarts.

Use a dedicated health check script. Instead of inlining the check in the Dockerfile, create a script with better error handling:

#!/bin/sh
set -e

# Check if the HTTP server responds
curl -f http://localhost:8080/health > /dev/null 2>&1 || exit 1

# Optionally check other conditions
if [ ! -f /tmp/app-ready ]; then
  exit 1
fi

exit 0

Copy it into the image and reference it:

COPY healthcheck.sh /usr/local/bin/healthcheck.sh
RUN chmod +x /usr/local/bin/healthcheck.sh
HEALTHCHECK --interval=15s --timeout=10s --start-period=30s --retries=3 CMD /usr/local/bin/healthcheck.sh

This gives you a single place to add logging, check multiple conditions, and debug failures without rebuilding the image for every health check change.

Verify Docker Desktop’s recent behavior change on macOS and Windows. Docker Desktop changed how it reports health status during container restart in 4.30+. A container that transitions through starting -> healthy -> exited (1) -> starting now sometimes shows as unhealthy for a few seconds during the second starting phase, even though it later becomes healthy. If your monitoring or depends_on check fires during that window, it sees a false unhealthy. Either add a settle delay or filter on a stable health window (3 consecutive checks) rather than the instantaneous status.

Watch for healthcheck inheritance from base images. If you FROM nginx:alpine and your base image declares a HEALTHCHECK, that check runs even if you do not define one yourself. Combined with a custom CMD that overrides the base image’s startup command, the inherited healthcheck may target a service that is no longer running. Either set your own HEALTHCHECK to replace the inherited one, or explicitly disable it:

HEALTHCHECK NONE

This is far more common than people realize; nginx, redis, postgres, and mariadb official images all ship with healthchecks now.

Confirm the healthcheck PID does not leak processes. A healthcheck spawned every 30 seconds that does not clean up its child processes leaks PIDs. After a few hours the container hits the PID limit and the next healthcheck fails with fork: retry: Resource temporarily unavailable. Inside the container, run ps -ef | wc -l over time. If the count grows monotonically, your healthcheck script is leaking. The common culprit is a shell-form healthcheck with a pipeline (curl ... | grep ...) where one side of the pipe dies and orphans the other.

Check whether HEALTHCHECK runs at all under BuildKit or Podman. BuildKit builds the image but the HEALTHCHECK only fires when the container actually runs, and only under docker run, not docker build. Podman 4.x ignores Dockerfile-level HEALTHCHECK instructions by default; you must pass --health-cmd to podman run or use a Quadlet unit. If your container is healthy under Docker but unhealthy (or has no health status) under Podman, the runtime is the difference.

Inspect the cgroup version mismatch. On hosts that have migrated from cgroup v1 to v2 (Ubuntu 22.04+, RHEL 9+), older Docker versions (pre-20.10.17) sometimes report inaccurate CPU usage to healthcheck commands. The check itself succeeds, but docker stats shows artificially low numbers and you misdiagnose a CPU-throttled container as healthy. Upgrade Docker on hosts with cgroup v2.

Trace systemd or supervisord interactions. If your container runs a process supervisor (systemd, supervisord, s6, runit), the supervisor restarts crashed child processes but PID 1 stays alive. Docker sees a healthy container (PID 1 is up) while your application is crash-looping inside it. Healthcheck on the actual application port, not just on PID 1 liveness. For Kubernetes deployments this manifests differently: the orchestrator-side symptom is repeated pod restarts and a higher-level back-off behavior instead of a single container being marked unhealthy.

Confirm depends_on is wired up correctly across the stack. If your healthcheck is fine in isolation but a dependent service still refuses to start, the issue may be in how Compose evaluates the dependency graph. See Docker Compose depends_on not working for the Compose-side gotchas around condition: service_healthy and condition: service_started.

What Other Tutorials Get Wrong About Container Healthchecks

Most Docker tutorials list the same fixes but frame them in ways that produce subtle bugs.

They jump to raising --timeout. This masks the real problem: either the command is broken, the binary is missing, or the endpoint is hitting a not-ready dependency. A slow healthcheck on a healthy app is a misconfiguration, not a tuning opportunity.

They use curl without flagging the missing-binary issue. Alpine, distroless, and scratch images do not ship curl. Articles that show HEALTHCHECK CMD curl ... examples leave Alpine users with “command not found” errors. Use wget (Alpine has it), or a language built-in.

They miss --start-period. Java, Rails, .NET apps take 30 to 60 seconds to boot. Without --start-period, the first few failures count toward retries and the container goes unhealthy before it has finished starting. Articles that focus on --retries and --timeout miss this critical flag.

They confuse liveness with readiness. Docker’s HEALTHCHECK is for LIVENESS (is the process responding?). Readiness (is the whole system functional?) belongs at the load balancer. Articles that show /health querying the database mix the two and produce containers that flap.

They miss localhost vs 0.0.0.0 bind-address mismatches. If your app binds to 127.0.0.1, the container can reach itself but only through the loopback interface. Healthchecks from 0.0.0.0 fail. Bind to 0.0.0.0 inside containers.

They miss inherited healthchecks from base images. FROM nginx:alpine inherits the base image’s HEALTHCHECK. If your CMD runs something else, the inherited check still fires against a service that no longer exists. Tutorials that focus on writing your own check miss the inheritance trap.

Frequently Asked Questions

What is the difference between liveness and readiness?

Liveness: is the process responding to HTTP at all? Readiness: is the full system (this process + dependencies) ready to serve traffic? Docker’s HEALTHCHECK is for liveness. Readiness belongs at the load balancer or orchestrator. Mixing them produces containers that flap during dependency outages.

Why does my healthcheck pass locally but fail in Docker?

Three common reasons. First, your local app binds to 0.0.0.0 but the container app binds to localhost only. Second, your local has curl installed but the container image (Alpine, distroless) does not. Third, your local check uses the host’s network namespace while the container has its own. Test the exact command inside the container with docker exec.

What does errno 137 mean?

Exit code 137 is the kernel OOM killer. The container exceeded its memory limit and was terminated. The healthcheck did not “fail”; the whole container died. Raise --memory or fix the leak; healthcheck tuning will not help.

Should I write a custom healthcheck script?

For non-trivial checks, yes. A script file makes it easy to add logging, check multiple conditions, and debug failures without rebuilding the image. Copy the script into the image and reference it in HEALTHCHECK CMD /usr/local/bin/healthcheck.sh.

What’s the difference between CMD and CMD-SHELL in Compose test:?

CMD passes each array element as a separate argv. CMD-SHELL passes the second element as a single shell command string (with ||, pipes, etc.). Use CMD for simple commands; use CMD-SHELL when you need shell features.

Why does depends_on: condition: service_healthy hang forever?

The dependency service does not have a healthcheck defined. Without it, Docker Compose has no way to determine if the service is healthy and waits indefinitely. Either add a healthcheck to the dependency, or use condition: service_started if you only need to wait for the container to start.