Skip to content

Fix: Docker HEALTHCHECK Failing — Container Marked Unhealthy Despite Running

FixDevs ·

Quick Answer

How to fix Docker HEALTHCHECK failures — command syntax, curl vs wget availability, start period, interval tuning, health check in docker-compose, and debugging unhealthy containers.

The Problem

A Docker container starts and runs correctly, but its status shows unhealthy:

docker ps

CONTAINER ID   IMAGE      STATUS
a1b2c3d4e5f6   myapp     Up 2 minutes (unhealthy)
# Container is running, but health check is failing

Or the container keeps restarting because orchestrators (Docker Swarm, Kubernetes) kill unhealthy containers:

docker inspect myapp --format='{{.State.Health.Status}}'
# unhealthy

docker inspect myapp --format='{{json .State.Health.Log}}'
# [{"Start":"...","End":"...","ExitCode":1,"Output":"curl: (7) Failed to connect to localhost port 8080: Connection refused"}]

Or the start_period isn’t long enough, causing the container to be marked unhealthy before the application finishes starting:

# Container starts, health check runs immediately, app not ready yet
# Health check fails 3 times → container marked unhealthy
# But app would have been healthy 10 seconds later

Why This Happens

Docker’s HEALTHCHECK instruction runs a command inside the container on a schedule. The container is marked unhealthy after a specified number of consecutive failures. Common causes:

  • curl or wget not in the image — the default health check command is often curl http://localhost:8080/health, but minimal images (Alpine, distroless) don’t include curl.
  • Wrong port or path — the health check targets a port or path that doesn’t exist or isn’t reachable from inside the container.
  • Localhost vs 0.0.0.0 — the app listens on 0.0.0.0 but the health check tries 127.0.0.1 — this should work, but some configurations bind only to a specific interface.
  • start_period too short — the health check starts counting failures immediately by default. Slow-starting applications (JVM, large Node.js apps) aren’t ready within the default start period.
  • Exit code not zero — the health check command must exit 0 to be healthy. If the HTTP request succeeds but the command returns a non-zero exit code for other reasons, Docker marks it as failed.
  • Shell unavailableHEALTHCHECK CMD without ["CMD-SHELL", "..."] runs the command directly without a shell, so shell features (&&, ||, pipes) don’t work.

Fix 1: Install curl or Use Alternatives

Minimal images often lack curl. Either install it or use an alternative:

# Alpine — install curl (adds ~1MB)
FROM node:20-alpine

RUN apk add --no-cache curl

HEALTHCHECK --interval=30s --timeout=10s --start-period=30s --retries=3 \
  CMD curl -f http://localhost:3000/health || exit 1

# Alternatively — use wget (included in busybox/Alpine)
HEALTHCHECK CMD wget -qO- http://localhost:3000/health || exit 1

# Or use nc (netcat) to check if port is open (no HTTP check)
HEALTHCHECK CMD nc -z localhost 3000 || exit 1

Distroless images — copy curl binary or use a shell script:

FROM gcr.io/distroless/nodejs20-debian12

# Distroless has no shell, no curl — use a compiled healthcheck binary
# Option 1: Use a multi-stage build to include a health check binary
FROM golang:1.22-alpine AS healthcheck-builder
RUN go build -o /healthcheck github.com/grpc-ecosystem/grpc-health-probe/...

FROM gcr.io/distroless/nodejs20-debian12
COPY --from=healthcheck-builder /healthcheck /healthcheck

HEALTHCHECK --interval=30s --timeout=10s \
  CMD ["/healthcheck", "-addr=:3000"]

Node.js — use a JavaScript health check script:

FROM node:20-alpine

# health-check.js included in the image
COPY health-check.js .

HEALTHCHECK --interval=30s --timeout=10s --start-period=30s \
  CMD node health-check.js
// health-check.js
const http = require('http');

const options = {
  host: 'localhost',
  port: process.env.PORT || 3000,
  path: '/health',
  timeout: 5000,
};

const req = http.request(options, (res) => {
  process.exit(res.statusCode === 200 ? 0 : 1);
});

req.on('error', () => process.exit(1));
req.on('timeout', () => { req.abort(); process.exit(1); });
req.end();

Fix 2: Set Correct Timing Parameters

The default HEALTHCHECK timing often causes false positives for real-world applications:

# Default values (if not specified):
# --interval=30s     Check every 30 seconds
# --timeout=30s      Fail if check takes longer than 30 seconds
# --start-period=0s  Start counting failures immediately
# --retries=3        Mark unhealthy after 3 consecutive failures

# Optimized for a typical web application:
HEALTHCHECK \
  --interval=10s \        # Check frequently during development
  --timeout=5s \          # Fail fast if unresponsive
  --start-period=60s \    # Wait 60s before counting failures (startup time)
  --retries=5 \           # 5 failures before marking unhealthy
  CMD curl -f http://localhost:3000/health || exit 1

# For a JVM/Spring Boot application (slow startup):
HEALTHCHECK \
  --interval=20s \
  --timeout=10s \
  --start-period=120s \   # JVM apps often take 30-90s to start
  --retries=5 \
  CMD curl -f http://localhost:8080/actuator/health || exit 1

start-period explained:

During start_period, health check failures don’t count toward retries. The container is starting during this window. After start_period elapses, failures start counting. This prevents false unhealthy status during legitimate startup.

# Check the health status timeline
docker inspect myapp | jq '.State.Health.Log[-5:]'
# Look at Start timestamps to see when checks began
# Compare against the container's start time in .State.StartedAt

Fix 3: Fix Health Check Command Syntax

The CMD form in HEALTHCHECK has two variants with different behavior:

# CMD (exec form) — runs directly, no shell, shell features not available
HEALTHCHECK CMD ["curl", "-f", "http://localhost:3000/health"]
# Equivalent to: docker exec container curl -f http://localhost:3000/health

# CMD-SHELL (shell form) — runs through /bin/sh -c
HEALTHCHECK CMD curl -f http://localhost:3000/health || exit 1
# Equivalent to: docker exec container /bin/sh -c "curl -f ... || exit 1"
# The || exit 1 requires a shell — use CMD-SHELL form

# Explicit CMD-SHELL form (equivalent to plain string CMD)
HEALTHCHECK CMD ["CMD-SHELL", "curl -f http://localhost:3000/health || exit 1"]

curl -f flag-f (fail) makes curl return exit code 22 for HTTP error responses (4xx, 5xx). Without it, curl exits 0 even on 404 or 500 responses:

# WRONG — exits 0 even on 500 Internal Server Error
HEALTHCHECK CMD curl http://localhost:3000/health

# CORRECT — exits non-zero on 4xx/5xx HTTP responses
HEALTHCHECK CMD curl -f http://localhost:3000/health

Test the health check command manually:

# Run the exact command inside the container
docker exec myapp curl -f http://localhost:3000/health
echo "Exit code: $?"  # Should be 0 for healthy

# Run as root to rule out permission issues
docker exec -u root myapp curl -f http://localhost:3000/health

# Check if curl is available
docker exec myapp which curl || echo "curl not found"

Fix 4: Configure Health Checks in docker-compose

docker-compose.yml supports overriding or adding health checks:

# docker-compose.yml
services:
  api:
    image: myapp:latest
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
      interval: 15s
      timeout: 5s
      retries: 5
      start_period: 30s

  # Service that waits for api to be healthy
  worker:
    image: myworker:latest
    depends_on:
      api:
        condition: service_healthy  # Wait for api to pass health check
    # Without this, worker starts immediately — api may not be ready

  # Database with built-in health check
  postgres:
    image: postgres:16
    environment:
      POSTGRES_PASSWORD: secret
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 10s

  redis:
    image: redis:7-alpine
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      retries: 3

Disable health check for a service (override Dockerfile’s HEALTHCHECK):

services:
  myservice:
    image: myapp:latest
    healthcheck:
      disable: true   # Ignore the HEALTHCHECK from the Dockerfile

Fix 5: Implement a Proper Health Endpoint

The health check’s value depends on what /health actually checks. A proper health endpoint verifies that the application can serve requests:

// Express.js — comprehensive health endpoint
app.get('/health', async (req, res) => {
  const health: Record<string, unknown> = {
    status: 'ok',
    uptime: process.uptime(),
    timestamp: new Date().toISOString(),
  };

  // Check database connection
  try {
    await db.query('SELECT 1');
    health.database = 'ok';
  } catch (err) {
    health.database = 'error';
    health.status = 'degraded';
  }

  // Check Redis connection
  try {
    await redis.ping();
    health.cache = 'ok';
  } catch (err) {
    health.cache = 'error';
    health.status = 'degraded';
  }

  const statusCode = health.status === 'ok' ? 200 : 503;
  res.status(statusCode).json(health);
});

Separate liveness vs readiness (Kubernetes pattern, useful in Docker too):

// Liveness — "is the process alive?" (simple, rarely fails)
app.get('/health/live', (req, res) => {
  res.status(200).json({ status: 'alive' });
});

// Readiness — "can it serve traffic?" (checks dependencies)
app.get('/health/ready', async (req, res) => {
  try {
    await Promise.all([
      db.query('SELECT 1'),
      redis.ping(),
    ]);
    res.status(200).json({ status: 'ready' });
  } catch (err) {
    res.status(503).json({ status: 'not ready', error: err.message });
  }
});
# Use the liveness check for Docker HEALTHCHECK (avoid killing due to DB outage)
HEALTHCHECK CMD curl -f http://localhost:3000/health/live || exit 1

Fix 6: Debug an Unhealthy Container

When a container is unhealthy, inspect the recent health check history:

# View last 5 health check results
docker inspect myapp --format='{{json .State.Health.Log}}' | \
  python3 -m json.tool | head -50

# Each log entry contains:
# Start: when the check began
# End: when it finished
# ExitCode: 0=healthy, non-zero=unhealthy
# Output: stdout/stderr from the check command

# Watch health status in real time
watch -n 2 'docker inspect myapp --format="Status: {{.State.Health.Status}}"'

# Get full container inspect output
docker inspect myapp | jq '.State.Health'

Common output messages and their meanings:

"curl: (7) Failed to connect to localhost port 3000: Connection refused"
→ App not listening on port 3000 yet, or crashed
→ Fix: Increase start-period or check app startup

"curl: (22) The requested URL returned error: 500"
→ App is running but returning 500 error
→ Fix: Debug the health endpoint — check app logs

"curl: (6) Could not resolve host: localhost"
→ Unusual — network configuration issue
→ Fix: Use 127.0.0.1 instead of localhost

"OCI runtime exec failed: exec: 'curl': executable file not found"
→ curl not in the image
→ Fix: Install curl or use alternative

Fix 7: Use Health Checks in Production Orchestration

In production with Docker Swarm or Kubernetes, health checks drive automatic recovery:

Docker Swarm — restart unhealthy replicas:

# docker-compose.yml (Swarm mode)
services:
  api:
    image: myapp:latest
    deploy:
      replicas: 3
      update_config:
        parallelism: 1
        delay: 10s
        failure_action: rollback
      restart_policy:
        condition: on-failure
        delay: 5s
        max_attempts: 3
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
      interval: 10s
      timeout: 5s
      retries: 3
      start_period: 30s

Health check during rolling updates — Swarm waits for the new container to pass its health check before removing the old one, enabling zero-downtime deployments.

Still Not Working?

Different user inside the container — the health check runs as the container’s user (often non-root). If the app listens on a port below 1024 (privileged), a non-root user may not be able to connect. Use ports above 1024 or run as root.

IPv6 vs IPv4 binding — if the app binds to ::1 (IPv6 localhost) but curl tries 127.0.0.1 (IPv4), the connection fails. Try using [::1] in the curl URL or bind the app to 0.0.0.0.

Health check timing with depends_on — in Docker Compose, depends_on: service_healthy only works if the dependency defines a healthcheck. If it doesn’t have one, Docker Compose ignores the service_healthy condition.

Misleading exit codes — some commands return non-zero for reasons unrelated to the actual health. Test the command manually inside the container with docker exec to verify the exit code and output before trusting it in a HEALTHCHECK.

For related Docker issues, see Fix: Docker Build Cache Invalidated and Fix: Docker Multi-Stage Build Failed.

F

FixDevs

Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.

Was this article helpful?

Related Articles