Skip to content

Fix: Docker Compose Healthcheck Not Working — depends_on Not Waiting or Always Unhealthy

FixDevs · (Updated: )

Part of:  Docker, DevOps & Infrastructure

Quick Answer

How to fix Docker Compose healthcheck issues — depends_on condition service_healthy, healthcheck command syntax, start_period, custom health scripts, and debugging unhealthy containers.

The Problem

A service starts before its dependency is ready, despite depends_on being configured:

services:
  app:
    depends_on:
      - db  # App starts before DB is accepting connections
  db:
    image: postgres:16

Or depends_on with condition: service_healthy causes the dependent service to never start:

services:
  app:
    depends_on:
      db:
        condition: service_healthy  # App waits forever — DB stays 'starting'
  db:
    image: postgres:16
    # No healthcheck defined — condition never satisfied

Or a healthcheck is defined but the container shows (unhealthy) despite the service being fine:

docker ps
# CONTAINER  STATUS
# my-db      Up 2 minutes (unhealthy)

Why This Happens

Docker’s depends_on by default only waits for the container to start, not for the service inside it to be ready. This is a deliberate design choice — Docker has no way to know what “ready” means for an arbitrary service, so it defers to you to express it via healthcheck. The short-form depends_on: [db] is equivalent to depends_on: db: condition: service_started, which only verifies the container is in the running state. The database process inside that container may still be initializing.

The healthcheck mechanism itself runs inside the container. Docker exec’es the test: command on the interval you specify, captures stdout/stderr/exit code, and transitions the container between starting, healthy, and unhealthy based on consecutive results. Three pitfalls follow from this design. First, the command must exist inside the container — curl is missing from many minimal images, including the official Alpine-based Node and Python ones. Second, the command runs in the container’s own root filesystem, so it cannot reach paths or binaries on the host. Third, the command shares the container’s network namespace, so localhost:5432 from inside the Postgres container is the Postgres server, not the host’s Postgres.

The start_period parameter is the third recurring source of confusion. It is not a “wait this long before starting checks” timer — checks run from the first interval. Instead, it is a “do not count failures during this window toward the unhealthy threshold” grace period. A Postgres container with start_period: 30s and retries: 3 may show three failing checks in the first 30 seconds without transitioning to unhealthy. After 30 seconds, the failure counter resets and the normal retry logic kicks in.

  • depends_on without condition — the default condition: service_started means “wait for the container to start,” not “wait for the database to accept connections.” Your app may start while Postgres is still initializing.
  • No healthcheck definedcondition: service_healthy requires an explicit healthcheck block on the dependency. Without one, the container never transitions from starting to healthy.
  • Wrong healthcheck command — if the health command uses a binary not available in the container, it fails immediately with exit code 1 or 127.
  • start_period too short — Postgres, MySQL, and other databases take several seconds (sometimes 30+) to initialize on first boot. If the health check runs during this window, it fails and the container is marked unhealthy.

Fix 1: Add a Healthcheck to the Dependency

condition: service_healthy only works when the dependency has a healthcheck:

services:
  app:
    image: myapp:latest
    depends_on:
      db:
        condition: service_healthy  # Wait until db is healthy
      redis:
        condition: service_healthy
    environment:
      DATABASE_URL: postgresql://user:pass@db:5432/mydb

  db:
    image: postgres:16
    environment:
      POSTGRES_USER: user
      POSTGRES_PASSWORD: pass
      POSTGRES_DB: mydb
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U user -d mydb"]
      interval: 5s       # Check every 5 seconds
      timeout: 5s        # Fail if no response within 5 seconds
      retries: 5         # Mark unhealthy after 5 consecutive failures
      start_period: 10s  # Don't count failures during first 10s (init time)

  redis:
    image: redis:7-alpine
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 5s
      timeout: 3s
      retries: 3
      start_period: 5s

Fix 2: Healthcheck Commands for Common Services

Correct health commands for popular services:

# PostgreSQL
healthcheck:
  test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER:-postgres} -d ${POSTGRES_DB:-postgres}"]
  interval: 5s
  timeout: 5s
  retries: 5
  start_period: 15s

# MySQL / MariaDB
healthcheck:
  test: ["CMD", "mysqladmin", "ping", "-h", "localhost", "-u", "root", "-p${MYSQL_ROOT_PASSWORD}"]
  interval: 5s
  timeout: 5s
  retries: 5
  start_period: 30s  # MySQL takes longer to initialize

# MongoDB
healthcheck:
  test: ["CMD", "mongosh", "--eval", "db.adminCommand('ping')"]
  interval: 5s
  timeout: 5s
  retries: 5
  start_period: 20s

# Redis
healthcheck:
  test: ["CMD", "redis-cli", "ping"]
  interval: 5s
  timeout: 3s
  retries: 3

# RabbitMQ
healthcheck:
  test: ["CMD", "rabbitmq-diagnostics", "ping"]
  interval: 10s
  timeout: 10s
  retries: 5
  start_period: 30s

# Elasticsearch
healthcheck:
  test: ["CMD-SHELL", "curl -fs http://localhost:9200/_cluster/health | grep -vq '\"status\":\"red\"'"]
  interval: 10s
  timeout: 10s
  retries: 5
  start_period: 60s

# Custom HTTP service
healthcheck:
  test: ["CMD-SHELL", "curl -fs http://localhost:8080/health || exit 1"]
  interval: 10s
  timeout: 5s
  retries: 3
  start_period: 20s

CMD vs CMD-SHELL syntax:

# CMD — exec form, no shell, each word is a separate array element
test: ["CMD", "pg_isready", "-U", "postgres"]

# CMD-SHELL — runs via /bin/sh -c, supports shell features
test: ["CMD-SHELL", "pg_isready -U postgres && echo healthy"]

# String form — equivalent to CMD-SHELL
test: "pg_isready -U postgres"

Note: Prefer CMD over CMD-SHELL when possible — it avoids shell injection and is more reliable. Use CMD-SHELL only when you need shell features like &&, pipes, or variable expansion.

Fix 3: Tune start_period for Slow Services

start_period prevents failures during initialization from counting toward retries:

healthcheck:
  test: ["CMD-SHELL", "pg_isready -U postgres"]
  interval: 5s       # How often to run the check
  timeout: 5s        # How long to wait for each check
  retries: 5         # Failures after start_period before marking unhealthy
  start_period: 30s  # Grace period — failures here don't count

When to increase start_period:

# Postgres with large schemas / initial data — 30s+
# MySQL with InnoDB recovery — 60s+
# Elasticsearch with large indices — 60-120s
# Kafka/Zookeeper cluster — 60s+
# Services with slow JVM startup (Spring Boot) — 30-60s

# For development, longer start_period reduces false unhealthy states
start_period: 60s

# For production CI, shorter start_period catches real problems faster
start_period: 10s

Fix 4: Debug Unhealthy Containers

When a container shows (unhealthy), inspect the health check output:

# See health status and last check output
docker inspect --format='{{json .State.Health}}' my-db | python -m json.tool

# Example output:
# {
#   "Status": "unhealthy",
#   "FailingStreak": 3,
#   "Log": [
#     {
#       "Start": "2026-03-26T10:00:00Z",
#       "End": "2026-03-26T10:00:05Z",
#       "ExitCode": 1,
#       "Output": "pg_isready: error: could not connect to server: FATAL: password authentication failed for user \"postgres\""
#     }
#   ]
# }

# Or use docker events to watch health transitions
docker events --filter "type=container" --filter "event=health_status"

Run the health check command manually inside the container:

# Connect to the container and run the health check manually
docker exec my-db pg_isready -U postgres -d mydb

# Run the exact CMD from the healthcheck
docker exec my-db sh -c "pg_isready -U postgres -d mydb"

# Check if the command exists in the container
docker exec my-db which pg_isready
docker exec my-db which redis-cli

Common unhealthy causes by exit code:

Exit 0 — healthy
Exit 1 — health check failed (service not ready)
Exit 127 — command not found (binary doesn't exist in container)
Exit 124 — timeout exceeded

Fix 5: Healthcheck in Custom Application Images

Add healthcheck to your own Dockerfiles:

# Dockerfile — add HEALTHCHECK instruction
FROM node:20-alpine

WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .

EXPOSE 3000

# Install curl for the health check (alpine doesn't have it by default)
RUN apk add --no-cache curl

HEALTHCHECK --interval=10s --timeout=5s --start-period=15s --retries=3 \
  CMD curl -fs http://localhost:3000/health || exit 1

CMD ["node", "server.js"]
# Python / FastAPI
FROM python:3.11-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .

EXPOSE 8000

HEALTHCHECK --interval=10s --timeout=5s --start-period=20s --retries=3 \
  CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')" || exit 1

CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

Implement a /health endpoint in your app:

// Express health endpoint
app.get('/health', (req, res) => {
  // Check critical dependencies
  const healthy = db.isConnected() && redis.isReady();
  if (healthy) {
    res.json({ status: 'ok' });
  } else {
    res.status(503).json({ status: 'unhealthy', reason: 'dependency unavailable' });
  }
});

Fix 6: Full Example with All Conditions

A production-ready docker-compose.yml with proper health checks:

version: '3.8'

services:
  app:
    build: .
    ports:
      - "3000:3000"
    environment:
      DATABASE_URL: postgresql://user:pass@db:5432/appdb
      REDIS_URL: redis://redis:6379
    depends_on:
      db:
        condition: service_healthy
      redis:
        condition: service_healthy
    restart: unless-stopped
    healthcheck:
      test: ["CMD-SHELL", "curl -fs http://localhost:3000/health || exit 1"]
      interval: 10s
      timeout: 5s
      retries: 3
      start_period: 20s

  db:
    image: postgres:16-alpine
    environment:
      POSTGRES_USER: user
      POSTGRES_PASSWORD: pass
      POSTGRES_DB: appdb
    volumes:
      - postgres_data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U user -d appdb"]
      interval: 5s
      timeout: 5s
      retries: 5
      start_period: 15s
    restart: unless-stopped

  redis:
    image: redis:7-alpine
    volumes:
      - redis_data:/data
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 5s
      timeout: 3s
      retries: 3
      start_period: 5s
    restart: unless-stopped

volumes:
  postgres_data:
  redis_data:

Version History: Healthchecks Across Docker and Compose Releases

Healthchecks have existed in Docker Engine since 1.12 (July 2016), when the HEALTHCHECK Dockerfile instruction and the docker inspect health fields shipped together. The original implementation supported --interval, --timeout, and --retries. --start-period was added in 1.13 (January 2017) to handle slow-starting services. The 1.x lineage was renamed to Docker CE, but the healthcheck semantics carried forward unchanged.

Docker Compose v1 (the Python-based docker-compose) added healthcheck support in version 1.10 (February 2017). The depends_on long form with condition: arrived in Compose file format 2.1 (mid-2016) for the v1 CLI. But there is an important historical caveat: Compose v1 dropped condition: support in the v3 file format (used widely for Swarm) and only restored it in later v3 minor versions. Many tutorials from 2017-2019 say “you cannot use service_healthy with version 3” — that was true for early 3.x but not for 3.4 onward, and is no longer relevant in 2026 because the version key is itself deprecated.

Compose v2 (the Go-based docker compose subcommand) released initial GA in June 2021 and is the default in Docker Desktop and current Linux installs. v2 brought depends_on: condition: service_healthy back as a first-class feature regardless of the (now obsolete) version: key. Compose v2.0 also fixed the long-standing bug where docker-compose up would return success even if a service marked service_healthy was still in the starting state.

Compose v2.4 (March 2022) added condition: service_completed_successfully. This is the right tool for one-shot initialization containers — a migration runner that exits 0 before the app starts. Before 2.4, the pattern required a sleep or a side-channel signal file.

Compose v2.13 (November 2022) introduced the required: false field on dependencies. Setting it lets you declare optional dependencies that the orchestrator skips silently if the named service is not in the current compose project. This matters in multi-file compose setups with -f base.yml -f optional-tracing.yml.

Compose v2.20 (August 2023) added the restart: true flag inside depends_on. When the dependency restarts, the dependent service restarts as well. This is the closest Compose comes to Kubernetes’ pod-level restart policies and is the right answer for stateful application coupling.

Compose v2.22 (October 2023) integrated Compose Watch with healthchecks. When watch: triggers a rebuild and restart, the dependent services correctly wait for the new container’s healthcheck to pass before being marked ready. Before 2.22, a watch-driven restart could leave the dependent service connected to a dying container.

Docker Engine 24 (June 2023) added the --start-interval healthcheck parameter. Inside the start_period, checks normally run on the same interval as steady-state. With start_interval, you can run more frequent checks during startup (say, every 1 second) and back off to the normal interval once the container is healthy. This shortens the time from “container started” to “healthy” by avoiding the worst-case wait of one full interval. Compose 2.20+ exposes this as start_interval: in the healthcheck block.

# Modern healthcheck with start_interval (Docker 24+, Compose 2.20+)
healthcheck:
  test: ["CMD", "pg_isready", "-U", "postgres"]
  interval: 30s
  start_interval: 1s   # Check every 1s during startup
  start_period: 60s
  retries: 3
  timeout: 5s

The progression is consistent: every major release tightened the contract between healthcheck state and dependent service startup. If you are still hand-rolling wait-for-it.sh or dockerize -wait in your entrypoints, the modern equivalent is condition: service_healthy plus a proper healthcheck, and it has been supported for years. The legacy patterns are no longer necessary in 2026 unless you target a Docker Engine older than 23.

Still Not Working?

depends_on is ignored by docker compose up --no-deps — the --no-deps flag skips dependency resolution. Remove it if you need health check waiting.

Service marked healthy but app still failsdepends_on: condition: service_healthy only ensures the container’s health check passes. Your app may still need a retry loop for the actual connection, since TCP acceptance and application readiness aren’t always the same. Add retry logic in your app’s startup code.

Healthcheck passes but container restarts — the process exiting (crash) is separate from the health check. A container can be healthy but still restart if the main process crashes. Check docker logs for the crash reason.

Health check works locally but fails in CI — CI environments often have less CPU/memory, causing services to start slower. Increase start_period and retries in CI, or use a separate docker-compose.ci.yml with longer timeouts.

Healthcheck inherited from base image keeps overriding yours — the official Postgres, MySQL, and Redis images do not ship a HEALTHCHECK by default, but some third-party base images do. If you write a healthcheck: block in docker-compose.yml and the container still uses an unexpected check, run docker inspect <container> | grep -A 10 Healthcheck to see which one Docker resolved. To clear an inherited HEALTHCHECK, set test: ["NONE"] in your service config — this turns off any inherited check before applying your own block.

Variable interpolation in test: fails silently${VAR} expansion happens at compose parse time, not at health check execution time. If the variable is unset, the test becomes ["CMD-SHELL", "pg_isready -U -d "], which fails with a syntax error. Pin defaults with ${VAR:-fallback} syntax inside the test command, or move sensitive parts of the check into a script file copied into the image.

Healthcheck on Windows containers needs different commandspg_isready, redis-cli, and shell pipelines that work on Linux base images do not exist on Windows Server Core or Nano Server bases. For Windows, use PowerShell-native checks (powershell -Command "Test-NetConnection localhost -Port 5432") and accept longer interval values because PowerShell startup overhead is non-trivial.

For related Docker issues, see Fix: Docker Container Keeps Restarting, Fix: Docker Compose depends_on Not Working, Fix: Docker Container Unhealthy, and Fix: Docker Healthcheck Failing.

F

FixDevs

Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.

Was this article helpful?

Related Articles