Fix: Docker Compose Healthcheck Not Working — depends_on Not Waiting or Always Unhealthy
Part of: Docker, DevOps & Infrastructure
Quick Answer
How to fix Docker Compose healthcheck issues — depends_on condition service_healthy, healthcheck command syntax, start_period, custom health scripts, and debugging unhealthy containers.
The Problem
A service starts before its dependency is ready, despite depends_on being configured:
services:
app:
depends_on:
- db # App starts before DB is accepting connections
db:
image: postgres:16Or depends_on with condition: service_healthy causes the dependent service to never start:
services:
app:
depends_on:
db:
condition: service_healthy # App waits forever — DB stays 'starting'
db:
image: postgres:16
# No healthcheck defined — condition never satisfiedOr a healthcheck is defined but the container shows (unhealthy) despite the service being fine:
docker ps
# CONTAINER STATUS
# my-db Up 2 minutes (unhealthy)Why This Happens
Docker’s depends_on by default only waits for the container to start, not for the service inside it to be ready. This is a deliberate design choice — Docker has no way to know what “ready” means for an arbitrary service, so it defers to you to express it via healthcheck. The short-form depends_on: [db] is equivalent to depends_on: db: condition: service_started, which only verifies the container is in the running state. The database process inside that container may still be initializing.
The healthcheck mechanism itself runs inside the container. Docker exec’es the test: command on the interval you specify, captures stdout/stderr/exit code, and transitions the container between starting, healthy, and unhealthy based on consecutive results. Three pitfalls follow from this design. First, the command must exist inside the container — curl is missing from many minimal images, including the official Alpine-based Node and Python ones. Second, the command runs in the container’s own root filesystem, so it cannot reach paths or binaries on the host. Third, the command shares the container’s network namespace, so localhost:5432 from inside the Postgres container is the Postgres server, not the host’s Postgres.
The start_period parameter is the third recurring source of confusion. It is not a “wait this long before starting checks” timer — checks run from the first interval. Instead, it is a “do not count failures during this window toward the unhealthy threshold” grace period. A Postgres container with start_period: 30s and retries: 3 may show three failing checks in the first 30 seconds without transitioning to unhealthy. After 30 seconds, the failure counter resets and the normal retry logic kicks in.
depends_onwithoutcondition— the defaultcondition: service_startedmeans “wait for the container to start,” not “wait for the database to accept connections.” Your app may start while Postgres is still initializing.- No
healthcheckdefined —condition: service_healthyrequires an explicithealthcheckblock on the dependency. Without one, the container never transitions fromstartingtohealthy. - Wrong healthcheck command — if the health command uses a binary not available in the container, it fails immediately with exit code 1 or 127.
start_periodtoo short — Postgres, MySQL, and other databases take several seconds (sometimes 30+) to initialize on first boot. If the health check runs during this window, it fails and the container is marked unhealthy.
Fix 1: Add a Healthcheck to the Dependency
condition: service_healthy only works when the dependency has a healthcheck:
services:
app:
image: myapp:latest
depends_on:
db:
condition: service_healthy # Wait until db is healthy
redis:
condition: service_healthy
environment:
DATABASE_URL: postgresql://user:pass@db:5432/mydb
db:
image: postgres:16
environment:
POSTGRES_USER: user
POSTGRES_PASSWORD: pass
POSTGRES_DB: mydb
healthcheck:
test: ["CMD-SHELL", "pg_isready -U user -d mydb"]
interval: 5s # Check every 5 seconds
timeout: 5s # Fail if no response within 5 seconds
retries: 5 # Mark unhealthy after 5 consecutive failures
start_period: 10s # Don't count failures during first 10s (init time)
redis:
image: redis:7-alpine
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 5s
timeout: 3s
retries: 3
start_period: 5sFix 2: Healthcheck Commands for Common Services
Correct health commands for popular services:
# PostgreSQL
healthcheck:
test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER:-postgres} -d ${POSTGRES_DB:-postgres}"]
interval: 5s
timeout: 5s
retries: 5
start_period: 15s
# MySQL / MariaDB
healthcheck:
test: ["CMD", "mysqladmin", "ping", "-h", "localhost", "-u", "root", "-p${MYSQL_ROOT_PASSWORD}"]
interval: 5s
timeout: 5s
retries: 5
start_period: 30s # MySQL takes longer to initialize
# MongoDB
healthcheck:
test: ["CMD", "mongosh", "--eval", "db.adminCommand('ping')"]
interval: 5s
timeout: 5s
retries: 5
start_period: 20s
# Redis
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 5s
timeout: 3s
retries: 3
# RabbitMQ
healthcheck:
test: ["CMD", "rabbitmq-diagnostics", "ping"]
interval: 10s
timeout: 10s
retries: 5
start_period: 30s
# Elasticsearch
healthcheck:
test: ["CMD-SHELL", "curl -fs http://localhost:9200/_cluster/health | grep -vq '\"status\":\"red\"'"]
interval: 10s
timeout: 10s
retries: 5
start_period: 60s
# Custom HTTP service
healthcheck:
test: ["CMD-SHELL", "curl -fs http://localhost:8080/health || exit 1"]
interval: 10s
timeout: 5s
retries: 3
start_period: 20sCMD vs CMD-SHELL syntax:
# CMD — exec form, no shell, each word is a separate array element
test: ["CMD", "pg_isready", "-U", "postgres"]
# CMD-SHELL — runs via /bin/sh -c, supports shell features
test: ["CMD-SHELL", "pg_isready -U postgres && echo healthy"]
# String form — equivalent to CMD-SHELL
test: "pg_isready -U postgres"Note: Prefer
CMDoverCMD-SHELLwhen possible — it avoids shell injection and is more reliable. UseCMD-SHELLonly when you need shell features like&&, pipes, or variable expansion.
Fix 3: Tune start_period for Slow Services
start_period prevents failures during initialization from counting toward retries:
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 5s # How often to run the check
timeout: 5s # How long to wait for each check
retries: 5 # Failures after start_period before marking unhealthy
start_period: 30s # Grace period — failures here don't countWhen to increase start_period:
# Postgres with large schemas / initial data — 30s+
# MySQL with InnoDB recovery — 60s+
# Elasticsearch with large indices — 60-120s
# Kafka/Zookeeper cluster — 60s+
# Services with slow JVM startup (Spring Boot) — 30-60s
# For development, longer start_period reduces false unhealthy states
start_period: 60s
# For production CI, shorter start_period catches real problems faster
start_period: 10sFix 4: Debug Unhealthy Containers
When a container shows (unhealthy), inspect the health check output:
# See health status and last check output
docker inspect --format='{{json .State.Health}}' my-db | python -m json.tool
# Example output:
# {
# "Status": "unhealthy",
# "FailingStreak": 3,
# "Log": [
# {
# "Start": "2026-03-26T10:00:00Z",
# "End": "2026-03-26T10:00:05Z",
# "ExitCode": 1,
# "Output": "pg_isready: error: could not connect to server: FATAL: password authentication failed for user \"postgres\""
# }
# ]
# }
# Or use docker events to watch health transitions
docker events --filter "type=container" --filter "event=health_status"Run the health check command manually inside the container:
# Connect to the container and run the health check manually
docker exec my-db pg_isready -U postgres -d mydb
# Run the exact CMD from the healthcheck
docker exec my-db sh -c "pg_isready -U postgres -d mydb"
# Check if the command exists in the container
docker exec my-db which pg_isready
docker exec my-db which redis-cliCommon unhealthy causes by exit code:
Exit 0 — healthy
Exit 1 — health check failed (service not ready)
Exit 127 — command not found (binary doesn't exist in container)
Exit 124 — timeout exceededFix 5: Healthcheck in Custom Application Images
Add healthcheck to your own Dockerfiles:
# Dockerfile — add HEALTHCHECK instruction
FROM node:20-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
EXPOSE 3000
# Install curl for the health check (alpine doesn't have it by default)
RUN apk add --no-cache curl
HEALTHCHECK --interval=10s --timeout=5s --start-period=15s --retries=3 \
CMD curl -fs http://localhost:3000/health || exit 1
CMD ["node", "server.js"]# Python / FastAPI
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 8000
HEALTHCHECK --interval=10s --timeout=5s --start-period=20s --retries=3 \
CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')" || exit 1
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]Implement a /health endpoint in your app:
// Express health endpoint
app.get('/health', (req, res) => {
// Check critical dependencies
const healthy = db.isConnected() && redis.isReady();
if (healthy) {
res.json({ status: 'ok' });
} else {
res.status(503).json({ status: 'unhealthy', reason: 'dependency unavailable' });
}
});Fix 6: Full Example with All Conditions
A production-ready docker-compose.yml with proper health checks:
version: '3.8'
services:
app:
build: .
ports:
- "3000:3000"
environment:
DATABASE_URL: postgresql://user:pass@db:5432/appdb
REDIS_URL: redis://redis:6379
depends_on:
db:
condition: service_healthy
redis:
condition: service_healthy
restart: unless-stopped
healthcheck:
test: ["CMD-SHELL", "curl -fs http://localhost:3000/health || exit 1"]
interval: 10s
timeout: 5s
retries: 3
start_period: 20s
db:
image: postgres:16-alpine
environment:
POSTGRES_USER: user
POSTGRES_PASSWORD: pass
POSTGRES_DB: appdb
volumes:
- postgres_data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U user -d appdb"]
interval: 5s
timeout: 5s
retries: 5
start_period: 15s
restart: unless-stopped
redis:
image: redis:7-alpine
volumes:
- redis_data:/data
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 5s
timeout: 3s
retries: 3
start_period: 5s
restart: unless-stopped
volumes:
postgres_data:
redis_data:Version History: Healthchecks Across Docker and Compose Releases
Healthchecks have existed in Docker Engine since 1.12 (July 2016), when the HEALTHCHECK Dockerfile instruction and the docker inspect health fields shipped together. The original implementation supported --interval, --timeout, and --retries. --start-period was added in 1.13 (January 2017) to handle slow-starting services. The 1.x lineage was renamed to Docker CE, but the healthcheck semantics carried forward unchanged.
Docker Compose v1 (the Python-based docker-compose) added healthcheck support in version 1.10 (February 2017). The depends_on long form with condition: arrived in Compose file format 2.1 (mid-2016) for the v1 CLI. But there is an important historical caveat: Compose v1 dropped condition: support in the v3 file format (used widely for Swarm) and only restored it in later v3 minor versions. Many tutorials from 2017-2019 say “you cannot use service_healthy with version 3” — that was true for early 3.x but not for 3.4 onward, and is no longer relevant in 2026 because the version key is itself deprecated.
Compose v2 (the Go-based docker compose subcommand) released initial GA in June 2021 and is the default in Docker Desktop and current Linux installs. v2 brought depends_on: condition: service_healthy back as a first-class feature regardless of the (now obsolete) version: key. Compose v2.0 also fixed the long-standing bug where docker-compose up would return success even if a service marked service_healthy was still in the starting state.
Compose v2.4 (March 2022) added condition: service_completed_successfully. This is the right tool for one-shot initialization containers — a migration runner that exits 0 before the app starts. Before 2.4, the pattern required a sleep or a side-channel signal file.
Compose v2.13 (November 2022) introduced the required: false field on dependencies. Setting it lets you declare optional dependencies that the orchestrator skips silently if the named service is not in the current compose project. This matters in multi-file compose setups with -f base.yml -f optional-tracing.yml.
Compose v2.20 (August 2023) added the restart: true flag inside depends_on. When the dependency restarts, the dependent service restarts as well. This is the closest Compose comes to Kubernetes’ pod-level restart policies and is the right answer for stateful application coupling.
Compose v2.22 (October 2023) integrated Compose Watch with healthchecks. When watch: triggers a rebuild and restart, the dependent services correctly wait for the new container’s healthcheck to pass before being marked ready. Before 2.22, a watch-driven restart could leave the dependent service connected to a dying container.
Docker Engine 24 (June 2023) added the --start-interval healthcheck parameter. Inside the start_period, checks normally run on the same interval as steady-state. With start_interval, you can run more frequent checks during startup (say, every 1 second) and back off to the normal interval once the container is healthy. This shortens the time from “container started” to “healthy” by avoiding the worst-case wait of one full interval. Compose 2.20+ exposes this as start_interval: in the healthcheck block.
# Modern healthcheck with start_interval (Docker 24+, Compose 2.20+)
healthcheck:
test: ["CMD", "pg_isready", "-U", "postgres"]
interval: 30s
start_interval: 1s # Check every 1s during startup
start_period: 60s
retries: 3
timeout: 5sThe progression is consistent: every major release tightened the contract between healthcheck state and dependent service startup. If you are still hand-rolling wait-for-it.sh or dockerize -wait in your entrypoints, the modern equivalent is condition: service_healthy plus a proper healthcheck, and it has been supported for years. The legacy patterns are no longer necessary in 2026 unless you target a Docker Engine older than 23.
Still Not Working?
depends_on is ignored by docker compose up --no-deps — the --no-deps flag skips dependency resolution. Remove it if you need health check waiting.
Service marked healthy but app still fails — depends_on: condition: service_healthy only ensures the container’s health check passes. Your app may still need a retry loop for the actual connection, since TCP acceptance and application readiness aren’t always the same. Add retry logic in your app’s startup code.
Healthcheck passes but container restarts — the process exiting (crash) is separate from the health check. A container can be healthy but still restart if the main process crashes. Check docker logs for the crash reason.
Health check works locally but fails in CI — CI environments often have less CPU/memory, causing services to start slower. Increase start_period and retries in CI, or use a separate docker-compose.ci.yml with longer timeouts.
Healthcheck inherited from base image keeps overriding yours — the official Postgres, MySQL, and Redis images do not ship a HEALTHCHECK by default, but some third-party base images do. If you write a healthcheck: block in docker-compose.yml and the container still uses an unexpected check, run docker inspect <container> | grep -A 10 Healthcheck to see which one Docker resolved. To clear an inherited HEALTHCHECK, set test: ["NONE"] in your service config — this turns off any inherited check before applying your own block.
Variable interpolation in test: fails silently — ${VAR} expansion happens at compose parse time, not at health check execution time. If the variable is unset, the test becomes ["CMD-SHELL", "pg_isready -U -d "], which fails with a syntax error. Pin defaults with ${VAR:-fallback} syntax inside the test command, or move sensitive parts of the check into a script file copied into the image.
Healthcheck on Windows containers needs different commands — pg_isready, redis-cli, and shell pipelines that work on Linux base images do not exist on Windows Server Core or Nano Server bases. For Windows, use PowerShell-native checks (powershell -Command "Test-NetConnection localhost -Port 5432") and accept longer interval values because PowerShell startup overhead is non-trivial.
For related Docker issues, see Fix: Docker Container Keeps Restarting, Fix: Docker Compose depends_on Not Working, Fix: Docker Container Unhealthy, and Fix: Docker Healthcheck Failing.
Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.
Was this article helpful?
Related Articles
Fix: Docker Secrets Not Working — BuildKit --secret Not Mounting, Compose Secrets Undefined, or Secret Leaking into Image
How to fix Docker secrets — BuildKit secret mounts in Dockerfile, docker-compose secrets config, runtime vs build-time secrets, environment variable alternatives, and verifying secrets don't leak into image layers.
Fix: docker-compose.override.yml Not Working — Override File Ignored or Not Merged
How to fix docker-compose.override.yml not being applied — file naming, merge behavior, explicit file flags, environment-specific configs, and common override pitfalls.
Fix: Docker Build ARG Not Available — ENV Variables Missing at Runtime
How to fix Docker ARG and ENV variable issues — build-time vs runtime scope, ARG before FROM, multi-stage build variable passing, secret handling, and .env file patterns.
Fix: Docker HEALTHCHECK Failing — Container Marked Unhealthy Despite Running
How to fix Docker HEALTHCHECK failures — command syntax, curl vs wget availability, start period, interval tuning, health check in docker-compose, and debugging unhealthy containers.