Fix: Docker Container Keeps Restarting
Part of: Docker, DevOps & Infrastructure
Quick Answer
How to fix a Docker container that keeps restarting — reading exit codes, debugging CrashLoopBackOff, fixing entrypoint errors, missing env vars, out-of-memory kills, and restart policy misconfiguration.
The Error
A Docker container exits immediately after starting and keeps restarting in a loop:
docker ps
# CONTAINER ID IMAGE COMMAND STATUS PORTS
# a1b2c3d4e5f6 myapp "node ..." Restarting (1) 5 seconds agoOr in Docker Compose:
myapp | Error: Cannot find module '/app/dist/server.js'
myapp exited with code 1
myapp | Error: Cannot find module '/app/dist/server.js'
myapp exited with code 1The container starts, crashes, Docker restarts it due to the restart policy, it crashes again — indefinitely.
Why This Happens
Docker’s restart policy (--restart always or restart: always in Compose) automatically restarts containers that exit. When the app crashes on startup, this creates a restart loop. The container starts, the process inside exits with a non-zero code, the Docker daemon waits a backoff interval (which grows over time), then starts it again. With restart: always or restart: unless-stopped, this continues forever unless you stop the container manually or fix the underlying crash.
The reason a one-time crash turns into an infinite loop is that the restart backoff is bounded — Docker waits at most 60 seconds between attempts by default and never gives up on its own. This is the right behavior for transient failures (network blips, dependency restarts) but the wrong behavior for permanent ones (missing files, bad config). Knowing which category you’re in requires reading the exit code and the application logs, not just the restart count.
Concrete root causes ordered by frequency:
- Application error on startup — a missing file, bad environment variable, or uncaught exception crashes the app before it becomes healthy.
- Wrong entrypoint or command — the CMD or ENTRYPOINT points to a file that doesn’t exist in the image, or the command syntax is wrong.
- Missing required environment variables — the app reads an env var at startup and throws if it’s undefined.
- Port already in use — the app tries to bind a port that’s occupied, fails, and exits.
- Out-of-memory kill — the container hits its memory limit and the kernel kills the process (exit code 137).
- Signal handling — the app doesn’t handle SIGTERM properly and exits with a non-zero code, triggering a restart even during intentional shutdown.
- Dependency not ready — the app tries to connect to a database or external service at startup before it’s available.
- Healthcheck failing — Compose v2 considers a container unhealthy after
start_period + retries * intervaland may restart it if the policy ison-failure.
Version History That Changes the Failure Mode
Docker’s restart and healthcheck behavior has evolved across Engine versions, and the licensing change to Docker Desktop in August 2021 reshuffled how teams run Docker on macOS and Windows. If your container restarts on one machine but not another, version drift is a real suspect.
- Docker 20.10 (Dec 8, 2020) consolidated several long-standing fixes into a single release line. This version added rootless mode as stable, improved the
--restartsemantics for containers that exit during startup, and stabilized BuildKit as opt-in. Thehealth.start_periodhealthcheck field was already present but became more reliable here. - Docker Desktop license change (Aug 31, 2021) required paid subscriptions for commercial use by companies over a threshold. This pushed many teams to Colima, Rancher Desktop, or Podman. Each of these uses slightly different VM defaults, which surfaces as “the same image restarts on Colima but runs fine on Docker Desktop”. The cause is almost always memory limits or filesystem mount semantics, not the image itself.
- Compose v2 (released 2021, default in Desktop 2022) rewrote Compose in Go and changed the healthcheck inheritance rules. In v1, healthcheck fields could only be set per service. In v2,
depends_on: condition: service_healthyactually waits — in v1 it was ignored. If your container “keeps restarting” on Compose v2 but worked on v1, the difference is that v2 enforces the healthcheck contract. - Docker Engine 23 (Feb 2023) made BuildKit the default builder. This changed how multi-stage builds resolve
COPY --from=builder— a missing file in the builder stage now fails the build instead of producing an empty file. A previously “working” image that suddenly restarts becausedist/server.jsis empty often traces to this. - Docker Engine 24 (Jun 2023) removed legacy
docker buildbehavior almost entirely. Compose v1 reached end-of-life. Therestart_policyblock in Compose now applies in standalone mode (not just Swarm), matching the documentation that was misleading for years. - Docker Engine 25 (Jan 2024) added improved exit-code semantics in
docker inspectand surfaced theHealth.FailingStreakfield more prominently. - Docker Engine 27 (Jun 2024) improved cgroups v2 handling, which affects OOM kill detection on Ubuntu 22.04+ and other systemd-based hosts.
If you are on Compose v2.x and inherit a project written for v1, expect surprises around depends_on, healthchecks, and the way restart interacts with stop_grace_period. A container that “keeps restarting” on v2 may simply be a container that v1 was silently letting fail.
Step 1: Read the Exit Code
The exit code tells you the category of failure:
docker inspect <container_name_or_id> --format='{{.State.ExitCode}}'Common exit codes:
| Exit Code | Meaning |
|---|---|
1 | Application error (check your app’s logs) |
2 | Misuse of shell command |
125 | Docker run itself failed |
126 | Command found but not executable |
127 | Command not found (wrong path in CMD/ENTRYPOINT) |
137 | Killed by signal 9 (OOM kill or docker kill) |
139 | Segmentation fault |
143 | Killed by signal 15 (SIGTERM — graceful shutdown) |
Exit code 127 → fix your CMD path. Exit code 137 → increase memory limit. Exit code 1 → read the application logs.
Step 2: Read the Logs
# Show logs from the last run (even after restart)
docker logs <container_name>
# Follow logs in real time
docker logs -f <container_name>
# Show last 50 lines
docker logs --tail 50 <container_name>
# For Docker Compose
docker compose logs myapp
docker compose logs --tail 50 myappThe log output almost always contains the specific error. Read it before changing anything else.
Fix 1: Fix Application Startup Errors
The most common cause is the application crashing during initialization. The log will show the specific error:
Node.js — module not found:
Error: Cannot find module '/app/dist/server.js'The build step wasn’t run before building the image, so dist/ doesn’t exist:
# Ensure build runs inside the Dockerfile
FROM node:20-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build # ← Must be here
CMD ["node", "dist/server.js"]Python — import error:
ModuleNotFoundError: No module named 'fastapi'Dependencies weren’t installed in the image:
FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt # ← Required
COPY . .
CMD ["python", "main.py"]Fix 2: Fix Wrong CMD or ENTRYPOINT
Exit code 127 means Docker ran the command but the shell couldn’t find the binary:
# Check what command the container tried to run
docker inspect <container> --format='{{.Config.Cmd}}'
docker inspect <container> --format='{{.Config.Entrypoint}}'Run the container interactively to debug:
# Override the entrypoint to get a shell
docker run -it --entrypoint /bin/sh myapp
# Inside the container, check if the file exists
ls -la /app/dist/server.js
which nodeFix the Dockerfile CMD:
# Wrong — file doesn't exist at this path
CMD ["node", "server.js"]
# Correct — use the actual path
CMD ["node", "/app/dist/server.js"]
# Or use the working directory
WORKDIR /app
CMD ["node", "dist/server.js"]Common Mistake: Using a shell script as entrypoint but forgetting
chmod +x:COPY entrypoint.sh /entrypoint.sh RUN chmod +x /entrypoint.sh # ← Required, otherwise exit code 126 ENTRYPOINT ["/entrypoint.sh"]
Fix 3: Provide Missing Environment Variables
If the app reads a required env var at startup and it’s missing, it crashes with exit code 1. The log will show something like:
Error: DATABASE_URL environment variable is required
TypeError: Cannot read properties of undefined (reading 'split')Pass environment variables to the container:
# Single variable
docker run -e DATABASE_URL=postgres://... myapp
# From a .env file
docker run --env-file .env myappIn Docker Compose:
services:
myapp:
image: myapp
environment:
DATABASE_URL: postgres://user:pass@db:5432/myapp
REDIS_URL: redis://redis:6379
NODE_ENV: production
# Or load from a file
env_file:
- .env.productionMake the app fail clearly when required variables are missing:
// Node.js — validate env at startup
const requiredEnv = ['DATABASE_URL', 'JWT_SECRET', 'PORT'];
for (const key of requiredEnv) {
if (!process.env[key]) {
console.error(`Missing required environment variable: ${key}`);
process.exit(1);
}
}Fix 4: Fix Out-of-Memory Kills (Exit Code 137)
Exit code 137 means the kernel killed the process because it exceeded the container’s memory limit:
# Check if it was OOM killed
docker inspect <container> --format='{{.State.OOMKilled}}'
# true = OOM killedIncrease the memory limit:
docker run -m 512m myapp # 512 MB limit
docker run -m 1g myapp # 1 GB limitIn Docker Compose:
services:
myapp:
image: myapp
deploy:
resources:
limits:
memory: 512M
reservations:
memory: 256MFind what’s consuming memory — run without a limit and check usage:
docker stats <container>If memory grows without bound, the app has a memory leak. Fix the leak rather than just raising the limit.
Fix 5: Wait for Dependencies
If the app connects to a database or other service at startup, and that service isn’t ready yet, the connection fails and the app exits:
Error: connect ECONNREFUSED 127.0.0.1:5432Use depends_on with health checks in Docker Compose:
services:
db:
image: postgres:16
environment:
POSTGRES_PASSWORD: secret
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 5s
timeout: 5s
retries: 5
myapp:
image: myapp
depends_on:
db:
condition: service_healthy # Wait until db passes health check
environment:
DATABASE_URL: postgres://postgres:secret@db:5432/myappOr use a wait script inside the container:
# wait-for-it.sh (common utility)
./wait-for-it.sh db:5432 --timeout=30 -- node dist/server.jsImplement retry logic in the app itself — this is the most resilient approach:
async function connectWithRetry(maxAttempts = 5) {
for (let attempt = 1; attempt <= maxAttempts; attempt++) {
try {
await db.connect();
console.log('Database connected');
return;
} catch (err) {
if (attempt === maxAttempts) throw err;
console.log(`Connection attempt ${attempt} failed, retrying in 5s...`);
await new Promise(r => setTimeout(r, 5000));
}
}
}Fix 6: Adjust the Restart Policy
If the app exits intentionally (e.g., a one-time migration task), restart: always keeps restarting it unnecessarily. Use the right policy:
services:
# Long-running server — restart on failure, not on clean exit
myapp:
image: myapp
restart: unless-stopped # or "on-failure"
# One-time task — never restart
migrate:
image: myapp
command: ["node", "migrate.js"]
restart: "no" # Default — don't restart after exitRestart policies:
| Policy | Behavior |
|---|---|
no | Never restart (default) |
on-failure | Restart only on non-zero exit code |
unless-stopped | Restart unless manually stopped |
always | Always restart, even on clean exit |
restart: always combined with a crashing app creates an infinite loop. Switch to on-failure with a max count:
docker run --restart on-failure:5 myapp # Max 5 restartsStill Not Working?
Check the healthcheck if Compose v2 is restarting the container. Even if the process is running, an unhealthy container with restart: on-failure and a healthcheck can be killed and restarted. Inspect the healthcheck history:
docker inspect <container> --format='{{json .State.Health}}' | jq
# Shows the last 5 healthcheck attempts and exit codesIf the healthcheck command itself is wrong (missing curl, wrong port, wrong path), the container restarts even when the app works.
Inspect cgroup v2 OOM events on systemd hosts. On Ubuntu 22.04+, RHEL 9, and Debian 12, OOM kills are surfaced through cgroup v2 and may not appear in dmesg. Use journalctl with the unit filter:
journalctl -u docker.service --since "10 minutes ago" | grep -i oom
journalctl _SYSTEMD_UNIT=docker.service | grep -i oomCheck for PID 1 zombie handling. A Node.js or Python process running as PID 1 inside the container may not reap child processes correctly. If you fork or spawn from your entrypoint and processes accumulate, the kernel can kill the container with exit code 137. Use tini or dumb-init as PID 1:
RUN apt-get install -y tini
ENTRYPOINT ["/usr/bin/tini", "--", "node", "dist/server.js"]Run the container without the restart policy to inspect the exit more carefully:
docker run --rm myapp
# Container exits once, you see the full outputRun with a shell to explore the container filesystem:
docker run -it --rm --entrypoint /bin/sh myapp
# Inside: ls, env, cat /app/dist/server.js, etc.Check system-level OOM kills (outside Docker):
dmesg | grep -i "out of memory"
dmesg | grep -i "oom"
journalctl -k | grep -i "killed process"Check if a port conflict is causing the crash:
# See what's using port 3000 on the host
ss -tlnp | grep :3000
lsof -i :3000For related issues, see Fix: Docker Container Already in Use, Fix: Kubernetes CrashLoopBackOff, Fix: Docker Exited 137 OOMKilled, and Fix: Docker Compose Healthcheck Not Working.
Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.
Was this article helpful?
Related Articles
Fix: Docker Compose Services Can't Connect to Each Other
How to fix Docker Compose networking issues — services can't reach each other by hostname, port mapping confusion, network aliases, depends_on timing, and host vs container port differences.
Fix: Docker Compose Environment Variables Not Loading from .env File
How to fix Docker Compose not loading environment variables from .env files — why variables are empty or undefined inside containers, the difference between env_file and variable substitution, and how to debug env var issues.
Fix: Docker Compose Watch Not Working — sync vs rebuild, Ignore Patterns, WSL/macOS File Events
How to fix docker compose watch errors — develop.watch directive not firing, sync vs sync+restart vs rebuild differences, ignore globs not matching, WSL2 file events delayed, named volumes shadowing watch, and Compose version requirements.
Fix: Coolify Not Working — Deployment Failing, SSL Not Working, or Containers Not Starting
How to fix Coolify self-hosted PaaS issues — server setup, application deployment, Docker and Nixpacks builds, environment variables, SSL certificates, database provisioning, and GitHub integration.