Skip to content

Fix: Docker Container Keeps Restarting

FixDevs · (Updated: )

Part of:  Docker, DevOps & Infrastructure

Quick Answer

How to fix a Docker container that keeps restarting — reading exit codes, debugging CrashLoopBackOff, fixing entrypoint errors, missing env vars, out-of-memory kills, and restart policy misconfiguration.

The Error

A Docker container exits immediately after starting and keeps restarting in a loop:

docker ps
# CONTAINER ID   IMAGE     COMMAND      STATUS                        PORTS
# a1b2c3d4e5f6   myapp     "node ..."   Restarting (1) 5 seconds ago

Or in Docker Compose:

myapp | Error: Cannot find module '/app/dist/server.js'
myapp exited with code 1
myapp | Error: Cannot find module '/app/dist/server.js'
myapp exited with code 1

The container starts, crashes, Docker restarts it due to the restart policy, it crashes again — indefinitely.

Why This Happens

Docker’s restart policy (--restart always or restart: always in Compose) automatically restarts containers that exit. When the app crashes on startup, this creates a restart loop. The container starts, the process inside exits with a non-zero code, the Docker daemon waits a backoff interval (which grows over time), then starts it again. With restart: always or restart: unless-stopped, this continues forever unless you stop the container manually or fix the underlying crash.

The reason a one-time crash turns into an infinite loop is that the restart backoff is bounded — Docker waits at most 60 seconds between attempts by default and never gives up on its own. This is the right behavior for transient failures (network blips, dependency restarts) but the wrong behavior for permanent ones (missing files, bad config). Knowing which category you’re in requires reading the exit code and the application logs, not just the restart count.

Concrete root causes ordered by frequency:

  • Application error on startup — a missing file, bad environment variable, or uncaught exception crashes the app before it becomes healthy.
  • Wrong entrypoint or command — the CMD or ENTRYPOINT points to a file that doesn’t exist in the image, or the command syntax is wrong.
  • Missing required environment variables — the app reads an env var at startup and throws if it’s undefined.
  • Port already in use — the app tries to bind a port that’s occupied, fails, and exits.
  • Out-of-memory kill — the container hits its memory limit and the kernel kills the process (exit code 137).
  • Signal handling — the app doesn’t handle SIGTERM properly and exits with a non-zero code, triggering a restart even during intentional shutdown.
  • Dependency not ready — the app tries to connect to a database or external service at startup before it’s available.
  • Healthcheck failing — Compose v2 considers a container unhealthy after start_period + retries * interval and may restart it if the policy is on-failure.

Version History That Changes the Failure Mode

Docker’s restart and healthcheck behavior has evolved across Engine versions, and the licensing change to Docker Desktop in August 2021 reshuffled how teams run Docker on macOS and Windows. If your container restarts on one machine but not another, version drift is a real suspect.

  • Docker 20.10 (Dec 8, 2020) consolidated several long-standing fixes into a single release line. This version added rootless mode as stable, improved the --restart semantics for containers that exit during startup, and stabilized BuildKit as opt-in. The health.start_period healthcheck field was already present but became more reliable here.
  • Docker Desktop license change (Aug 31, 2021) required paid subscriptions for commercial use by companies over a threshold. This pushed many teams to Colima, Rancher Desktop, or Podman. Each of these uses slightly different VM defaults, which surfaces as “the same image restarts on Colima but runs fine on Docker Desktop”. The cause is almost always memory limits or filesystem mount semantics, not the image itself.
  • Compose v2 (released 2021, default in Desktop 2022) rewrote Compose in Go and changed the healthcheck inheritance rules. In v1, healthcheck fields could only be set per service. In v2, depends_on: condition: service_healthy actually waits — in v1 it was ignored. If your container “keeps restarting” on Compose v2 but worked on v1, the difference is that v2 enforces the healthcheck contract.
  • Docker Engine 23 (Feb 2023) made BuildKit the default builder. This changed how multi-stage builds resolve COPY --from=builder — a missing file in the builder stage now fails the build instead of producing an empty file. A previously “working” image that suddenly restarts because dist/server.js is empty often traces to this.
  • Docker Engine 24 (Jun 2023) removed legacy docker build behavior almost entirely. Compose v1 reached end-of-life. The restart_policy block in Compose now applies in standalone mode (not just Swarm), matching the documentation that was misleading for years.
  • Docker Engine 25 (Jan 2024) added improved exit-code semantics in docker inspect and surfaced the Health.FailingStreak field more prominently.
  • Docker Engine 27 (Jun 2024) improved cgroups v2 handling, which affects OOM kill detection on Ubuntu 22.04+ and other systemd-based hosts.

If you are on Compose v2.x and inherit a project written for v1, expect surprises around depends_on, healthchecks, and the way restart interacts with stop_grace_period. A container that “keeps restarting” on v2 may simply be a container that v1 was silently letting fail.

Step 1: Read the Exit Code

The exit code tells you the category of failure:

docker inspect <container_name_or_id> --format='{{.State.ExitCode}}'

Common exit codes:

Exit CodeMeaning
1Application error (check your app’s logs)
2Misuse of shell command
125Docker run itself failed
126Command found but not executable
127Command not found (wrong path in CMD/ENTRYPOINT)
137Killed by signal 9 (OOM kill or docker kill)
139Segmentation fault
143Killed by signal 15 (SIGTERM — graceful shutdown)

Exit code 127 → fix your CMD path. Exit code 137 → increase memory limit. Exit code 1 → read the application logs.

Step 2: Read the Logs

# Show logs from the last run (even after restart)
docker logs <container_name>

# Follow logs in real time
docker logs -f <container_name>

# Show last 50 lines
docker logs --tail 50 <container_name>

# For Docker Compose
docker compose logs myapp
docker compose logs --tail 50 myapp

The log output almost always contains the specific error. Read it before changing anything else.

Fix 1: Fix Application Startup Errors

The most common cause is the application crashing during initialization. The log will show the specific error:

Node.js — module not found:

Error: Cannot find module '/app/dist/server.js'

The build step wasn’t run before building the image, so dist/ doesn’t exist:

# Ensure build runs inside the Dockerfile
FROM node:20-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build          # ← Must be here
CMD ["node", "dist/server.js"]

Python — import error:

ModuleNotFoundError: No module named 'fastapi'

Dependencies weren’t installed in the image:

FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt   # ← Required
COPY . .
CMD ["python", "main.py"]

Fix 2: Fix Wrong CMD or ENTRYPOINT

Exit code 127 means Docker ran the command but the shell couldn’t find the binary:

# Check what command the container tried to run
docker inspect <container> --format='{{.Config.Cmd}}'
docker inspect <container> --format='{{.Config.Entrypoint}}'

Run the container interactively to debug:

# Override the entrypoint to get a shell
docker run -it --entrypoint /bin/sh myapp

# Inside the container, check if the file exists
ls -la /app/dist/server.js
which node

Fix the Dockerfile CMD:

# Wrong — file doesn't exist at this path
CMD ["node", "server.js"]

# Correct — use the actual path
CMD ["node", "/app/dist/server.js"]

# Or use the working directory
WORKDIR /app
CMD ["node", "dist/server.js"]

Common Mistake: Using a shell script as entrypoint but forgetting chmod +x:

COPY entrypoint.sh /entrypoint.sh
RUN chmod +x /entrypoint.sh    # ← Required, otherwise exit code 126
ENTRYPOINT ["/entrypoint.sh"]

Fix 3: Provide Missing Environment Variables

If the app reads a required env var at startup and it’s missing, it crashes with exit code 1. The log will show something like:

Error: DATABASE_URL environment variable is required
TypeError: Cannot read properties of undefined (reading 'split')

Pass environment variables to the container:

# Single variable
docker run -e DATABASE_URL=postgres://... myapp

# From a .env file
docker run --env-file .env myapp

In Docker Compose:

services:
  myapp:
    image: myapp
    environment:
      DATABASE_URL: postgres://user:pass@db:5432/myapp
      REDIS_URL: redis://redis:6379
      NODE_ENV: production
    # Or load from a file
    env_file:
      - .env.production

Make the app fail clearly when required variables are missing:

// Node.js — validate env at startup
const requiredEnv = ['DATABASE_URL', 'JWT_SECRET', 'PORT'];
for (const key of requiredEnv) {
  if (!process.env[key]) {
    console.error(`Missing required environment variable: ${key}`);
    process.exit(1);
  }
}

Fix 4: Fix Out-of-Memory Kills (Exit Code 137)

Exit code 137 means the kernel killed the process because it exceeded the container’s memory limit:

# Check if it was OOM killed
docker inspect <container> --format='{{.State.OOMKilled}}'
# true = OOM killed

Increase the memory limit:

docker run -m 512m myapp          # 512 MB limit
docker run -m 1g myapp            # 1 GB limit

In Docker Compose:

services:
  myapp:
    image: myapp
    deploy:
      resources:
        limits:
          memory: 512M
        reservations:
          memory: 256M

Find what’s consuming memory — run without a limit and check usage:

docker stats <container>

If memory grows without bound, the app has a memory leak. Fix the leak rather than just raising the limit.

Fix 5: Wait for Dependencies

If the app connects to a database or other service at startup, and that service isn’t ready yet, the connection fails and the app exits:

Error: connect ECONNREFUSED 127.0.0.1:5432

Use depends_on with health checks in Docker Compose:

services:
  db:
    image: postgres:16
    environment:
      POSTGRES_PASSWORD: secret
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 5s
      timeout: 5s
      retries: 5

  myapp:
    image: myapp
    depends_on:
      db:
        condition: service_healthy   # Wait until db passes health check
    environment:
      DATABASE_URL: postgres://postgres:secret@db:5432/myapp

Or use a wait script inside the container:

# wait-for-it.sh (common utility)
./wait-for-it.sh db:5432 --timeout=30 -- node dist/server.js

Implement retry logic in the app itself — this is the most resilient approach:

async function connectWithRetry(maxAttempts = 5) {
  for (let attempt = 1; attempt <= maxAttempts; attempt++) {
    try {
      await db.connect();
      console.log('Database connected');
      return;
    } catch (err) {
      if (attempt === maxAttempts) throw err;
      console.log(`Connection attempt ${attempt} failed, retrying in 5s...`);
      await new Promise(r => setTimeout(r, 5000));
    }
  }
}

Fix 6: Adjust the Restart Policy

If the app exits intentionally (e.g., a one-time migration task), restart: always keeps restarting it unnecessarily. Use the right policy:

services:
  # Long-running server — restart on failure, not on clean exit
  myapp:
    image: myapp
    restart: unless-stopped    # or "on-failure"

  # One-time task — never restart
  migrate:
    image: myapp
    command: ["node", "migrate.js"]
    restart: "no"             # Default — don't restart after exit

Restart policies:

PolicyBehavior
noNever restart (default)
on-failureRestart only on non-zero exit code
unless-stoppedRestart unless manually stopped
alwaysAlways restart, even on clean exit

restart: always combined with a crashing app creates an infinite loop. Switch to on-failure with a max count:

docker run --restart on-failure:5 myapp  # Max 5 restarts

Still Not Working?

Check the healthcheck if Compose v2 is restarting the container. Even if the process is running, an unhealthy container with restart: on-failure and a healthcheck can be killed and restarted. Inspect the healthcheck history:

docker inspect <container> --format='{{json .State.Health}}' | jq
# Shows the last 5 healthcheck attempts and exit codes

If the healthcheck command itself is wrong (missing curl, wrong port, wrong path), the container restarts even when the app works.

Inspect cgroup v2 OOM events on systemd hosts. On Ubuntu 22.04+, RHEL 9, and Debian 12, OOM kills are surfaced through cgroup v2 and may not appear in dmesg. Use journalctl with the unit filter:

journalctl -u docker.service --since "10 minutes ago" | grep -i oom
journalctl _SYSTEMD_UNIT=docker.service | grep -i oom

Check for PID 1 zombie handling. A Node.js or Python process running as PID 1 inside the container may not reap child processes correctly. If you fork or spawn from your entrypoint and processes accumulate, the kernel can kill the container with exit code 137. Use tini or dumb-init as PID 1:

RUN apt-get install -y tini
ENTRYPOINT ["/usr/bin/tini", "--", "node", "dist/server.js"]

Run the container without the restart policy to inspect the exit more carefully:

docker run --rm myapp
# Container exits once, you see the full output

Run with a shell to explore the container filesystem:

docker run -it --rm --entrypoint /bin/sh myapp
# Inside: ls, env, cat /app/dist/server.js, etc.

Check system-level OOM kills (outside Docker):

dmesg | grep -i "out of memory"
dmesg | grep -i "oom"
journalctl -k | grep -i "killed process"

Check if a port conflict is causing the crash:

# See what's using port 3000 on the host
ss -tlnp | grep :3000
lsof -i :3000

For related issues, see Fix: Docker Container Already in Use, Fix: Kubernetes CrashLoopBackOff, Fix: Docker Exited 137 OOMKilled, and Fix: Docker Compose Healthcheck Not Working.

F

FixDevs

Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.

Was this article helpful?

Related Articles