Skip to content

Fix: Kubernetes Pod OOMKilled — Out of Memory Error

FixDevs ·

Quick Answer

How to fix Kubernetes OOMKilled errors — understanding memory limits, finding memory leaks, setting correct resource requests and limits, and using Vertical Pod Autoscaler.

The Error

A Kubernetes pod terminates with OOMKilled (exit code 137):

$ kubectl get pods
NAME                    READY   STATUS      RESTARTS   AGE
api-deployment-7d9fb   0/1     OOMKilled   3          12m

Or in kubectl describe pod:

State:          Terminated
  Reason:       OOMKilled
  Exit Code:    137
  Started:      Thu, 20 Mar 2026 10:00:00 +0000
  Finished:     Thu, 20 Mar 2026 10:02:34 +0000
Last State:     Terminated
  Reason:       OOMKilled
  Exit Code:    137

Or the pod keeps restarting with increasing RESTARTS:

NAME               READY   STATUS             RESTARTS   AGE
worker-pod-xk2mp   0/1     CrashLoopBackOff   8          40m

Why This Happens

Kubernetes enforces memory limits at the container level using Linux cgroups. When a container’s memory usage exceeds its configured limits.memory, the kernel immediately terminates the process with SIGKILL (exit code 137):

  • Memory limit too low — the application legitimately needs more memory than the limit allows. Common after traffic spikes, processing large payloads, or loading bigger datasets than tested locally.
  • Memory leak — the application allocates memory and never frees it. Memory usage grows slowly until it hits the limit — then OOMKill happens. The pod restarts, memory grows again, repeating.
  • JVM heap size exceeds container limit — Java apps with -Xmx set higher than the container’s memory limit. The JVM reserves heap memory up front and the container is killed on start.
  • No memory limit set — if limits.memory isn’t set, there’s no ceiling. The pod can consume all node memory, causing the node itself to OOM and kill random processes.
  • Sidecar containers counted together — all containers in a pod share the pod’s cgroup. A sidecar (log agent, proxy) consuming memory counts against the same node’s memory.

Fix 1: Diagnose Which Container Is Being Killed

First, identify the container and confirm it’s a memory issue:

# Check pod status and restart count
kubectl get pods -n <namespace>

# Describe the pod for detailed OOMKill information
kubectl describe pod <pod-name> -n <namespace>

# Check previous container logs (before the crash)
kubectl logs <pod-name> -n <namespace> --previous

# Check events for the pod
kubectl get events -n <namespace> --field-selector involvedObject.name=<pod-name> --sort-by='.lastTimestamp'

In kubectl describe pod, look for:

Containers:
  api:
    State:          Running
    Last State:     Terminated
      Reason:       OOMKilled     # ← Confirmed OOM
      Exit Code:    137
    Limits:
      memory:       256Mi         # ← Current limit
    Requests:
      memory:       128Mi

Check node-level memory pressure:

# Check if the node is under memory pressure
kubectl describe node <node-name> | grep -A 5 "Conditions:"
# MemoryPressure: True means the node itself is running low

# Check actual memory usage on the node
kubectl top node <node-name>

# Check pod memory usage
kubectl top pods -n <namespace>
kubectl top pods -n <namespace> --containers  # Per-container breakdown

Fix 2: Increase Memory Limits

If the application legitimately needs more memory than currently allocated, increase the limit:

# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
spec:
  template:
    spec:
      containers:
      - name: api
        image: myapp:latest
        resources:
          requests:
            memory: "256Mi"   # Minimum guaranteed memory
            cpu: "250m"
          limits:
            memory: "512Mi"   # Maximum — OOMKill fires if exceeded
            cpu: "500m"
kubectl apply -f deployment.yaml

Or patch the deployment directly:

kubectl patch deployment api -n <namespace> --patch '
{
  "spec": {
    "template": {
      "spec": {
        "containers": [{
          "name": "api",
          "resources": {
            "limits": {"memory": "512Mi"},
            "requests": {"memory": "256Mi"}
          }
        }]
      }
    }
  }
}'

Memory unit reference:

ValueMeaning
128Mi128 mebibytes (134 MB) — use Mi for binary
512Mi512 mebibytes (536 MB)
1Gi1 gibibyte (1.07 GB)
256M256 megabytes (decimal) — avoid, use Mi

Common Mistake: Setting limits.memory equal to requests.memory. The request guarantees the minimum; the limit is the maximum. A tight limit with no headroom causes OOMKill on any usage spike. A reasonable ratio is 2:1 (limit = 2× request) for stable apps, higher for bursty workloads.

Fix 3: Fix Java/JVM Memory Configuration

Java applications are a frequent cause of OOMKilled because the JVM doesn’t respect container memory limits by default in older versions:

# Wrong — JVM reads host total RAM, not container limit
# On a 16GB node with a 512Mi container limit, JVM sets heap to ~4GB
java -jar app.jar

# OOMKill fires when JVM tries to use the 4GB heap inside a 512Mi container

Fix for Java 11+ — use container-aware JVM flags:

# Dockerfile
FROM eclipse-temurin:21-jre

# UseContainerSupport is on by default in Java 10+
# MaxRAMPercentage controls heap as a fraction of container memory
ENTRYPOINT ["java", \
  "-XX:+UseContainerSupport", \
  "-XX:MaxRAMPercentage=75.0", \
  "-jar", "/app.jar"]

Or set explicit heap limits that fit within the container:

# If container limit is 512Mi, set heap to at most 400Mi
# (leaving room for JVM overhead, metaspace, etc.)
containers:
- name: api
  env:
  - name: JAVA_OPTS
    value: "-Xms128m -Xmx400m -XX:+UseContainerSupport"
  resources:
    limits:
      memory: "512Mi"

For Node.js — set --max-old-space-size:

containers:
- name: node-api
  command: ["node", "--max-old-space-size=400", "dist/index.js"]
  resources:
    limits:
      memory: "512Mi"

For Python — configure gunicorn worker memory:

containers:
- name: python-api
  command: ["gunicorn", "--workers=2", "--worker-class=uvicorn.workers.UvicornWorker",
            "--max-requests=1000", "--max-requests-jitter=50",
            "app:app"]

--max-requests restarts workers after N requests, preventing slow memory leaks from accumulating.

Fix 4: Find and Fix Memory Leaks

If memory grows continuously until OOMKill, it’s likely a leak. Profile memory usage before it crashes:

Enable memory profiling for Node.js:

// Add to your Node.js app
const v8 = require('v8');
const fs = require('fs');

// Trigger heap snapshot via HTTP endpoint
app.get('/debug/heap-snapshot', (req, res) => {
  const filename = `/tmp/heapdump-${Date.now()}.heapsnapshot`;
  const snapshotStream = v8.writeHeapSnapshot(filename);
  res.json({ snapshot: filename });
});

// Monitor heap size
setInterval(() => {
  const { heapUsed, heapTotal } = process.memoryUsage();
  console.log(`Heap: ${Math.round(heapUsed / 1024 / 1024)}MB / ${Math.round(heapTotal / 1024 / 1024)}MB`);
}, 30000);

For Python — use tracemalloc:

import tracemalloc
import linecache

def display_top(snapshot, key_type='lineno', limit=10):
    snapshot = snapshot.filter_traces((
        tracemalloc.Filter(False, "<frozen importlib._bootstrap>"),
        tracemalloc.Filter(False, "<unknown>"),
    ))
    top_stats = snapshot.statistics(key_type)
    for index, stat in enumerate(top_stats[:limit], 1):
        frame = stat.traceback[0]
        print(f"#{index}: {frame.filename}:{frame.lineno}: {stat.size / 1024:.1f} KiB")

tracemalloc.start()
# ... your code ...
snapshot = tracemalloc.take_snapshot()
display_top(snapshot)

Use kubectl exec to check memory inside a running container:

# Get shell in the container
kubectl exec -it <pod-name> -n <namespace> -- /bin/bash

# Check process memory
cat /proc/meminfo
cat /sys/fs/cgroup/memory/memory.usage_in_bytes     # Current usage
cat /sys/fs/cgroup/memory/memory.limit_in_bytes     # Container limit

Common leak patterns:

// Node.js — event listener leak
// WRONG — listener added on every request, never removed
app.get('/data', (req, res) => {
  emitter.on('data', handleData);  // Leak: listener accumulates
  emitter.emit('data', someData);
  res.send('ok');
});

// CORRECT — use once() or remove the listener
app.get('/data', (req, res) => {
  emitter.once('data', handleData);  // Fires once, auto-removed
  emitter.emit('data', someData);
  res.send('ok');
});
# Python — cache without eviction
# WRONG — unbounded cache grows forever
cache = {}
def get_data(key):
    if key not in cache:
        cache[key] = expensive_fetch(key)
    return cache[key]

# CORRECT — use LRU cache with size limit
from functools import lru_cache

@lru_cache(maxsize=1000)
def get_data(key):
    return expensive_fetch(key)

Fix 5: Set Memory Requests and Limits Correctly

Kubernetes scheduling depends on requests, but OOMKill depends on limits. Both must be set correctly:

resources:
  requests:
    memory: "128Mi"   # Scheduler uses this to find a node with enough memory
  limits:
    memory: "256Mi"   # Kernel kills the container if it exceeds this

Rules for setting values:

  1. Requests = steady-state memory usage at normal load (measure with kubectl top pods)
  2. Limits = peak memory usage under high load, plus a safety buffer (20–50%)
  3. Never set limits lower than requests (Kubernetes rejects this)
  4. Avoid setting limits equal to requests — any spike causes OOMKill

Use kubectl top to find actual usage:

# Watch memory usage over time
watch kubectl top pods -n <namespace>

# For a specific pod
kubectl top pod <pod-name> -n <namespace> --containers

LimitRange — set default limits for a namespace:

# limitrange.yaml — applies when containers don't specify resources
apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: production
spec:
  limits:
  - default:
      memory: "256Mi"
      cpu: "500m"
    defaultRequest:
      memory: "128Mi"
      cpu: "250m"
    type: Container

Fix 6: Use Vertical Pod Autoscaler (VPA)

If you’re unsure of the right memory values, VPA can recommend or automatically set them based on observed usage:

# vpa.yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: api-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api
  updatePolicy:
    updateMode: "Off"    # "Off" = recommend only, don't auto-update
    # updateMode: "Auto" = automatically update pod resources
  resourcePolicy:
    containerPolicies:
    - containerName: api
      minAllowed:
        memory: "64Mi"
      maxAllowed:
        memory: "2Gi"
# Install VPA (if not already installed)
kubectl apply -f https://github.com/kubernetes/autoscaler/releases/latest/download/vertical-pod-autoscaler.yaml

# Apply the VPA
kubectl apply -f vpa.yaml

# Check VPA recommendations
kubectl describe vpa api-vpa -n <namespace>
# Look for: Recommendation > Container Recommendations > Target

Pro Tip: Run VPA in "Off" mode first for a week to collect usage data and get recommendations. Only switch to "Auto" mode once you’ve validated the recommendations match your expectations. Auto mode restarts pods to apply new limits, which can cause brief downtime.

Still Not Working?

Check if it’s the init container being OOMKilled — init containers run before the main container and can also be killed:

kubectl describe pod <pod-name> | grep -A 10 "Init Containers:"

Check node-level OOM events, not just pod events:

# SSH to the node or check system logs
kubectl get events --all-namespaces | grep OOM
journalctl -k | grep -i "oom\|killed process"

For persistent memory growth despite restarts, the issue may be an external resource (Redis, database connection pool) that isn’t cleaned up on restart. Check for connection leaks in your application.

Set terminationMessagePolicy: FallbackToLogsOnError to capture logs from OOMKilled containers:

containers:
- name: api
  terminationMessagePolicy: FallbackToLogsOnError
kubectl describe pod <pod-name> | grep -A 5 "Last State:"
# Termination Message section may contain the last log lines before OOMKill

For related Kubernetes issues, see Fix: Kubernetes CrashLoopBackOff and Fix: Kubernetes Pod Pending.

F

FixDevs

Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.

Was this article helpful?

Related Articles