Fix: Kubernetes Pod OOMKilled — Out of Memory Error
Quick Answer
How to fix Kubernetes OOMKilled errors — understanding memory limits, finding memory leaks, setting correct resource requests and limits, and using Vertical Pod Autoscaler.
The Error
A Kubernetes pod terminates with OOMKilled (exit code 137):
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
api-deployment-7d9fb 0/1 OOMKilled 3 12mOr in kubectl describe pod:
State: Terminated
Reason: OOMKilled
Exit Code: 137
Started: Thu, 20 Mar 2026 10:00:00 +0000
Finished: Thu, 20 Mar 2026 10:02:34 +0000
Last State: Terminated
Reason: OOMKilled
Exit Code: 137Or the pod keeps restarting with increasing RESTARTS:
NAME READY STATUS RESTARTS AGE
worker-pod-xk2mp 0/1 CrashLoopBackOff 8 40mWhy This Happens
Kubernetes enforces memory limits at the container level using Linux cgroups. When a container’s memory usage exceeds its configured limits.memory, the kernel immediately terminates the process with SIGKILL (exit code 137):
- Memory limit too low — the application legitimately needs more memory than the limit allows. Common after traffic spikes, processing large payloads, or loading bigger datasets than tested locally.
- Memory leak — the application allocates memory and never frees it. Memory usage grows slowly until it hits the limit — then OOMKill happens. The pod restarts, memory grows again, repeating.
- JVM heap size exceeds container limit — Java apps with
-Xmxset higher than the container’s memory limit. The JVM reserves heap memory up front and the container is killed on start. - No memory limit set — if
limits.memoryisn’t set, there’s no ceiling. The pod can consume all node memory, causing the node itself to OOM and kill random processes. - Sidecar containers counted together — all containers in a pod share the pod’s cgroup. A sidecar (log agent, proxy) consuming memory counts against the same node’s memory.
Fix 1: Diagnose Which Container Is Being Killed
First, identify the container and confirm it’s a memory issue:
# Check pod status and restart count
kubectl get pods -n <namespace>
# Describe the pod for detailed OOMKill information
kubectl describe pod <pod-name> -n <namespace>
# Check previous container logs (before the crash)
kubectl logs <pod-name> -n <namespace> --previous
# Check events for the pod
kubectl get events -n <namespace> --field-selector involvedObject.name=<pod-name> --sort-by='.lastTimestamp'In kubectl describe pod, look for:
Containers:
api:
State: Running
Last State: Terminated
Reason: OOMKilled # ← Confirmed OOM
Exit Code: 137
Limits:
memory: 256Mi # ← Current limit
Requests:
memory: 128MiCheck node-level memory pressure:
# Check if the node is under memory pressure
kubectl describe node <node-name> | grep -A 5 "Conditions:"
# MemoryPressure: True means the node itself is running low
# Check actual memory usage on the node
kubectl top node <node-name>
# Check pod memory usage
kubectl top pods -n <namespace>
kubectl top pods -n <namespace> --containers # Per-container breakdownFix 2: Increase Memory Limits
If the application legitimately needs more memory than currently allocated, increase the limit:
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: api
spec:
template:
spec:
containers:
- name: api
image: myapp:latest
resources:
requests:
memory: "256Mi" # Minimum guaranteed memory
cpu: "250m"
limits:
memory: "512Mi" # Maximum — OOMKill fires if exceeded
cpu: "500m"kubectl apply -f deployment.yamlOr patch the deployment directly:
kubectl patch deployment api -n <namespace> --patch '
{
"spec": {
"template": {
"spec": {
"containers": [{
"name": "api",
"resources": {
"limits": {"memory": "512Mi"},
"requests": {"memory": "256Mi"}
}
}]
}
}
}
}'Memory unit reference:
| Value | Meaning |
|---|---|
128Mi | 128 mebibytes (134 MB) — use Mi for binary |
512Mi | 512 mebibytes (536 MB) |
1Gi | 1 gibibyte (1.07 GB) |
256M | 256 megabytes (decimal) — avoid, use Mi |
Common Mistake: Setting
limits.memoryequal torequests.memory. The request guarantees the minimum; the limit is the maximum. A tight limit with no headroom causes OOMKill on any usage spike. A reasonable ratio is 2:1 (limit = 2× request) for stable apps, higher for bursty workloads.
Fix 3: Fix Java/JVM Memory Configuration
Java applications are a frequent cause of OOMKilled because the JVM doesn’t respect container memory limits by default in older versions:
# Wrong — JVM reads host total RAM, not container limit
# On a 16GB node with a 512Mi container limit, JVM sets heap to ~4GB
java -jar app.jar
# OOMKill fires when JVM tries to use the 4GB heap inside a 512Mi containerFix for Java 11+ — use container-aware JVM flags:
# Dockerfile
FROM eclipse-temurin:21-jre
# UseContainerSupport is on by default in Java 10+
# MaxRAMPercentage controls heap as a fraction of container memory
ENTRYPOINT ["java", \
"-XX:+UseContainerSupport", \
"-XX:MaxRAMPercentage=75.0", \
"-jar", "/app.jar"]Or set explicit heap limits that fit within the container:
# If container limit is 512Mi, set heap to at most 400Mi
# (leaving room for JVM overhead, metaspace, etc.)
containers:
- name: api
env:
- name: JAVA_OPTS
value: "-Xms128m -Xmx400m -XX:+UseContainerSupport"
resources:
limits:
memory: "512Mi"For Node.js — set --max-old-space-size:
containers:
- name: node-api
command: ["node", "--max-old-space-size=400", "dist/index.js"]
resources:
limits:
memory: "512Mi"For Python — configure gunicorn worker memory:
containers:
- name: python-api
command: ["gunicorn", "--workers=2", "--worker-class=uvicorn.workers.UvicornWorker",
"--max-requests=1000", "--max-requests-jitter=50",
"app:app"]--max-requests restarts workers after N requests, preventing slow memory leaks from accumulating.
Fix 4: Find and Fix Memory Leaks
If memory grows continuously until OOMKill, it’s likely a leak. Profile memory usage before it crashes:
Enable memory profiling for Node.js:
// Add to your Node.js app
const v8 = require('v8');
const fs = require('fs');
// Trigger heap snapshot via HTTP endpoint
app.get('/debug/heap-snapshot', (req, res) => {
const filename = `/tmp/heapdump-${Date.now()}.heapsnapshot`;
const snapshotStream = v8.writeHeapSnapshot(filename);
res.json({ snapshot: filename });
});
// Monitor heap size
setInterval(() => {
const { heapUsed, heapTotal } = process.memoryUsage();
console.log(`Heap: ${Math.round(heapUsed / 1024 / 1024)}MB / ${Math.round(heapTotal / 1024 / 1024)}MB`);
}, 30000);For Python — use tracemalloc:
import tracemalloc
import linecache
def display_top(snapshot, key_type='lineno', limit=10):
snapshot = snapshot.filter_traces((
tracemalloc.Filter(False, "<frozen importlib._bootstrap>"),
tracemalloc.Filter(False, "<unknown>"),
))
top_stats = snapshot.statistics(key_type)
for index, stat in enumerate(top_stats[:limit], 1):
frame = stat.traceback[0]
print(f"#{index}: {frame.filename}:{frame.lineno}: {stat.size / 1024:.1f} KiB")
tracemalloc.start()
# ... your code ...
snapshot = tracemalloc.take_snapshot()
display_top(snapshot)Use kubectl exec to check memory inside a running container:
# Get shell in the container
kubectl exec -it <pod-name> -n <namespace> -- /bin/bash
# Check process memory
cat /proc/meminfo
cat /sys/fs/cgroup/memory/memory.usage_in_bytes # Current usage
cat /sys/fs/cgroup/memory/memory.limit_in_bytes # Container limitCommon leak patterns:
// Node.js — event listener leak
// WRONG — listener added on every request, never removed
app.get('/data', (req, res) => {
emitter.on('data', handleData); // Leak: listener accumulates
emitter.emit('data', someData);
res.send('ok');
});
// CORRECT — use once() or remove the listener
app.get('/data', (req, res) => {
emitter.once('data', handleData); // Fires once, auto-removed
emitter.emit('data', someData);
res.send('ok');
});# Python — cache without eviction
# WRONG — unbounded cache grows forever
cache = {}
def get_data(key):
if key not in cache:
cache[key] = expensive_fetch(key)
return cache[key]
# CORRECT — use LRU cache with size limit
from functools import lru_cache
@lru_cache(maxsize=1000)
def get_data(key):
return expensive_fetch(key)Fix 5: Set Memory Requests and Limits Correctly
Kubernetes scheduling depends on requests, but OOMKill depends on limits. Both must be set correctly:
resources:
requests:
memory: "128Mi" # Scheduler uses this to find a node with enough memory
limits:
memory: "256Mi" # Kernel kills the container if it exceeds thisRules for setting values:
- Requests = steady-state memory usage at normal load (measure with
kubectl top pods) - Limits = peak memory usage under high load, plus a safety buffer (20–50%)
- Never set limits lower than requests (Kubernetes rejects this)
- Avoid setting limits equal to requests — any spike causes OOMKill
Use kubectl top to find actual usage:
# Watch memory usage over time
watch kubectl top pods -n <namespace>
# For a specific pod
kubectl top pod <pod-name> -n <namespace> --containersLimitRange — set default limits for a namespace:
# limitrange.yaml — applies when containers don't specify resources
apiVersion: v1
kind: LimitRange
metadata:
name: default-limits
namespace: production
spec:
limits:
- default:
memory: "256Mi"
cpu: "500m"
defaultRequest:
memory: "128Mi"
cpu: "250m"
type: ContainerFix 6: Use Vertical Pod Autoscaler (VPA)
If you’re unsure of the right memory values, VPA can recommend or automatically set them based on observed usage:
# vpa.yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: api-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: api
updatePolicy:
updateMode: "Off" # "Off" = recommend only, don't auto-update
# updateMode: "Auto" = automatically update pod resources
resourcePolicy:
containerPolicies:
- containerName: api
minAllowed:
memory: "64Mi"
maxAllowed:
memory: "2Gi"# Install VPA (if not already installed)
kubectl apply -f https://github.com/kubernetes/autoscaler/releases/latest/download/vertical-pod-autoscaler.yaml
# Apply the VPA
kubectl apply -f vpa.yaml
# Check VPA recommendations
kubectl describe vpa api-vpa -n <namespace>
# Look for: Recommendation > Container Recommendations > TargetPro Tip: Run VPA in
"Off"mode first for a week to collect usage data and get recommendations. Only switch to"Auto"mode once you’ve validated the recommendations match your expectations. Auto mode restarts pods to apply new limits, which can cause brief downtime.
Still Not Working?
Check if it’s the init container being OOMKilled — init containers run before the main container and can also be killed:
kubectl describe pod <pod-name> | grep -A 10 "Init Containers:"Check node-level OOM events, not just pod events:
# SSH to the node or check system logs
kubectl get events --all-namespaces | grep OOM
journalctl -k | grep -i "oom\|killed process"For persistent memory growth despite restarts, the issue may be an external resource (Redis, database connection pool) that isn’t cleaned up on restart. Check for connection leaks in your application.
Set terminationMessagePolicy: FallbackToLogsOnError to capture logs from OOMKilled containers:
containers:
- name: api
terminationMessagePolicy: FallbackToLogsOnErrorkubectl describe pod <pod-name> | grep -A 5 "Last State:"
# Termination Message section may contain the last log lines before OOMKillFor related Kubernetes issues, see Fix: Kubernetes CrashLoopBackOff and Fix: Kubernetes Pod Pending.
Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.
Was this article helpful?
Related Articles
Fix: Kubernetes HPA Not Scaling — HorizontalPodAutoscaler Shows Unknown or Doesn't Scale
How to fix Kubernetes HorizontalPodAutoscaler issues — metrics-server not installed, CPU requests not set, unknown metrics, scale-down delay, custom metrics, and KEDA.
Fix: Docker Build Cache Invalidated — Slow Builds on Every Run
How to fix Docker layer cache being invalidated on every build — Dockerfile instruction order, COPY optimization, ARG vs ENV, BuildKit cache mounts, and .dockerignore.
Fix: Kubernetes Secret Not Mounted — Pod Cannot Access Secret Values
How to fix Kubernetes Secrets not being mounted — namespace mismatches, RBAC permissions, volume mount configuration, environment variable injection, and secret decoding issues.
Fix: Helm Not Working — Release Already Exists, Stuck Upgrade, and Values Not Applied
How to fix Helm 3 errors — release already exists, another operation is in progress, --set values not applied, nil pointer template errors, kubeVersion mismatch, hook failures, and ConfigMap changes not restarting pods.