Fix: Kubernetes Pod OOMKilled (Exit Code 137)
Part of: Docker, DevOps & Infrastructure
Quick Answer
How to fix Kubernetes OOMKilled pod status caused by memory limit exceeded, container memory leaks, JVM heap misconfiguration, and resource requests/limits settings.
The Error
You check your pod status and see:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
my-app-7b9f4d8c5-x2k9l 0/1 OOMKilled 3 5mOr in the pod description:
$ kubectl describe pod my-app-7b9f4d8c5-x2k9l
Last State: Terminated
Reason: OOMKilled
Exit Code: 137Or the pod is in CrashLoopBackOff and the previous termination reason is OOMKilled:
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: OOMKilled
Exit Code: 137Kubernetes killed your container because it exceeded its memory limit. The Linux kernel’s OOM (Out Of Memory) killer terminated the process, and Kubernetes reports it as OOMKilled with exit code 137 (128 + signal 9 = SIGKILL).
Why This Happens
Kubernetes enforces memory limits set in the pod specification. When a container tries to use more memory than its limit, the Linux kernel kills it immediately. There is no warning, no graceful shutdown — the process is killed with SIGKILL. The exit code 137 is the standard shell convention for a process terminated by signal 9 (128 + 9). Kubernetes also records Reason: OOMKilled in the container’s lastState, which is what you should grep for in automation rather than the exit code alone.
What actually happens under the hood is a kernel-level decision, not a Kubernetes-level one. Each container’s process tree runs inside a Linux memory cgroup with a hard limit set to the value of resources.limits.memory. When any allocation inside the cgroup would push usage past that ceiling, the kernel’s OOM killer selects a victim process within that cgroup (typically the largest one, scored by oom_score) and sends SIGKILL. Because the kill is synchronous with the failing allocation, your application does not get to flush logs, complete in-flight requests, or run shutdown handlers. The kubelet only learns about the kill afterward when it reaps the exit status, which is why dashboards sometimes show the pod as healthy for a few seconds after it has already died.
Common causes:
- Memory limit is too low. Your application genuinely needs more memory than the limit allows.
- Memory leak. The application gradually consumes more memory until it hits the limit.
- JVM heap misconfiguration. The JVM’s max heap exceeds the container’s memory limit, or non-heap memory (metaspace, thread stacks, native memory) is not accounted for.
- No memory limit set. Without a limit, the container uses node memory until the node’s OOM killer steps in, which is worse.
- Sidecar containers. Init containers or sidecars (like Istio’s envoy) consume memory that was not accounted for in the limit.
- Temporary spikes. The application handles a burst of traffic that causes temporary memory spikes.
- Page cache and tmpfs pressure. Heavy file I/O or writes to
emptyDir: { medium: Memory }count against the container’s memory cgroup on both cgroups v1 and v2, and can push usage over the limit even when your application’s RSS looks fine.
Version History That Changes the Failure Mode
The OOMKilled symptom looks the same across Kubernetes versions, but the underlying behavior, the tools you have, and even the kernel accounting changed materially over time. Knowing which version stack you are on prevents wasted debugging:
- Kubernetes 1.18 (Mar 2020) —
kubectl debug(alpha). Before this, you could not attach a debugging shell to a crashed container without rebuilding the image. Thekubectl debugcommand (with ephemeral debug containers) reached beta in 1.23 and graduated to stable in 1.25 (Aug 2022). For OOMKilled investigation, this is the cleanest way to attachhtop,jcmd, or memory profilers to a running pod without modifying the workload. - Kubernetes 1.22 (Aug 2021) — cgroups v2 support (alpha). Before cgroups v2, swap accounting per container was effectively impossible, and PSI (Pressure Stall Information) was not exposed in a useful form.
- Kubernetes 1.25 (Aug 2022) — cgroups v2 GA. This is the big inflection point. On nodes running cgroups v2 (Ubuntu 22.04+, RHEL 9+, Bottlerocket recent builds), memory accounting is unified (
memory.current,memory.max) and PSI metrics are available at/sys/fs/cgroup/<cgroup>/memory.pressure. TheMemoryQoSalpha feature (1.22+) only works on cgroups v2. - Kubernetes 1.22+ with
NodeSwapfeature gate. Swap on Linux nodes became beta in 1.28 (Aug 2023). Before this,kubeletrefused to start if swap was enabled. Now you can opt in to swap usage forBurstableQoS pods, which can buy headroom for short spikes but also masks real leaks. Thememory.swap.maxsemantic on cgroups v2 differs frommemory.memsw.limit_in_byteson v1 — verify which one your tooling reports. - JVM container awareness — JDK 8u131 (Apr 2017) and JDK 10 (Mar 2018). Pre-8u131, the JVM read host memory from
/proc/meminfoand would happily set a multi-GB heap inside a 512Mi container. 8u131 added experimental-XX:+UseCGroupMemoryLimitForHeap, but it was clunky and required-XX:+UnlockExperimentalVMOptions. JDK 10 made-XX:+UseContainerSupportthe default and added-XX:MaxRAMPercentage. Anything older than 8u191 in a container should be considered broken for memory sizing. - Cgroups v1 vs v2 OOM event surfacing. On cgroups v1, in-cgroup OOM kills sometimes failed to propagate a clean
Reason: OOMKilledand you only sawExit Code: 137withReason: Error. On cgroups v2 plus Kubernetes 1.25+, the kubelet reliably reportsOOMKilledin pod events. If your describe output shows exit 137 but no OOMKilled reason, suspect an old cgroups v1 node.
Fix 1: Increase the Memory Limit
The simplest fix. If your application legitimately needs more memory, increase the limit:
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
spec:
template:
spec:
containers:
- name: my-app
image: my-app:latest
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"Key concepts:
requests— the amount of memory Kubernetes guarantees to the container. Used for scheduling.limits— the maximum memory the container can use. Exceeding this triggers OOMKill.
Set requests to what the application typically uses and limits to handle peak usage:
resources:
requests:
memory: "512Mi" # Normal usage
limits:
memory: "1Gi" # Peak usage ceilingPro Tip: Do not set
requestsequal tolimitsunless you want guaranteed QoS (quality of service). Equal values put the pod in theGuaranteedQoS class, which means it is the last to be evicted under node pressure — but it also means no burst capacity. For most workloads, setlimitsto 1.5–2x therequestsvalue.
Fix 2: Monitor Actual Memory Usage
Before changing limits, understand how much memory your application actually uses:
Real-time memory usage:
kubectl top pod my-app-7b9f4d8c5-x2k9lMemory usage over time (requires metrics-server):
kubectl top pod --containersCheck the OOM event details:
kubectl describe pod my-app-7b9f4d8c5-x2k9l | grep -A 5 "Last State"Check node-level memory pressure:
kubectl describe node <node-name> | grep -A 5 "Conditions"
kubectl top nodeIf kubectl top does not work, install the metrics-server:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yamlUse this data to set appropriate limits. If the application uses 400Mi normally and peaks at 600Mi, set requests to 512Mi and limits to 768Mi–1Gi.
Fix 3: Fix JVM Memory in Containers
Java applications are the most common source of OOMKilled in Kubernetes. The JVM allocates memory for heap, metaspace, thread stacks, code cache, and native memory — all of which count toward the container limit.
Use container-aware JVM flags:
containers:
- name: my-app
image: my-java-app:latest
resources:
limits:
memory: "1Gi"
env:
- name: JAVA_OPTS
value: "-XX:MaxRAMPercentage=75.0 -XX:InitialRAMPercentage=50.0"-XX:MaxRAMPercentage=75.0 sets the max heap to 75% of the container’s memory limit. The remaining 25% is for metaspace, thread stacks, and native memory.
Do NOT use -Xmx with a value close to the memory limit:
# WRONG — leaves no room for non-heap memory
env:
- name: JAVA_OPTS
value: "-Xmx1g" # Container limit is also 1Gi — OOMKilled!The JVM uses more than just the heap. With -Xmx1g in a 1Gi container, total memory exceeds the limit and the container gets killed.
Rule of thumb: Set -Xmx to about 70–75% of the container memory limit. Or better, use -XX:MaxRAMPercentage which calculates automatically.
For Java OutOfMemoryError within the JVM itself (not container-level), see Fix: Java OutOfMemoryError.
Fix 4: Fix Node.js Memory in Containers
Node.js has a default heap limit (around 1.5–2GB depending on the version). In a container with a lower memory limit, Node.js might try to use more memory than allowed.
Set the Node.js heap limit explicitly:
containers:
- name: my-app
image: my-node-app:latest
resources:
limits:
memory: "512Mi"
env:
- name: NODE_OPTIONS
value: "--max-old-space-size=384"Set --max-old-space-size to about 75% of the container memory limit (in MB). The remaining 25% handles native memory, buffers, and other overhead. Node.js 12+ honours cgroup limits when computing the default heap size, but older versions read host memory and may over-allocate before you ever set the flag.
Fix 5: Debug Memory Leaks
If the container is OOMKilled after running for hours or days (not immediately on startup), you likely have a memory leak.
Identify the pattern:
# Watch memory usage over time
watch kubectl top pod my-app-7b9f4d8c5-x2k9l --containersIf memory grows steadily without dropping, it is a leak.
For Node.js:
# Generate a heap snapshot
kubectl exec my-pod -- node -e "require('v8').writeHeapSnapshot()"
kubectl cp my-pod:/app/Heap.*.heapsnapshot ./heapdump.heapsnapshotOpen the heapsnapshot in Chrome DevTools (Memory tab) to identify leaked objects.
For Java:
# Trigger a heap dump
kubectl exec my-pod -- jcmd 1 GC.heap_dump /tmp/heapdump.hprof
kubectl cp my-pod:/tmp/heapdump.hprof ./heapdump.hprofAnalyze with Eclipse MAT or VisualVM.
For Python:
Use tracemalloc or objgraph to track memory allocations.
Common leak sources:
- Unbounded caches or in-memory stores
- Event listeners that are never removed
- Database connection pools that grow without cleanup
- Large objects stored in session state
- Circular references preventing garbage collection
Fix 6: Handle Sidecar Container Memory
If your pod has sidecar containers (Istio envoy, log collectors, monitoring agents), their memory counts toward the pod’s total. But each container has its own resources section:
spec:
containers:
- name: my-app
resources:
limits:
memory: "512Mi"
- name: istio-proxy
resources:
limits:
memory: "128Mi"
- name: log-collector
resources:
limits:
memory: "64Mi"Check which container was OOMKilled:
kubectl describe pod my-app-pod | grep -B 2 "OOMKilled"The output shows which specific container was terminated. Fix the limit for that container, not the others.
Common Mistake: Increasing the main container’s memory limit when the sidecar is the one being OOMKilled. Always check
kubectl describe podto identify which container hit its limit.
Fix 7: Set LimitRange and ResourceQuota
LimitRange sets default and maximum limits for containers in a namespace:
apiVersion: v1
kind: LimitRange
metadata:
name: memory-limits
namespace: default
spec:
limits:
- default:
memory: "512Mi"
defaultRequest:
memory: "256Mi"
max:
memory: "2Gi"
min:
memory: "64Mi"
type: ContainerThis ensures that every container has reasonable memory limits even if the deployment spec does not specify them.
ResourceQuota limits total memory for an entire namespace:
apiVersion: v1
kind: ResourceQuota
metadata:
name: memory-quota
namespace: default
spec:
hard:
requests.memory: "8Gi"
limits.memory: "16Gi"Fix 8: Use Vertical Pod Autoscaler (VPA)
If you are unsure what the right memory limit should be, use VPA to automatically recommend or set memory limits based on actual usage:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: my-app-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
updatePolicy:
updateMode: "Off" # "Off" = recommend only, "Auto" = apply automaticallyWith updateMode: "Off", check the recommendations:
kubectl describe vpa my-app-vpaVPA analyzes historical memory usage and recommends appropriate requests and limits.
Still Not Working?
If the pod continues to be OOMKilled after increasing limits:
Check for node-level memory pressure. The node itself might be running out of memory, causing the kubelet to evict pods:
kubectl describe node <node-name> | grep -A 10 "Conditions"Look for MemoryPressure: True. If the node is under pressure, scale up the cluster or add more nodes.
Check for init container memory. Init containers run before the main container and have their own memory limits. If an init container uses too much memory, the pod fails before the main container starts.
Check for ephemeral storage OOM. Kubernetes also evicts pods that exceed ephemeral storage limits (ephemeral-storage in resources). The symptoms look similar to OOMKill but the reason in kubectl describe will say Evicted rather than OOMKilled.
Check kernel overcommit settings. The node’s vm.overcommit_memory kernel parameter affects how the OOM killer behaves. A value of 0 (heuristic) or 2 (strict) can cause unexpected OOM kills.
Use kubectl debug with an ephemeral container. When the pod restarts in a loop, you cannot exec into the dead container, but you can attach a debug container that shares the pod’s PID and network namespaces. This lets you inspect /proc/<pid>/status and smaps from a known-good image:
kubectl debug -it my-pod --image=nicolaka/netshoot --target=my-app
cat /proc/1/status | grep -E "VmRSS|VmSwap|VmPeak"If VmPeak is far above limits.memory, your app is spiking faster than metrics-server samples (which defaults to 15-second intervals) and you are blind to the real peak.
Check cgroups version on the node. Tooling that worked on cgroups v1 can silently misreport on v2 and vice versa. Confirm with:
kubectl debug node/<node-name> -it --image=busybox -- stat -fc %T /sys/fs/cgroup/A return of cgroup2fs means cgroups v2; tmpfs typically means v1. JVM, Node.js, and Go’s GOMEMLIMIT (added Go 1.19, Aug 2022) all read cgroup memory differently between the two layouts. An older Java 8 JVM on a cgroups v2 node may not respect the limit at all and will OOMKill almost immediately under load.
Check for kernel page cache accounting. On cgroups v2, page cache pages allocated by your container count against memory.current and can trigger OOMKill even when RSS looks healthy. Heavy dd, log writes, or tmpfs-backed emptyDir volumes are common culprits. Move large temporary files to a PersistentVolume or to emptyDir without the medium: Memory setting.
Check for missing PodDisruptionBudget interplay. When OOMKills cascade across replicas, a too-tight PodDisruptionBudget plus an autoscaler trying to roll new pods can leave the service effectively down. Confirm with kubectl get pdb and loosen the budget temporarily while you fix the leak.
If the pod is crashing for reasons other than OOM, see Fix: Kubernetes CrashLoopBackOff. If the pod cannot start because the image cannot be pulled, see Fix: Kubernetes ImagePullBackOff.
For the Docker-level equivalent of this error (outside Kubernetes), see Fix: Docker exited with code 137 OOMKilled.
Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.
Was this article helpful?
Related Articles
Fix: Kubernetes ImagePullBackOff - Failed to Pull Image
How to fix the Kubernetes ImagePullBackOff and ErrImagePull errors when a pod fails to pull a container image from a registry.
Fix: Kubernetes Pod CrashLoopBackOff (Back-off restarting failed container)
How to fix the Kubernetes CrashLoopBackOff error when a pod repeatedly crashes and Kubernetes keeps restarting it with increasing back-off delays.
Fix: YAML 'mapping values are not allowed here' and Other YAML Syntax Errors
How to fix 'mapping values are not allowed here', 'could not find expected :', 'did not find expected key', and other YAML indentation and syntax errors in Docker Compose, Kubernetes manifests, GitHub Actions, and config files.
Fix: Docker Container Exited (137) OOMKilled / Killed Signal 9
How to fix Docker container 'Exited (137)', OOMKilled, and 'Killed' signal 9 errors caused by out-of-memory conditions in Docker, Docker Compose, and Kubernetes.