Skip to content

Fix: Kubernetes Pod OOMKilled (Exit Code 137)

FixDevs · (Updated: )

Part of:  Docker, DevOps & Infrastructure

Quick Answer

How to fix Kubernetes OOMKilled pod status caused by memory limit exceeded, container memory leaks, JVM heap misconfiguration, and resource requests/limits settings.

The Error

You check your pod status and see:

$ kubectl get pods
NAME                     READY   STATUS      RESTARTS   AGE
my-app-7b9f4d8c5-x2k9l  0/1     OOMKilled   3          5m

Or in the pod description:

$ kubectl describe pod my-app-7b9f4d8c5-x2k9l
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137

Or the pod is in CrashLoopBackOff and the previous termination reason is OOMKilled:

    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137

Kubernetes killed your container because it exceeded its memory limit. The Linux kernel’s OOM (Out Of Memory) killer terminated the process, and Kubernetes reports it as OOMKilled with exit code 137 (128 + signal 9 = SIGKILL).

Why This Happens

Kubernetes enforces memory limits set in the pod specification. When a container tries to use more memory than its limit, the Linux kernel kills it immediately. There is no warning, no graceful shutdown — the process is killed with SIGKILL. The exit code 137 is the standard shell convention for a process terminated by signal 9 (128 + 9). Kubernetes also records Reason: OOMKilled in the container’s lastState, which is what you should grep for in automation rather than the exit code alone.

What actually happens under the hood is a kernel-level decision, not a Kubernetes-level one. Each container’s process tree runs inside a Linux memory cgroup with a hard limit set to the value of resources.limits.memory. When any allocation inside the cgroup would push usage past that ceiling, the kernel’s OOM killer selects a victim process within that cgroup (typically the largest one, scored by oom_score) and sends SIGKILL. Because the kill is synchronous with the failing allocation, your application does not get to flush logs, complete in-flight requests, or run shutdown handlers. The kubelet only learns about the kill afterward when it reaps the exit status, which is why dashboards sometimes show the pod as healthy for a few seconds after it has already died.

Common causes:

  • Memory limit is too low. Your application genuinely needs more memory than the limit allows.
  • Memory leak. The application gradually consumes more memory until it hits the limit.
  • JVM heap misconfiguration. The JVM’s max heap exceeds the container’s memory limit, or non-heap memory (metaspace, thread stacks, native memory) is not accounted for.
  • No memory limit set. Without a limit, the container uses node memory until the node’s OOM killer steps in, which is worse.
  • Sidecar containers. Init containers or sidecars (like Istio’s envoy) consume memory that was not accounted for in the limit.
  • Temporary spikes. The application handles a burst of traffic that causes temporary memory spikes.
  • Page cache and tmpfs pressure. Heavy file I/O or writes to emptyDir: { medium: Memory } count against the container’s memory cgroup on both cgroups v1 and v2, and can push usage over the limit even when your application’s RSS looks fine.

Version History That Changes the Failure Mode

The OOMKilled symptom looks the same across Kubernetes versions, but the underlying behavior, the tools you have, and even the kernel accounting changed materially over time. Knowing which version stack you are on prevents wasted debugging:

  • Kubernetes 1.18 (Mar 2020) — kubectl debug (alpha). Before this, you could not attach a debugging shell to a crashed container without rebuilding the image. The kubectl debug command (with ephemeral debug containers) reached beta in 1.23 and graduated to stable in 1.25 (Aug 2022). For OOMKilled investigation, this is the cleanest way to attach htop, jcmd, or memory profilers to a running pod without modifying the workload.
  • Kubernetes 1.22 (Aug 2021) — cgroups v2 support (alpha). Before cgroups v2, swap accounting per container was effectively impossible, and PSI (Pressure Stall Information) was not exposed in a useful form.
  • Kubernetes 1.25 (Aug 2022) — cgroups v2 GA. This is the big inflection point. On nodes running cgroups v2 (Ubuntu 22.04+, RHEL 9+, Bottlerocket recent builds), memory accounting is unified (memory.current, memory.max) and PSI metrics are available at /sys/fs/cgroup/<cgroup>/memory.pressure. The MemoryQoS alpha feature (1.22+) only works on cgroups v2.
  • Kubernetes 1.22+ with NodeSwap feature gate. Swap on Linux nodes became beta in 1.28 (Aug 2023). Before this, kubelet refused to start if swap was enabled. Now you can opt in to swap usage for Burstable QoS pods, which can buy headroom for short spikes but also masks real leaks. The memory.swap.max semantic on cgroups v2 differs from memory.memsw.limit_in_bytes on v1 — verify which one your tooling reports.
  • JVM container awareness — JDK 8u131 (Apr 2017) and JDK 10 (Mar 2018). Pre-8u131, the JVM read host memory from /proc/meminfo and would happily set a multi-GB heap inside a 512Mi container. 8u131 added experimental -XX:+UseCGroupMemoryLimitForHeap, but it was clunky and required -XX:+UnlockExperimentalVMOptions. JDK 10 made -XX:+UseContainerSupport the default and added -XX:MaxRAMPercentage. Anything older than 8u191 in a container should be considered broken for memory sizing.
  • Cgroups v1 vs v2 OOM event surfacing. On cgroups v1, in-cgroup OOM kills sometimes failed to propagate a clean Reason: OOMKilled and you only saw Exit Code: 137 with Reason: Error. On cgroups v2 plus Kubernetes 1.25+, the kubelet reliably reports OOMKilled in pod events. If your describe output shows exit 137 but no OOMKilled reason, suspect an old cgroups v1 node.

Fix 1: Increase the Memory Limit

The simplest fix. If your application legitimately needs more memory, increase the limit:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  template:
    spec:
      containers:
        - name: my-app
          image: my-app:latest
          resources:
            requests:
              memory: "256Mi"
              cpu: "250m"
            limits:
              memory: "512Mi"
              cpu: "500m"

Key concepts:

  • requests — the amount of memory Kubernetes guarantees to the container. Used for scheduling.
  • limits — the maximum memory the container can use. Exceeding this triggers OOMKill.

Set requests to what the application typically uses and limits to handle peak usage:

resources:
  requests:
    memory: "512Mi"    # Normal usage
  limits:
    memory: "1Gi"      # Peak usage ceiling

Pro Tip: Do not set requests equal to limits unless you want guaranteed QoS (quality of service). Equal values put the pod in the Guaranteed QoS class, which means it is the last to be evicted under node pressure — but it also means no burst capacity. For most workloads, set limits to 1.5–2x the requests value.

Fix 2: Monitor Actual Memory Usage

Before changing limits, understand how much memory your application actually uses:

Real-time memory usage:

kubectl top pod my-app-7b9f4d8c5-x2k9l

Memory usage over time (requires metrics-server):

kubectl top pod --containers

Check the OOM event details:

kubectl describe pod my-app-7b9f4d8c5-x2k9l | grep -A 5 "Last State"

Check node-level memory pressure:

kubectl describe node <node-name> | grep -A 5 "Conditions"
kubectl top node

If kubectl top does not work, install the metrics-server:

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

Use this data to set appropriate limits. If the application uses 400Mi normally and peaks at 600Mi, set requests to 512Mi and limits to 768Mi–1Gi.

Fix 3: Fix JVM Memory in Containers

Java applications are the most common source of OOMKilled in Kubernetes. The JVM allocates memory for heap, metaspace, thread stacks, code cache, and native memory — all of which count toward the container limit.

Use container-aware JVM flags:

containers:
  - name: my-app
    image: my-java-app:latest
    resources:
      limits:
        memory: "1Gi"
    env:
      - name: JAVA_OPTS
        value: "-XX:MaxRAMPercentage=75.0 -XX:InitialRAMPercentage=50.0"

-XX:MaxRAMPercentage=75.0 sets the max heap to 75% of the container’s memory limit. The remaining 25% is for metaspace, thread stacks, and native memory.

Do NOT use -Xmx with a value close to the memory limit:

# WRONG — leaves no room for non-heap memory
env:
  - name: JAVA_OPTS
    value: "-Xmx1g"  # Container limit is also 1Gi — OOMKilled!

The JVM uses more than just the heap. With -Xmx1g in a 1Gi container, total memory exceeds the limit and the container gets killed.

Rule of thumb: Set -Xmx to about 70–75% of the container memory limit. Or better, use -XX:MaxRAMPercentage which calculates automatically.

For Java OutOfMemoryError within the JVM itself (not container-level), see Fix: Java OutOfMemoryError.

Fix 4: Fix Node.js Memory in Containers

Node.js has a default heap limit (around 1.5–2GB depending on the version). In a container with a lower memory limit, Node.js might try to use more memory than allowed.

Set the Node.js heap limit explicitly:

containers:
  - name: my-app
    image: my-node-app:latest
    resources:
      limits:
        memory: "512Mi"
    env:
      - name: NODE_OPTIONS
        value: "--max-old-space-size=384"

Set --max-old-space-size to about 75% of the container memory limit (in MB). The remaining 25% handles native memory, buffers, and other overhead. Node.js 12+ honours cgroup limits when computing the default heap size, but older versions read host memory and may over-allocate before you ever set the flag.

Fix 5: Debug Memory Leaks

If the container is OOMKilled after running for hours or days (not immediately on startup), you likely have a memory leak.

Identify the pattern:

# Watch memory usage over time
watch kubectl top pod my-app-7b9f4d8c5-x2k9l --containers

If memory grows steadily without dropping, it is a leak.

For Node.js:

# Generate a heap snapshot
kubectl exec my-pod -- node -e "require('v8').writeHeapSnapshot()"
kubectl cp my-pod:/app/Heap.*.heapsnapshot ./heapdump.heapsnapshot

Open the heapsnapshot in Chrome DevTools (Memory tab) to identify leaked objects.

For Java:

# Trigger a heap dump
kubectl exec my-pod -- jcmd 1 GC.heap_dump /tmp/heapdump.hprof
kubectl cp my-pod:/tmp/heapdump.hprof ./heapdump.hprof

Analyze with Eclipse MAT or VisualVM.

For Python:

Use tracemalloc or objgraph to track memory allocations.

Common leak sources:

  • Unbounded caches or in-memory stores
  • Event listeners that are never removed
  • Database connection pools that grow without cleanup
  • Large objects stored in session state
  • Circular references preventing garbage collection

Fix 6: Handle Sidecar Container Memory

If your pod has sidecar containers (Istio envoy, log collectors, monitoring agents), their memory counts toward the pod’s total. But each container has its own resources section:

spec:
  containers:
    - name: my-app
      resources:
        limits:
          memory: "512Mi"
    - name: istio-proxy
      resources:
        limits:
          memory: "128Mi"
    - name: log-collector
      resources:
        limits:
          memory: "64Mi"

Check which container was OOMKilled:

kubectl describe pod my-app-pod | grep -B 2 "OOMKilled"

The output shows which specific container was terminated. Fix the limit for that container, not the others.

Common Mistake: Increasing the main container’s memory limit when the sidecar is the one being OOMKilled. Always check kubectl describe pod to identify which container hit its limit.

Fix 7: Set LimitRange and ResourceQuota

LimitRange sets default and maximum limits for containers in a namespace:

apiVersion: v1
kind: LimitRange
metadata:
  name: memory-limits
  namespace: default
spec:
  limits:
    - default:
        memory: "512Mi"
      defaultRequest:
        memory: "256Mi"
      max:
        memory: "2Gi"
      min:
        memory: "64Mi"
      type: Container

This ensures that every container has reasonable memory limits even if the deployment spec does not specify them.

ResourceQuota limits total memory for an entire namespace:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: memory-quota
  namespace: default
spec:
  hard:
    requests.memory: "8Gi"
    limits.memory: "16Gi"

Fix 8: Use Vertical Pod Autoscaler (VPA)

If you are unsure what the right memory limit should be, use VPA to automatically recommend or set memory limits based on actual usage:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: "Off"  # "Off" = recommend only, "Auto" = apply automatically

With updateMode: "Off", check the recommendations:

kubectl describe vpa my-app-vpa

VPA analyzes historical memory usage and recommends appropriate requests and limits.

Still Not Working?

If the pod continues to be OOMKilled after increasing limits:

Check for node-level memory pressure. The node itself might be running out of memory, causing the kubelet to evict pods:

kubectl describe node <node-name> | grep -A 10 "Conditions"

Look for MemoryPressure: True. If the node is under pressure, scale up the cluster or add more nodes.

Check for init container memory. Init containers run before the main container and have their own memory limits. If an init container uses too much memory, the pod fails before the main container starts.

Check for ephemeral storage OOM. Kubernetes also evicts pods that exceed ephemeral storage limits (ephemeral-storage in resources). The symptoms look similar to OOMKill but the reason in kubectl describe will say Evicted rather than OOMKilled.

Check kernel overcommit settings. The node’s vm.overcommit_memory kernel parameter affects how the OOM killer behaves. A value of 0 (heuristic) or 2 (strict) can cause unexpected OOM kills.

Use kubectl debug with an ephemeral container. When the pod restarts in a loop, you cannot exec into the dead container, but you can attach a debug container that shares the pod’s PID and network namespaces. This lets you inspect /proc/<pid>/status and smaps from a known-good image:

kubectl debug -it my-pod --image=nicolaka/netshoot --target=my-app
cat /proc/1/status | grep -E "VmRSS|VmSwap|VmPeak"

If VmPeak is far above limits.memory, your app is spiking faster than metrics-server samples (which defaults to 15-second intervals) and you are blind to the real peak.

Check cgroups version on the node. Tooling that worked on cgroups v1 can silently misreport on v2 and vice versa. Confirm with:

kubectl debug node/<node-name> -it --image=busybox -- stat -fc %T /sys/fs/cgroup/

A return of cgroup2fs means cgroups v2; tmpfs typically means v1. JVM, Node.js, and Go’s GOMEMLIMIT (added Go 1.19, Aug 2022) all read cgroup memory differently between the two layouts. An older Java 8 JVM on a cgroups v2 node may not respect the limit at all and will OOMKill almost immediately under load.

Check for kernel page cache accounting. On cgroups v2, page cache pages allocated by your container count against memory.current and can trigger OOMKill even when RSS looks healthy. Heavy dd, log writes, or tmpfs-backed emptyDir volumes are common culprits. Move large temporary files to a PersistentVolume or to emptyDir without the medium: Memory setting.

Check for missing PodDisruptionBudget interplay. When OOMKills cascade across replicas, a too-tight PodDisruptionBudget plus an autoscaler trying to roll new pods can leave the service effectively down. Confirm with kubectl get pdb and loosen the budget temporarily while you fix the leak.

If the pod is crashing for reasons other than OOM, see Fix: Kubernetes CrashLoopBackOff. If the pod cannot start because the image cannot be pulled, see Fix: Kubernetes ImagePullBackOff.

For the Docker-level equivalent of this error (outside Kubernetes), see Fix: Docker exited with code 137 OOMKilled.

F

FixDevs

Solo developer based in Japan. Every solution is cross-referenced with official documentation and tested before publishing.

Was this article helpful?

Related Articles