What is OOMKilled? | DevOps Dictionary

OOMKilled — Extended Technical Detail

What Does OOMKilled Mean in Simple Terms?

OOM stands for Out Of Memory. When your container tries to use more RAM than its configured limit, the Linux kernel steps in and forcefully kills the process — no warning, no graceful shutdown. Think of it like a security guard at a Swiggy data centre saying: "You were allocated 512MB of memory, you tried to take 800MB, you are out."

◈ DIAGRAM

+------------------------------------------+
| Container Memory Limit: 512Mi            | <- Set in deployment.yaml
+------------------------------------------+
                    |
                    v
+------------------------------------------+
| Container reaches 512Mi usage            | <- Memory pressure event
+------------------------------------------+
                    |
                    v
+------------------------------------------+
| Linux kernel sends SIGKILL (signal 9)    | <- Instant hard kill, no grace period
+------------------------------------------+
                    |
                    v
+------------------------------------------+
| Pod status: OOMKilled, Exit Code: 137    | <- Kubelet reports to API server
+------------------------------------------+
                    |
                    v
+------------------------------------------+
| Kubelet restarts the container           | <- If restartPolicy: Always (default)
+------------------------------------------+

Why Exit Code 137?

137 = 128 + 9. In Linux, when a process is killed by a signal, the exit code is 128 + signal number. Signal 9 is SIGKILL — the kernel's hard kill signal. Unlike SIGTERM (signal 15, graceful shutdown), SIGKILL cannot be caught, blocked, or ignored by the process.

Signal	Number	Exit Code	Meaning
SIGTERM	15	143	Graceful shutdown requested — app can handle it
SIGKILL	9	137	Hard kill by kernel — OOMKilled or `kill -9`
SIGSEGV	11	139	Segmentation fault — bad memory access in app code

How to Confirm OOMKilled

Bash

1# Describe the pod and look for OOMKilled under Last State
2kubectl describe pod api-server-7d9f8b-xkp2q -n production
3 
4# Look for this exact block in the output:
5# Last State:     Terminated
6#   Reason:       OOMKilled
7#   Exit Code:    137
8#   Started:      Mon, 12 Feb 2024 14:22:01 +0530
9#   Finished:     Mon, 12 Feb 2024 14:22:01 +0530
10 
11# Quick one-liner to extract exit code directly
12kubectl get pod api-server-7d9f8b-xkp2q -n production \
13  -o jsonpath='{.status.containerStatuses[0].lastState.terminated.exitCode}'
14# Output: 137
15 
16# Check all pods in a namespace for recent OOMKills
17kubectl get pods -n production -o json | \
18  jq '.items[] | select(.status.containerStatuses[]?.lastState.terminated.reason=="OOMKilled") | .metadata.name'

Correct Memory Configuration

YAML

1# deployment.yaml — correct memory resource configuration
2containers:
3  - name: api-server
4    image: registry.razorpay.in/api-server:v2.4.1
5    resources:
6      requests:
7        memory: "256Mi"    # Minimum guaranteed memory reserved on the node
8        cpu: "200m"
9      limits:
10        memory: "512Mi"    # Hard ceiling — exceed this = OOMKilled instantly
11        cpu: "1000m"

◈ DIAGRAM

+-----------------------------+
| limits.memory: 512Mi        | <- Hard ceiling — kernel kills at this threshold
+-----------------------------+
| requests.memory: 256Mi      | <- Guaranteed reservation on the node
+-----------------------------+
| Actual usage: varies        | <- Should stay between request and limit
+-----------------------------+

How to Find the Right Memory Limit

Never guess memory limits. Observe real usage first, then add headroom:

Bash

1# Watch live memory usage across all pods in a namespace
2kubectl top pods -n production
3 
4# Sort by memory descending to find the biggest consumers
5kubectl top pods -n production --sort-by=memory
6 
7# Get a specific pod's current memory usage
8kubectl top pod api-server-7d9f8b-xkp2q -n production
9 
10# Formula: set limit = peak observed usage + 30% headroom
11# Example: peak = 380Mi -> limit = 380 * 1.3 = ~500Mi -> round up to 512Mi

Language-Specific Memory Tuning

OOMKills are especially common for runtimes that manage their own heap and do not respect container memory limits automatically:

Bash

1# Node.js — heap limit defaults to ~1.5GB regardless of container limit
2# Set max-old-space-size to 75-80% of your memory limit
3# Container limit: 512Mi -> Node heap max: 400MB
4containers:
5  - name: node-api
6    image: registry.razorpay.in/node-api:v3.1.0
7    env:
8      - name: NODE_OPTIONS
9        value: "--max-old-space-size=400"
10    resources:
11      limits:
12        memory: "512Mi"
13 
14# Java — JVM sets heap to 25% of total system RAM, not container limit
15# Use XX:MaxRAMPercentage instead of -Xmx for container-aware heap sizing
16containers:
17  - name: java-service
18    image: registry.hotstar.com/stream-api:v1.8.2
19    env:
20      - name: JAVA_OPTS
21        value: "-XX:MaxRAMPercentage=75.0"
22    resources:
23      limits:
24        memory: "2Gi"

OOMKilled Troubleshooting Reference

Symptom	Likely Cause	Fix
Exit code 137, reason OOMKilled	Memory limit set too low	Run `kubectl top pod`, measure peak, add 30% headroom
OOMKill on every deploy despite high limit	Memory leak in application code	Profile the app — heap grows unbounded
OOMKill only under traffic spikes	Limit too tight for peak load	Use HPA to scale out before memory pressure hits
Node-level OOMKill (whole node)	No memory limits set on pods	Apply LimitRange to all namespaces immediately
Java app OOMKilled despite large limit	JVM heap not bounded to container	Add `-XX:MaxRAMPercentage=75.0` to JAVA_OPTS

💡 Tip: For Node.js apps, always set NODE_OPTIONS=--max-old-space-size to 75–80% of your memory limit. A container limited to 512Mi should have --max-old-space-size=400 — this lets Node.js trigger its own GC before the kernel kills the process.

🔴 Common Mistake: Setting limits.memory equal to requests.memory with zero headroom. Any traffic spike or GC pause that pushes usage even 1MB over the limit will OOMKill the pod instantly. Always set limits at least 30–50% above requests.

📌 Remember: OOMKilled pods restart immediately if restartPolicy: Always (the default). If your pod is in CrashLoopBackOff with exit code 137, the fix is almost always to raise the memory limit — not to keep restarting it.

⚠️ Security: If a container is repeatedly OOMKilled despite seemingly reasonable limits, investigate for a memory leak or an injection attack that is causing unbounded data accumulation. Sudden memory spikes that trigger OOMKills on Zerodha-scale trading platforms during market open are often the first indicator of a runaway query or payload exploit.

Syncing Data

OOMKilled

Technical Explanation & Usage

OOMKilled — Extended Technical Detail

What Does OOMKilled Mean in Simple Terms?

Why Exit Code 137?

How to Confirm OOMKilled

Correct Memory Configuration

How to Find the Right Memory Limit

Language-Specific Memory Tuning

OOMKilled Troubleshooting Reference

Related Terms

Namespace

Pod

Deployment