OOMKilled ā Extended Technical Detail
What Does OOMKilled Mean in Simple Terms?
OOM stands for Out Of Memory. When your container tries to use more RAM than its configured limit, the Linux kernel steps in and forcefully kills the process ā no warning, no graceful shutdown. Think of it like a security guard at a Swiggy data centre saying: "You were allocated 512MB of memory, you tried to take 800MB, you are out."
+------------------------------------------+| Container Memory Limit: 512Mi | <- Set in deployment.yaml+------------------------------------------+ | v+------------------------------------------+| Container reaches 512Mi usage | <- Memory pressure event+------------------------------------------+ | v+------------------------------------------+| Linux kernel sends SIGKILL (signal 9) | <- Instant hard kill, no grace period+------------------------------------------+ | v+------------------------------------------+| Pod status: OOMKilled, Exit Code: 137 | <- Kubelet reports to API server+------------------------------------------+ | v+------------------------------------------+| Kubelet restarts the container | <- If restartPolicy: Always (default)+------------------------------------------+Why Exit Code 137?
137 = 128 + 9. In Linux, when a process is killed by a signal, the exit code is 128 + signal number. Signal 9 is SIGKILL ā the kernel's hard kill signal. Unlike SIGTERM (signal 15, graceful shutdown), SIGKILL cannot be caught, blocked, or ignored by the process.
| Signal | Number | Exit Code | Meaning |
|---|---|---|---|
| SIGTERM | 15 | 143 | Graceful shutdown requested ā app can handle it |
| SIGKILL | 9 | 137 | Hard kill by kernel ā OOMKilled or kill -9 |
| SIGSEGV | 11 | 139 | Segmentation fault ā bad memory access in app code |
How to Confirm OOMKilled
1# Describe the pod and look for OOMKilled under Last State2kubectl describe pod api-server-7d9f8b-xkp2q -n production3 4# Look for this exact block in the output:5# Last State: Terminated6# Reason: OOMKilled7# Exit Code: 1378# Started: Mon, 12 Feb 2024 14:22:01 +05309# Finished: Mon, 12 Feb 2024 14:22:01 +053010 11# Quick one-liner to extract exit code directly12kubectl get pod api-server-7d9f8b-xkp2q -n production \13 -o jsonpath='{.status.containerStatuses[0].lastState.terminated.exitCode}'14# Output: 13715 16# Check all pods in a namespace for recent OOMKills17kubectl get pods -n production -o json | \18 jq '.items[] | select(.status.containerStatuses[]?.lastState.terminated.reason=="OOMKilled") | .metadata.name'Correct Memory Configuration
1# deployment.yaml ā correct memory resource configuration2containers:3 - name: api-server4 image: registry.razorpay.in/api-server:v2.4.15 resources:6 requests:7 memory: "256Mi" # Minimum guaranteed memory reserved on the node8 cpu: "200m"9 limits:10 memory: "512Mi" # Hard ceiling ā exceed this = OOMKilled instantly11 cpu: "1000m"+-----------------------------+| limits.memory: 512Mi | <- Hard ceiling ā kernel kills at this threshold+-----------------------------+| requests.memory: 256Mi | <- Guaranteed reservation on the node+-----------------------------+| Actual usage: varies | <- Should stay between request and limit+-----------------------------+How to Find the Right Memory Limit
Never guess memory limits. Observe real usage first, then add headroom:
1# Watch live memory usage across all pods in a namespace2kubectl top pods -n production3 4# Sort by memory descending to find the biggest consumers5kubectl top pods -n production --sort-by=memory6 7# Get a specific pod's current memory usage8kubectl top pod api-server-7d9f8b-xkp2q -n production9 10# Formula: set limit = peak observed usage + 30% headroom11# Example: peak = 380Mi -> limit = 380 * 1.3 = ~500Mi -> round up to 512MiLanguage-Specific Memory Tuning
OOMKills are especially common for runtimes that manage their own heap and do not respect container memory limits automatically:
1# Node.js ā heap limit defaults to ~1.5GB regardless of container limit2# Set max-old-space-size to 75-80% of your memory limit3# Container limit: 512Mi -> Node heap max: 400MB4containers:5 - name: node-api6 image: registry.razorpay.in/node-api:v3.1.07 env:8 - name: NODE_OPTIONS9 value: "--max-old-space-size=400"10 resources:11 limits:12 memory: "512Mi"13 14# Java ā JVM sets heap to 25% of total system RAM, not container limit15# Use XX:MaxRAMPercentage instead of -Xmx for container-aware heap sizing16containers:17 - name: java-service18 image: registry.hotstar.com/stream-api:v1.8.219 env:20 - name: JAVA_OPTS21 value: "-XX:MaxRAMPercentage=75.0"22 resources:23 limits:24 memory: "2Gi"OOMKilled Troubleshooting Reference
| Symptom | Likely Cause | Fix |
|---|---|---|
| Exit code 137, reason OOMKilled | Memory limit set too low | Run kubectl top pod, measure peak, add 30% headroom |
| OOMKill on every deploy despite high limit | Memory leak in application code | Profile the app ā heap grows unbounded |
| OOMKill only under traffic spikes | Limit too tight for peak load | Use HPA to scale out before memory pressure hits |
| Node-level OOMKill (whole node) | No memory limits set on pods | Apply LimitRange to all namespaces immediately |
| Java app OOMKilled despite large limit | JVM heap not bounded to container | Add -XX:MaxRAMPercentage=75.0 to JAVA_OPTS |
š” Tip: For Node.js apps, always setNODE_OPTIONS=--max-old-space-sizeto 75ā80% of your memory limit. A container limited to 512Mi should have--max-old-space-size=400ā this lets Node.js trigger its own GC before the kernel kills the process.
š“ Common Mistake: Settinglimits.memoryequal torequests.memorywith zero headroom. Any traffic spike or GC pause that pushes usage even 1MB over the limit will OOMKill the pod instantly. Always set limits at least 30ā50% above requests.
š Remember: OOMKilled pods restart immediately if restartPolicy: Always (the default). If your pod is in CrashLoopBackOff with exit code 137, the fix is almost always to raise the memory limit ā not to keep restarting it.ā ļø Security: If a container is repeatedly OOMKilled despite seemingly reasonable limits, investigate for a memory leak or an injection attack that is causing unbounded data accumulation. Sudden memory spikes that trigger OOMKills on Zerodha-scale trading platforms during market open are often the first indicator of a runaway query or payload exploit.