Toleration ā Extended Technical Detail
What is a Toleration in Simple Terms?
A Toleration is the pod's way of saying "I am allowed on that restricted node." It is the matching key to the Taint lock on the node. Without a Toleration, a pod is automatically repelled by any node that carries a matching Taint.
+------------------------------------------+| Node: gpu-node-1 || Taint: workload=gpu:NoSchedule | <- Node repels all pods by default+------------------------------------------+ +------------------------------------------+| Regular API Pod || No Toleration | <- REPELLED ā stays on untainted nodes+------------------------------------------+ +------------------------------------------+| GPU Transcoding Pod || Toleration: workload=gpu:NoSchedule | <- ALLOWED ā can land on tainted node+------------------------------------------+Toleration in a Pod Spec
1# Pod spec for a GPU encoding job that tolerates the GPU node taint2apiVersion: v13kind: Pod4metadata:5 name: video-encoder6 namespace: streaming-prod7spec:8 tolerations:9 - key: "workload"10 operator: "Equal"11 value: "gpu"12 effect: "NoSchedule" # Must match the taint effect exactly13 containers:14 - name: encoder15 image: registry.hotstar.in/video-encoder:v3.1.016 resources:17 limits:18 nvidia.com/gpu: 1Toleration Operators ā Equal vs Exists
+------------------------+ +------------------------------+| operator: Equal | | operator: Exists || | | || key=workload | <------> | key=workload || value=gpu | | (value is ignored) || effect=NoSchedule | | effect=NoSchedule || | | || Matches ONLY taints | | Matches ANY taint with || where key=workload | | key=workload regardless || AND value=gpu | | of its value |+------------------------+ +------------------------------+1# Equal ā key AND value must both match the taint exactly2tolerations:3 - key: "workload"4 operator: "Equal"5 value: "gpu"6 effect: "NoSchedule"7 8# Exists ā only the key needs to match, value is ignored9tolerations:10 - key: "workload"11 operator: "Exists"12 effect: "NoSchedule"13 14# Universal ā tolerates ALL taints on ALL nodes (empty key + Exists)15tolerations:16 - operator: "Exists" # No key ā matches every taint in the clusterToleration with tolerationSeconds (for NoExecute)
When a node gets a NoExecute taint (e.g., during a node failure), Kubernetes evicts pods that don't tolerate it. You can delay eviction using tolerationSeconds:
1spec:2 tolerations:3 - key: "node.kubernetes.io/not-ready"4 operator: "Exists"5 effect: "NoExecute"6 tolerationSeconds: 300 # Stay on the node for 5 minutes before eviction7 # Useful for Zerodha's trading pods during brief node blips+------------------------------------------+| Node becomes NotReady | <- Hardware hiccup on mumbai-worker-3+------------------------------------------+ | v+------------------------------------------+| NoExecute taint auto-applied | <- Kubernetes adds system taint+------------------------------------------+ | v+------------------------------------------+| Pod with tolerationSeconds: 300 | <- Waits 5 min before evicting| Pod without tolerationSeconds | <- Evicted immediately+------------------------------------------+Toleration + NodeAffinity ā The Complete Pattern
A Toleration only allows a pod onto a tainted node ā it does not guarantee the pod lands there. To ensure the pod goes exclusively to the intended node, combine Toleration with NodeAffinity:
1# deployment.yaml ā video transcoder exclusively on GPU nodes at Hotstar2spec:3 tolerations:4 - key: "workload"5 operator: "Equal"6 value: "gpu"7 effect: "NoSchedule" # Step 1: Allow past the taint repel8 affinity:9 nodeAffinity:10 requiredDuringSchedulingIgnoredDuringExecution:11 nodeSelectorTerms:12 - matchExpressions:13 - key: workload14 operator: In15 values: ["gpu"] # Step 2: Pin to GPU nodes only16 containers:17 - name: transcoder18 image: registry.hotstar.in/transcoder:v4.2.119 resources:20 limits:21 nvidia.com/gpu: "1"+------------------------------------------+| Toleration on pod | <- Unlocks the tainted node+------------------------------------------+ | v+------------------------------------------+| NodeAffinity on pod | <- Forces pod to GPU nodes only+------------------------------------------+ | v+------------------------------------------+| GPU node exclusively runs GPU workloads | <- Isolation achieved+------------------------------------------+System Tolerations on DaemonSets
Critical system DaemonSets (kube-proxy, CNI, log collectors) automatically carry built-in Tolerations so they run on every node including tainted ones:
1# Check the built-in tolerations on a system DaemonSet2kubectl get daemonset kube-proxy -n kube-system -o yaml | grep -A 20 tolerations3 4# Output ā kube-proxy tolerates everything5# tolerations:6# - operator: Exists <-- matches ALL taints, runs everywhereTroubleshooting Common Toleration Problems
| Problem | Symptom | Fix |
|---|---|---|
| Pod stuck in Pending | 0/5 nodes available: node(s) had taint |
Pod is missing a Toleration ā add tolerations block with matching key, value, and effect |
| Toleration set but pod still Pending | Pod has Toleration but won't schedule | Toleration effect doesn't match taint effect ā they must be identical |
| Pod not landing on intended node | Pod schedules on random untainted nodes | Toleration allows the node but doesn't pin the pod ā add nodeAffinity or nodeSelector |
| Non-GPU pods landing on GPU nodes | Cost spike on GPU instance billing | Pod has operator: Exists with no key ā wildcard Toleration bypasses all taints |
| Pod evicted unexpectedly after node event | Running pod disappears during node hiccup | Missing tolerationSeconds on NoExecute Toleration ā add a grace period |
š” Tip: Use operator: Exists with an empty key to create a pod that tolerates ALL taints on ALL nodes ā useful for critical system DaemonSets like log collectors (Fluentd, Filebeat) that must run on every node including tainted GPU and dedicated nodes.š Remember: A Toleration is permission, not a directive. It tells Kubernetes "this pod is allowed on tainted nodes" ā but without anodeAffinityornodeSelector, the scheduler may still place the pod on a cheaper untainted node. Always pair Toleration with affinity rules for hard node dedication.
ā ļø Security: At Razorpay, PCI-DSS compliance requires that card-processing pods run only on dedicated, isolated nodes. Setoperator: Equalwith the exact taint key and value ā neveroperator: Existsā to prevent these pods from accidentally tolerating unintended taints and landing on shared infrastructure.
š“ Common Mistake: Usingoperator: Existswithout akeyin a Toleration. This is a wildcard that matches every taint on every node in the cluster ā effectively defeating all taint-based isolation for that pod. Always scope Tolerations to a specifickeyunless you explicitly need universal placement (system DaemonSets only).