Kubelet ā Extended Technical Detail
What is the Kubelet in Simple Terms?
Think of the kubelet as the local manager on every worker node. The Kubernetes control plane (API server) sends orders ā "run this pod with these containers." The kubelet on each node receives those orders and makes sure they are actually executed on that specific machine.
+------------------------------------------+| Kubernetes API Server | <- Control plane ā issues PodSpecs+------------------------------------------+ | v+------------------------------------------+| Kubelet (on each worker node) | <- Watches API, enforces PodSpecs locally+------------------------------------------+ | v+------------------------------------------+| Container Runtime (containerd) | <- Actually starts/stops containers+------------------------------------------+ | v+------------------------------------------+| Running Pod | <- Final result: container is alive+------------------------------------------+What the Kubelet Does (Reconciliation Loop)
The kubelet runs a continuous reconciliation loop ā comparing the desired state from the API server against the actual state on the node:
| Step | Action |
|---|---|
| 1. Watch | Polls the API server for PodSpecs assigned to its node |
| 2. Pull | Pulls container images if not already cached on the node |
| 3. Start | Starts containers via the container runtime (containerd) |
| 4. Probe | Runs liveness and readiness probes at configured intervals |
| 5. Report | Sends pod status updates back to the API server |
| 6. Evict | Evicts pods when the node hits memory or disk pressure thresholds |
How to Check Kubelet Status on a Node
1# SSH into a worker node2ssh rahul@mumbai-prod-node-13 4# Check if kubelet service is running5sudo systemctl status kubelet6 7# View live kubelet logs (last 10 minutes)8sudo journalctl -u kubelet -f --since "10 minutes ago"9 10# Check kubelet version and config11kubelet --version12sudo cat /var/lib/kubelet/config.yamlNode NotReady ā The Most Common Kubelet Issue
1# Spot a NotReady node from the control plane2kubectl get nodes3# NAME STATUS ROLES AGE4# mumbai-prod-node-1 Ready <none> 12d5# mumbai-prod-node-2 NotReady <none> 12d <- problem node6 7# Get details on why it is NotReady8kubectl describe node mumbai-prod-node-29# Look for these conditions at the bottom:10# MemoryPressure False11# DiskPressure True <- disk is full12# PIDPressure False13# Ready False <- kubelet stopped reporting14 15# SSH into the node and restart kubelet16ssh rahul@10.0.1.5117sudo systemctl restart kubelet18sudo systemctl status kubeletKubelet Eviction ā When Nodes Run Low on Resources
The kubelet automatically evicts pods when the node hits resource pressure thresholds. These defaults can be tuned in the kubelet config:
1# /var/lib/kubelet/config.yaml ā kubelet eviction thresholds2evictionHard:3 memory.available: "200Mi" # Evict pods if less than 200Mi RAM free4 nodefs.available: "10%" # Evict if disk drops below 10%5 nodefs.inodesFree: "5%" # Evict if inodes drop below 5%6evictionSoft:7 memory.available: "500Mi" # Warn first, then evict after grace period8evictionSoftGracePeriod:9 memory.available: "1m30s" # Give pods 90 seconds before forced evictionKubelet Probes ā How Pods Get Health-Checked
The kubelet is responsible for running three types of probes against containers:
+---------------------------+ +---------------------------+ +---------------------------+| Liveness Probe | | Readiness Probe | | Startup Probe || | | | | || Is the app still alive? | | Is the app ready to recv | | Did the app start OK? || Fail -> kubelet restarts | | traffic? Fail -> removed | | Replaces liveness during || the container | | from Service endpoints | | slow init containers |+---------------------------+ +---------------------------+ +---------------------------+1# Example: all three probes on a Razorpay payment API container2containers:3 - name: payment-api4 image: registry.razorpay.in/payment-api:v2.4.15 startupProbe:6 httpGet:7 path: /healthz8 port: 80809 failureThreshold: 30 # Allow up to 30 x 10s = 5 minutes to start10 periodSeconds: 1011 livenessProbe:12 httpGet:13 path: /healthz14 port: 808015 initialDelaySeconds: 516 periodSeconds: 1517 failureThreshold: 3 # Restart container after 3 consecutive failures18 readinessProbe:19 httpGet:20 path: /ready21 port: 808022 periodSeconds: 1023 failureThreshold: 2 # Remove from load balancer after 2 failuresKubelet Troubleshooting Reference
| Symptom | Likely Cause | Fix |
|---|---|---|
Node shows NotReady |
Kubelet crashed or lost API connectivity | systemctl restart kubelet on the node |
DiskPressure on node |
/var/lib/kubelet or /var/log full |
Clear old images: crictl rmi --prune |
MemoryPressure on node |
Too many pods, no resource limits set | Add LimitRange to namespaces |
| Pods evicted unexpectedly | Kubelet hit eviction threshold | Check kubectl describe node for pressure conditions |
container runtime is down |
containerd service crashed | systemctl restart containerd then systemctl restart kubelet |
š Remember: If a node showsNotReady, the kubelet on that node has either crashed or lost connectivity to the API server. SSH into the node and runsudo systemctl status kubeletimmediately ā do not wait for the node to self-recover.
ā ļø Security: The kubelet exposes port 10250 (read-write API). Unauthorized access to port 10250 gives full control over every pod on that node ā including exec into running containers. Always firewall this port and restrict access to the API server and monitoring agents only. Port 10255 (read-only, deprecated) should be disabled entirely.
š” Tip: On Hotstar-scale clusters running 500+ nodes, kubelet log noise can be overwhelming. Use sudo journalctl -u kubelet --since "5 minutes ago" | grep -E "ERROR|WARN|evict" to filter signal from noise fast during an incident.š“ Common Mistake: Restarting a node to fix a NotReady status without first draining it. Always run kubectl drain mumbai-prod-node-2 --ignore-daemonsets --delete-emptydir-data before rebooting a node ā otherwise all pods on it are hard-killed with no graceful termination.