DaemonSet ā One Agent Per Node, Always
What is a DaemonSet in Simple Terms?
A DaemonSet is a standing order to the scheduler: "This pod must always run on every node ā one copy per node, no more, no less." Not one total replica. One per node. When a new node joins the cluster, the DaemonSet pod is automatically placed on it. When a node is removed, the pod is cleaned up automatically.
DaemonSet vs Deployment ā The Key Difference
+------------------------------------------+ +------------------------------------------+| Deployment | | DaemonSet || | | || You control replica count | | Cluster controls replica count || replicas: 3 (on any 3 nodes) | | 1 pod per node (on ALL nodes) || | | || Use for: application workloads | | Use for: infrastructure agents |+------------------------------------------+ +------------------------------------------+One Pod Per Node ā How It Looks
+---------------+ +---------------+ +---------------+ +---------------+| mumbai-node-1| | mumbai-node-2| | mumbai-node-3| | mumbai-node-4|| | | | | | | || [fluentd-0] | | [fluentd-1] | | [fluentd-2] | | [fluentd-3] |+---------------+ +---------------+ +---------------+ +---------------+ ^ ^ | | New node added to cluster -----> DaemonSet auto-schedules pod hereWhen to Use a DaemonSet
Use DaemonSet for infrastructure agents:
- Log collection ā Fluentd, Filebeat, Promtail (must read log files from every node's disk)
- Metrics scraping ā Node Exporter (must collect CPU/memory/disk from every node)
- Network plugins ā Calico, Cilium CNI agents (must configure networking on every node)
- Security agents ā Falco, CrowdStrike (must inspect every node's syscalls and processes)
- Storage drivers ā CSI node drivers that attach and mount volumes
Do NOT use DaemonSet for:
- Application workloads (use Deployment)
- Batch processing (use Jobs or CronJobs)
- Anything where you want explicit control over replica count
A Real DaemonSet ā Node Exporter for Prometheus
1# node-exporter-daemonset.yaml2# Runs Prometheus Node Exporter on every node in mumbai-prod-cluster3apiVersion: apps/v14kind: DaemonSet5metadata:6 name: node-exporter7 namespace: monitoring8spec:9 selector:10 matchLabels:11 app: node-exporter12 template:13 metadata:14 labels:15 app: node-exporter16 spec:17 hostNetwork: true # Uses the node's network namespace directly18 hostPID: true # Sees all processes on the node (required for metrics)19 tolerations:20 - operator: Exists # Tolerate ALL taints ā run on control-plane nodes too21 effect: NoSchedule22 containers:23 - name: node-exporter24 image: prom/node-exporter:v1.7.025 ports:26 - containerPort: 910027 hostPort: 9100 # Binds directly to the node's port 910028 args:29 - '--path.procfs=/host/proc'30 - '--path.sysfs=/host/sys'31 volumeMounts:32 - name: proc33 mountPath: /host/proc34 readOnly: true35 - name: sys36 mountPath: /host/sys37 readOnly: true38 volumes:39 - name: proc40 hostPath:41 path: /proc # Mounts the node's /proc filesystem42 - name: sys43 hostPath:44 path: /sys # Mounts the node's /sys filesystemTargeting Specific Nodes ā Not Always Every Node
You do not always want a DaemonSet on every node. A GPU monitoring agent should only run on GPU nodes:
1# nodeSelector ā simple label match2spec:3 template:4 spec:5 nodeSelector:6 accelerator: nvidia-gpu # Only schedule on nodes labelled as GPU nodesFor more complex targeting, use nodeAffinity:
1# nodeAffinity ā skip control-plane, run only on worker nodes2affinity:3 nodeAffinity:4 requiredDuringSchedulingIgnoredDuringExecution:5 nodeSelectorTerms:6 - matchExpressions:7 - key: node-role8 operator: In9 values:10 - worker # Excludes control-plane nodes explicitlyTolerations ā Getting onto Tainted Nodes
Control-plane nodes carry a default taint: node-role.kubernetes.io/control-plane:NoSchedule. Without a matching toleration, your DaemonSet will skip them. The operator: Exists toleration bypasses ALL taints ā use it for critical agents that must run everywhere like Falco or CNI plugins.
1tolerations:2 - key: node-role.kubernetes.io/control-plane3 operator: Exists4 effect: NoSchedule # Specific ā only bypass this one taintKey DaemonSet Commands
| Task | Command |
|---|---|
| List DaemonSet status | kubectl get ds -n monitoring |
| See which nodes have the pod | kubectl get pods -o wide -n monitoring |
| Check rollout status | kubectl rollout status ds/node-exporter -n monitoring |
| Force restart all pods | kubectl rollout restart ds/node-exporter -n monitoring |
| Describe DaemonSet events | kubectl describe ds/node-exporter -n monitoring |
| Check pod count vs node count | kubectl get ds node-exporter -n monitoring |
1# Verify a DaemonSet pod is running on every node2kubectl get pods -n monitoring -l app=node-exporter -o wide3 4# Output should show one pod per node:5# NAME READY NODE6# node-exporter-4xk2p 1/1 mumbai-node-17# node-exporter-7hq9r 1/1 mumbai-node-28# node-exporter-m2pzn 1/1 mumbai-node-3ā ļø Security: SettinghostNetwork: trueandhostPID: truegives the container full visibility into the host's network stack and every process running on the node. Only use these flags for trusted infrastructure agents like Node Exporter or Falco ā never for application workloads. In Hotstar or PhonePe production clusters, DaemonSet pods with host access should be reviewed as part of every security audit.
š Remember: DaemonSet pod count equals node count. If your cluster has 12 nodes and your DaemonSet shows 10 pods, two nodes have issues ā either they are tainted without a matching toleration, or the pods are failing on those nodes. Use kubectl get pods -o wide to identify which nodes are missing coverage.š“ Common Mistake: Using a Deployment with replicas matching your node count as a substitute for a DaemonSet. If a node is added later, the Deployment will not automatically place a pod on it ā you are back to manual scaling. Always use a DaemonSet for anything that must run on every node.š” Tip: In Zerodha or Swiggy-scale clusters, DaemonSets for log collection (Fluentd/Promtail) are often the highest-volume pods in the cluster. Set properresources.requestsandresources.limitson them ā an unthrottled Fluentd pod can consume enough CPU to starve application pods on the same node during log bursts.