Running StatefulSets for Databases on Kubernetes
Overview and What You Will Learn
Regular Deployments treat every pod as identical and interchangeable β perfect for stateless APIs but catastrophic for databases where pod identity, startup order, and storage persistence are critical. StatefulSets solve this by giving each pod a stable, predictable identity, its own dedicated PersistentVolumeClaim, and strict ordered deployment and termination guarantees. This lab walks you through deploying PostgreSQL, Redis, and a multi-node database cluster on Kubernetes using StatefulSets with production-grade configuration.
By the end of this guide you will be able to:
- Understand the core differences between Deployments and StatefulSets and when to use each
- Deploy a single-instance PostgreSQL database using a StatefulSet with persistent storage
- Configure a Redis cluster using StatefulSets with stable network identities
- Set up a primary-replica PostgreSQL configuration with ordered pod startup
- Troubleshoot common StatefulSet failures including stuck termination and PVC binding issues
Why This Matters in Production
Zerodha runs PostgreSQL for trade records and MySQL for user accounts directly on Kubernetes using StatefulSets. The ordered startup guarantee means the primary database pod always initialises and becomes ready before replica pods attempt to connect and begin replication β preventing the split-brain scenarios that plague manually managed database clusters.
At Razorpay, Redis is deployed as a StatefulSet cluster where each node has a stable DNS name (redis-0.redis, redis-1.redis, redis-2.redis) that never changes even after pod restarts. Application code hardcodes these stable names rather than dynamic pod IPs β impossible with a regular Deployment.
Core Principles
StatefulSet vs Deployment β the critical differences: DEPLOYMENT STATEFULSET ββββββββββ βββββββββββ Pod names Random suffix Stable ordinal api-7d9f8b-xkp2q postgres-0 api-7d9f8b-mn3lp postgres-1 postgres-2 Pod identity Interchangeable Unique and stable Storage Shared or none Each pod gets its own dedicated PVC (postgres-data-0, postgres-data-1) Startup order All pods start Ordered: pod-0 must simultaneously be Ready before pod-1 starts Termination All pods stop Reverse order: simultaneously pod-2 β pod-1 β pod-0 DNS Service IP only Per-pod DNS: pod-0.service.ns.svc.cluster.local
When to use StatefulSet vs Deployment: Use StatefulSet when: Use Deployment when: ββββββββββββββββββββ ββββββββββββββββββ
Databases (PostgreSQL, MySQL) * REST APIs Message queues (Kafka, RabbitMQ) * Web servers (NGINX, Express) Caches with persistence (Redis) * Background workers (stateless) Search engines (Elasticsearch) * Any app with no local state Any app needing stable pod DNS * Any app that is truly stateless
Detailed Step-by-Step Practical Lab
Step 1 β Create the Headless Service for Stable Pod DNS
StatefulSets require a Headless Service β a Service with clusterIP: None that creates individual DNS entries for each pod instead of a single load-balanced IP:
1# headless-service-postgres.yaml2apiVersion: v13kind: Service4metadata:5 name: postgres6 namespace: production7 labels:8 app: postgres9spec:10 clusterIP: None # This makes it a Headless Service11 selector:12 app: postgres13 ports:14 - name: postgres15 port: 543216 targetPort: 54321kubectl apply -f headless-service-postgres.yaml2 3# This creates DNS entries for each pod:4# postgres-0.postgres.production.svc.cluster.local β pod IP of postgres-05# postgres-1.postgres.production.svc.cluster.local β pod IP of postgres-16# postgres-2.postgres.production.svc.cluster.local β pod IP of postgres-27 8# Also create a regular Service for client connections (load balances reads)9kubectl apply -f - <<EOF10apiVersion: v111kind: Service12metadata:13 name: postgres-primary14 namespace: production15spec:16 selector:17 app: postgres18 role: primary # Only route to the primary pod19 ports:20 - port: 543221 targetPort: 543222EOFπ Remember: The Headless Service name must match the serviceName field in your StatefulSet spec β this is what enables the stable per-pod DNS names. Getting this wrong is the most common StatefulSet configuration mistake.Step 2 β Deploy Single-Instance PostgreSQL StatefulSet
1# statefulset-postgres.yaml β production PostgreSQL on Kubernetes2apiVersion: apps/v13kind: StatefulSet4metadata:5 name: postgres6 namespace: production7spec:8 serviceName: "postgres" # Must match the Headless Service name9 replicas: 1 # Start single β add replicas for HA10 selector:11 matchLabels:12 app: postgres13 template:14 metadata:15 labels:16 app: postgres17 role: primary18 spec:19 terminationGracePeriodSeconds: 60 # Give PostgreSQL time to flush WAL20 securityContext:21 fsGroup: 999 # postgres UID β sets volume ownership22 runAsUser: 99923 runAsNonRoot: true24 initContainers:25 # Fix permissions on the data directory before PostgreSQL starts26 - name: fix-permissions27 image: busybox:1.3528 command: ["sh", "-c", "chown -R 999:999 /var/lib/postgresql/data"]29 volumeMounts:30 - name: postgres-data31 mountPath: /var/lib/postgresql/data32 securityContext:33 runAsUser: 0 # Run as root for chown only34 containers:35 - name: postgres36 image: postgres:15.437 ports:38 - containerPort: 543239 name: postgres40 env:41 - name: POSTGRES_DB42 value: "zerodha_trading"43 - name: POSTGRES_USER44 valueFrom:45 secretKeyRef:46 name: postgres-credentials47 key: username48 - name: POSTGRES_PASSWORD49 valueFrom:50 secretKeyRef:51 name: postgres-credentials52 key: password53 - name: PGDATA54 value: "/var/lib/postgresql/data/pgdata" # Subdirectory avoids lost+found55 - name: POSTGRES_INITDB_ARGS56 value: "--encoding=UTF8 --auth-host=scram-sha-256"57 resources:58 requests:59 cpu: "500m"60 memory: "1Gi"61 limits:62 cpu: "4"63 memory: "8Gi"64 livenessProbe:65 exec:66 command:67 - pg_isready68 - -U69 - $(POSTGRES_USER)70 - -d71 - $(POSTGRES_DB)72 initialDelaySeconds: 3073 periodSeconds: 1074 failureThreshold: 675 readinessProbe:76 exec:77 command:78 - pg_isready79 - -U80 - $(POSTGRES_USER)81 - -d82 - $(POSTGRES_DB)83 initialDelaySeconds: 584 periodSeconds: 585 failureThreshold: 386 volumeMounts:87 - name: postgres-data88 mountPath: /var/lib/postgresql/data89 - name: postgres-config90 mountPath: /etc/postgresql/postgresql.conf91 subPath: postgresql.conf92 volumeClaimTemplates: # Each pod gets its own PVC automatically93 - metadata:94 name: postgres-data95 labels:96 app: postgres97 spec:98 accessModes: ["ReadWriteOnce"]99 storageClassName: gp3-encrypted100 resources:101 requests:102 storage: 100Gi1kubectl apply -f statefulset-postgres.yaml2 3# Watch ordered pod startup4kubectl get pods -n production -w5# NAME READY STATUS RESTARTS6# postgres-0 0/1 ContainerCreating 0 β starts first7# postgres-0 0/1 Running 08# postgres-0 1/1 Running 0 β must be Ready before replicas start9 10# Verify PVC was automatically created11kubectl get pvc -n production12# NAME STATUS VOLUME CAPACITY13# postgres-data-postgres-0 Bound pvc-a1b2c3d4-... 100GiStep 3 β Deploy Redis as a StatefulSet Cluster
1# statefulset-redis.yaml β Redis cluster with stable pod identities2apiVersion: v13kind: ConfigMap4metadata:5 name: redis-config6 namespace: production7data:8 redis.conf: |9 maxmemory 2gb10 maxmemory-policy allkeys-lru11 appendonly yes12 appendfsync everysec13 save 900 114 save 300 1015 save 60 1000016apiVersion: apps/v117kind: StatefulSet18metadata:19 name: redis20 namespace: production21spec:22 serviceName: "redis"23 replicas: 3 # 3-node Redis cluster24 selector:25 matchLabels:26 app: redis27 template:28 metadata:29 labels:30 app: redis31 spec:32 terminationGracePeriodSeconds: 3033 containers:34 - name: redis35 image: redis:7.236 command: ["redis-server", "/etc/redis/redis.conf"]37 ports:38 - containerPort: 637939 name: redis40 resources:41 requests:42 cpu: "250m"43 memory: "512Mi"44 limits:45 cpu: "1"46 memory: "2Gi"47 livenessProbe:48 exec:49 command: ["redis-cli", "ping"]50 initialDelaySeconds: 1551 periodSeconds: 1052 readinessProbe:53 exec:54 command: ["redis-cli", "ping"]55 initialDelaySeconds: 556 periodSeconds: 557 volumeMounts:58 - name: redis-data59 mountPath: /data60 - name: redis-config61 mountPath: /etc/redis62 volumes:63 - name: redis-config64 configMap:65 name: redis-config66 volumeClaimTemplates:67 - metadata:68 name: redis-data69 spec:70 accessModes: ["ReadWriteOnce"]71 storageClassName: gp3-encrypted72 resources:73 requests:74 storage: 20Gi1kubectl apply -f statefulset-redis.yaml2 3# Watch all 3 Redis pods start in strict order4kubectl get pods -n production -w5# redis-0 1/1 Running 0 β starts and becomes Ready first6# redis-1 1/1 Running 0 β starts only after redis-0 is Ready7# redis-2 1/1 Running 0 β starts only after redis-1 is Ready8 9# Connect to Redis and verify cluster10kubectl exec -it redis-0 -n production -- redis-cli ping11# PONG12 13# Each pod has a stable DNS name β application connects using these14# redis-0.redis.production.svc.cluster.local:637915# redis-1.redis.production.svc.cluster.local:637916# redis-2.redis.production.svc.cluster.local:6379Step 4 β Perform a Rolling Update on a StatefulSet
1# Update PostgreSQL image version2kubectl set image statefulset/postgres \3 postgres=postgres:15.5 \4 -n production5 6# Watch ordered rolling update β updates in reverse order (pod-2 first, pod-0 last)7kubectl rollout status statefulset/postgres -n production8# Waiting for 1 pods to be ready...9# statefulset rolling update complete 1 pods at revision postgres-6d8f9b...10 11# Check rollout history12kubectl rollout history statefulset/postgres -n production13 14# Rollback if needed15kubectl rollout undo statefulset/postgres -n productionπ‘ Tip: StatefulSet rolling updates go in reverse ordinal order β pod-2 is updated first, then pod-1, then pod-0. For primary-replica databases this means replicas are updated before the primary, which is the safe order. Always verify replication lag is zero before each pod update completes.
Step 5 β Scale a StatefulSet Up and Down Safely
1# Scale up β new pods start in order (pod-1 after pod-0 is Ready)2kubectl scale statefulset postgres -n production --replicas=33 4# Watch ordered scale-up5kubectl get pods -n production -w6# postgres-0 1/1 Running 07# postgres-1 0/1 Pending 0 β starts after postgres-0 is Ready8# postgres-1 1/1 Running 09# postgres-2 0/1 Pending 0 β starts after postgres-1 is Ready10# postgres-2 1/1 Running 011 12# Scale down β pods terminate in reverse order (pod-2 first)13kubectl scale statefulset postgres -n production --replicas=114 15# CRITICAL: Scaling down does NOT delete PVCs16# PVCs for postgres-1 and postgres-2 still exist after scale-down17kubectl get pvc -n production | grep postgres18# postgres-data-postgres-0 Bound 100Gi β active19# postgres-data-postgres-1 Bound 100Gi β orphaned β delete manually if not needed20# postgres-data-postgres-2 Bound 100Gi β orphaned β delete manually if not neededβ οΈ Security: Never delete orphaned PVCs automatically. Kubernetes intentionally keeps them to prevent accidental data loss. Review and manually delete them only after confirming the data is either replicated elsewhere or no longer needed.
Step 6 β Troubleshoot Common StatefulSet Failures
1# Problem 1 β Pod stuck in Terminating state2kubectl get pods -n production3# postgres-0 1/1 Terminating 0 48m β stuck4 5# Cause: The pod has a finalizer or the node is unresponsive6# Check for finalizers7kubectl get pod postgres-0 -n production -o jsonpath='{.metadata.finalizers}'8 9# Force delete as last resort (data loss risk β only if node is dead)10kubectl delete pod postgres-0 -n production --force --grace-period=011 12# Problem 2 β PVC stuck in Pending after scale-up13kubectl describe pvc postgres-data-postgres-1 -n production14# Events: ProvisioningFailed: no nodes available in zone ap-south-1a15# Cause: WaitForFirstConsumer mode β pod must be scheduled first16# Fix: Ensure the pod is scheduled before checking PVC status17 18# Problem 3 β Pod-1 stuck in Init state waiting for pod-019kubectl get pods -n production20# postgres-0 0/1 Running 0 β not Ready yet (probe failing)21# postgres-1 0/1 Init:0/1 0 β waiting for postgres-0 to be Ready22 23# Check why postgres-0 is not passing readiness probe24kubectl describe pod postgres-0 -n production25kubectl logs postgres-0 -n productionProduction Best Practices & Common Pitfalls
- Always set
terminationGracePeriodSecondsto at least 60 for databases. The default 30 seconds is too short for PostgreSQL to complete a checkpoint and flush WAL β abrupt termination risks data corruption. - Use
podManagementPolicy: Parallelonly for StatefulSets where pods are truly independent β like Elasticsearch data nodes. Never use it for primary-replica databases where order matters. - Monitor replication lag on all replica pods. A replica that falls too far behind the primary will cause data loss if the primary fails before the replica catches up.
- Back up PVCs using Velero with volume snapshots on a schedule β at minimum daily, ideally every hour for financial transaction databases.
- Use
updateStrategy: RollingUpdatewithpartitionduring major database version upgrades β this lets you upgrade one pod at a time and pause the rollout to verify replication before continuing.
π΄ Common Mistake: Deleting a StatefulSet with kubectl delete statefulset postgres thinking it will also clean up PVCs. It does not β PVCs are intentionally orphaned. But the pods are deleted, leaving your database inaccessible until the StatefulSet is recreated and the pods rebind to the orphaned PVCs. Always scale to zero first, verify, then delete.Quick Reference & Troubleshooting Commands
| Command | Purpose |
|---|---|
kubectl get statefulset -n <ns> |
List all StatefulSets and replica counts |
kubectl describe statefulset <name> -n <ns> |
Full StatefulSet config and events |
kubectl get pods -n <ns> -w |
Watch ordered pod startup and termination |
kubectl scale statefulset <name> --replicas=<n> -n <ns> |
Scale StatefulSet up or down |
kubectl rollout status statefulset <name> -n <ns> |
Watch rolling update progress |
kubectl rollout undo statefulset <name> -n <ns> |
Rollback to previous StatefulSet revision |
kubectl exec -it <name>-0 -n <ns> -- bash |
Shell into the primary pod (ordinal 0) |
kubectl get pvc -n <ns> | grep <statefulset-name> |
List PVCs created by a StatefulSet |
kubectl delete pod <name>-0 -n <ns> --force --grace-period=0 |
Force delete stuck Terminating pod |
kubectl get pod <name>-0 -n <ns> -o jsonpath='{.metadata.finalizers}' |
Check for blocking finalizers |