HPA - Automatic Scaling Based on Real Traffic
What is HPA in Simple Terms?
HPA is your automatic traffic manager. During Swiggy's dinner rush (7pm–9pm), order volume spikes 10x. HPA detects that CPU is hitting 80%, automatically adds more pods to handle the load, then scales back down at midnight when traffic drops — saving infrastructure costs without any human intervention.
How HPA Decides to Scale
+--------------------------------------------------+| HPA controller polls Metrics Server every 15s | <- continuous monitoring+--------------------------------------------------+ | v+--------------------------------------------------+| Current avg CPU across pods: 85% || Target avg CPU configured: 70% || Pods needed = ceil(current / target * replicas) |+--------------------------------------------------+ | +-----------+-----------+ | | v v+---------------------+ +---------------------+| CPU > target | | CPU < target || -> Scale UP | | -> Scale DOWN || Add pods (up to max)| | Remove pods (to min) |+---------------------+ +---------------------+The scale formula Kubernetes uses: desiredReplicas = ceil(currentReplicas * (currentMetric / targetMetric))
Example HPA Manifest
1# hpa.yaml — autoscale the order-service based on CPU and memory2apiVersion: autoscaling/v23kind: HorizontalPodAutoscaler4metadata:5 name: order-service-hpa6 namespace: production7spec:8 scaleTargetRef:9 apiVersion: apps/v110 kind: Deployment11 name: order-service12 minReplicas: 3 # Always keep at least 3 pods — never scale to zero13 maxReplicas: 20 # Hard ceiling — prevents runaway scaling14 metrics:15 - type: Resource16 resource:17 name: cpu18 target:19 type: Utilization20 averageUtilization: 70 # Scale when avg CPU across all pods hits 70%21 - type: Resource22 resource:23 name: memory24 target:25 type: Utilization26 averageUtilization: 80 # Also scale if memory pressure builds upDeployment Must Have Resource Requests Defined
HPA cannot calculate utilization percentage without knowing what the pod requested. This is a mandatory prerequisite:
1# deployment.yaml — resources.requests MUST be set for HPA to work2spec:3 containers:4 - name: order-service5 image: registry.swiggy.in/order-service:v3.1.06 resources:7 requests:8 cpu: "250m" # HPA uses this as the baseline for % calculation9 memory: "256Mi"10 limits:11 cpu: "1000m"12 memory: "512Mi"Checking HPA Status
1# See current replica count, targets, and scaling activity2kubectl get hpa -n production3 4# Output:5# NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS6# order-service-hpa Deployment/order-svc 68%/70% 3 20 87 8# Detailed status including last scale event9kubectl describe hpa order-service-hpa -n production10 11# Watch HPA react to live traffic in real time12kubectl get hpa order-service-hpa -n production -wScaling Behavior — Controlling Scale Up and Scale Down Speed
By default, HPA can be aggressive on scale-up and slow on scale-down. You can tune this:
1# Add behavior block to control scaling velocity2spec:3 behavior:4 scaleUp:5 stabilizationWindowSeconds: 0 # Scale up immediately when needed6 policies:7 - type: Pods8 value: 4 # Add at most 4 pods per scaling event9 periodSeconds: 6010 scaleDown:11 stabilizationWindowSeconds: 300 # Wait 5 minutes before scaling down12 policies:13 - type: Percent14 value: 10 # Remove at most 10% of pods per minute15 periodSeconds: 60The stabilizationWindowSeconds on scale-down prevents flapping — HPA won't remove pods immediately after a traffic spike drops, giving headroom for the next wave.
Troubleshooting HPA
| Symptom | Likely Cause | Fix |
|---|---|---|
HPA shows <unknown>/70% |
Metrics Server not installed | Install metrics-server in kube-system |
| HPA not scaling up | resources.requests not set on deployment |
Add CPU/memory requests to pod spec |
HPA stuck at minReplicas |
Current CPU below threshold | Check actual usage with kubectl top pods |
| HPA scaling too aggressively | No stabilization window | Add behavior.scaleDown.stabilizationWindowSeconds |
Replicas hit maxReplicas and stop |
Max ceiling reached | Raise maxReplicas or investigate pod performance |
1# Verify Metrics Server is working (prerequisite for HPA)2kubectl top pods -n production3 4# Check why HPA is not scaling — events section is key5kubectl describe hpa order-service-hpa -n production | grep -A 20 Events6 7# Manually simulate load to test HPA behavior8kubectl run load-gen --image=busybox -it --rm -- \9 /bin/sh -c "while true; do wget -q -O- http://order-service.production; done"📌 Remember: HPA requires the Metrics Server to be installed in the cluster. Without it, HPA cannot read CPU or memory metrics and will report<unknown>targets — staying completely inactive. Runkubectl top podsto verify Metrics Server is working before creating any HPA.
🔴 Common Mistake: SettingminReplicas: 1in production. If that single pod is being replaced during a scale-up event, your service has zero availability for the seconds it takes to start a new pod. Always setminReplicas: 3or higher for any production workload at Razorpay or PhonePe scale.
💡 Tip: Set scaleDown.stabilizationWindowSeconds: 300 in the HPA behavior block. Without this, HPA will remove pods immediately after a traffic spike drops — only to add them back 2 minutes later when the next spike arrives. The stabilization window prevents this flapping and keeps your pod count stable during volatile traffic patterns like Hotstar's live streaming events.⚠️ Security: HPA withmaxReplicasset too high can become a cost explosion vector. A traffic spike — or a DDoS — can trigger HPA to spin up hundreds of pods, consuming all cluster node capacity and triggering expensive cloud autoscaling. Always set a sensiblemaxReplicasceiling and configure cluster-level resource quotas per namespace to cap total pod resource consumption.