ImagePullBackOff ā Why Your Pod Can't Start
What Is ImagePullBackOff in Simple Terms?
Before a pod can run, Kubernetes must pull the container image onto the node. ImagePullBackOff means that pull is failing ā and Kubernetes is retrying with increasing wait times (10s, 20s, 40s... up to 5 minutes) rather than hammering the registry endlessly.
+------------------------------------------+| Pod scheduled on mumbai-worker-1 | <- Step 1: Scheduler picks a node+------------------------------------------+ | v+------------------------------------------+| Kubelet tries to pull the image | <- Step 2: Contacts the registry+------------------------------------------+ | v+------------------------------------------+| Pull fails | <- Step 3: Registry rejects or unreachable+------------------------------------------+ | v+------------------------------------------+| Status: ErrImagePull | <- Step 4: First failure label+------------------------------------------+ | v+------------------------------------------+| Kubelet retries with backoff timer | <- Step 5: Retry at 10s, 20s, 40s...| Status: ImagePullBackOff |+------------------------------------------+š Remember:ErrImagePullandImagePullBackOffare the same root problem.ErrImagePullis the first attempt,ImagePullBackOffis what you see once Kubernetes starts spacing out retries. Always fix the underlying cause ā retries will not resolve it on their own.
The Four Root Causes ā In Order of Frequency
+------------------------------------------+| Cause 1: Wrong image name or tag | <- Most common. Typo in deployment YAML+------------------------------------------+| Cause 2: Image does not exist in registry| <- Tag was deleted or never pushed+------------------------------------------+| Cause 3: Missing registry credentials | <- Private registry, no imagePullSecret+------------------------------------------+| Cause 4: Registry unreachable from node | <- Network issue, VPC firewall, DNS+------------------------------------------+How to Diagnose ā Step by Step
1# Step 1 ā Read the exact error message from pod events2kubectl describe pod <pod-name> -n production3 4# Look for this section at the bottom of the output:5# Events:6# Type Reason Message7# ---- ------ -------8# Warning Failed Failed to pull image "registry.razorpay.in/api:v2.5":9# rpc error: code = Unknown desc = pulling image:10# unauthorized: authentication required11 12# Step 2 ā Confirm what image name is in the spec13kubectl get pod <pod-name> -n production \14 -o jsonpath='{.spec.containers[*].image}'15# Output: registry.razorpay.in/api:v2.5Fix 1 ā Wrong Image Name or Tag
The most common cause. A tag that does not exist on the registry returns a manifest unknown error.
1# deployment.yaml ā verify these three parts exactly2spec:3 containers:4 - name: api5 image: registry.razorpay.in/api:v2.5.1 # registry / name / tag6 # ^^^^^^^^^^^^^^^^^ ^^^ ^^^^^7 # registry host repo tag ā all three must be exact1# Verify the tag exists in the registry before deploying2# For Docker Hub:3docker pull nginx:1.25-alpine4 5# For a private ECR registry:6aws ecr describe-images \7 --repository-name api \8 --region ap-south-1 \9 --query 'imageDetails[*].imageTags'Fix 2 ā Private Registry Needs Authentication
When pulling from a private registry (ECR, GCR, self-hosted Harbor), the node needs credentials. These are stored as a Kubernetes Secret of type kubernetes.io/dockerconfigjson and referenced in the pod spec as imagePullSecrets.
1# Create the registry credential Secret2kubectl create secret docker-registry ecr-credentials \3 --docker-server=905418385260.dkr.ecr.ap-south-1.amazonaws.com \4 --docker-username=AWS \5 --docker-password=$(aws ecr get-login-password --region ap-south-1) \6 -n production1# deployment.yaml ā reference the Secret in the pod spec2spec:3 imagePullSecrets:4 - name: ecr-credentials # Must exist in the same namespace as the pod5 containers:6 - name: api7 image: 905418385260.dkr.ecr.ap-south-1.amazonaws.com/api:v2.5.1ā ļø Security: ECR tokens expire every 12 hours. For production clusters at Razorpay or Zerodha, use the amazon-ecr-credential-helper or an IRSA-based solution instead of a static Secret ā static tokens cause ImagePullBackOff every 12 hours when the token silently expires.Fix 3 ā Registry Unreachable from Node
If the image name and credentials are correct but the pull still fails, the node cannot reach the registry. This is common when private registries are inside a VPC and worker nodes are in a different subnet.
1# SSH into the affected node and test registry connectivity directly2ssh rahul@10.0.1.503 4# Test TCP connectivity to the registry5nc -zv registry.razorpay.in 4436# Expected: Connection to registry.razorpay.in 443 port [tcp/https] succeeded!7 8# Test DNS resolution of the registry hostname9nslookup registry.razorpay.in10# If this fails, check your VPC DNS settings and /etc/resolv.conf on the node11 12# Attempt a manual pull from the node13crictl pull registry.razorpay.in/api:v2.5.114# The error output here is far more detailed than what kubectl showsQuick Troubleshooting Reference
| Error Message | Root Cause | Fix |
|---|---|---|
manifest unknown |
Tag does not exist in the registry | Push the correct tag or fix the tag name in the deployment |
unauthorized: authentication required |
Missing or expired credentials | Create or refresh imagePullSecrets |
pull access denied |
Image is private, wrong credentials | Verify username and password in the docker-registry Secret |
no such host |
Registry DNS not resolving on the node | Check node /etc/resolv.conf and VPC DNS configuration |
connection refused |
Registry unreachable on port 443 | Check firewall rules and security group for the node |
context deadline exceeded |
Network timeout to registry | Check VPC route tables, NAT gateway, or proxy config |
1# Full diagnostic sequence ā run these in order2# 1. Read the exact error3kubectl describe pod <pod-name> -n production | grep -A 10 Events4 5# 2. Confirm the image string in the spec6kubectl get pod <pod-name> -n production -o jsonpath='{.spec.containers[*].image}'7 8# 3. Check if imagePullSecret exists in the right namespace9kubectl get secret ecr-credentials -n production10 11# 4. Test the secret content is valid12kubectl get secret ecr-credentials -n production \13 -o jsonpath='{.data.\.dockerconfigjson}' | base64 -d | jq14 15# 5. Force a fresh pull attempt by deleting and recreating the pod16kubectl delete pod <pod-name> -n production17# The deployment controller will recreate it immediatelyš“ Common Mistake: Creating theimagePullSecretsSecret in thedefaultnamespace but deploying the pod toproduction. Secrets are namespace-scoped ā the Secret must exist in the same namespace as the pod that references it. A Secret indefaultis completely invisible to pods inproduction.
š” Tip: On EKS clusters, attach theAmazonEC2ContainerRegistryReadOnlyIAM policy to your node group's IAM role. Worker nodes will authenticate to ECR automatically without anyimagePullSecretsā ECR credentials are handled transparently via instance metadata. This is the cleanest solution for Hotstar or Swiggy scale where dozens of services pull from the same ECR registry. ===