Production Best Practices & Common Pitfalls
- Always test cross-namespace connectivity after adding any NetworkPolicy. A policy in the target namespace affects all inbound traffic including from other namespaces that previously worked without any policy.
- Use Cilium with Hubble in production β the real-time policy trace and drop visibility is worth the migration cost. Debugging NetworkPolicies without Hubble is guesswork.
- Label your namespaces explicitly with
kubernetes.io/metadata.nameβ this label is auto-applied in Kubernetes 1.21+ and is required for reliable namespace-based NetworkPolicy selectors. - Monitor CoreDNS with Prometheus and alert on
coredns_dns_response_rcode_count_total{rcode="SERVFAIL"}β a spike indicates upstream DNS failure that will cause cascading service discovery failures across the entire cluster. - Never run tcpdump directly on a node in production without approval β packet capture on a financial services cluster is a compliance event that must be logged and justified.
π΄ Common Mistake: Checking pod logs to diagnose networking issues. Application logs say "connection refused" or "timeout" β they cannot tell you whether the failure is at Layer 2 (CNI), Layer 3 (kube-proxy), Layer 4 (DNS), or Layer 5 (NetworkPolicy). Always use network-level tools like netshoot, not application logs, for network debugging.
Quick Reference & Troubleshooting Commands
| Command | Purpose |
|---|---|
kubectl run netshoot --image=nicolaka/netshoot -it --rm -n <ns> -- bash |
Launch network debug container |
kubectl debug -it --image=nicolaka/netshoot --target=<pod> <pod> -n <ns> -- bash |
Debug sharing pod's network namespace |
kubectl get endpoints <service> -n <ns> |
Verify pods are registered as Service backends |
kubectl get networkpolicies -n <ns> |
List all NetworkPolicies in a namespace |
kubectl get pods -n kube-system | grep coredns |
Check CoreDNS pod health |
kubectl logs -n kube-system -l k8s-app=kube-dns |
CoreDNS logs for DNS failure diagnosis |
nslookup <service>.<ns>.svc.cluster.local |
Test DNS resolution from inside a pod |
curl -v http://<clusterip>:<port>/path |
Test Service IP directly bypassing DNS |
kubectl get pods -n production -o wide |
Show pod IPs and which node they are on |
kubectl logs -n kube-system kube-proxy-<id> |
kube-proxy logs for Service routing issues |