Skip to content

CKA Road Trip: Deployment Has 0 Pods — How to Actually Diagnose It

After fixing a controller manager crash, I assumed 0 pods always meant a broken controller manager. Wrong. Events: <none> is the specific signal. Here's the full diagnostic flow.


Not Always the Controller Manager

0 pods on a deployment has multiple causes. The controller manager is one of them — but not the only one. Getting the diagnosis right means reading the signals in order.


Step 1 — Check the Obvious First

k get deploy video-app -o yaml | grep -E 'replicas|paused'

Replicas set to 0:

spec:
  replicas: 0   # someone scaled it down
Not a bug. Fix: k scale deploy video-app --replicas=2

Deployment paused:

spec:
  paused: true   # deployment is paused, won't create pods
Fix: k rollout resume deploy video-app


Step 2 — Read the Events

k describe deploy video-app
# look at the Events section at the bottom

Events: <none> with replicas > 0 and not paused: Nobody is acting on the deployment. The controller manager isn't running. Go check it:

k get pods -n kube-system | grep controller-manager

Events present — scheduling failure:

FailedScheduling — 0/2 nodes available: insufficient memory
Pod objects were created but couldn't be scheduled. Node issue, resource issue, taint/toleration mismatch.

Events present — quota exceeded:

FailedCreate — exceeded quota: pods, requested: 2, used: 10, limited: 10
ResourceQuota in the namespace is blocking pod creation.

Events present — image pull failure:

Failed to pull image "nginx:wrongtag": not found
Pod was created and scheduled but container can't start.


The Diagnostic Flow

deployment has 0 pods
check replicas field — is it 0?
        ↓ no
check paused field — is it true?
        ↓ no
k describe deploy → read Events
Events: <none>
→ controller manager down
→ k get pods -n kube-system | grep controller-manager

Events: FailedScheduling
→ node/resource/taint issue
→ k describe pod, k get nodes

Events: FailedCreate
→ quota exceeded
→ k get resourcequota -n <namespace>

Events: image pull error
→ wrong image tag or missing registry credentials
→ k describe pod → check image name

The One Signal Worth Memorising

Events: <none> on a deployment with replicas > 0 and not paused = controller manager is the problem. Every other cause leaves events. Silence is the specific fingerprint of a dead controller manager.

Everything else — read the events. They tell you exactly what went wrong.