CKA Road Trip: Node NotReady + etcd Backup¶

Two tasks, one exercise.

Part 1 — Node NotReady¶

k get nodes
# controlplane   NotReady

k describe node controlplane
# Conditions:
#   Ready   Unknown   NodeStatusUnknown   Kubelet stopped posting node status.

The condition message is the signal. Kubelet stopped posting node status means one thing — the kubelet process is dead.

ssh controlplane
systemctl status kubelet
# Active: inactive (dead)

systemctl start kubelet
systemctl status kubelet
# Active: active (running)

exit
k get nodes
# controlplane   Ready

The kubelet was stopped. Start it, node recovers.

Part 2 — etcd Backup¶

Verify etcd is running first:

k get pods -n kube-system | grep etcd
# etcd-controlplane   1/1   Running

Take the snapshot:

ETCDCTL_API=3 etcdctl \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/apiserver-etcd-client.crt \
  --key=/etc/kubernetes/pki/apiserver-etcd-client.key \
  snapshot save /opt/cluster_backup.db > backup.txt 2>&1

The three certs are always required — etcd won't talk without mTLS. Find them at:

/etc/kubernetes/pki/etcd/ca.crt
/etc/kubernetes/pki/apiserver-etcd-client.crt
/etc/kubernetes/pki/apiserver-etcd-client.key

> backup.txt 2>&1 redirects both stdout and stderr to the file. Without the > before backup.txt etcdctl sees it as a second argument and throws snapshot save expects one argument.

The Diagnostic Chain¶

node NotReady
    ↓
k describe node → "Kubelet stopped posting node status"
    ↓
ssh into node
    ↓
systemctl status kubelet → inactive
    ↓
systemctl start kubelet
    ↓
node Ready

Kubelet stopped posting node status is unambiguous. Go straight to the kubelet, don't waste time elsewhere.