Skip to content

etcd

CKA Road Trip: Node NotReady + etcd Backup

Two tasks, one exercise.


Part 1 — Node NotReady

k get nodes
# controlplane   NotReady
k describe node controlplane
# Conditions:
#   Ready   Unknown   NodeStatusUnknown   Kubelet stopped posting node status.

The condition message is the signal. Kubelet stopped posting node status means one thing — the kubelet process is dead.

ssh controlplane
systemctl status kubelet
# Active: inactive (dead)

systemctl start kubelet
systemctl status kubelet
# Active: active (running)

exit
k get nodes
# controlplane   Ready

The kubelet was stopped. Start it, node recovers.


Part 2 — etcd Backup

Verify etcd is running first:

k get pods -n kube-system | grep etcd
# etcd-controlplane   1/1   Running

Take the snapshot:

ETCDCTL_API=3 etcdctl \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/apiserver-etcd-client.crt \
  --key=/etc/kubernetes/pki/apiserver-etcd-client.key \
  snapshot save /opt/cluster_backup.db > backup.txt 2>&1

The three certs are always required — etcd won't talk without mTLS. Find them at:

/etc/kubernetes/pki/etcd/ca.crt
/etc/kubernetes/pki/apiserver-etcd-client.crt
/etc/kubernetes/pki/apiserver-etcd-client.key

> backup.txt 2>&1 redirects both stdout and stderr to the file. Without the > before backup.txt etcdctl sees it as a second argument and throws snapshot save expects one argument.


The Diagnostic Chain

node NotReady
k describe node → "Kubelet stopped posting node status"
ssh into node
systemctl status kubelet → inactive
systemctl start kubelet
node Ready

Kubelet stopped posting node status is unambiguous. Go straight to the kubelet, don't waste time elsewhere.