CKA Road Trip: SSH Into a Node — Troubleshooting Commands¶
Once a node is NotReady, kubectl becomes limited. You SSH in and it's just Linux from there.
The Commands, In Order¶
# is the kubelet alive?
systemctl status kubelet
# is the container runtime alive?
systemctl status containerd
# kubelet logs — what's it complaining about?
journalctl -u kubelet -n 50 --no-pager
# disk space — is the node full?
df -h
# memory — is it under pressure?
free -m
# are containers actually running at the OS level?
crictl ps
# what does the kubelet config look like?
cat /var/lib/kubelet/config.yaml
# static pod manifests — anything broken here?
ls /etc/kubernetes/manifests/
What Each One Tells You¶
systemctl status kubelet — is the kubelet process running or dead. First thing to check, every time.
systemctl status containerd — is the container runtime up. If containerd is dead, no containers can start even if kubelet is fine.
journalctl -u kubelet -n 50 --no-pager — the last 50 kubelet log lines. This is where the actual error is. Typo in a binary name, missing config file, cert error — it'll be here.
df -h — disk pressure. A full disk kills the kubelet. Nodes with full disks go NotReady silently from kubectl's perspective.
free -m — memory pressure. Same idea — resource exhaustion shows as NotReady.
crictl ps — shows containers running at the containerd level, bypassing Kubernetes entirely. Useful when kubectl shows nothing but containers might still be running. Think of it as docker ps for the CRI layer.
cat /var/lib/kubelet/config.yaml — the kubelet's own config file. If it's malformed or missing, the kubelet won't start.
ls /etc/kubernetes/manifests/ — static pod manifests for control plane components. A broken yaml here means apiserver, etcd, scheduler, or controller-manager won't start.
The Pattern¶
The first two commands tell you if the key processes are up. journalctl tells you why if they're not. df and free rule out resource pressure. Everything else is digging deeper once you know the direction.
Fixing comes after. Troubleshoot first, understand what's broken, then fix it.