alexsusanu@docs:Kubernetes Commands & Troubleshooting $
alexsusanu@docs
:~$ cat Kubernetes Commands & Troubleshooting.md

HomeNOTES → Kubernetes Commands & Troubleshooting

Kubernetes Commands & Troubleshooting

category: DevOps
tags: kubernetes, k8s, kubectl, troubleshooting, debugging, monitoring

Essential kubectl Commands

What kubectl is: The command-line tool for interacting with Kubernetes clusters, providing a way to deploy applications, inspect and manage cluster resources, and view logs.

Why it matters: kubectl is your primary interface to Kubernetes. Mastering kubectl commands is essential for daily operations, troubleshooting, and cluster management.

Basic Resource Operations

Get resources:

# List resources
kubectl get pods                                   # List pods in current namespace
kubectl get pods -A                               # List pods in all namespaces
kubectl get pods -o wide                          # Additional info (node, IP, etc.)
kubectl get pods --show-labels                    # Show pod labels
kubectl get pods -l app=nginx                     # Filter by labels
kubectl get pods --field-selector=status.phase=Running  # Filter by field

# Different output formats
kubectl get pods -o yaml                          # YAML output
kubectl get pods -o json                          # JSON output
kubectl get pods -o jsonpath='{.items[*].metadata.name}'  # Custom output
kubectl get pods -o custom-columns=NAME:.metadata.name,STATUS:.status.phase

Describe resources:

kubectl describe pod <pod-name>                   # Detailed pod information
kubectl describe node <node-name>                 # Node details
kubectl describe service <service-name>           # Service details
kubectl describe deployment <deployment-name>     # Deployment details

Create and apply resources:

kubectl create -f manifest.yaml                   # Create resources from file
kubectl apply -f manifest.yaml                    # Apply configuration (create or update)
kubectl apply -f directory/                       # Apply all YAML files in directory
kubectl apply -R -f manifests/                    # Recursively apply files

# Create resources imperatively
kubectl create deployment nginx --image=nginx:1.21
kubectl create service clusterip nginx --tcp=80:80
kubectl create configmap app-config --from-literal=key=value
kubectl create secret generic app-secret --from-literal=password=secret123

Delete resources:

kubectl delete pod <pod-name>                     # Delete specific pod
kubectl delete -f manifest.yaml                   # Delete resources defined in file
kubectl delete deployment,service nginx           # Delete multiple resource types
kubectl delete pods --all                         # Delete all pods in namespace
kubectl delete pods -l app=nginx                  # Delete pods matching label
kubectl delete namespace <namespace-name>         # Delete namespace and all resources

Advanced kubectl Operations

Edit resources:

kubectl edit pod <pod-name>                       # Edit pod in default editor
kubectl edit deployment <deployment-name>         # Edit deployment
kubectl patch pod <pod-name> -p '{"spec":{"containers":[{"name":"app","image":"nginx:1.22"}]}}'

Scale resources:

kubectl scale deployment <deployment-name> --replicas=5
kubectl scale --replicas=3 -f deployment.yaml
kubectl autoscale deployment <deployment-name> --min=2 --max=10 --cpu-percent=80

Resource usage:

kubectl top nodes                                 # Node resource usage
kubectl top pods                                  # Pod resource usage
kubectl top pods --containers                     # Container resource usage
kubectl top pods -l app=nginx                     # Resource usage for labeled pods

Logs and Debugging

View logs:

kubectl logs <pod-name>                           # Pod logs
kubectl logs <pod-name> -c <container-name>       # Specific container logs
kubectl logs -f <pod-name>                        # Follow logs (tail -f)
kubectl logs --previous <pod-name>                # Previous container instance logs
kubectl logs -l app=nginx                         # Logs from all pods with label
kubectl logs --since=1h <pod-name>                # Logs from last hour
kubectl logs --tail=100 <pod-name>                # Last 100 lines

Execute commands in containers:

kubectl exec <pod-name> -- ls /                   # Execute single command
kubectl exec -it <pod-name> -- /bin/bash          # Interactive shell
kubectl exec -it <pod-name> -c <container> -- /bin/sh  # Specific container
kubectl exec <pod-name> -- env                    # View environment variables

Port forwarding:

kubectl port-forward pod/<pod-name> 8080:80       # Forward local port to pod
kubectl port-forward service/<service-name> 8080:80  # Forward to service
kubectl port-forward deployment/<deployment-name> 8080:80  # Forward to deployment
kubectl port-forward --address 0.0.0.0 pod/<pod-name> 8080:80  # Listen on all interfaces

Copy files:

kubectl cp <pod-name>:/path/to/file ./local-file  # Copy from pod to local
kubectl cp ./local-file <pod-name>:/path/to/file  # Copy from local to pod
kubectl cp <pod-name>:/path/to/directory ./local-directory -c <container-name>

Troubleshooting Techniques

What troubleshooting involves: Systematic approach to identifying, diagnosing, and resolving issues in Kubernetes clusters and applications.

Why it's important: Kubernetes complexity means issues can occur at multiple layers - infrastructure, networking, storage, application, and configuration. Effective troubleshooting minimizes downtime and maintains reliability.

Pod Troubleshooting

Pod states and issues:

# Check pod status
kubectl get pods                                  # Overall pod status
kubectl describe pod <pod-name>                   # Detailed status and events

# Common pod states
# Pending - Pod accepted but not scheduled
# Running - Pod is executing
# Succeeded - All containers terminated successfully  
# Failed - All containers terminated, at least one failed
# Unknown - Pod state cannot be determined

ImagePullBackOff / ErrImagePull:

# Check image name and tag
kubectl describe pod <pod-name> | grep -i image

# Verify image exists
docker pull <image-name>

# Check image pull secrets
kubectl get pods <pod-name> -o yaml | grep imagePullSecrets
kubectl describe secret <image-pull-secret>

# Fix common issues:
# - Incorrect image name/tag
# - Missing or incorrect registry credentials
# - Network connectivity to registry
# - Registry authentication issues

CrashLoopBackOff:

# Check container logs
kubectl logs <pod-name> --previous               # Previous container logs
kubectl logs <pod-name> -c <container-name>     # Specific container

# Check resource limits
kubectl describe pod <pod-name> | grep -A 5 Limits

# Common causes:
# - Application startup failures
# - Missing environment variables or secrets
# - Resource constraints (CPU/memory limits)
# - Health check failures
# - Dependency issues (database connectivity, etc.)

Pending pods:

# Check why pod is not scheduled
kubectl describe pod <pod-name> | grep Events -A 10

# Check node resources
kubectl top nodes
kubectl describe nodes

# Common issues:
# - Insufficient resources (CPU, memory)
# - Node selector constraints
# - Taints and tolerations
# - Affinity/anti-affinity rules
# - PVC binding issues

Service and Networking Troubleshooting

Service connectivity issues:

# Check service endpoints
kubectl get endpoints <service-name>
kubectl describe service <service-name>

# Test service connectivity
kubectl run test-pod --image=busybox -it --rm -- nslookup <service-name>
kubectl run test-pod --image=busybox -it --rm -- wget -qO- <service-name>:<port>

# Check service selector matches pod labels  
kubectl get pods --show-labels
kubectl get service <service-name> -o yaml | grep selector -A 5

DNS resolution issues:

# Test DNS resolution
kubectl run test-pod --image=busybox -it --rm -- nslookup kubernetes.default
kubectl run test-pod --image=busybox -it --rm -- nslookup <service-name>.<namespace>.svc.cluster.local

# Check CoreDNS pods
kubectl get pods -n kube-system -l k8s-app=kube-dns
kubectl logs -n kube-system -l k8s-app=kube-dns

# Check DNS configuration in pods
kubectl exec <pod-name> -- cat /etc/resolv.conf

NetworkPolicy troubleshooting:

# Check if NetworkPolicies are applied
kubectl get networkpolicies
kubectl describe networkpolicy <policy-name>

# Test connectivity between pods
kubectl exec <source-pod> -- nc -zv <target-service> <port>
kubectl exec <source-pod> -- telnet <target-ip> <port>

# Check CNI plugin logs (varies by plugin)
kubectl logs -n kube-system -l k8s-app=calico-node  # Calico example

Storage Troubleshooting

PVC binding issues:

# Check PVC status
kubectl get pvc
kubectl describe pvc <pvc-name>

# Check available PVs
kubectl get pv
kubectl describe pv <pv-name>

# Common issues:
# - No matching PV available
# - Storage class not found
# - Access mode mismatch
# - Size requirements not met
# - Node affinity constraints

Volume mount failures:

# Check pod events for mount errors
kubectl describe pod <pod-name> | grep Events -A 20

# Check volume mounts in pod spec
kubectl get pod <pod-name> -o yaml | grep -A 10 volumeMounts

# Verify file permissions
kubectl exec <pod-name> -- ls -la /mounted/path
kubectl exec <pod-name> -- df -h  # Check if volume is mounted

Application Troubleshooting

Container startup issues:

# Check container logs
kubectl logs <pod-name> -c <container-name>
kubectl logs <pod-name> --previous  # Previous instance

# Check container resource usage
kubectl top pod <pod-name> --containers

# Check environment variables
kubectl exec <pod-name> -- env

# Check mounted secrets and configmaps
kubectl exec <pod-name> -- ls -la /etc/secrets
kubectl exec <pod-name> -- cat /etc/config/app.conf

Health check failures:

# Check liveness and readiness probes
kubectl describe pod <pod-name> | grep -A 5 Liveness
kubectl describe pod <pod-name> | grep -A 5 Readiness

# Test health endpoints manually
kubectl exec <pod-name> -- curl -f http://localhost:8080/health
kubectl port-forward <pod-name> 8080:8080
curl http://localhost:8080/health

Resource constraints:

# Check resource requests and limits
kubectl describe pod <pod-name> | grep -A 10 Limits
kubectl describe pod <pod-name> | grep -A 10 Requests

# Check if pod is being throttled or OOMKilled
kubectl describe pod <pod-name> | grep -i oom
kubectl describe pod <pod-name> | grep -i killed

# Check node resources
kubectl top nodes
kubectl describe node <node-name> | grep -A 5 "Allocated resources"

Monitoring and Observability

What monitoring involves: Collecting, analyzing, and acting on metrics, logs, and traces from Kubernetes clusters and applications to ensure reliability and performance.

Why it's essential: Kubernetes complexity requires proactive monitoring to detect issues early, understand system behavior, optimize performance, and maintain reliability.

Cluster Monitoring

Cluster health checks:

# Check cluster component status
kubectl get componentstatuses
kubectl cluster-info
kubectl get nodes
kubectl top nodes

# Check system pods
kubectl get pods -n kube-system
kubectl get pods -n kube-system -o wide

# Check cluster events
kubectl get events --sort-by=.metadata.creationTimestamp
kubectl get events --field-selector type=Warning

Resource monitoring:
```bash

Resource usage across cluster

kubectl top nodes
kubectl top pods --all-namespaces
kubectl top pods --

Last updated: 2025-08-26 20:00 UTC