Home → NOTES → CKA Guide: Kubernetes Cluster Components - Control Plane & Worker Node Architecture

CKA Guide: Kubernetes Cluster Components - Control Plane & Worker Node Architecture

category: Kubernetes Certification
tags: cka, kubernetes, exam, kubectl, certification

Fundamental Conceptual Understanding

The Distributed Systems Architecture Philosophy

From Monolithic to Distributed Control Plane:

Traditional Infrastructure Management:
├── Single management server (SPOF)
├── Centralized state and decision making
├── Manual scaling and failover
├── Vendor-specific management interfaces
└── Limited automation capabilities

Kubernetes Distributed Architecture:
├── Multiple independent, coordinating components
├── Shared state through distributed consensus (etcd)
├── Automatic leader election and failover
├── API-driven, vendor-neutral interfaces
└── Declarative automation at scale

The Microservices Pattern Applied to Infrastructure:

Each Kubernetes component follows microservices principles:
├── Single Responsibility: Each component has one primary function
├── API-First: All communication through well-defined APIs
├── Stateless Design: Components don't store state locally
├── Independent Deployment: Components can be updated separately
├── Fault Isolation: Component failure doesn't cascade
└── Horizontal Scaling: Multiple instances for high availability

This design enables:
├── Operational flexibility and easier maintenance
├── Independent component scaling and optimization
├── Clear separation of concerns and debugging
├── Extensibility through plugin architectures
└── Vendor neutrality and cloud portability

Control Theory and Feedback Loops

The Control Plane as a Control System:

Kubernetes implements a classic control theory model:

Input (Desired State) → Controller → System (Cluster) → Output (Actual State)
        ↑                                                        │
        └──────── Feedback Loop (Continuous Monitoring) ←────────┘

Components in the Control Loop:
├── API Server: Receives desired state declarations
├── Controllers: Implement control logic and take corrective actions
├── Scheduler: Optimizes resource allocation decisions
├── kubelet: Executes control decisions on worker nodes
└── etcd: Maintains authoritative record of desired and actual state

Feedback Mechanisms:
├── Continuous state observation through kubelet reports
├── Event-driven reactions to state changes
├── Reconciliation loops to correct drift
├── Health monitoring and automatic recovery
└── Metrics collection for performance optimization

State Convergence Philosophy:

Kubernetes operates on eventual consistency principles:

1. Desired State Declaration:
   User submits YAML → API Server validates → etcd stores

2. State Observation:
   Controllers watch etcd → Compare desired vs actual → Identify drift

3. Corrective Action:
   Controllers take action → Update cluster state → Report back to etcd

4. Convergence:
   Repeat until actual state matches desired state

This model provides:
├── Self-healing capabilities (automatic error correction)
├── Declarative simplicity (describe what, not how)
├── Idempotent operations (safe to retry)
├── Distributed coordination (multiple controllers working together)
└── Resilient operations (continues working despite component failures)

Control Plane Components Deep Dive

kube-apiserver: The Central Nervous System

API Server Architecture and Responsibilities:

kube-apiserver Core Functions:
├── RESTful API Gateway: HTTP/gRPC interface for all cluster operations
├── Authentication & Authorization: Identity verification and permission checking
├── Admission Control: Request validation, mutation, and policy enforcement
├── Resource Validation: Schema validation and business logic enforcement
├── etcd Interface: Persistent storage backend communication
├── Watch API: Event streaming for real-time state monitoring
└── Aggregation Layer: Extension API integration point

Request Processing Pipeline:
HTTP Request → Authentication → Authorization → Admission Controllers → 
Validation → Serialization → etcd Storage → Response

Watch Streaming:
etcd Change → API Server → Watch Stream → Controller → Action

API Server Configuration and Tuning:

# API Server configuration options (kubeadm example)
apiVersion: v1
kind: Pod
metadata:
  name: kube-apiserver
  namespace: kube-system
spec:
  containers:
  - command:
    - kube-apiserver

    # Core API settings
    - --advertise-address=10.0.1.100
    - --allow-privileged=true
    - --authorization-mode=Node,RBAC
    - --client-ca-file=/etc/kubernetes/pki/ca.crt
    - --enable-admission-plugins=NodeRestriction,DefaultStorageClass,ResourceQuota

    # etcd connection
    - --etcd-servers=https://127.0.0.1:2379
    - --etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt
    - --etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt
    - --etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key

    # Service account and RBAC
    - --service-account-issuer=https://kubernetes.default.svc.cluster.local
    - --service-account-key-file=/etc/kubernetes/pki/sa.pub
    - --service-account-signing-key-file=/etc/kubernetes/pki/sa.key

    # Network configuration
    - --service-cluster-ip-range=10.96.0.0/12
    - --proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt
    - --proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key

    # Performance and reliability
    - --request-timeout=60s
    - --max-requests-inflight=400
    - --max-mutating-requests-inflight=200
    - --watch-cache-sizes=default=100

    # Security
    - --tls-cert-file=/etc/kubernetes/pki/apiserver.crt
    - --tls-private-key-file=/etc/kubernetes/pki/apiserver.key
    - --audit-log-path=/var/log/audit.log
    - --audit-policy-file=/etc/kubernetes/audit-policy.yaml

    image: k8s.gcr.io/kube-apiserver:v1.25.0
    livenessProbe:
      failureThreshold: 8
      httpGet:
        host: 10.0.1.100
        path: /livez
        port: 6443
        scheme: HTTPS
      initialDelaySeconds: 10
      periodSeconds: 10
      timeoutSeconds: 15
    readinessProbe:
      failureThreshold: 3
      httpGet:
        host: 10.0.1.100
        path: /readyz
        port: 6443
        scheme: HTTPS
      periodSeconds: 1
      timeoutSeconds: 15

API Server Health and Monitoring:

# Health check endpoints
curl -k https://localhost:6443/healthz
curl -k https://localhost:6443/livez
curl -k https://localhost:6443/readyz

# Detailed health components
curl -k https://localhost:6443/livez?verbose=1
curl -k https://localhost:6443/readyz?verbose=1

# API server metrics (if enabled)
curl -k https://localhost:6443/metrics

# Common API server issues:
# 1. Certificate expiration
sudo openssl x509 -in /etc/kubernetes/pki/apiserver.crt -text -noout | grep "Not After"

# 2. etcd connectivity
kubectl logs -n kube-system kube-apiserver-master | grep -i etcd

# 3. High request latency
kubectl logs -n kube-system kube-apiserver-master | grep -i "request took"

# 4. Authentication/authorization failures
kubectl logs -n kube-system kube-apiserver-master | grep -i "forbidden\|unauthorized"

etcd: The Distributed Database

etcd Architecture and Data Model:

etcd Design Principles:
├── Consistency: Strong consistency through Raft consensus algorithm
├── Availability: Highly available with leader election and failover
├── Partition Tolerance: Continues operating with network partitions
├── Simplicity: Simple key-value API with hierarchical keys
└── Performance: Optimized for read-heavy workloads with watch API

Raft Consensus Algorithm:
├── Leader Election: One node becomes leader, others are followers
├── Log Replication: Leader replicates entries to followers
├── Commitment: Entries committed when majority acknowledges
├── Safety: Ensures consistency even during network partitions
└── Efficiency: Optimizes for normal case performance

etcd Data Organization:
├── Key-Value Store: Hierarchical keys like filesystem paths
├── Revisions: Every change gets monotonically increasing revision
├── Compaction: Old revisions removed to reclaim space
├── Transactions: Multi-key atomic operations
└── Watches: Real-time notification of key changes

etcd Operations and Maintenance:

# etcd cluster health check
ETCDCTL_API=3 etcdctl \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt \
  --key=/etc/kubernetes/pki/etcd/healthcheck-client.key \
  endpoint health

# Cluster member management
ETCDCTL_API=3 etcdctl \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt \
  --key=/etc/kubernetes/pki/etcd/healthcheck-client.key \
  member list

# Database size and performance monitoring
ETCDCTL_API=3 etcdctl \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt \
  --key=/etc/kubernetes/pki/etcd/healthcheck-client.key \
  endpoint status --write-out=table

# Backup creation
ETCDCTL_API=3 etcdctl \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt \
  --key=/etc/kubernetes/pki/etcd/healthcheck-client.key \
  snapshot save /backup/etcd-snapshot-$(date +%Y%m%d%H%M%S).db

# Database defragmentation (maintenance operation)
ETCDCTL_API=3 etcdctl \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt \
  --key=/etc/kubernetes/pki/etcd/healthcheck-client.key \
  defrag

# Compaction (remove old revisions)
ETCDCTL_API=3 etcdctl \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt \
  --key=/etc/kubernetes/pki/etcd/healthcheck-client.key \
  compact $(etcdctl endpoint status --write-out="json" | jq '.[0].Status.header.revision')

etcd Performance Optimization:

# etcd configuration for performance
apiVersion: v1
kind: Pod
metadata:
  name: etcd
  namespace: kube-system
spec:
  containers:
  - command:
    - etcd
    - --name=master
    - --data-dir=/var/lib/etcd

    # Cluster configuration
    - --listen-client-urls=https://127.0.0.1:2379,https://10.0.1.100:2379
    - --advertise-client-urls=https://10.0.1.100:2379
    - --listen-peer-urls=https://10.0.1.100:2380
    - --initial-advertise-peer-urls=https://10.0.1.100:2380

    # Performance tuning
    - --heartbeat-interval=100        # Leader heartbeat interval (ms)
    - --election-timeout=1000         # Election timeout (ms)
    - --max-snapshots=5              # Maximum number of snapshots
    - --max-wals=5                   # Maximum number of WAL files
    - --quota-backend-bytes=8589934592  # 8GB backend size limit

    # Compaction settings
    - --auto-compaction-mode=periodic
    - --auto-compaction-retention=1h  # Retain 1 hour of history

    # Security
    - --client-cert-auth=true
    - --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    - --cert-file=/etc/kubernetes/pki/etcd/server.crt
    - --key-file=/etc/kubernetes/pki/etcd/server.key

    image: k8s.gcr.io/etcd:3.5.4-0
    resources:
      requests:
        cpu: 100m
        memory: 100Mi
      limits:
        cpu: 1000m
        memory: 2Gi
    volumeMounts:
    - mountPath: /var/lib/etcd
      name: etcd-data
    - mountPath: /etc/kubernetes/pki/etcd
      name: etcd-certs
  volumes:
  - hostPath:
      path: /var/lib/etcd
      type: DirectoryOrCreate
    name: etcd-data
  - hostPath:
      path: /etc/kubernetes/pki/etcd
      type: DirectoryOrCreate
    name: etcd-certs

kube-scheduler: The Resource Optimizer

Scheduler Architecture and Algorithm:

Two-Phase Scheduling Process:

Phase 1: Filtering (Feasible Nodes)
├── NodeResourcesFit: Node has sufficient CPU/memory
├── NodeAffinity: Node matches affinity expressions
├── PodTopologySpread: Ensures even distribution
├── Taints/Tolerations: Pod tolerates node taints
├── VolumeBinding: Required volumes can be provisioned
└── Custom Filters: Extension points for custom logic

Phase 2: Scoring (Optimal Selection)
├── NodeResourcesFit: Prefer nodes with more available resources
├── InterPodAffinity: Satisfy pod affinity/anti-affinity preferences
├── NodeAffinity: Prefer nodes matching affinity preferences
├── ImageLocality: Prefer nodes with required images already present
├── Custom Scorers: Extension points for custom optimization
└── Final Selection: Choose node with highest weighted score

Scheduling Framework:
├── QueueSort: Order pods by priority for scheduling
├── PreFilter: Early validation and setup for filtering
├── Filter: Eliminate nodes that cannot run the pod
├── PostFilter: Handle scheduling failures (preemption)
├── PreScore: Preparation for scoring phase
├── Score: Assign scores to feasible nodes
├── NormalizeScore: Normalize scores to 0-100 range
├── Reserve: Reserve resources on chosen node
├── Permit: Final approval before binding
├── PreBind: Pre-binding operations
├── Bind: Actually bind pod to node
└── PostBind: Post-binding cleanup and notification

Scheduler Configuration and Customization:

# Scheduler configuration
apiVersion: kubescheduler.config.k8s.io/v1beta3
kind: KubeSchedulerConfiguration
profiles:
- schedulerName: default-scheduler
  plugins:
    # Enable/disable specific plugins
    score:
      enabled:
      - name: NodeResourcesFit
        weight: 100
      - name: InterPodAffinity
        weight: 50
      - name: NodeAffinity
        weight: 25
      disabled:
      - name: PodTopologySpread  # Disable if not needed

    filter:
      enabled:
      - name: NodeResourcesFit
      - name: NodeAffinity
      - name: PodTopologySpread

  pluginConfig:
  # Configure NodeResourcesFit scoring
  - name: NodeResourcesFit
    args:
      scoringStrategy:
        type: LeastAllocated     # Prefer nodes with more free resources
        # Alternative: MostAllocated, RequestedToCapacityRatio

  # Configure InterPodAffinity
  - name: InterPodAffinity
    args:
      hardPodAffinityWeight: 100

  # Configure PodTopologySpread
  - name: PodTopologySpread
    args:
      defaultConstraints:
      - maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: ScheduleAnyway

# Multiple scheduler profiles for different workload types
- schedulerName: high-performance-scheduler
  plugins:
    score:
      enabled:
      - name: NodeResourcesFit
        weight: 100
      disabled:
      - name: InterPodAffinity  # Disable for performance
  pluginConfig:
  - name: NodeResourcesFit
    args:
      scoringStrategy:
        type: MostAllocated     # Pack workloads tightly

Scheduler Debugging and Optimization:

# Scheduler health and status
kubectl get events | grep FailedScheduling
kubectl logs -n kube-system kube-scheduler-master

# Debug scheduling decisions
kubectl get events --field-selector reason=FailedScheduling
kubectl describe pod <pending-pod-name>

# Scheduler performance metrics
kubectl logs -n kube-system kube-scheduler-master | grep "Attempting to schedule pod"
kubectl logs -n kube-system kube-scheduler-master | grep "Successfully assigned"

# Common scheduling issues:
# 1. Insufficient resources
kubectl describe nodes | grep -A 5 "Allocated resources"

# 2. Affinity/anti-affinity conflicts
kubectl get pod <pod-name> -o yaml | grep -A 20 affinity:

# 3. Taints preventing scheduling
kubectl describe nodes | grep -A 3 Taints:

# 4. Resource quotas blocking scheduling
kubectl describe quota --all-namespaces

# 5. Priority class issues
kubectl get priorityclasses
kubectl describe priorityclass high-priority

kube-controller-manager: The Automation Engine

Controller Manager Architecture:

Controller Manager Components:

Core Controllers:
├── Node Controller: Manages node lifecycle and health
├── Replication Controller: Ensures desired replica count
├── Endpoints Controller: Populates service endpoints
├── Service Account Controller: Creates default service accounts
├── Token Controller: Manages service account tokens
├── Resource Quota Controller: Enforces resource limits
├── Namespace Controller: Manages namespace lifecycle
└── Persistent Volume Controller: Handles PV/PVC binding

Workload Controllers:
├── Deployment Controller: Manages deployment rollouts
├── ReplicaSet Controller: Ensures pod replica count
├── StatefulSet Controller: Manages stateful applications
├── DaemonSet Controller: Ensures pod runs on each node
├── Job Controller: Manages batch job execution
├── CronJob Controller: Schedules periodic jobs
└── HorizontalPodAutoscaler: Automatic pod scaling

Each Controller Implements:
├── Watch API: Monitor resource changes in etcd
├── Work Queue: Buffer and rate-limit operations
├── Reconciliation Loop: Compare desired vs actual state
├── Error Handling: Retry failed operations with backoff
├── Leader Election: Ensure only one active instance
└── Metrics: Report controller performance and health

Controller Manager Configuration:

# kube-controller-manager configuration
apiVersion: v1
kind: Pod
metadata:
  name: kube-controller-manager
  namespace: kube-system
spec:
  containers:
  - command:
    - kube-controller-manager

    # Core settings
    - --bind-address=127.0.0.1
    - --cluster-cidr=10.244.0.0/16
    - --cluster-name=kubernetes
    - --cluster-signing-cert-file=/etc/kubernetes/pki/ca.crt
    - --cluster-signing-key-file=/etc/kubernetes/pki/ca.key

    # Controller settings
    - --controllers=*,bootstrapsigner,tokencleaner
    - --concurrent-deployment-syncs=5        # Deployment controller parallelism
    - --concurrent-replicaset-syncs=5        # ReplicaSet controller parallelism
    - --concurrent-service-syncs=1           # Service controller parallelism
    - --concurrent-namespace-syncs=10        # Namespace controller parallelism

    # Node controller settings
    - --node-monitor-period=5s               # How often to check node health
    - --node-monitor-grace-period=40s        # Grace period before marking unhealthy
    - --pod-eviction-timeout=5m0s           # Time to wait before evicting pods
    - --unhealthy-zone-threshold=0.55        # Threshold for unhealthy zone

    # Resource management
    - --kube-api-qps=20                      # API server request rate
    - --kube-api-burst=30                    # API server burst capacity

    # Leader election
    - --leader-elect=true
    - --leader-elect-lease-duration=15s
    - --leader-elect-renew-deadline=10s
    - --leader-elect-retry-period=2s

    # Service account management
    - --service-account-private-key-file=/etc/kubernetes/pki/sa.key
    - --use-service-account-credentials=true

    image: k8s.gcr.io/kube-controller-manager:v1.25.0
    livenessProbe:
      failureThreshold: 8
      httpGet:
        host: 127.0.0.1
        path: /healthz
        port: 10257
        scheme: HTTPS
      initialDelaySeconds: 10
      periodSeconds: 10
      timeoutSeconds: 15
    resources:
      requests:
        cpu: 200m
        memory: 512Mi

Controller Performance and Debugging:

# Controller manager health
curl -k https://localhost:10257/healthz

# Controller metrics
curl -k https://localhost:10257/metrics

# Check controller logs
kubectl logs -n kube-system kube-controller-manager-master

# Monitor specific controller activity
kubectl logs -n kube-system kube-controller-manager-master | grep "deployment-controller"
kubectl logs -n kube-system kube-controller-manager-master | grep "replicaset-controller"
kubectl logs -n kube-system kube-controller-manager-master | grep "node-controller"

# Common controller issues:
# 1. High API server load
kubectl logs -n kube-system kube-controller-manager-master | grep "rate limit"

# 2. Controller loop delays
kubectl logs -n kube-system kube-controller-manager-master | grep "slow"

# 3. Leader election issues
kubectl logs -n kube-system kube-controller-manager-master | grep "leader"

# 4. Resource conflicts
kubectl get events | grep "conflict"

Worker Node Components

kubelet: The Node Agent

kubelet Architecture and Responsibilities:

kubelet Core Functions:
├── Pod Lifecycle Management: Create, start, stop, and delete pods
├── Container Runtime Interface: Manage containers through CRI
├── Volume Management: Mount and unmount pod volumes
├── Network Setup: Configure pod networking through CNI
├── Health Monitoring: Execute liveness and readiness probes
├── Resource Management: Enforce resource limits and QoS
├── Node Status Reporting: Report node health to API server
├── Static Pod Management: Run control plane pods
├── Device Plugin Management: Expose specialized hardware
└── Certificate Management: Rotate kubelet certificates

kubelet Watch Loops:
├── Pod Watch: Monitor assigned pods from API server
├── ConfigMap/Secret Watch: Update mounted configurations
├── Node Watch: React to node updates and taints
├── Volume Watch: Handle volume attachment/detachment
└── Image Watch: Preload required container images

Resource Management:
├── CPU CFS Quotas: Enforce CPU limits through cgroups
├── Memory Limits: OOM killer integration for memory limits
├── Disk Quotas: Ephemeral storage management
├── Device Allocation: GPU and specialized hardware assignment
└── QoS Classes: Priority-based resource allocation

kubelet Configuration:

# kubelet configuration file (/var/lib/kubelet/config.yaml)
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration

# Basic settings
clusterDNS:
- 10.96.0.10
clusterDomain: cluster.local
staticPodPath: /etc/kubernetes/manifests

# Container runtime
containerRuntimeEndpoint: unix:///var/run/containerd/containerd.sock

# Resource management
maxPods: 110                        # Maximum pods per node
podPidsLimit: 4096                  # Maximum PIDs per pod
enforceNodeAllocatable:
- pods
- system-reserved
- kube-reserved

# System reserved resources
systemReserved:
  cpu: 100m
  memory: 100Mi
  ephemeral-storage: 1Gi

# Kubernetes reserved resources
kubeReserved:
  cpu: 100m
  memory: 100Mi
  ephemeral-storage: 1Gi

# Eviction settings
evictionHard:
  memory.available: "100Mi"
  nodefs.available: "10%"
  nodefs.inodesFree: "5%"
  imagefs.available: "15%"

evictionSoft:
  memory.available: "300Mi"
  nodefs.available: "15%"

evictionSoftGracePeriod:
  memory.available: "1m30s"
  nodefs.available: "1m30s"

# Image management
imageMinimumGCAge: 2m
imageGCHighThresholdPercent: 85
imageGCLowThresholdPercent: 80

# Logging
logging:
  format: json
  verbosity: 2

# Container log management
containerLogMaxSize: 50Mi
containerLogMaxFiles: 5

# Node status update frequency
nodeStatusUpdateFrequency: 10s
nodeStatusReportFrequency: 5m

# Health check settings
streamingConnectionIdleTimeout: 4h
syncFrequency: 1m

# Feature gates
featureGates:
  RotateKubeletServerCertificate: true
  CSIDriverRegistry: true

kubelet Debugging and Monitoring:

# kubelet service status
sudo systemctl status kubelet
sudo journalctl -u kubelet -f

# kubelet configuration
sudo cat /var/lib/kubelet/config.yaml
sudo cat /etc/kubernetes/kubelet.conf

# kubelet health endpoints
curl -k https://localhost:10250/healthz
curl -k https://localhost:10250/metrics

# Common kubelet issues:
# 1. Certificate expiration
sudo journalctl -u kubelet | grep certificate
sudo openssl x509 -in /var/lib/kubelet/pki/kubelet-client-current.pem -text -noout

# 2. Container runtime issues
sudo crictl version
sudo crictl info
sudo systemctl status containerd

# 3. Network plugin issues
sudo journalctl -u kubelet | grep CNI
ls -la /etc/cni/net.d/

# 4. Resource pressure
kubectl describe node $(hostname) | grep -A 10 Conditions:
kubectl describe node $(hostname) | grep -A 15 "Allocated resources"

# 5. Pod eviction events
sudo journalctl -u kubelet | grep evict
kubectl get events | grep Evicted

Container Runtime: The Execution Engine

Container Runtime Interface (CRI) Architecture:

CRI Design:
├── Image Service: Pull, list, remove, and inspect images
├── Runtime Service: Create, start, stop, and remove containers
├── Streaming Service: Exec, attach, and port-forward operations
├── Stats Service: Container and image filesystem statistics
└── Event Service: Container lifecycle event notifications

Runtime Implementations:
├── containerd: CNCF graduated project, Docker's successor
├── CRI-O: Lightweight CRI implementation for Kubernetes
├── Docker Engine: Legacy runtime with dockershim (deprecated)
├── gVisor: Sandboxed runtime for improved security
└── Kata Containers: VM-based containers for isolation

CRI Protocol Flow:
kubelet → gRPC → CRI Runtime → OCI Runtime → Container
   │        │         │            │           │
Request   API     containerd    runc/crun   Process

containerd Configuration:

# /etc/containerd/config.toml
version = 2

[grpc]
  address = "/run/containerd/containerd.sock"
  max_recv_message_size = 16777216
  max_send_message_size = 16777216

[debug]
  level = "info"

[metrics]
  address = "127.0.0.1:1338"
  grpc_histogram = false

[plugins]
  [plugins."io.containerd.grpc.v1.cri"]
    enable_selinux = false
    enable_tls_streaming = false
    max_container_log_line_size = 16384
    disable_cgroup = false
    disable_apparmor = false
    restrict_oom_score_adj = false
    max_concurrent_downloads = 3
    disable_proc_mount = false
    unset_seccomp_profile = ""
    tolerate_missing_hugetlb_controller = true
    disable_hugetlb_controller = true
    ignore_image_defined_volumes = false

    [plugins."io.containerd.grpc.v1.cri".containerd]
      snapshotter = "overlayfs"
      default_runtime_name = "runc"
      no_pivot = false
      disable_snapshot_annotations = true
      discard_unpacked_layers = false

      [plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
        [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
          runtime_type = "io.containerd.runc.v2"
          runtime_engine = ""
          runtime_root = ""
          privileged_without_host_devices = false
          base_runtime_spec = ""

          [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
            SystemdCgroup = true

    [plugins."io.containerd.grpc.v1.cri".cni]
      bin_dir = "/opt/cni/bin"
      conf_dir = "/etc/cni/net.d"
      max_conf_num = 1
      conf_template = ""

    [plugins."io.containerd.grpc.v1.cri".registry]
      [plugins."io.containerd.grpc.v1.cri".registry.mirrors]
        [plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"]
          endpoint = ["https://registry-1.docker.io"]

Container Runtime Operations:

# containerd operations
sudo crictl version
sudo crictl info

# Container management
sudo crictl ps                     # List running containers
sudo crictl ps -a                  # List all containers
sudo crictl inspect <container-id> # Inspect container
sudo crictl logs <container-id>    # View container logs
sudo crictl exec -it <container-id> /bin/sh  # Execute in container

# Image management
sudo crictl images                 # List images
sudo crictl pull nginx:latest      # Pull image
sudo crictl rmi <image-id>         # Remove image
sudo crictl imagefsinfo            # Image filesystem info

# Pod management (crictl)
sudo crictl pods                   # List pod sandboxes
sudo crictl inspectp <pod-id>      # Inspect pod sandbox
sudo crictl stopp <pod-id>         # Stop pod sandbox
sudo crictl rmp <pod-id>           # Remove pod sandbox

# Runtime debugging
sudo systemctl status containerd
sudo journalctl -u containerd -f

# Performance monitoring
sudo crictl stats                  # Container resource usage
sudo crictl stats <container-id>   # Specific container stats

kube-proxy: The Network Proxy

kube-proxy Architecture and Modes:

kube-proxy Responsibilities:
├── Service Discovery: Implement ClusterIP virtual networking
├── Load Balancing: Distribute traffic across service endpoints
├── Network Address Translation: DNAT from service IP to pod IP
├── Health Checking: Route traffic only to healthy endpoints
└── External Access: Implement NodePort and LoadBalancer services

Proxy Modes:

1. iptables Mode (Default):
   ├── Creates iptables rules for each service
   ├── DNAT traffic from service IP to pod IPs
   ├── Random selection among healthy endpoints
   ├── Lower resource usage
   └── Higher latency with many services

2. IPVS Mode (Advanced):
   ├── Uses Linux IPVS for in-kernel load balancing
   ├── Multiple load balancing algorithms
   ├── Better performance with many services
   ├── Requires IPVS kernel modules
   └── More sophisticated connection tracking

3. userspace Mode (Legacy):
   ├── User-space proxy process
   ├── Higher latency due to context switching
   ├── No longer recommended
   └── Maintained for compatibility

Load Balancing Algorithms (IPVS):
├── Round Robin (rr): Distribute requests sequentially
├── Least Connection (lc): Route to least connected endpoint
├── Destination Hashing (dh): Consistent routing based on destination
├── Source Hashing (sh): Session affinity based on source IP
├── Shortest Expected Delay (sed): Minimize expected delay
└── Never Queue (nq): No queuing, immediate processing

kube-proxy Configuration:

# kube-proxy configuration
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration

# Basic settings
bindAddress: 0.0.0.0
clientConnection:
  kubeconfig: /var/lib/kube-proxy/kubeconfig.conf

# Cluster configuration
clusterCIDR: 10.244.0.0/16
configSyncPeriod: 15m0s

# Proxy mode selection
mode: "ipvs"                       # "iptables", "ipvs", or "userspace"

# IPVS configuration
ipvs:
  scheduler: "rr"                  # Load balancing algorithm
  syncPeriod: 30s
  minSyncPeriod: 5s
  strictARP: true                  # Important for metallb compatibility
  tcpTimeout: 0s
  tcpFinTimeout: 0s
  udpTimeout: 0s

# iptables configuration
iptables:
  masqueradeAll: false
  masqueradeBit: 14
  minSyncPeriod: 1s
  syncPeriod: 30s

# Connection tracking
conntrack:
  maxPerCore: 32768
  min: 131072
  tcpCloseWaitTimeout: 1h0m0s
  tcpEstablishedTimeout: 24h0m0s

# Node port configuration
nodePortAddresses: []              # Bind to all interfaces
portRange: "30000-32767"

# Health and metrics
healthzBindAddress: 0.0.0.0:10256
metricsBindAddress: 127.0.0.1:10249

# Feature gates
featureGates:
  EndpointSliceProxying: true      # Use EndpointSlices instead of Endpoints

kube-proxy Debugging:

# kube-proxy status and logs
kubectl get pods -n kube-system | grep kube-proxy
kubectl logs -n kube-system <kube-proxy-pod>

# Check proxy mode
kubectl logs -n kube-system <kube-proxy-pod> | grep "Using proxy mode"

# iptables rules (iptables mode)
sudo iptables -t nat -L KUBE-SERVICES
sudo iptables -t nat -L KUBE-NODEPORTS
sudo iptables -t nat -L | grep <service-name>

# IPVS rules (IPVS mode)
sudo ipvsadm -L -n
sudo ipvsadm -L -n -t <service-cluster-ip>:<port>

# Service connectivity testing
kubectl run test-pod --image=busybox --rm -it -- wget -qO- http://<service-name>
kubectl run test-pod --image=busybox --rm -it -- nslookup <service-name>

# Common kube-proxy issues:
# 1. Service endpoints not updating
kubectl get endpoints <service-name>
kubectl describe endpoints <service-name>

# 2. iptables rules not created
sudo iptables-save | grep <service-name>

# 3. IPVS modules not loaded
lsmod | grep ip_vs
sudo modprobe ip_vs
sudo modprobe ip_vs_rr
sudo modprobe ip_vs_wrr
sudo modprobe ip_vs_sh

# 4. Network policy conflicts
kubectl get networkpolicies
kubectl describe networkpolicy <policy-name>

High Availability and Scaling Patterns

Control Plane High Availability

Multi-Master Architecture:

HA Control Plane Design:

Load Balancer (External)
├── API Server 1 (Active)
├── API Server 2 (Active)
└── API Server 3 (Active)

etcd Cluster (Odd Number)
├── etcd Node 1 (Leader/Follower)
├── etcd Node 2 (Leader/Follower)
└── etcd Node 3 (Leader/Follower)

Controllers (Leader Election)
├── Controller Manager (Leader + Standby)
└── Scheduler (Leader + Standby)

Best Practices:
├── Odd number of etcd nodes (3, 5, 7)
├── Separate etcd from other components
├── Use external load balancer for API servers
├── Ensure network partitions don't split brain
└── Monitor leader election and failover

kubeadm HA Setup Pattern:

# Initialize first control plane node
sudo kubeadm init --control-plane-endpoint "cluster-endpoint:6443" \
  --upload-certs \
  --pod-network-cidr=10.244.0.0/16

# Join additional control plane nodes
sudo kubeadm join cluster-endpoint:6443 \
  --token <token> \
  --discovery-token-ca-cert-hash sha256:<hash> \
  --control-plane \
  --certificate-key <certificate-key>

# Join worker nodes
sudo kubeadm join cluster-endpoint:6443 \
  --token <token> \
  --discovery-token-ca-cert-hash sha256:<hash>

Component Scaling and Performance

Resource Requirements by Component:

Component Resource Guidelines:

API Server:
├── CPU: 1-2 cores + 0.1 cores per 1000 pods
├── Memory: 1-2GB + 50MB per 1000 pods
├── Disk: 50GB + growth for audit logs
└── Network: High bandwidth for client requests

etcd:
├── CPU: 2-4 cores for production
├── Memory: 2-8GB depending on cluster size
├── Disk: SSD required, 50GB+ with low latency
└── Network: Low latency between members

Controller Manager:
├── CPU: 0.5-1 cores + scaling with cluster size
├── Memory: 256MB-1GB depending on controllers
├── Disk: Minimal (stateless)
└── Network: Moderate API server communication

Scheduler:
├── CPU: 0.2-0.5 cores + scaling with pods
├── Memory: 128MB-512MB
├── Disk: Minimal (stateless)
└── Network: Moderate API server communication

kubelet:
├── CPU: 0.1-0.2 cores per node
├── Memory: 100-200MB per node
├── Disk: Container image storage
└── Network: Pod networking bandwidth

Exam Tips & Quick Reference

⚡ Essential Component Commands

# Cluster component health
kubectl cluster-info
kubectl get componentstatuses

# Control plane pod logs
kubectl logs -n kube-system kube-apiserver-<master>
kubectl logs -n kube-system etcd-<master>
kubectl logs -n kube-system kube-scheduler-<master>
kubectl logs -n kube-system kube-controller-manager-<master>

# Node component debugging
sudo systemctl status kubelet
sudo journalctl -u kubelet -f
sudo crictl ps
sudo crictl logs <container-id>

# etcd operations
ETCDCTL_API=3 etcdctl --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt \
  --key=/etc/kubernetes/pki/etcd/healthcheck-client.key \
  endpoint health

🎯 Common Exam Scenarios

Scenario 1: Debug Control Plane Component

# Check component status
kubectl get pods -n kube-system
kubectl describe pod <component-pod> -n kube-system

# Check component logs
kubectl logs <component-pod> -n kube-system

# Check configuration files
sudo cat /etc/kubernetes/manifests/<component>.yaml

Scenario 2: Worker Node Issues

# Check node status
kubectl get nodes
kubectl describe node <node-name>

# Check kubelet
sudo systemctl status kubelet
sudo journalctl -u kubelet | tail -50

# Check container runtime
sudo crictl version
sudo systemctl status containerd

🚨 Critical Gotchas

etcd Backup: Always backup etcd before cluster operations
Certificate Expiration: Monitor certificate expiration dates
Resource Limits: Set appropriate resource limits for components
Network Connectivity: Ensure components can communicate
Leader Election: Only one active controller-manager/scheduler
Static Pod Path: Control plane pods in /etc/kubernetes/manifests/
Service Account Tokens: Required for component authentication

WHY This Matters - The Deeper Philosophy

Distributed Systems Architecture Principles

The Microservices Architecture Applied to Infrastructure:

Traditional Monolithic Infrastructure:
├── Single management server
├── Centralized state and decision making
├── Single point of failure
├── Difficult to scale individual functions
└── Vendor lock-in and limited flexibility

Kubernetes Distributed Architecture:
├── Multiple independent, coordinating components
├── Shared state through consensus
├── No single point of failure
├── Independent scaling and optimization
└── Vendor-neutral and extensible

This architectural choice enables:
├── Operational resilience through redundancy
├── Horizontal scaling of control plane functions
├── Independent component evolution and updates
├── Clear separation of concerns for debugging
└── Ecosystem extensibility through well-defined APIs

The Control Theory Model:

Kubernetes implements classical control system principles:

Feedback Control Loop:
Desired State → Controller → System Action → Actual State → Feedback
     ↑                                                        │
     └────────── Error Correction (Reconciliation) ←─────────┘

This model provides:
├── Self-healing: Automatic correction of system drift
├── Declarative operation: Specify what, not how
├── Idempotent actions: Safe to retry operations
├── Eventual consistency: System converges to desired state
└── Distributed coordination: Multiple controllers working together

Production Engineering Implications

The Operational Excellence Model:

Component Isolation Benefits:
├── Fault isolation: Component failure doesn't cascade
├── Independent scaling: Scale components based on load
├── Rolling updates: Update components without full outage
├── Resource optimization: Tune resources per component
├── Debugging simplicity: Clear component boundaries
└── Team ownership: Different teams can own different components

High Availability Design:
├── No single points of failure in any component
├── Leader election for stateful components
├── Consensus algorithms for consistent state
├── Health monitoring and automatic failover
└── Geographic distribution for disaster recovery

The Economic Model of Distributed Infrastructure:

Cost Structure:
├── Infrastructure costs: Hardware and cloud resources
├── Operational costs: Monitoring, maintenance, support
├── Reliability costs: Redundancy and disaster recovery
├── Performance costs: Over-provisioning for peak load
└── Innovation costs: Training and tool development

ROI of Kubernetes Architecture:
├── Reduced operational overhead through automation
├── Improved resource utilization through efficient scheduling
├── Lower downtime costs through self-healing
├── Faster feature delivery through standardized platforms
└── Reduced vendor lock-in through portable abstractions

Career Development Implications

For the Exam:
- Component Understanding: Know what each component does and how they interact
- Debugging Skills: Systematically troubleshoot component failures
- Configuration Knowledge: Understand key configuration parameters
- Architecture Thinking: Understand how components work together

For Production Systems:
- High Availability: Design and operate resilient control planes
- Performance Optimization: Tune components for scale and efficiency
- Security: Secure component communication and access
- Monitoring: Implement comprehensive observability

For Your Career:
- Systems Architecture: Understand distributed systems design patterns
- Platform Engineering: Build and operate infrastructure platforms
- Site Reliability: Ensure system reliability through proper design
- Technical Leadership: Guide teams in adopting cloud-native architectures

Understanding Kubernetes components deeply teaches you how distributed systems actually work in practice. This knowledge is fundamental to the CKA exam and essential for anyone designing, operating, or debugging complex distributed infrastructure.

These components represent the state of the art in distributed systems engineering - understanding them gives you insight into how to build resilient, scalable systems that can handle real-world production demands while maintaining simplicity and operational excellence.

Last updated: 2025-08-26 20:00 UTC