alexsusanu@docs:CKA Study Guide: Application Scaling in Kubernetes $
alexsusanu@docs
:~$ cat CKA Study Guide: Application Scaling in Kubernetes.md

HomeNOTES → CKA Study Guide: Application Scaling in Kubernetes

CKA Study Guide: Application Scaling in Kubernetes

category: Kubernetes Certification
tags: cka, kubernetes, exam, kubectl, certification

The Economics of Scale: Why Application Scaling Matters

Application scaling in Kubernetes isn't just about handling more traffic—it's about optimizing resource utilization, cost efficiency, and user experience. Poor scaling strategies can lead to over-provisioning (wasted money) or under-provisioning (poor performance and lost revenue).

Understanding Scaling Trade-offs

Resource Efficiency vs Responsiveness:
- Aggressive scaling: Quick response to load changes, higher resource costs
- Conservative scaling: Lower resource costs, potential performance degradation during spikes
- Predictive scaling: Optimal efficiency but requires understanding traffic patterns

Horizontal vs Vertical Scaling:
- Horizontal (scale out): Add more instances, better fault tolerance, Kubernetes-native
- Vertical (scale up): Increase instance resources, simpler but limited and requires restarts

Cost Implications:
- Under-scaling: Lost revenue from poor performance (e.g., 100ms latency increase = 1% revenue loss for e-commerce)
- Over-scaling: Wasted infrastructure costs (typical organizations waste 30-50% of cloud resources)
- Scaling delays: Slow scaling reactions can cascade into system-wide outages

Kubernetes Scaling Architecture

Kubernetes provides multiple scaling mechanisms that work together:

┌─────────────────────────────────────────────────────────────┐
│                     Cluster Autoscaler                      │
│                   (scales nodes)                           │
└─────────────────────┬───────────────────────────────────────┘
                      │
┌─────────────────────┴───────────────────────────────────────┐
│              Horizontal Pod Autoscaler (HPA)               │
│                 (scales pod replicas)                      │
└─────────────────────┬───────────────────────────────────────┘
                      │
┌─────────────────────┴───────────────────────────────────────┐
│              Vertical Pod Autoscaler (VPA)                 │
│                (scales pod resources)                      │
└─────────────────────┬───────────────────────────────────────┘
                      │
┌─────────────────────┴───────────────────────────────────────┐
│                   Application                              │
│                (deployments/pods)                          │
└─────────────────────────────────────────────────────────────┘

This hierarchical approach allows optimization at multiple levels simultaneously.


Manual Scaling Operations

Basic Scaling Commands

Immediate Scaling:

# Scale deployment to specific replica count
kubectl scale deployment web-app --replicas=5

# Scale multiple deployments
kubectl scale deployment web-app api-server --replicas=3

# Scale by label selector
kubectl scale deployments -l app=frontend --replicas=4

# Scale ReplicaSet directly (not recommended)
kubectl scale rs web-app-5d4f8c7b9 --replicas=2

# Scale StatefulSet
kubectl scale statefulset database --replicas=3

Current Scaling Information:

# Check current replica status
kubectl get deployments
kubectl get deployment web-app -o wide

# Detailed scaling information
kubectl describe deployment web-app | grep -A5 "Replicas:"

# Historical scaling events
kubectl get events --sort-by=.metadata.creationTimestamp | grep -i scale

Understanding Scaling Behavior

How Kubernetes Handles Scaling:
1. Scale Up: Creates new pods immediately, subject to resource availability
2. Scale Down: Respects graceful termination periods and PodDisruptionBudgets
3. Rolling Updates: Scaling can happen simultaneously with deployments
4. Resource Constraints: Scaling blocked if insufficient cluster resources

Scaling Events and Timeline:

# Monitor scaling in real-time
kubectl get pods -l app=web-app --watch

# Check resource allocation during scaling
kubectl top nodes
kubectl top pods -l app=web-app

# Verify scaling completion
kubectl rollout status deployment/web-app

Scale-Down Behavior and Pod Selection:

# Kubernetes selects pods for termination based on:
# 1. Pods in Pending state (unscheduled)
# 2. Pods with lower priority class
# 3. Pods using more resources than requested
# 4. Pods that have been running longer
# 5. Random selection among equivalent pods

# Control termination order with priorities
apiVersion: v1
kind: Pod
metadata:
  name: high-priority-pod
spec:
  priorityClassName: high-priority
  containers:
  - name: app
    image: nginx:1.21

Horizontal Pod Autoscaler (HPA)

HPA Fundamentals and Architecture

How HPA Works:
1. Metrics Collection: HPA controller queries metrics server every 15 seconds
2. Target Calculation: Compares current metrics to target values
3. Scaling Decision: Calculates desired replica count using algorithms
4. Scale Execution: Updates deployment replica count if change needed
5. Stabilization: Waits for cooldown periods to prevent thrashing

HPA Controller Algorithm:

desiredReplicas = ceil[currentReplicas * (currentMetricValue / targetMetricValue)]

Example:
- Current replicas: 3
- Current CPU utilization: 80%
- Target CPU utilization: 50%
- Desired replicas = ceil[3 * (80/50)] = ceil[4.8] = 5

Basic HPA Configuration

CPU-Based HPA:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300  # 5 minutes
      policies:
      - type: Percent
        value: 50
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 60   # 1 minute
      policies:
      - type: Percent
        value: 100
        periodSeconds: 60

Memory-Based HPA:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: memory-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: memory-intensive-app
  minReplicas: 1
  maxReplicas: 8
  metrics:
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

Advanced HPA Metrics

Custom Metrics HPA:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: custom-metrics-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  minReplicas: 3
  maxReplicas: 20
  metrics:
  # CPU utilization
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 60

  # Custom application metric (requests per second)
  - type: Pods
    pods:
      metric:
        name: requests_per_second
      target:
        type: AverageValue
        averageValue: "1000"

  # External metric (SQS queue length)
  - type: External
    external:
      metric:
        name: sqs_queue_length
        selector:
          matchLabels:
            queue: "processing-queue"
      target:
        type: Value
        value: "100"

Object Metrics HPA:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: object-metrics-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-frontend
  minReplicas: 2
  maxReplicas: 15
  metrics:
  # Scale based on ingress requests per second
  - type: Object
    object:
      metric:
        name: requests_per_second
      describedObject:
        apiVersion: networking.k8s.io/v1
        kind: Ingress
        name: web-frontend-ingress
      target:
        type: Value
        value: "10k"

HPA Behavior and Tuning

Understanding HPA Behavior Settings:

spec:
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      # Don't scale down more than 50% of current replicas in 1 minute
      - type: Percent
        value: 50
        periodSeconds: 60
      # Don't scale down more than 2 pods in 1 minute
      - type: Pods
        value: 2
        periodSeconds: 60
      selectPolicy: Min  # Use the most restrictive policy

    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
      # Allow doubling replicas in 1 minute
      - type: Percent
        value: 100
        periodSeconds: 60
      # Allow adding up to 4 pods in 1 minute
      - type: Pods
        value: 4
        periodSeconds: 60
      selectPolicy: Max  # Use the most aggressive policy

HPA Tuning Parameters:

Parameter Default Purpose Tuning Guidelines
--horizontal-pod-autoscaler-sync-period 15s How often HPA evaluates Decrease for faster response
--horizontal-pod-autoscaler-upscale-delay 3m Delay before scale up Increase for bursty workloads
--horizontal-pod-autoscaler-downscale-delay 5m Delay before scale down Increase for stable workloads
--horizontal-pod-autoscaler-tolerance 0.1 Tolerance for metric changes Increase to reduce scaling frequency

HPA Status and Monitoring:

# Check HPA status
kubectl get hpa
kubectl describe hpa web-app-hpa

# Monitor HPA events
kubectl get events --sort-by=.metadata.creationTimestamp | grep HorizontalPodAutoscaler

# Debug HPA issues
kubectl logs -n kube-system deployment/metrics-server
kubectl top pods -l app=web-app

# Manual HPA testing
kubectl run load-generator --image=busybox --restart=Never -- /bin/sh -c "while true; do wget -q -O- http://web-app-service; done"

Vertical Pod Autoscaler (VPA)

VPA Concepts and Components

VPA Architecture:

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│  VPA Recommender │    │  VPA Updater    │    │  VPA Admission  │
│  (analyzes      │    │  (evicts pods   │    │  Controller     │
│   resource      │    │   for resize)   │    │  (sets new      │
│   usage)        │    │                 │    │   requests)     │
└─────────────────┘    └─────────────────┘    └─────────────────┘
         │                        │                        │
         └────────────────────────┼────────────────────────┘
                                  │
                         ┌─────────────────┐
                         │  Metrics Server │
                         │  (resource      │
                         │   usage data)   │
                         └─────────────────┘

VPA Modes:
- "Off": Only provides recommendations, doesn't modify pods
- "Initial": Sets resource requests only for new pods
- "Auto": Automatically updates resource requests and recreates pods
- "Recreation": Similar to Auto but explicitly recreates pods

VPA Configuration

Basic VPA Setup:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: web-app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  updatePolicy:
    updateMode: "Auto"  # Off, Initial, Auto
  resourcePolicy:
    containerPolicies:
    - containerName: web
      minAllowed:
        cpu: 100m
        memory: 128Mi
      maxAllowed:
        cpu: 2
        memory: 2Gi
      controlledResources: ["cpu", "memory"]

VPA with Resource Bounds:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: bounded-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
    - containerName: api
      minAllowed:
        cpu: 200m
        memory: 256Mi
      maxAllowed:
        cpu: 4
        memory: 8Gi
      controlledValues: RequestsAndLimits  # RequestsOnly, RequestsAndLimits
      mode: Auto  # Auto, Off

VPA Recommendation Only Mode:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: recommendation-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: database
  updatePolicy:
    updateMode: "Off"  # Only generate recommendations
  resourcePolicy:
    containerPolicies:
    - containerName: postgres
      controlledResources: ["cpu", "memory"]

VPA Monitoring and Recommendations

Viewing VPA Recommendations:

# Get VPA status and recommendations
kubectl get vpa
kubectl describe vpa web-app-vpa

# View detailed recommendations
kubectl get vpa web-app-vpa -o yaml

# Example output interpretation:
# recommendation:
#   containerRecommendations:
#   - containerName: web
#     lowerBound:     # Minimum recommended values
#       cpu: 50m
#       memory: 100Mi
#     target:         # Optimal recommended values
#       cpu: 100m
#       memory: 200Mi
#     upperBound:     # Maximum safe values before diminishing returns
#       cpu: 200m
#       memory: 400Mi

VPA Events and Troubleshooting:

# Monitor VPA events
kubectl get events --sort-by=.metadata.creationTimestamp | grep VerticalPodAutoscaler

# Check VPA controller logs
kubectl logs -n kube-system deployment/vpa-recommender
kubectl logs -n kube-system deployment/vpa-updater
kubectl logs -n kube-system deployment/vpa-admission-controller

# Debug pod evictions
kubectl get events --field-selector reason=Evicted

Cluster Autoscaler

Cluster Autoscaler Fundamentals

How Cluster Autoscaler Works:
1. Pod Scheduling Monitoring: Watches for pods that can't be scheduled due to resource constraints
2. Node Group Evaluation: Determines which node groups can accommodate pending pods
3. Scale-Up Decision: Adds nodes to node groups when pods are unschedulable
4. Scale-Down Decision: Removes underutilized nodes when resources are wasted
5. Cloud Provider Integration: Communicates with cloud APIs to manage node lifecycle

Scale-Up Triggers:
- Pods in Pending state due to insufficient resources
- Pods with resource requests that don't fit on existing nodes
- Pods with specific node affinity/anti-affinity rules

Scale-Down Triggers:
- Node utilization below threshold (default 50%) for scale-down delay period
- All pods on node can be rescheduled elsewhere
- No pods with local storage or specific node requirements

Cluster Autoscaler Configuration

AWS Cluster Autoscaler Example:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
  namespace: kube-system
spec:
  replicas: 1
  selector:
    matchLabels:
      app: cluster-autoscaler
  template:
    metadata:
      labels:
        app: cluster-autoscaler
    spec:
      serviceAccountName: cluster-autoscaler
      containers:
      - image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.21.0
        name: cluster-autoscaler
        resources:
          limits:
            cpu: 100m
            memory: 300Mi
          requests:
            cpu: 100m
            memory: 300Mi
        command:
        - ./cluster-autoscaler
        - --v=4
        - --stderrthreshold=info
        - --cloud-provider=aws
        - --skip-nodes-with-local-storage=false
        - --expander=least-waste
        - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/my-cluster
        - --balance-similar-node-groups
        - --scale-down-enabled=true
        - --scale-down-delay-after-add=10m
        - --scale-down-unneeded-time=10m
        - --scale-down-utilization-threshold=0.5
        - --max-node-provision-time=15m

Node Pool Annotations for Scaling Control:

# Node pool configuration (cloud provider specific)
metadata:
  annotations:
    # Prevent specific nodes from being scaled down
    "cluster-autoscaler.kubernetes.io/scale-down-disabled": "true"

    # Set custom utilization threshold for this node
    "cluster-autoscaler.kubernetes.io/scale-down-utilization-threshold": "0.3"

    # Specify maximum scale-down batch size
    "cluster-autoscaler.kubernetes.io/max-scale-down-parallelism": "2"

Pod Annotations for Scaling Behavior:

apiVersion: v1
kind: Pod
metadata:
  name: important-pod
  annotations:
    # Prevent this pod from triggering scale-down of its node
    "cluster-autoscaler.kubernetes.io/safe-to-evict": "false"
spec:
  containers:
  - name: app
    image: my-app:1.0

Cluster Autoscaler Tuning

Key Configuration Parameters:

# Scale-up parameters
--max-node-provision-time=15m     # Max time to wait for node to become ready
--scale-up-from-zero=true         # Allow scaling from 0 nodes

# Scale-down parameters
--scale-down-delay-after-add=10m      # Wait time after scale-up before considering scale-down
--scale-down-delay-after-delete=10s   # Wait time after node deletion
--scale-down-delay-after-failure=3m   # Wait time after failed scale-down
--scale-down-unneeded-time=10m        # How long node should be unneeded before removal
--scale-down-utilization-threshold=0.5 # Node utilization threshold for scale-down

# Expander strategies
--expander=least-waste           # Choose node group that wastes least resources
--expander=most-pods            # Choose node group that can schedule most pending pods
--expander=priority             # Use priority-based node group selection
--expander=random               # Random selection among valid node groups

Monitoring Cluster Autoscaler:

# Check cluster autoscaler status
kubectl get pods -n kube-system -l app=cluster-autoscaler
kubectl logs -n kube-system deployment/cluster-autoscaler

# Monitor scaling events
kubectl get events --sort-by=.metadata.creationTimestamp | grep -i "cluster-autoscaler"

# Check node status and capacity
kubectl get nodes -o wide
kubectl describe nodes | grep -A5 "Allocatable:"

# View pending pods (triggers for scale-up)
kubectl get pods --all-namespaces --field-selector=status.phase=Pending

Multi-Dimensional Scaling Strategies

Combining HPA, VPA, and Cluster Autoscaler

Coordinated Scaling Architecture:

# 1. VPA for right-sizing containers
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
    - containerName: web
      maxAllowed:
        cpu: 2
        memory: 4Gi

---
# 2. HPA for horizontal scaling based on load
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  minReplicas: 3
  maxReplicas: 50
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Pods
    pods:
      metric:
        name: requests_per_second
      target:
        type: AverageValue
        averageValue: "1000"

---
# 3. Cluster Autoscaler handles node scaling automatically
# (configured at cluster level)

Scaling Interaction Patterns:
1. VPA optimizes resource requests → More efficient pod packing
2. HPA scales pod replicas → Increased resource demand
3. Cluster Autoscaler adds nodes → Provides capacity for new pods
4. Cluster Autoscaler removes nodes → Optimizes costs during low load

Advanced Scaling Patterns

Predictive Scaling with CronJobs:

# Pre-scale for known traffic patterns
apiVersion: batch/v1
kind: CronJob
metadata:
  name: morning-scale-up
spec:
  schedule: "0 8 * * 1-5"  # 8 AM on weekdays
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: scaler
            image: bitnami/kubectl:latest
            command:
            - /bin/sh
            - -c
            - |
              kubectl scale deployment web-app --replicas=10
              kubectl scale deployment api-server --replicas=8
          restartPolicy: OnFailure

---
apiVersion: batch/v1
kind: CronJob
metadata:
  name: evening-scale-down
spec:
  schedule: "0 20 * * 1-5"  # 8 PM on weekdays
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: scaler
            image: bitnami/kubectl:latest
            command:
            - /bin/sh
            - -c
            - |
              kubectl scale deployment web-app --replicas=3
              kubectl scale deployment api-server --replicas=2
          restartPolicy: OnFailure

Multi-Tier Scaling Strategy:

# Frontend tier - aggressive scaling
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: frontend-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: frontend
  minReplicas: 3
  maxReplicas: 100
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50  # Aggressive scaling
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 30
      policies:
      - type: Percent
        value: 100
        periodSeconds: 30

---
# Backend tier - conservative scaling
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: backend-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: backend
  minReplicas: 2
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70  # Conservative scaling
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 120
      policies:
      - type: Pods
        value: 2
        periodSeconds: 60

Scaling Troubleshooting and Optimization

Common Scaling Issues

HPA Not Scaling:

# Debug HPA issues
kubectl describe hpa web-app-hpa

# Common issues and solutions:
# 1. Missing metrics server
kubectl get deployment metrics-server -n kube-system

# 2. Pods without resource requests
kubectl get deployment web-app -o yaml | grep -A10 resources:

# 3. Incorrect metric values
kubectl top pods -l app=web-app

# 4. HPA controller issues
kubectl logs -n kube-system deployment/horizontal-pod-autoscaler-controller

Cluster Autoscaler Not Scaling:

# Debug cluster autoscaler
kubectl logs -n kube-system deployment/cluster-autoscaler

# Check for common issues:
# 1. Pending pods
kubectl get pods --all-namespaces --field-selector=status.phase=Pending

# 2. Node group configuration
kubectl describe nodes | grep -E "(Taints|Labels)" | grep autoscaler

# 3. Pod disruption budgets blocking scale-down
kubectl get pdb --all-namespaces

# 4. Pods preventing node drainage
kubectl get pods --all-namespaces -o wide | grep <node-name>

Resource Contention During Scaling:

# Monitor resource usage during scaling
kubectl top nodes
kubectl top pods --all-namespaces --sort-by=memory

# Check for resource limits preventing scaling
kubectl describe nodes | grep -A5 "Allocated resources:"

# Identify pods consuming excessive resources
kubectl get pods --all-namespaces -o custom-columns=NAME:.metadata.name,NAMESPACE:.metadata.namespace,CPU:.spec.containers[*].resources.requests.cpu,MEMORY:.spec.containers[*].resources.requests.memory

Performance Optimization

Scaling Performance Metrics:

#!/bin/bash
# scaling-performance-test.sh

DEPLOYMENT="web-app"
TARGET_REPLICAS=20
START_TIME=$(date +%s)

echo "Starting scaling performance test..."
echo "Target: $TARGET_REPLICAS replicas for $DEPLOYMENT"

# Trigger scaling
kubectl scale deployment $DEPLOYMENT --replicas=$TARGET_REPLICAS

# Monitor scaling progress
while true; do
    CURRENT_REPLICAS=$(kubectl get deployment $DEPLOYMENT -o jsonpath='{.status.readyReplicas}')
    CURRENT_TIME=$(date +%s)
    ELAPSED=$((CURRENT_TIME - START_TIME))

    echo "Time: ${ELAPSED}s, Ready replicas: ${CURRENT_REPLICAS:-0}/$TARGET_REPLICAS"

    if [ "${CURRENT_REPLICAS:-0}" -eq "$TARGET_REPLICAS" ]; then
        echo "Scaling completed in ${ELAPSED} seconds"
        break
    fi

    if [ "$ELAPSED" -gt 300 ]; then
        echo "Scaling timed out after 5 minutes"
        exit 1
    fi

    sleep 5
done

# Verify all pods are ready
kubectl wait --for=condition=ready pod -l app=$DEPLOYMENT --timeout=300s
echo "All pods are ready"

Scaling Latency Optimization:

# Optimize pod startup time for faster scaling
apiVersion: apps/v1
kind: Deployment
metadata:
  name: fast-scaling-app
spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: app
        image: my-app:1.0
        # Optimize resource requests for faster scheduling
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
          limits:
            cpu: 500m
            memory: 512Mi
        # Fast readiness probe for quicker traffic serving
        readinessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 2
          timeoutSeconds: 1
        # Fast liveness probe
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 10
      # Use init containers for pre-warming
      initContainers:
      - name: cache-warmer
        image: cache-warmer:1.0
        command: ['warm-cache.sh']

Cost Optimization and Resource Management

Right-Sizing Strategies

Resource Utilization Analysis:

#!/bin/bash
# resource-utilization-report.sh

echo "=== Resource Utilization Report ==="

# Node utilization
echo "Node Resource Utilization:"
kubectl top nodes

# Pod resource requests vs usage
echo -e "\nPod Resource Analysis:"
kubectl get pods --all-namespaces -o custom-columns=\
"NAMESPACE:.metadata.namespace,\
NAME:.metadata.name,\
CPU_REQUEST:.spec.containers[*].resources.requests.cpu,\
MEMORY_REQUEST:.spec.containers[*].resources.requests.memory,\
CPU_USAGE:.status.containerStatuses[*].usage.cpu,\
MEMORY_USAGE:.status.containerStatuses[*].usage.memory"

# Identify over-provisioned pods
echo -e "\nOver-provisioned Pods (request > 2x usage):"
kubectl top pods --all-namespaces --containers | awk '
NR>1 {
    if ($3 > 0 && $4 > 0) {
        cpu_ratio = $3 / $4
        if (cpu_ratio > 2) {
            print $1 "/" $2 ": CPU over-provisioned by " cpu_ratio "x"
        }
    }
}'

# Cluster-wide resource summary
echo -e "\nCluster Resource Summary:"
kubectl describe nodes | grep -A4 "Allocated resources:" | grep -E "(cpu|memory)" | \
awk '{total+=$2; used+=$4} END {print "Total CPU: " total ", Used: " used ", Utilization: " (used/total*100) "%"}'

Cost-Aware Scaling Policies:

# Cost-optimized HPA with conservative scaling
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: cost-optimized-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 80  # Higher threshold to reduce over-provisioning
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 600  # 10 minutes - slow scale-down
      policies:
      - type: Percent
        value: 25
        periodSeconds: 120
    scaleUp:
      stabilizationWindowSeconds: 120  # 2 minutes - moderate scale-up
      policies:
      - type: Percent
        value: 50
        periodSeconds: 60

Spot Instance Integration

Spot Instance Node Pools for Scaling:

# Node affinity for spot instances
apiVersion: apps/v1
kind: Deployment
metadata:
  name: spot-tolerant-app
spec:
  replicas: 5
  template:
    spec:
      # Tolerate spot instance interruptions
      tolerations:
      - key: "spot"
        operator: "Equal"
        value: "true"
        effect: "NoSchedule"

      # Prefer spot instances but allow regular nodes
      affinity:
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 50
            preference:
              matchExpressions:
              - key: "node.kubernetes.io/instance-type"
                operator: In
                values: ["spot"]

      containers:
      - name: app
        image: resilient-app:1.0
        # App must handle graceful shutdowns
        lifecycle:
          preStop:
            exec:
              command: ["/bin/sh", "-c", "sleep 30"]

Spot Instance Drain Handling:

# DaemonSet to handle spot instance termination
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: spot-termination-handler
spec:
  selector:
    matchLabels:
      app: spot-termination-handler
  template:
    metadata:
      labels:
        app: spot-termination-handler
    spec:
      tolerations:
      - key: "spot"
        operator: "Exists"
        effect: "NoSchedule"
      containers:
      - name: spot-handler
        image: spot-termination-handler:1.0
        command:
        - /bin/sh
        - -c
        - |
          while true; do
            # Check for spot termination notice
            if curl -s http://169.254.169.254/latest/meta-data/spot/instance-action 2>/dev/null; then
              echo "Spot termination notice received, draining node..."
              kubectl drain $(hostname) --ignore-daemonsets --delete-emptydir-data --force --grace-period=30
              break
            fi
            sleep 5
          done
        securityContext:
          privileged: true
        volumeMounts:
        - name: kubectl
          mountPath: /usr/local/bin/kubectl
      volumes:
      - name: kubectl
        hostPath:
          path: /usr/local/bin/kubectl

Production Best Practices and Patterns

Scaling SLAs and Monitoring

Scaling Performance SLAs:

# ServiceLevelObjective for scaling performance
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: scaling-slo
spec:
  groups:
  - name: scaling-performance
    rules:
    - alert: SlowScaleUp
      expr: |
        (
          increase(kube_deployment_status_replicas[5m]) > 0
        ) and (
          (kube_deployment_status_replicas - kube_deployment_status_ready_replicas) > 5
        )
      for: 2m
      labels:
        severity: warning
      annotations:
        summary: "Deployment scaling is slow"
        description: "Deployment {{ $labels.deployment }} has been scaling for more than 2 minutes"

    - alert: HPANotScaling
      expr: |
        (
          kube_horizontalpodautoscaler_status_desired_replicas != kube_horizontalpodautoscaler_status_current_replicas
        ) and (
          changes(kube_horizontalpodautoscaler_status_current_replicas[10m]) == 0
        )
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "HPA not responding to scaling needs"
        description: "HPA {{ $labels.horizontalpodautoscaler }} has desired {{ $value }} replicas but hasn't scaled in 10 minutes"

Scaling Metrics Dashboard:

# Grafana dashboard queries for scaling metrics
scaling_responsiveness: |
  rate(kube_deployment_status_replicas[5m])

scaling_efficiency: |
  (
    sum by (deployment) (kube_deployment_status_ready_replicas) / 
    sum by (deployment) (kube_deployment_status_replicas)
  ) * 100

resource_utilization: |
  (
    sum(rate(container_cpu_usage_seconds_total[5m])) by (pod) / 
    sum(kube_pod_container_resource_requests{resource="cpu"}) by (pod)
  ) * 100

scaling_events: |
  increase(kube_hpa_status_current_replicas[1m])

Disaster Recovery and Scaling

Multi-Region Scaling Strategy:

# Primary region deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app-primary
  labels:
    region: primary
spec:
  replicas: 10
  template:
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: topology.kubernetes.io/region
                operator: In
                values: ["us-west-2"]

---
# Disaster recovery region deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app-dr
  labels:
    region: dr
spec:
  replicas: 2  # Minimal capacity in DR region
  template:
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: topology.kubernetes.io/region
                operator: In
                values: ["us-east-1"]

Emergency Scaling Procedures:

#!/bin/bash
# emergency-scale.sh

INCIDENT_TYPE="$1"  # traffic-spike, node-failure, region-outage

case "$INCIDENT_TYPE" in
  "traffic-spike")
    echo "Executing emergency scale-up for traffic spike..."
    kubectl scale deployment web-app --replicas=50
    kubectl scale deployment api-server --replicas=20
    kubectl patch hpa web-app-hpa -p '{"spec":{"maxReplicas":100}}'
    ;;

  "node-failure")
    echo "Responding to node failure..."
    # Force rescheduling of affected pods
    kubectl get pods --field-selector=status.phase=Pending -o name | xargs kubectl delete
    # Temporarily increase replica count to compensate
    kubectl scale deployment web-app --replicas=15
    ;;

  "region-outage")
    echo "Activating disaster recovery scaling..."
    kubectl scale deployment web-app-dr --replicas=10
    # Update ingress to route traffic to DR region
    kubectl patch ingress web-app-ingress -p '{"spec":{"rules":[{"host":"app.example.com","http":{"paths":[{"path":"/","pathType":"Prefix","backend":{"service":{"name":"web-app-dr-service","port":{"number":80}}}}]}}]}}'
    ;;

  *)
    echo "Unknown incident type. Available types: traffic-spike, node-failure, region-outage"
    exit 1
    ;;
esac

echo "Emergency scaling procedure completed"

Exam Tips

Essential Commands to Master

# Manual scaling
kubectl scale deployment app --replicas=5
kubectl scale deployment app --replicas=10 --timeout=300s

# HPA management
kubectl autoscale deployment app --min=2 --max=10 --cpu-percent=70
kubectl get hpa
kubectl describe hpa app-hpa

# Monitoring scaling
kubectl get deployments -w
kubectl top pods -l app=web-app
kubectl get events --sort-by=.metadata.creationTimestamp | grep -i scale

Key Concepts for Exam

  • Manual scaling is immediate but requires active management
  • HPA requires metrics server and resource requests on containers
  • VPA and HPA should not target the same resource (CPU/memory) simultaneously
  • Cluster Autoscaler works at node level, triggered by unschedulable pods
  • PodDisruptionBudgets can prevent scaling down

Common Exam Scenarios

  1. Manually scale a deployment to specific replica count
  2. Create HPA for deployment based on CPU utilization
  3. Troubleshoot HPA that's not scaling (missing metrics, no resource requests)
  4. Configure HPA with custom scaling behavior (min/max replicas, scale policies)
  5. Debug why cluster autoscaler isn't adding nodes

Time-Saving Shortcuts

# Quick HPA creation
kubectl autoscale deploy app --min=2 --max=10 --cpu-percent=70

# Fast scaling check
kubectl get deploy,hpa,pods -l app=myapp

# Quick resource verification
kubectl top pods -l app=myapp
kubectl describe deploy app | grep -A5 "Replicas:"

# Monitor scaling
kubectl get pods -l app=myapp --watch

Critical Details to Remember

  • HPA requires resource requests on containers to function
  • HPA default sync period is 15 seconds for metric collection
  • VPA in "Auto" mode will recreate pods to apply new resource requests
  • Cluster Autoscaler won't remove nodes with pods that have local storage
  • PodDisruptionBudgets can block both manual and automatic scaling
  • Default HPA behavior includes stabilization windows to prevent thrashing
  • Use kubectl wait to wait for scaling operations to complete
  • HPA metrics are averaged across all pods in the deployment
Last updated: 2025-08-26 20:00 UTC