CKA Study Guide: Application Scaling in Kubernetes
category: Kubernetes Certification
tags: cka, kubernetes, exam, kubectl, certification
The Economics of Scale: Why Application Scaling Matters
Application scaling in Kubernetes isn't just about handling more traffic—it's about optimizing resource utilization, cost efficiency, and user experience. Poor scaling strategies can lead to over-provisioning (wasted money) or under-provisioning (poor performance and lost revenue).
Understanding Scaling Trade-offs
Resource Efficiency vs Responsiveness:
- Aggressive scaling: Quick response to load changes, higher resource costs
- Conservative scaling: Lower resource costs, potential performance degradation during spikes
- Predictive scaling: Optimal efficiency but requires understanding traffic patterns
Horizontal vs Vertical Scaling:
- Horizontal (scale out): Add more instances, better fault tolerance, Kubernetes-native
- Vertical (scale up): Increase instance resources, simpler but limited and requires restarts
Cost Implications:
- Under-scaling: Lost revenue from poor performance (e.g., 100ms latency increase = 1% revenue loss for e-commerce)
- Over-scaling: Wasted infrastructure costs (typical organizations waste 30-50% of cloud resources)
- Scaling delays: Slow scaling reactions can cascade into system-wide outages
Kubernetes Scaling Architecture
Kubernetes provides multiple scaling mechanisms that work together:
┌─────────────────────────────────────────────────────────────┐
│ Cluster Autoscaler │
│ (scales nodes) │
└─────────────────────┬───────────────────────────────────────┘
│
┌─────────────────────┴───────────────────────────────────────┐
│ Horizontal Pod Autoscaler (HPA) │
│ (scales pod replicas) │
└─────────────────────┬───────────────────────────────────────┘
│
┌─────────────────────┴───────────────────────────────────────┐
│ Vertical Pod Autoscaler (VPA) │
│ (scales pod resources) │
└─────────────────────┬───────────────────────────────────────┘
│
┌─────────────────────┴───────────────────────────────────────┐
│ Application │
│ (deployments/pods) │
└─────────────────────────────────────────────────────────────┘
This hierarchical approach allows optimization at multiple levels simultaneously.
Manual Scaling Operations
Basic Scaling Commands
Immediate Scaling:
# Scale deployment to specific replica count
kubectl scale deployment web-app --replicas=5
# Scale multiple deployments
kubectl scale deployment web-app api-server --replicas=3
# Scale by label selector
kubectl scale deployments -l app=frontend --replicas=4
# Scale ReplicaSet directly (not recommended)
kubectl scale rs web-app-5d4f8c7b9 --replicas=2
# Scale StatefulSet
kubectl scale statefulset database --replicas=3
Current Scaling Information:
# Check current replica status
kubectl get deployments
kubectl get deployment web-app -o wide
# Detailed scaling information
kubectl describe deployment web-app | grep -A5 "Replicas:"
# Historical scaling events
kubectl get events --sort-by=.metadata.creationTimestamp | grep -i scale
Understanding Scaling Behavior
How Kubernetes Handles Scaling:
1. Scale Up: Creates new pods immediately, subject to resource availability
2. Scale Down: Respects graceful termination periods and PodDisruptionBudgets
3. Rolling Updates: Scaling can happen simultaneously with deployments
4. Resource Constraints: Scaling blocked if insufficient cluster resources
Scaling Events and Timeline:
# Monitor scaling in real-time
kubectl get pods -l app=web-app --watch
# Check resource allocation during scaling
kubectl top nodes
kubectl top pods -l app=web-app
# Verify scaling completion
kubectl rollout status deployment/web-app
Scale-Down Behavior and Pod Selection:
# Kubernetes selects pods for termination based on:
# 1. Pods in Pending state (unscheduled)
# 2. Pods with lower priority class
# 3. Pods using more resources than requested
# 4. Pods that have been running longer
# 5. Random selection among equivalent pods
# Control termination order with priorities
apiVersion: v1
kind: Pod
metadata:
name: high-priority-pod
spec:
priorityClassName: high-priority
containers:
- name: app
image: nginx:1.21
Horizontal Pod Autoscaler (HPA)
HPA Fundamentals and Architecture
How HPA Works:
1. Metrics Collection: HPA controller queries metrics server every 15 seconds
2. Target Calculation: Compares current metrics to target values
3. Scaling Decision: Calculates desired replica count using algorithms
4. Scale Execution: Updates deployment replica count if change needed
5. Stabilization: Waits for cooldown periods to prevent thrashing
HPA Controller Algorithm:
desiredReplicas = ceil[currentReplicas * (currentMetricValue / targetMetricValue)]
Example:
- Current replicas: 3
- Current CPU utilization: 80%
- Target CPU utilization: 50%
- Desired replicas = ceil[3 * (80/50)] = ceil[4.8] = 5
Basic HPA Configuration
CPU-Based HPA:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
behavior:
scaleDown:
stabilizationWindowSeconds: 300 # 5 minutes
policies:
- type: Percent
value: 50
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 60 # 1 minute
policies:
- type: Percent
value: 100
periodSeconds: 60
Memory-Based HPA:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: memory-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: memory-intensive-app
minReplicas: 1
maxReplicas: 8
metrics:
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
Advanced HPA Metrics
Custom Metrics HPA:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: custom-metrics-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
minReplicas: 3
maxReplicas: 20
metrics:
# CPU utilization
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60
# Custom application metric (requests per second)
- type: Pods
pods:
metric:
name: requests_per_second
target:
type: AverageValue
averageValue: "1000"
# External metric (SQS queue length)
- type: External
external:
metric:
name: sqs_queue_length
selector:
matchLabels:
queue: "processing-queue"
target:
type: Value
value: "100"
Object Metrics HPA:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: object-metrics-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-frontend
minReplicas: 2
maxReplicas: 15
metrics:
# Scale based on ingress requests per second
- type: Object
object:
metric:
name: requests_per_second
describedObject:
apiVersion: networking.k8s.io/v1
kind: Ingress
name: web-frontend-ingress
target:
type: Value
value: "10k"
HPA Behavior and Tuning
Understanding HPA Behavior Settings:
spec:
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
# Don't scale down more than 50% of current replicas in 1 minute
- type: Percent
value: 50
periodSeconds: 60
# Don't scale down more than 2 pods in 1 minute
- type: Pods
value: 2
periodSeconds: 60
selectPolicy: Min # Use the most restrictive policy
scaleUp:
stabilizationWindowSeconds: 60
policies:
# Allow doubling replicas in 1 minute
- type: Percent
value: 100
periodSeconds: 60
# Allow adding up to 4 pods in 1 minute
- type: Pods
value: 4
periodSeconds: 60
selectPolicy: Max # Use the most aggressive policy
HPA Tuning Parameters:
Parameter | Default | Purpose | Tuning Guidelines |
---|---|---|---|
--horizontal-pod-autoscaler-sync-period |
15s | How often HPA evaluates | Decrease for faster response |
--horizontal-pod-autoscaler-upscale-delay |
3m | Delay before scale up | Increase for bursty workloads |
--horizontal-pod-autoscaler-downscale-delay |
5m | Delay before scale down | Increase for stable workloads |
--horizontal-pod-autoscaler-tolerance |
0.1 | Tolerance for metric changes | Increase to reduce scaling frequency |
HPA Status and Monitoring:
# Check HPA status
kubectl get hpa
kubectl describe hpa web-app-hpa
# Monitor HPA events
kubectl get events --sort-by=.metadata.creationTimestamp | grep HorizontalPodAutoscaler
# Debug HPA issues
kubectl logs -n kube-system deployment/metrics-server
kubectl top pods -l app=web-app
# Manual HPA testing
kubectl run load-generator --image=busybox --restart=Never -- /bin/sh -c "while true; do wget -q -O- http://web-app-service; done"
Vertical Pod Autoscaler (VPA)
VPA Concepts and Components
VPA Architecture:
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ VPA Recommender │ │ VPA Updater │ │ VPA Admission │
│ (analyzes │ │ (evicts pods │ │ Controller │
│ resource │ │ for resize) │ │ (sets new │
│ usage) │ │ │ │ requests) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │ │
└────────────────────────┼────────────────────────┘
│
┌─────────────────┐
│ Metrics Server │
│ (resource │
│ usage data) │
└─────────────────┘
VPA Modes:
- "Off": Only provides recommendations, doesn't modify pods
- "Initial": Sets resource requests only for new pods
- "Auto": Automatically updates resource requests and recreates pods
- "Recreation": Similar to Auto but explicitly recreates pods
VPA Configuration
Basic VPA Setup:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: web-app-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
updatePolicy:
updateMode: "Auto" # Off, Initial, Auto
resourcePolicy:
containerPolicies:
- containerName: web
minAllowed:
cpu: 100m
memory: 128Mi
maxAllowed:
cpu: 2
memory: 2Gi
controlledResources: ["cpu", "memory"]
VPA with Resource Bounds:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: bounded-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
updatePolicy:
updateMode: "Auto"
resourcePolicy:
containerPolicies:
- containerName: api
minAllowed:
cpu: 200m
memory: 256Mi
maxAllowed:
cpu: 4
memory: 8Gi
controlledValues: RequestsAndLimits # RequestsOnly, RequestsAndLimits
mode: Auto # Auto, Off
VPA Recommendation Only Mode:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: recommendation-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: database
updatePolicy:
updateMode: "Off" # Only generate recommendations
resourcePolicy:
containerPolicies:
- containerName: postgres
controlledResources: ["cpu", "memory"]
VPA Monitoring and Recommendations
Viewing VPA Recommendations:
# Get VPA status and recommendations
kubectl get vpa
kubectl describe vpa web-app-vpa
# View detailed recommendations
kubectl get vpa web-app-vpa -o yaml
# Example output interpretation:
# recommendation:
# containerRecommendations:
# - containerName: web
# lowerBound: # Minimum recommended values
# cpu: 50m
# memory: 100Mi
# target: # Optimal recommended values
# cpu: 100m
# memory: 200Mi
# upperBound: # Maximum safe values before diminishing returns
# cpu: 200m
# memory: 400Mi
VPA Events and Troubleshooting:
# Monitor VPA events
kubectl get events --sort-by=.metadata.creationTimestamp | grep VerticalPodAutoscaler
# Check VPA controller logs
kubectl logs -n kube-system deployment/vpa-recommender
kubectl logs -n kube-system deployment/vpa-updater
kubectl logs -n kube-system deployment/vpa-admission-controller
# Debug pod evictions
kubectl get events --field-selector reason=Evicted
Cluster Autoscaler
Cluster Autoscaler Fundamentals
How Cluster Autoscaler Works:
1. Pod Scheduling Monitoring: Watches for pods that can't be scheduled due to resource constraints
2. Node Group Evaluation: Determines which node groups can accommodate pending pods
3. Scale-Up Decision: Adds nodes to node groups when pods are unschedulable
4. Scale-Down Decision: Removes underutilized nodes when resources are wasted
5. Cloud Provider Integration: Communicates with cloud APIs to manage node lifecycle
Scale-Up Triggers:
- Pods in Pending state due to insufficient resources
- Pods with resource requests that don't fit on existing nodes
- Pods with specific node affinity/anti-affinity rules
Scale-Down Triggers:
- Node utilization below threshold (default 50%) for scale-down delay period
- All pods on node can be rescheduled elsewhere
- No pods with local storage or specific node requirements
Cluster Autoscaler Configuration
AWS Cluster Autoscaler Example:
apiVersion: apps/v1
kind: Deployment
metadata:
name: cluster-autoscaler
namespace: kube-system
spec:
replicas: 1
selector:
matchLabels:
app: cluster-autoscaler
template:
metadata:
labels:
app: cluster-autoscaler
spec:
serviceAccountName: cluster-autoscaler
containers:
- image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.21.0
name: cluster-autoscaler
resources:
limits:
cpu: 100m
memory: 300Mi
requests:
cpu: 100m
memory: 300Mi
command:
- ./cluster-autoscaler
- --v=4
- --stderrthreshold=info
- --cloud-provider=aws
- --skip-nodes-with-local-storage=false
- --expander=least-waste
- --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/my-cluster
- --balance-similar-node-groups
- --scale-down-enabled=true
- --scale-down-delay-after-add=10m
- --scale-down-unneeded-time=10m
- --scale-down-utilization-threshold=0.5
- --max-node-provision-time=15m
Node Pool Annotations for Scaling Control:
# Node pool configuration (cloud provider specific)
metadata:
annotations:
# Prevent specific nodes from being scaled down
"cluster-autoscaler.kubernetes.io/scale-down-disabled": "true"
# Set custom utilization threshold for this node
"cluster-autoscaler.kubernetes.io/scale-down-utilization-threshold": "0.3"
# Specify maximum scale-down batch size
"cluster-autoscaler.kubernetes.io/max-scale-down-parallelism": "2"
Pod Annotations for Scaling Behavior:
apiVersion: v1
kind: Pod
metadata:
name: important-pod
annotations:
# Prevent this pod from triggering scale-down of its node
"cluster-autoscaler.kubernetes.io/safe-to-evict": "false"
spec:
containers:
- name: app
image: my-app:1.0
Cluster Autoscaler Tuning
Key Configuration Parameters:
# Scale-up parameters
--max-node-provision-time=15m # Max time to wait for node to become ready
--scale-up-from-zero=true # Allow scaling from 0 nodes
# Scale-down parameters
--scale-down-delay-after-add=10m # Wait time after scale-up before considering scale-down
--scale-down-delay-after-delete=10s # Wait time after node deletion
--scale-down-delay-after-failure=3m # Wait time after failed scale-down
--scale-down-unneeded-time=10m # How long node should be unneeded before removal
--scale-down-utilization-threshold=0.5 # Node utilization threshold for scale-down
# Expander strategies
--expander=least-waste # Choose node group that wastes least resources
--expander=most-pods # Choose node group that can schedule most pending pods
--expander=priority # Use priority-based node group selection
--expander=random # Random selection among valid node groups
Monitoring Cluster Autoscaler:
# Check cluster autoscaler status
kubectl get pods -n kube-system -l app=cluster-autoscaler
kubectl logs -n kube-system deployment/cluster-autoscaler
# Monitor scaling events
kubectl get events --sort-by=.metadata.creationTimestamp | grep -i "cluster-autoscaler"
# Check node status and capacity
kubectl get nodes -o wide
kubectl describe nodes | grep -A5 "Allocatable:"
# View pending pods (triggers for scale-up)
kubectl get pods --all-namespaces --field-selector=status.phase=Pending
Multi-Dimensional Scaling Strategies
Combining HPA, VPA, and Cluster Autoscaler
Coordinated Scaling Architecture:
# 1. VPA for right-sizing containers
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: app-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
updatePolicy:
updateMode: "Auto"
resourcePolicy:
containerPolicies:
- containerName: web
maxAllowed:
cpu: 2
memory: 4Gi
---
# 2. HPA for horizontal scaling based on load
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
minReplicas: 3
maxReplicas: 50
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Pods
pods:
metric:
name: requests_per_second
target:
type: AverageValue
averageValue: "1000"
---
# 3. Cluster Autoscaler handles node scaling automatically
# (configured at cluster level)
Scaling Interaction Patterns:
1. VPA optimizes resource requests → More efficient pod packing
2. HPA scales pod replicas → Increased resource demand
3. Cluster Autoscaler adds nodes → Provides capacity for new pods
4. Cluster Autoscaler removes nodes → Optimizes costs during low load
Advanced Scaling Patterns
Predictive Scaling with CronJobs:
# Pre-scale for known traffic patterns
apiVersion: batch/v1
kind: CronJob
metadata:
name: morning-scale-up
spec:
schedule: "0 8 * * 1-5" # 8 AM on weekdays
jobTemplate:
spec:
template:
spec:
containers:
- name: scaler
image: bitnami/kubectl:latest
command:
- /bin/sh
- -c
- |
kubectl scale deployment web-app --replicas=10
kubectl scale deployment api-server --replicas=8
restartPolicy: OnFailure
---
apiVersion: batch/v1
kind: CronJob
metadata:
name: evening-scale-down
spec:
schedule: "0 20 * * 1-5" # 8 PM on weekdays
jobTemplate:
spec:
template:
spec:
containers:
- name: scaler
image: bitnami/kubectl:latest
command:
- /bin/sh
- -c
- |
kubectl scale deployment web-app --replicas=3
kubectl scale deployment api-server --replicas=2
restartPolicy: OnFailure
Multi-Tier Scaling Strategy:
# Frontend tier - aggressive scaling
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: frontend-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: frontend
minReplicas: 3
maxReplicas: 100
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50 # Aggressive scaling
behavior:
scaleUp:
stabilizationWindowSeconds: 30
policies:
- type: Percent
value: 100
periodSeconds: 30
---
# Backend tier - conservative scaling
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: backend-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: backend
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70 # Conservative scaling
behavior:
scaleUp:
stabilizationWindowSeconds: 120
policies:
- type: Pods
value: 2
periodSeconds: 60
Scaling Troubleshooting and Optimization
Common Scaling Issues
HPA Not Scaling:
# Debug HPA issues
kubectl describe hpa web-app-hpa
# Common issues and solutions:
# 1. Missing metrics server
kubectl get deployment metrics-server -n kube-system
# 2. Pods without resource requests
kubectl get deployment web-app -o yaml | grep -A10 resources:
# 3. Incorrect metric values
kubectl top pods -l app=web-app
# 4. HPA controller issues
kubectl logs -n kube-system deployment/horizontal-pod-autoscaler-controller
Cluster Autoscaler Not Scaling:
# Debug cluster autoscaler
kubectl logs -n kube-system deployment/cluster-autoscaler
# Check for common issues:
# 1. Pending pods
kubectl get pods --all-namespaces --field-selector=status.phase=Pending
# 2. Node group configuration
kubectl describe nodes | grep -E "(Taints|Labels)" | grep autoscaler
# 3. Pod disruption budgets blocking scale-down
kubectl get pdb --all-namespaces
# 4. Pods preventing node drainage
kubectl get pods --all-namespaces -o wide | grep <node-name>
Resource Contention During Scaling:
# Monitor resource usage during scaling
kubectl top nodes
kubectl top pods --all-namespaces --sort-by=memory
# Check for resource limits preventing scaling
kubectl describe nodes | grep -A5 "Allocated resources:"
# Identify pods consuming excessive resources
kubectl get pods --all-namespaces -o custom-columns=NAME:.metadata.name,NAMESPACE:.metadata.namespace,CPU:.spec.containers[*].resources.requests.cpu,MEMORY:.spec.containers[*].resources.requests.memory
Performance Optimization
Scaling Performance Metrics:
#!/bin/bash
# scaling-performance-test.sh
DEPLOYMENT="web-app"
TARGET_REPLICAS=20
START_TIME=$(date +%s)
echo "Starting scaling performance test..."
echo "Target: $TARGET_REPLICAS replicas for $DEPLOYMENT"
# Trigger scaling
kubectl scale deployment $DEPLOYMENT --replicas=$TARGET_REPLICAS
# Monitor scaling progress
while true; do
CURRENT_REPLICAS=$(kubectl get deployment $DEPLOYMENT -o jsonpath='{.status.readyReplicas}')
CURRENT_TIME=$(date +%s)
ELAPSED=$((CURRENT_TIME - START_TIME))
echo "Time: ${ELAPSED}s, Ready replicas: ${CURRENT_REPLICAS:-0}/$TARGET_REPLICAS"
if [ "${CURRENT_REPLICAS:-0}" -eq "$TARGET_REPLICAS" ]; then
echo "Scaling completed in ${ELAPSED} seconds"
break
fi
if [ "$ELAPSED" -gt 300 ]; then
echo "Scaling timed out after 5 minutes"
exit 1
fi
sleep 5
done
# Verify all pods are ready
kubectl wait --for=condition=ready pod -l app=$DEPLOYMENT --timeout=300s
echo "All pods are ready"
Scaling Latency Optimization:
# Optimize pod startup time for faster scaling
apiVersion: apps/v1
kind: Deployment
metadata:
name: fast-scaling-app
spec:
replicas: 3
template:
spec:
containers:
- name: app
image: my-app:1.0
# Optimize resource requests for faster scheduling
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
# Fast readiness probe for quicker traffic serving
readinessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 5
periodSeconds: 2
timeoutSeconds: 1
# Fast liveness probe
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 10
periodSeconds: 10
# Use init containers for pre-warming
initContainers:
- name: cache-warmer
image: cache-warmer:1.0
command: ['warm-cache.sh']
Cost Optimization and Resource Management
Right-Sizing Strategies
Resource Utilization Analysis:
#!/bin/bash
# resource-utilization-report.sh
echo "=== Resource Utilization Report ==="
# Node utilization
echo "Node Resource Utilization:"
kubectl top nodes
# Pod resource requests vs usage
echo -e "\nPod Resource Analysis:"
kubectl get pods --all-namespaces -o custom-columns=\
"NAMESPACE:.metadata.namespace,\
NAME:.metadata.name,\
CPU_REQUEST:.spec.containers[*].resources.requests.cpu,\
MEMORY_REQUEST:.spec.containers[*].resources.requests.memory,\
CPU_USAGE:.status.containerStatuses[*].usage.cpu,\
MEMORY_USAGE:.status.containerStatuses[*].usage.memory"
# Identify over-provisioned pods
echo -e "\nOver-provisioned Pods (request > 2x usage):"
kubectl top pods --all-namespaces --containers | awk '
NR>1 {
if ($3 > 0 && $4 > 0) {
cpu_ratio = $3 / $4
if (cpu_ratio > 2) {
print $1 "/" $2 ": CPU over-provisioned by " cpu_ratio "x"
}
}
}'
# Cluster-wide resource summary
echo -e "\nCluster Resource Summary:"
kubectl describe nodes | grep -A4 "Allocated resources:" | grep -E "(cpu|memory)" | \
awk '{total+=$2; used+=$4} END {print "Total CPU: " total ", Used: " used ", Utilization: " (used/total*100) "%"}'
Cost-Aware Scaling Policies:
# Cost-optimized HPA with conservative scaling
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: cost-optimized-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 80 # Higher threshold to reduce over-provisioning
behavior:
scaleDown:
stabilizationWindowSeconds: 600 # 10 minutes - slow scale-down
policies:
- type: Percent
value: 25
periodSeconds: 120
scaleUp:
stabilizationWindowSeconds: 120 # 2 minutes - moderate scale-up
policies:
- type: Percent
value: 50
periodSeconds: 60
Spot Instance Integration
Spot Instance Node Pools for Scaling:
# Node affinity for spot instances
apiVersion: apps/v1
kind: Deployment
metadata:
name: spot-tolerant-app
spec:
replicas: 5
template:
spec:
# Tolerate spot instance interruptions
tolerations:
- key: "spot"
operator: "Equal"
value: "true"
effect: "NoSchedule"
# Prefer spot instances but allow regular nodes
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 50
preference:
matchExpressions:
- key: "node.kubernetes.io/instance-type"
operator: In
values: ["spot"]
containers:
- name: app
image: resilient-app:1.0
# App must handle graceful shutdowns
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 30"]
Spot Instance Drain Handling:
# DaemonSet to handle spot instance termination
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: spot-termination-handler
spec:
selector:
matchLabels:
app: spot-termination-handler
template:
metadata:
labels:
app: spot-termination-handler
spec:
tolerations:
- key: "spot"
operator: "Exists"
effect: "NoSchedule"
containers:
- name: spot-handler
image: spot-termination-handler:1.0
command:
- /bin/sh
- -c
- |
while true; do
# Check for spot termination notice
if curl -s http://169.254.169.254/latest/meta-data/spot/instance-action 2>/dev/null; then
echo "Spot termination notice received, draining node..."
kubectl drain $(hostname) --ignore-daemonsets --delete-emptydir-data --force --grace-period=30
break
fi
sleep 5
done
securityContext:
privileged: true
volumeMounts:
- name: kubectl
mountPath: /usr/local/bin/kubectl
volumes:
- name: kubectl
hostPath:
path: /usr/local/bin/kubectl
Production Best Practices and Patterns
Scaling SLAs and Monitoring
Scaling Performance SLAs:
# ServiceLevelObjective for scaling performance
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: scaling-slo
spec:
groups:
- name: scaling-performance
rules:
- alert: SlowScaleUp
expr: |
(
increase(kube_deployment_status_replicas[5m]) > 0
) and (
(kube_deployment_status_replicas - kube_deployment_status_ready_replicas) > 5
)
for: 2m
labels:
severity: warning
annotations:
summary: "Deployment scaling is slow"
description: "Deployment {{ $labels.deployment }} has been scaling for more than 2 minutes"
- alert: HPANotScaling
expr: |
(
kube_horizontalpodautoscaler_status_desired_replicas != kube_horizontalpodautoscaler_status_current_replicas
) and (
changes(kube_horizontalpodautoscaler_status_current_replicas[10m]) == 0
)
for: 5m
labels:
severity: warning
annotations:
summary: "HPA not responding to scaling needs"
description: "HPA {{ $labels.horizontalpodautoscaler }} has desired {{ $value }} replicas but hasn't scaled in 10 minutes"
Scaling Metrics Dashboard:
# Grafana dashboard queries for scaling metrics
scaling_responsiveness: |
rate(kube_deployment_status_replicas[5m])
scaling_efficiency: |
(
sum by (deployment) (kube_deployment_status_ready_replicas) /
sum by (deployment) (kube_deployment_status_replicas)
) * 100
resource_utilization: |
(
sum(rate(container_cpu_usage_seconds_total[5m])) by (pod) /
sum(kube_pod_container_resource_requests{resource="cpu"}) by (pod)
) * 100
scaling_events: |
increase(kube_hpa_status_current_replicas[1m])
Disaster Recovery and Scaling
Multi-Region Scaling Strategy:
# Primary region deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-app-primary
labels:
region: primary
spec:
replicas: 10
template:
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: topology.kubernetes.io/region
operator: In
values: ["us-west-2"]
---
# Disaster recovery region deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-app-dr
labels:
region: dr
spec:
replicas: 2 # Minimal capacity in DR region
template:
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: topology.kubernetes.io/region
operator: In
values: ["us-east-1"]
Emergency Scaling Procedures:
#!/bin/bash
# emergency-scale.sh
INCIDENT_TYPE="$1" # traffic-spike, node-failure, region-outage
case "$INCIDENT_TYPE" in
"traffic-spike")
echo "Executing emergency scale-up for traffic spike..."
kubectl scale deployment web-app --replicas=50
kubectl scale deployment api-server --replicas=20
kubectl patch hpa web-app-hpa -p '{"spec":{"maxReplicas":100}}'
;;
"node-failure")
echo "Responding to node failure..."
# Force rescheduling of affected pods
kubectl get pods --field-selector=status.phase=Pending -o name | xargs kubectl delete
# Temporarily increase replica count to compensate
kubectl scale deployment web-app --replicas=15
;;
"region-outage")
echo "Activating disaster recovery scaling..."
kubectl scale deployment web-app-dr --replicas=10
# Update ingress to route traffic to DR region
kubectl patch ingress web-app-ingress -p '{"spec":{"rules":[{"host":"app.example.com","http":{"paths":[{"path":"/","pathType":"Prefix","backend":{"service":{"name":"web-app-dr-service","port":{"number":80}}}}]}}]}}'
;;
*)
echo "Unknown incident type. Available types: traffic-spike, node-failure, region-outage"
exit 1
;;
esac
echo "Emergency scaling procedure completed"
Exam Tips
Essential Commands to Master
# Manual scaling
kubectl scale deployment app --replicas=5
kubectl scale deployment app --replicas=10 --timeout=300s
# HPA management
kubectl autoscale deployment app --min=2 --max=10 --cpu-percent=70
kubectl get hpa
kubectl describe hpa app-hpa
# Monitoring scaling
kubectl get deployments -w
kubectl top pods -l app=web-app
kubectl get events --sort-by=.metadata.creationTimestamp | grep -i scale
Key Concepts for Exam
- Manual scaling is immediate but requires active management
- HPA requires metrics server and resource requests on containers
- VPA and HPA should not target the same resource (CPU/memory) simultaneously
- Cluster Autoscaler works at node level, triggered by unschedulable pods
- PodDisruptionBudgets can prevent scaling down
Common Exam Scenarios
- Manually scale a deployment to specific replica count
- Create HPA for deployment based on CPU utilization
- Troubleshoot HPA that's not scaling (missing metrics, no resource requests)
- Configure HPA with custom scaling behavior (min/max replicas, scale policies)
- Debug why cluster autoscaler isn't adding nodes
Time-Saving Shortcuts
# Quick HPA creation
kubectl autoscale deploy app --min=2 --max=10 --cpu-percent=70
# Fast scaling check
kubectl get deploy,hpa,pods -l app=myapp
# Quick resource verification
kubectl top pods -l app=myapp
kubectl describe deploy app | grep -A5 "Replicas:"
# Monitor scaling
kubectl get pods -l app=myapp --watch
Critical Details to Remember
- HPA requires resource requests on containers to function
- HPA default sync period is 15 seconds for metric collection
- VPA in "Auto" mode will recreate pods to apply new resource requests
- Cluster Autoscaler won't remove nodes with pods that have local storage
- PodDisruptionBudgets can block both manual and automatic scaling
- Default HPA behavior includes stabilization windows to prevent thrashing
- Use
kubectl wait
to wait for scaling operations to complete - HPA metrics are averaged across all pods in the deployment