Home → NOTES → Kubernetes Storage: Complete Deep Technical Guide

Kubernetes Storage: Complete Deep Technical Guide

category: DevOps
tags: kubernetes, storage, persistent-volumes, pvc, storage-classes, statefulsets, volumes

Introduction to Kubernetes Storage

Kubernetes storage solves the fundamental problem that containers are ephemeral - when a container restarts, all data inside it is lost. Storage in Kubernetes provides ways to persist data beyond the container lifecycle and share data between containers.

The Container Storage Problem

Without Persistent Storage:

Container starts → Creates data → Container dies → Data is LOST forever

With Persistent Storage:

Container starts → Mounts persistent volume → Creates data → Container dies → Data SURVIVES
New container starts → Mounts same volume → Sees previous data → Continues working

Kubernetes Storage Concepts Hierarchy

Physical Storage (Disk, NFS, Cloud Storage)
↓
Persistent Volume (PV) - Cluster resource representing actual storage
↓
Persistent Volume Claim (PVC) - User's request for storage
↓
Pod Volume Mount - How pods access the claimed storage

Key Insight: Kubernetes separates the implementation (how storage actually works) from the interface (how users request storage). This allows the same application to work with different storage backends.

Persistent Volumes (PV) Deep Dive

What Persistent Volumes Actually Are

A Persistent Volume (PV) is a Kubernetes API object that represents a piece of actual storage that exists somewhere - on a local disk, network file system, or cloud storage service. Think of it like a "storage ID card" that tells Kubernetes where to find real storage.

PV vs Actual Storage:
- Actual Storage - The real disk/SSD/NFS share that stores bytes
- Persistent Volume - Kubernetes object that points to that storage and describes its properties

What PVs Contain:
- Capacity - How much storage space is available
- Access Modes - How the storage can be accessed (read-write, read-only, etc.)
- Volume Plugin - Which technology provides the storage (NFS, AWS EBS, etc.)
- Connection Details - How to actually connect to the storage
- Reclaim Policy - What happens to data when PV is released

PV Lifecycle States

Available - PV exists and is ready to be claimed

apiVersion: v1
kind: PersistentVolume
metadata:
  name: my-pv
status:
  phase: Available  # Ready for use, no PVC bound to it

Bound - PV is claimed by a PVC and in use

status:
  phase: Bound
  claimRef:
    name: my-pvc
    namespace: production

Released - PVC was deleted but PV still contains data

status:
  phase: Released  # PVC gone, but data still exists

Failed - PV has a problem and can't be used

status:
  phase: Failed
  message: "Failed to mount volume"

Access Modes Explained

ReadWriteOnce (RWO) - Can be mounted read-write by one node only
- Most common for databases, single-instance applications
- Examples: AWS EBS, Google Persistent Disk, Azure Disk
- Important: "One node" not "one pod" - multiple pods on same node can share

apiVersion: v1
kind: PersistentVolume
metadata:
  name: database-pv
spec:
  capacity:
    storage: 100Gi
  accessModes:
  - ReadWriteOnce  # Only one node can mount this
  awsElasticBlockStore:
    volumeID: vol-12345678
    fsType: ext4

ReadOnlyMany (ROX) - Can be mounted read-only by multiple nodes
- Good for static content, configuration files, shared data
- Examples: NFS, CephFS (read-only mode)

apiVersion: v1
kind: PersistentVolume
metadata:
  name: static-content-pv
spec:
  capacity:
    storage: 50Gi
  accessModes:
  - ReadOnlyMany  # Multiple nodes can read
  nfs:
    server: nfs-server.company.com
    path: /exports/static-content

ReadWriteMany (RWX) - Can be mounted read-write by multiple nodes
- Required for shared file systems, multi-writer scenarios
- Examples: NFS, CephFS, Azure Files
- Warning: Not all storage types support this!

apiVersion: v1
kind: PersistentVolume
metadata:
  name: shared-data-pv
spec:
  capacity:
    storage: 200Gi
  accessModes:
  - ReadWriteMany  # Multiple nodes can read AND write
  nfs:
    server: nfs-server.company.com
    path: /exports/shared-data

ReadWriteOncePod (RWOP) - Can be mounted read-write by one pod only
- Newest access mode, strictest restriction
- Ensures only one pod can write at a time
- Good for databases that can't handle multiple writers

Reclaim Policies Explained

Retain - Keep data when PVC is deleted (manual cleanup required)

apiVersion: v1
kind: PersistentVolume
metadata:
  name: important-data-pv
spec:
  persistentVolumeReclaimPolicy: Retain  # Data survives PVC deletion
  capacity:
    storage: 100Gi
  # When PVC is deleted:
  # 1. PV status becomes "Released"
  # 2. Data still exists on storage
  # 3. Admin must manually clean up and make PV "Available" again

Delete - Delete underlying storage when PVC is deleted

apiVersion: v1
kind: PersistentVolume
metadata:
  name: temp-data-pv
spec:
  persistentVolumeReclaimPolicy: Delete  # Storage gets deleted too
  capacity:
    storage: 50Gi
  # When PVC is deleted:
  # 1. PV gets deleted
  # 2. Underlying storage (AWS EBS, etc.) gets deleted
  # 3. Data is PERMANENTLY LOST

Recycle - Wipe data and make PV available again (deprecated)

# DON'T USE - deprecated and dangerous
persistentVolumeReclaimPolicy: Recycle  # Runs "rm -rf" on data

PV Examples by Storage Type

Local Storage PV

apiVersion: v1
kind: PersistentVolume
metadata:
  name: local-storage-pv
spec:
  capacity:
    storage: 100Gi
  accessModes:
  - ReadWriteOnce
  persistentVolumeReclaimPolicy: Delete
  storageClassName: local-storage
  local:
    path: /mnt/ssd1  # Path on the node
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - worker-node-1  # Only available on this specific node

NFS PV

apiVersion: v1
kind: PersistentVolume
metadata:
  name: nfs-pv
spec:
  capacity:
    storage: 500Gi
  accessModes:
  - ReadWriteMany  # NFS supports multiple writers
  persistentVolumeReclaimPolicy: Retain
  nfs:
    server: 192.168.1.100
    path: /exports/kubernetes-data
  mountOptions:
  - hard
  - nfsvers=4.1
  - rsize=1048576
  - wsize=1048576

AWS EBS PV

apiVersion: v1
kind: PersistentVolume
metadata:
  name: aws-ebs-pv
spec:
  capacity:
    storage: 20Gi
  accessModes:
  - ReadWriteOnce  # EBS only supports single node
  persistentVolumeReclaimPolicy: Delete
  awsElasticBlockStore:
    volumeID: vol-0123456789abcdef0
    fsType: ext4
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: failure-domain.beta.kubernetes.io/zone
          operator: In
          values:
          - us-west-2a  # EBS volumes are zone-specific

Google Persistent Disk PV

apiVersion: v1
kind: PersistentVolume
metadata:
  name: gcp-disk-pv
spec:
  capacity:
    storage: 50Gi
  accessModes:
  - ReadWriteOnce
  persistentVolumeReclaimPolicy: Delete
  gcePersistentDisk:
    pdName: my-data-disk
    fsType: ext4
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: failure-domain.beta.kubernetes.io/zone
          operator: In
          values:
          - us-central1-a

Persistent Volume Claims (PVC) Deep Dive

What PVCs Actually Do

A Persistent Volume Claim (PVC) is a user's request for storage. It's like going to a restaurant and saying "I want a table for 4 people" - you're not specifying which exact table, just your requirements. Kubernetes finds a PV that matches your requirements and "binds" it to your PVC.

PVC as Storage Request:
- "I need 10GB of storage"
- "I need ReadWriteOnce access"
- "I need fast SSD storage class"
- "Find me a PV that matches these requirements"

Binding Process:
1. User creates PVC - Specifies storage requirements
2. Kubernetes searches - Looks for PV that matches requirements
3. Binding occurs - PVC gets connected to suitable PV
4. Pod uses PVC - Pod mounts the claimed storage
5. Exclusive relationship - One PVC can only bind to one PV

PVC Binding Requirements

For a PVC to bind to a PV, ALL of these must match:

Storage Capacity - PV must have at least as much capacity as requested

# PVC requests 10Gi
spec:
  resources:
    requests:
      storage: 10Gi

# PV has 20Gi - MATCH (PV has enough)
spec:
  capacity:
    storage: 20Gi

# PV has 5Gi - NO MATCH (PV too small)

Access Modes - PV must support requested access mode

# PVC wants ReadWriteOnce
spec:
  accessModes:
  - ReadWriteOnce

# PV supports ReadWriteOnce - MATCH
spec:
  accessModes:
  - ReadWriteOnce

# PV only supports ReadOnlyMany - NO MATCH

Storage Class - Must match (or both empty)

# PVC requests specific storage class
spec:
  storageClassName: fast-ssd

# PV has same storage class - MATCH
spec:
  storageClassName: fast-ssd

# PV has different storage class - NO MATCH

Node Affinity - Pod's node must be able to access the PV

# PV only available on specific node
spec:
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - worker-node-1

# Pod scheduled on worker-node-1 - MATCH
# Pod scheduled on worker-node-2 - NO MATCH

Static vs Dynamic Provisioning

Static Provisioning - Pre-Created PVs

How it works: Administrator creates PVs in advance, users claim them with PVCs.

Example Workflow:

# 1. Admin creates PV
apiVersion: v1
kind: PersistentVolume
metadata:
  name: manual-pv-1
spec:
  capacity:
    storage: 10Gi
  accessModes:
  - ReadWriteOnce
  nfs:
    server: nfs-server.company.com
    path: /exports/pv-1
---
# 2. User creates PVC
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: my-storage-claim
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
# 3. Kubernetes binds PVC to PV automatically

When to use static provisioning:
- Small clusters with predictable storage needs
- Using storage systems that don't support dynamic provisioning
- Need specific configuration or pre-created storage
- Maximum control over storage allocation

Dynamic Provisioning - Automatic PV Creation

How it works: When PVC is created, Kubernetes automatically creates a matching PV using a StorageClass.

Example Workflow:

# 1. User creates PVC with storageClassName
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: dynamic-claim
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 20Gi
  storageClassName: aws-ebs-gp2  # Tells K8s how to create PV

# 2. Kubernetes automatically creates PV using StorageClass
# 3. PVC gets bound to the new PV
# 4. Underlying storage (AWS EBS volume) gets created

Benefits of dynamic provisioning:
- No pre-planning required
- Storage created on-demand
- Right-sized storage (no waste)
- Scales automatically

PVC Examples

Basic Application Storage

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: webapp-storage
  namespace: production
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 50Gi
  storageClassName: fast-ssd
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: webapp
spec:
  replicas: 1  # Only 1 replica because RWO
  template:
    spec:
      containers:
      - name: app
        image: myapp:latest
        volumeMounts:
        - name: app-data
          mountPath: /var/lib/app
      volumes:
      - name: app-data
        persistentVolumeClaim:
          claimName: webapp-storage

Database Storage with Specific Requirements

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-storage
  namespace: database
  labels:
    app: postgres
    tier: database
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 100Gi
  storageClassName: high-iops-ssd
  selector:
    matchLabels:
      type: database-storage  # Only bind to PVs with this label
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
spec:
  serviceName: postgres
  replicas: 1
  template:
    spec:
      containers:
      - name: postgres
        image: postgres:14
        env:
        - name: POSTGRES_DB
          value: myapp
        - name: PGDATA
          value: /var/lib/postgresql/data/pgdata
        volumeMounts:
        - name: postgres-data
          mountPath: /var/lib/postgresql/data
      volumes:
      - name: postgres-data
        persistentVolumeClaim:
          claimName: postgres-storage

Shared Storage for Multiple Pods

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: shared-files
  namespace: production
spec:
  accessModes:
  - ReadWriteMany  # Multiple pods can mount
  resources:
    requests:
      storage: 200Gi
  storageClassName: nfs-storage
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: file-processors
spec:
  replicas: 3  # Multiple replicas can share RWX volume
  template:
    spec:
      containers:
      - name: processor
        image: file-processor:latest
        volumeMounts:
        - name: shared-data
          mountPath: /shared
      volumes:
      - name: shared-data
        persistentVolumeClaim:
          claimName: shared-files

Storage Classes Deep Dive

What Storage Classes Actually Do

A StorageClass is like a "storage profile" that defines how to dynamically create storage when needed. It's a template that tells Kubernetes:
- What storage technology to use (AWS EBS, GCP Persistent Disk, etc.)
- What parameters to use (disk type, replication, encryption, etc.)
- How fast or durable the storage should be

StorageClass as Storage Menu:
Think of StorageClasses like a restaurant menu:
- "fast-ssd" = Premium option, expensive but fast
- "standard-disk" = Regular option, balanced price/performance
- "slow-archive" = Budget option, slow but cheap

Dynamic Provisioning Process

Complete Dynamic Provisioning Flow:
1. User creates PVC with storageClassName: fast-ssd
2. Kubernetes finds StorageClass named "fast-ssd"
3. StorageClass calls provisioner (e.g., AWS EBS CSI driver)
4. Provisioner creates actual storage (e.g., creates AWS EBS volume)
5. Kubernetes creates PV representing the new storage
6. PVC binds to new PV automatically
7. Pod can mount the storage through PVC

Example with real AWS resources:

# 1. StorageClass defines how to create storage
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: aws-ebs-gp3
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  iops: "3000"
  throughput: "125"
  encrypted: "true"
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
---
# 2. PVC requests storage using this class
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: my-app-storage
spec:
  accessModes: [ReadWriteOnce]
  resources:
    requests:
      storage: 50Gi
  storageClassName: aws-ebs-gp3

# 3. Result: AWS EBS GP3 volume gets created automatically
# - Type: gp3
# - Size: 50GB
# - IOPS: 3000
# - Throughput: 125 MiB/s
# - Encrypted: Yes

StorageClass Parameters by Provider

AWS EBS StorageClass Examples

# High Performance SSD
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: aws-ebs-io1-high-perf
provisioner: ebs.csi.aws.com
parameters:
  type: io1          # Provisioned IOPS SSD
  iopsPerGB: "50"    # 50 IOPS per GB
  encrypted: "true"
  kmsKeyId: "arn:aws:kms:us-west-2:123456789:key/12345678-1234-1234-1234-123456789012"
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
---
# Balanced Performance
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: aws-ebs-gp3-standard
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  iops: "3000"       # Baseline IOPS
  throughput: "125"  # MiB/s
  encrypted: "true"
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
---
# Budget Option
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: aws-ebs-gp2-budget
provisioner: ebs.csi.aws.com
parameters:
  type: gp2          # Previous generation, cheaper
  encrypted: "false"
reclaimPolicy: Delete
volumeBindingMode: Immediate
allowVolumeExpansion: true

Google Cloud StorageClass Examples

# High Performance Regional SSD
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: gcp-ssd-regional
provisioner: pd.csi.storage.gke.io
parameters:
  type: pd-ssd
  replication-type: regional-pd  # Replicated across zones
  zones: us-central1-a,us-central1-b,us-central1-c
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
---
# Standard Persistent Disk
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: gcp-standard
provisioner: pd.csi.storage.gke.io
parameters:
  type: pd-standard  # Traditional spinning disks
  replication-type: none
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true

Azure StorageClass Examples

# Premium SSD
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: azure-premium-ssd
provisioner: disk.csi.azure.com
parameters:
  skuName: Premium_LRS  # Premium Locally Redundant Storage
  kind: Managed
  cachingmode: ReadOnly
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
---
# Standard HDD
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: azure-standard-hdd
provisioner: disk.csi.azure.com
parameters:
  skuName: Standard_LRS
  kind: Managed
reclaimPolicy: Delete
volumeBindingMode: Immediate
allowVolumeExpansion: true

NFS StorageClass Example

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: nfs-client
provisioner: k8s-sigs.io/nfs-subdir-external-provisioner
parameters:
  server: nfs-server.company.com
  path: /exports/kubernetes
  archiveOnDelete: "true"  # Move to archive instead of delete
reclaimPolicy: Delete
volumeBindingMode: Immediate
allowVolumeExpansion: false  # NFS doesn't support expansion

Volume Binding Modes

Immediate Binding

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: immediate-binding
provisioner: ebs.csi.aws.com
volumeBindingMode: Immediate  # Create volume as soon as PVC is created
# Pros: Fast PVC binding
# Cons: Volume might be created in wrong availability zone

Immediate binding process:
1. PVC created → Storage provisioned immediately
2. Pod scheduled → Might be in different zone than storage
3. Pod fails to start → Storage and pod in different zones

WaitForFirstConsumer Binding

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: wait-for-consumer
provisioner: ebs.csi.aws.com
volumeBindingMode: WaitForFirstConsumer  # Wait for pod to be scheduled
# Pros: Storage created in same zone as pod
# Cons: Slightly slower pod startup

WaitForFirstConsumer process:
1. PVC created → Storage NOT provisioned yet
2. Pod scheduled → Kubernetes knows which zone pod will run in
3. Storage provisioned → Created in same zone as pod
4. Pod starts successfully → Storage and pod in same zone

Default StorageClass

Setting Default StorageClass:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: default-storage
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"  # Makes this default
provisioner: ebs.csi.aws.com
parameters:
  type: gp3

How Default Works:

# PVC without storageClassName
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: auto-storage
spec:
  accessModes: [ReadWriteOnce]
  resources:
    requests:
      storage: 10Gi
  # No storageClassName specified - uses default StorageClass

StatefulSets Storage Deep Dive

Why StatefulSets Need Special Storage

StatefulSets are designed for applications that need:
- Stable storage identity - Each pod gets its own persistent storage
- Ordered deployment - Pods start and stop in sequence
- Stable network identity - Predictable DNS names

Problem with Deployments and Storage:

# This DOESN'T WORK for databases
apiVersion: apps/v1
kind: Deployment
metadata:
  name: database
spec:
  replicas: 3  # 3 database pods
  template:
    spec:
      volumes:
      - name: db-data
        persistentVolumeClaim:
          claimName: shared-db-storage  # ALL pods share same storage!

Problems:
- All 3 database pods write to same storage = corruption
- Pods have random names = can't identify which is primary
- Pods can start in any order = split-brain scenarios

StatefulSet Storage Architecture

How StatefulSets Solve Storage:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: database
spec:
  serviceName: database
  replicas: 3
  volumeClaimTemplates:  # Creates separate PVC for each pod
  - metadata:
      name: db-data
    spec:
      accessModes: [ReadWriteOnce]
      resources:
        requests:
          storage: 100Gi
      storageClassName: fast-ssd
  template:
    spec:
      containers:
      - name: postgres
        image: postgres:14
        volumeMounts:
        - name: db-data
          mountPath: /var/lib/postgresql/data

What StatefulSet Creates:

database-0 pod → db-data-database-0 PVC → PV-1 → Storage-1
database-1 pod → db-data-database-1 PVC → PV-2 → Storage-2  
database-2 pod → db-data-database-2 PVC → PV-3 → Storage-3

Storage Identity Properties:
- Each pod gets unique storage - database-0 has different storage than database-1
- Storage survives pod restart - If database-0 pod dies, new database-0 pod gets same storage
- Ordered scaling - database-1 won't start until database-0 is ready
- Stable storage names - db-data-database-0 PVC name never changes

VolumeClaimTemplates Explained

VolumeClaimTemplate is like a PVC factory - it creates a new PVC for each StatefulSet pod using the same template.

Template Expansion:

# StatefulSet has this template:
volumeClaimTemplates:
- metadata:
    name: data-volume
  spec:
    accessModes: [ReadWriteOnce]
    resources:
      requests:
        storage: 50Gi

# Kubernetes creates these actual PVCs:
# data-volume-myapp-0 (for pod myapp-0)
# data-volume-myapp-1 (for pod myapp-1)  
# data-volume-myapp-2 (for pod myapp-2)

PVC Naming Pattern:

{volumeClaimTemplate.name}-{statefulset.name}-{ordinal}

Examples:
data-volume-postgres-0
data-volume-postgres-1
logs-volume-elasticsearch-0
config-volume-kafka-2

StatefulSet Storage Examples

PostgreSQL Primary-Replica Cluster

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres-cluster
  namespace: database
spec:
  serviceName: postgres-cluster
  replicas: 3
  selector:
    matchLabels:
      app: postgres
  volumeClaimTemplates:
  # Main database storage
  - metadata:
      name: postgres-data
      labels:
        app: postgres
        type: data
    spec:
      accessModes: [ReadWriteOnce]
      resources:
        requests:
          storage: 200Gi
      storageClassName: high-iops-ssd
  # Write-Ahead Log storage (separate for performance)
  - metadata:
      name: postgres-wal
      labels:
        app: postgres
        type: wal
    spec:
      accessModes: [ReadWriteOnce]
      resources:
        requests:
          storage: 50Gi
      storageClassName: ultra-fast-ssd
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
      - name: postgres
        image: postgres:14
        env:
        - name: POSTGRES_DB
          value: myapp
        - name: POSTGRES_PASSWORD
          valueFrom:
            secretKeyRef:
              name: postgres-secret
              key: password
        - name: PGDATA
          value: /var/lib/postgresql/data/pgdata
        volumeMounts:
        - name: postgres-data
          mountPath: /var/lib/postgresql/data
        - name: postgres-wal
          mountPath: /var/lib/postgresql/wal
        # Configuration based on pod ordinal
        command:
        - sh
        - -c
        - |
          if [ "$HOSTNAME" = "postgres-cluster-0" ]; then
            # This is the primary
            echo "Starting as primary"
            postgres -c wal_level=replica -c max_wal_senders=3
          else
            # This is a replica
            echo "Starting as replica"
            pg_basebackup -h postgres-cluster-0 -D /var/lib/postgresql/data -U postgres -v -P -W
            postgres -c hot_standby=on
          fi

Elasticsearch Cluster with Different Storage Types

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: elasticsearch
  namespace: search
spec:
  serviceName: elasticsearch
  replicas: 3
  selector:
    matchLabels:
      app: elasticsearch
  volumeClaimTemplates:
  # Fast storage for Elasticsearch data
  - metadata:
      name: es-data
    spec:
      accessModes: [ReadWriteOnce]
      resources:
        requests:
          storage: 500Gi
      storageClassName: nvme-ssd
  # Separate storage for logs
  - metadata:
      name: es-logs
    spec:
      accessModes: [ReadWriteOnce]
      resources:
        requests:
          storage: 100Gi
      storageClassName: standard-ssd
  template:
    metadata:
      labels:
        app: elasticsearch
    spec:
      initContainers:
      # Set proper permissions
      - name: fix-permissions
        image: busybox
        command: ["sh", "-c", "chown -R 1000:1000 /usr/share/elasticsearch/data"]
        volumeMounts:
        - name: es-data
          mountPath: /usr/share/elasticsearch/data
      containers:
      - name: elasticsearch
        image: elasticsearch:8.5.0
        env:
        - name: cluster.name
          value: "elasticsearch-cluster"
        - name: node.name
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: discovery.seed_hosts
          value: "elasticsearch-0.elasticsearch,elasticsearch-1.elasticsearch,elasticsearch-2.elasticsearch"
        - name: cluster.initial_master_nodes
          value: "elasticsearch-0,elasticsearch-1,elasticsearch-2"
        - name: ES_JAVA_OPTS
          value: "-Xms2g -Xmx2g"
        volumeMounts:
        - name: es-data
          mountPath: /usr/share/elasticsearch/data
        - name: es-logs
          mountPath: /usr/share/elasticsearch/logs
        resources:
          requests:
            memory: 4Gi
            cpu: 1000m
          limits:
            memory: 4Gi
            cpu: 2000m

Kafka Cluster with Separate Log Storage

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: kafka
  namespace: messaging
spec:
  serviceName: kafka
  replicas: 3
  selector:
    matchLabels:
      app: kafka
  volumeClaimTemplates:
  # Kafka log segments (the actual message data)
  - metadata:
      name: kafka-logs
    spec:
      accessModes: [ReadWriteOnce]
      resources:
        requests:
          storage: 1Ti
      storageClassName: high-throughput-ssd
  # Kafka metadata and configuration
  - metadata:
      name: kafka-data
    spec:
      accessModes: [ReadWriteOnce]
      resources:
        requests:
          storage: 50Gi
      storageClassName: standard-ssd
  template:
    metadata:
      labels:
        app: kafka
    spec:
      containers:
      - name: kafka
        image: confluentinc/cp-kafka:7.0.1
        env:
        - name: KAFKA_BROKER_ID
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: KAFKA_ZOOKEEPER_CONNECT
          value: "zookeeper:2181"
        - name: KAFKA_ADVERTISED_LISTENERS
          value: "PLAINTEXT://$(POD_NAME).kafka:9092"
        - name: KAFKA_LOG_DIRS
          value: "/var/kafka-logs"
        - name: KAFKA_LOG_RETENTION_HOURS
          value: "168"  # 1 week
        - name: KAFKA_LOG_SEGMENT_BYTES
          value: "1073741824"  # 1GB segments
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        volumeMounts:
        - name: kafka-logs
          mountPath: /var/kafka-logs
        - name: kafka-data
          mountPath: /var/kafka-data
        ports:
        - containerPort: 9092
        resources:
          requests:
            memory: 2Gi
            cpu: 500m
          limits:
            memory: 4Gi
            cpu: 1000m

StatefulSet Storage Scaling

Scaling Up (Adding Pods)

# Scale from 3 to 5 replicas
kubectl scale statefulset postgres-cluster --replicas=5

# What happens:
# 1. postgres-cluster-3 pod created with new PVC: postgres-data-postgres-cluster-3
# 2. postgres-cluster-4 pod created with new PVC: postgres-data-postgres-cluster-4
# 3. New pods start in order: 3 then 4

Scaling Down (Removing Pods)

# Scale from 5 to 3 replicas
kubectl scale statefulset postgres-cluster --replicas=3

# What happens:
# 1. postgres-cluster-4 pod deleted (highest ordinal first)
# 2. postgres-cluster-3 pod deleted
# 3. PVCs postgres-data-postgres-cluster-3 and postgres-data-postgres-cluster-4 remain!
#    (Data is preserved even though pods are gone)

Important: PVCs are NOT automatically deleted when scaling down StatefulSets. This prevents accidental data loss.

Manual PVC Cleanup

# After scaling down, manually delete PVCs if you want to free storage
kubectl delete pvc postgres-data-postgres-cluster-3
kubectl delete pvc postgres-data-postgres-cluster-4

# Or delete all PVCs for a StatefulSet (DANGEROUS!)
kubectl delete pvc -l app=postgres

Volume Types Deep Dive

emptyDir - Temporary Shared Storage

What emptyDir Actually Is:
- Temporary directory created when pod starts
- Shared between all containers in the pod
- Deleted when pod is removed from node
- Can be stored on disk or in memory (tmpfs)

emptyDir Storage Location:

Default: /var/lib/kubelet/pods/{pod-uid}/volumes/kubernetes.io~empty-dir/{volume-name}
Memory: tmpfs mounted in RAM

Disk-Based emptyDir

apiVersion: v1
kind: Pod
metadata:
  name: shared-storage-pod
spec:
  containers:
  - name: writer
    image: busybox
    command: ["sh", "-c", "while true; do echo $(date) >> /shared/log.txt; sleep 5; done"]
    volumeMounts:
    - name: shared-data
      mountPath: /shared
  - name: reader
    image: busybox
    command: ["sh", "-c", "while true; do tail -f /shared/log.txt; done"]
    volumeMounts:
    - name: shared-data
      mountPath: /shared
  volumes:
  - name: shared-data
    emptyDir: {}  # Stored on node's disk

Memory-Based emptyDir (tmpfs)

apiVersion: v1
kind: Pod
metadata:
  name: memory-cache-pod
spec:
  containers:
  - name: app
    image: myapp:latest
    volumeMounts:
    - name: cache-volume
      mountPath: /tmp/cache
  volumes:
  - name: cache-volume
    emptyDir:
      medium: Memory  # Stored in RAM
      sizeLimit: 1Gi  # Limit memory usage

When to use emptyDir:
- Temporary file processing
- Cache that doesn't need to persist
- Communication between containers in same pod
- Scratch space for applications

hostPath - Direct Node Access

What hostPath Does:
Mounts a file or directory from the node's filesystem directly into the pod. The data persists on the node even when pods are deleted.

Security Warning: hostPath gives pods direct access to the node's filesystem, which can be a security risk.

Basic hostPath Examples

apiVersion: v1
kind: Pod
metadata:
  name: hostpath-pod
spec:
  containers:
  - name: app
    image: busybox
    command: ["sleep", "3600"]
    volumeMounts:
    - name: host-data
      mountPath: /host-data
    - name: host-logs
      mountPath: /var/log/host
  volumes:
  # Mount directory from node
  - name: host-data
    hostPath:
      path: /data/app-storage
      type: DirectoryOrCreate  # Create directory if it doesn't exist
  # Mount existing file from node
  - name: host-logs
    hostPath:
      path: /var/log/messages
      type: File  # Must be an existing file

hostPath Types

volumes:
- name: example-volume
  hostPath:
    path: /path/on/node
    type: Directory         # Must be existing directory
    # type: DirectoryOrCreate # Create directory if missing
    # type: File              # Must be existing file
    # type: FileOrCreate      # Create file if missing
    # type: Socket            # Must be existing Unix socket
    # type: CharDevice        # Must be existing character device
    # type: BlockDevice       # Must be existing block device

Common hostPath Use Cases

Docker Socket Access (for Docker-in-Docker):

apiVersion: v1
kind: Pod
metadata:
  name: docker-client
spec:
  containers:
  - name: docker
    image: docker:20.10
    command: ["docker", "ps"]  # List containers on node
    volumeMounts:
    - name: docker-socket
      mountPath: /var/run/docker.sock
  volumes:
  - name: docker-socket
    hostPath:
      path: /var/run/docker.sock
      type: Socket

System Monitoring (Access to /proc and /sys):

apiVersion: v1
kind: Pod
metadata:
  name: node-monitor
spec:
  containers:
  - name: monitor
    image: monitoring-agent:latest
    volumeMounts:
    - name: proc
      mountPath: /host/proc
      readOnly: true
    - name: sys
      mountPath: /host/sys
      readOnly: true
  volumes:
  - name: proc
    hostPath:
      path: /proc
      type: Directory
  - name: sys
    hostPath:
      path: /sys
      type: Directory

ConfigMap and Secret Volumes

ConfigMap Volumes

How ConfigMap volumes work: Mount configuration data as files in the pod.

# Create ConfigMap with configuration files
apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
data:
  app.properties: |
    database.host=postgres.database.svc.cluster.local
    database.port=5432
    log.level=INFO
  nginx.conf: |
    server {
        listen 80;
        location / {
            proxy_pass http://backend:8080;
        }
    }
---
apiVersion: v1
kind: Pod
metadata:
  name: app-with-config
spec:
  containers:
  - name: app
    image: myapp:latest
    volumeMounts:
    - name: config-volume
      mountPath: /etc/config
  volumes:
  - name: config-volume
    configMap:
      name: app-config
      items:  # Optional: map specific keys to specific paths
      - key: app.properties
        path: application.properties
      - key: nginx.conf
        path: nginx/nginx.conf

Result in pod:

/etc/config/
├── application.properties  (content from app.properties key)
└── nginx/
    └── nginx.conf         (content from nginx.conf key)

Secret Volumes

How Secret volumes work: Mount sensitive data as files, with additional security features.

# Create Secret with sensitive data
apiVersion: v1
kind: Secret
metadata:
  name: app-secrets
type: Opaque
data:
  database-password: cGFzc3dvcmQxMjM=  # base64 encoded "password123"
  api-key: YWJjZGVmZ2hpams=            # base64 encoded "abcdefghijk"
stringData:
  ssl-cert.pem: |
    -----BEGIN CERTIFICATE-----
    MIIDXTCCAkWgAwIBAgIJAKlwmMhJlJb...
    -----END CERTIFICATE-----
---
apiVersion: v1
kind: Pod
metadata:
  name: app-with-secrets
spec:
  containers:
  - name: app
    image: myapp:latest
    volumeMounts:
    - name: secret-volume
      mountPath: /etc/secrets
      readOnly: true  # Secrets should be read-only
  volumes:
  - name: secret-volume
    secret:
      secretName: app-secrets
      defaultMode: 0400  # Read-only for owner only
      items:
      - key: database-password
        path: db-password
        mode: 0400
      - key: ssl-cert.pem
        path: ssl/cert.pem
        mode: 0444

Result in pod:

/etc/secrets/
├── db-password     (contains "password123", mode 0400)
└── ssl/
    └── cert.pem    (contains SSL certificate, mode 0444)

Network Storage Types

NFS (Network File System)

apiVersion: v1
kind: PersistentVolume
metadata:
  name: nfs-pv
spec:
  capacity:
    storage: 100Gi
  accessModes:
  - ReadWriteMany  # NFS supports multiple writers
  persistentVolumeReclaimPolicy: Retain
  nfs:
    server: nfs-server.company.com
    path: /exports/shared-data
  mountOptions:
  - hard           # Hard mount (retries on failure)
  - nfsvers=4.1    # Use NFSv4.1
  - rsize=1048576  # Read buffer size
  - wsize=1048576  # Write buffer size
  - timeo=600      # Timeout in deciseconds (60 seconds)
  - retrans=2      # Number of retries

CephFS (Ceph File System)

apiVersion: v1
kind: PersistentVolume
metadata:
  name: cephfs-pv
spec:
  capacity:
    storage: 200Gi
  accessModes:
  - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  cephfs:
    monitors:
    - 192.168.1.10:6789
    - 192.168.1.11:6789
    - 192.168.1.12:6789
    path: /kubernetes-volumes
    user: admin
    secretRef:
      name: ceph-secret
    readOnly: false

iSCSI Storage

apiVersion: v1
kind: PersistentVolume
metadata:
  name: iscsi-pv
spec:
  capacity:
    storage: 50Gi
  accessModes:
  - ReadWriteOnce  # iSCSI typically single-writer
  persistentVolumeReclaimPolicy: Retain
  iscsi:
    targetPortal: 192.168.1.100:3260
    iqn: iqn.2019-04.com.company:storage.target01
    lun: 0
    fsType: ext4
    readOnly: false
    chapAuthDiscovery: true
    chapAuthSession: true
    secretRef:
      name: iscsi-chap-secret

Volume Snapshots Deep Dive

What Volume Snapshots Actually Are

Volume Snapshots are point-in-time copies of persistent volumes. Think of them like "save points" in a video game - you can create a snapshot of your data and restore back to that exact state later.

Snapshot vs Backup:
- Snapshot - Point-in-time copy, usually stored on same storage system
- Backup - Copy moved to different storage system/location

Snapshot API Objects

VolumeSnapshotClass

Defines how snapshots are created:

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
  name: aws-ebs-snapshot-class
driver: ebs.csi.aws.com  # CSI driver that handles snapshots
deletionPolicy: Delete   # Delete snapshot when VolumeSnapshot is deleted
parameters:
  encrypted: "true"      # Driver-specific parameters

VolumeSnapshot

Request to create a snapshot:

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: database-snapshot-20240101
  namespace: production
spec:
  volumeSnapshotClassName: aws-ebs-snapshot-class
  source:
    persistentVolumeClaimName: postgres-data-pvc  # PVC to snapshot

VolumeSnapshotContent

Represents the actual snapshot (like PV for snapshots):

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotContent
metadata:
  name: snapcontent-12345
spec:
  deletionPolicy: Delete
  driver: ebs.csi.aws.com
  source:
    snapshotHandle: snap-0123456789abcdef0  # Actual cloud snapshot ID
  volumeSnapshotRef:
    name: database-snapshot-20240101
    namespace: production

Snapshot Workflow Examples

Manual Database Backup

# 1. Create snapshot before maintenance
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: postgres-before-upgrade
  namespace: database
spec:
  volumeSnapshotClassName: fast-snapshot-class
  source:
    persistentVolumeClaimName: postgres-data-postgres-0
---
# 2. Wait for snapshot to be ready
# kubectl wait --for=condition=ReadyToUse volumesnapshot/postgres-before-upgrade --timeout=300s

# 3. Perform maintenance/upgrade
# kubectl apply -f new-postgres-version.yaml

# 4. If something goes wrong, restore from snapshot
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-restored-data
  namespace: database
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 100Gi
  storageClassName: fast-ssd
  dataSource:
    name: postgres-before-upgrade  # Restore from snapshot
    kind: VolumeSnapshot
    apiGroup: snapshot.storage.k8s.io

Automated Backup with CronJob

apiVersion: batch/v1
kind: CronJob
metadata:
  name: database-backup
  namespace: database
spec:
  schedule: "0 2 * * *"  # Daily at 2 AM
  jobTemplate:
    spec:
      template:
        spec:
          serviceAccount: snapshot-creator
          containers:
          - name: backup
            image: kubectl:latest
            command:
            - sh
            - -c
            - |
              TIMESTAMP=$(date +%Y%m%d-%H%M%S)
              SNAPSHOT_NAME="postgres-backup-${TIMESTAMP}"

              # Create snapshot
              kubectl apply -f - <<EOF
              apiVersion: snapshot.storage.k8s.io/v1
              kind: VolumeSnapshot
              metadata:
                name: ${SNAPSHOT_NAME}
                namespace: database
                labels:
                  backup-type: automated
                  backup-date: $(date +%Y-%m-%d)
              spec:
                volumeSnapshotClassName: aws-ebs-snapshot-class
                source:
                  persistentVolumeClaimName: postgres-data-postgres-0
              EOF

              # Wait for snapshot to complete
              kubectl wait --for=condition=ReadyToUse \
                volumesnapshot/${SNAPSHOT_NAME} --timeout=600s

              # Clean up old snapshots (keep last 7 days)
              kubectl get volumesnapshot -l backup-type=automated \
                --sort-by=.metadata.creationTimestamp \
                -o name | head -n -7 | xargs -r kubectl delete
          restartPolicy: OnFailure
---
# RBAC for snapshot creation
apiVersion: v1
kind: ServiceAccount
metadata:
  name: snapshot-creator
  namespace: database
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: snapshot-manager
  namespace: database
rules:
- apiGroups: ["snapshot.storage.k8s.io"]
  resources: ["volumesnapshots"]
  verbs: ["create", "get", "list", "delete"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: snapshot-creator-binding
  namespace: database
subjects:
- kind: ServiceAccount
  name: snapshot-creator
  namespace: database
roleRef:
  kind: Role
  name: snapshot-manager
  apiGroup: rbac.authorization.k8s.io

Cross-Namespace Restore

# Snapshot in production namespace
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: production-data-snapshot
  namespace: production
spec:
  volumeSnapshotClassName: aws-ebs-snapshot-class
  source:
    persistentVolumeClaimName: app-data-pvc
---
# Restore in staging namespace for testing
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: staging-data-from-prod
  namespace: staging
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 100Gi
  storageClassName: standard-ssd  # Can use different storage class
  dataSource:
    name: production-data-snapshot
    kind: VolumeSnapshot
    apiGroup: snapshot.storage.k8s.io
    namespace: production  # Cross-namespace reference

CSI Driver Support

AWS EBS CSI Driver Snapshots

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: ebs-with-snapshots
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  encrypted: "true"
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
---
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
  name: ebs-snapshot-class
driver: ebs.csi.aws.com
deletionPolicy: Delete
parameters:
  encrypted: "true"
  # AWS-specific: copy snapshot to different region
  # sourceRegion: us-west-2
  # destinationRegion: us-east-1

Google Cloud Persistent Disk Snapshots

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
  name: gcp-snapshot-class
driver: pd.csi.storage.gke.io
deletionPolicy: Retain  # Keep snapshots even if VolumeSnapshot is deleted
parameters:
  storage-locations: us-central1  # Store snapshot in specific region
  snapshot-type: regional        # Regional snapshot for durability

Azure Disk Snapshots

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
  name: azure-snapshot-class
driver: disk.csi.azure.com
deletionPolicy: Delete
parameters:
  incremental: "true"  # Use incremental snapshots to save space

Key Concepts Summary

Persistent Volumes (PV) - Cluster resources representing actual storage with capacity, access modes, and connection details
Persistent Volume Claims (PVC) - User requests for storage that get bound to matching PVs
Storage Classes - Templates for dynamic storage provisioning with cloud provider integration
Static Provisioning - Admin pre-creates PVs, users claim them with PVCs
Dynamic Provisioning - Storage automatically created when PVC is created using StorageClass
StatefulSets - Workloads requiring stable storage identity with VolumeClaimTemplates
Access Modes - RWO (single node), ROX (multiple read-only), RWX (multiple read-write), RWOP (single pod)
Volume Types - emptyDir (temporary), hostPath (node access), ConfigMap/Secret (configuration), network storage (NFS, iSCSI)
Volume Snapshots - Point-in-time copies for backup and restore operations

Best Practices / Tips

Choose appropriate access modes - Use RWO for most applications, RWX only when truly needed
Set reclaim policies carefully - Use Retain for important data, Delete for temporary storage
Use StorageClasses for dynamic provisioning - More flexible than static PVs
Plan storage capacity - Include growth projections and snapshot space
Implement backup strategy - Use volume snapshots or external backup tools
Monitor storage performance - Watch IOPS, throughput, and latency metrics
Use appropriate storage types - Match storage performance to application needs
Configure volume binding mode - Use WaitForFirstConsumer for zone-aware scheduling
Set resource limits - Prevent storage exhaustion with quotas and limits
Document storage architecture - Maintain clear documentation of storage design and dependencies

Common Issues / Troubleshooting

Problem 1: PVC Stuck in Pending

Symptom: PVC remains in "Pending" status indefinitely
Cause: No PV matches requirements, or storage provisioning failed
Solution: Check PV availability, StorageClass configuration, and node affinity

# Check PVC status and events
kubectl describe pvc my-pvc

# Check available PVs
kubectl get pv

# Check StorageClass
kubectl describe storageclass my-storage-class

# Check CSI driver pods
kubectl get pods -n kube-system | grep csi

Problem 2: Pod Can't Mount Volume

Symptom: Pod stuck in "ContainerCreating" with volume mount errors
Cause: Storage not accessible from node, permission issues, or driver problems
Solution: Check node affinity, CSI driver status, and storage permissions

# Check pod events
kubectl describe pod my-pod

# Check node where pod is scheduled
kubectl get pod my-pod -o wide

# Check if storage is in same zone as node
kubectl get node NODE_NAME --show-labels

Problem 3: Volume Snapshot Failing

Symptom: VolumeSnapshot stuck in "Pending" or "Error" state
Cause: CSI driver doesn't support snapshots, or cloud provider issues
Solution: Verify snapshot support and check CSI driver logs

# Check snapshot status
kubectl describe volumesnapshot my-snapshot

# Check if CSI driver supports snapshots
kubectl get csidriver -o yaml

# Check snapshot controller
kubectl get pods -n kube-system | grep snapshot

Problem 4: StatefulSet Volume Not Persistent

Symptom: StatefulSet pod data lost when pod restarts
Cause: Using emptyDir instead of PVC, or PVC not properly configured
Solution: Verify VolumeClaimTemplates and PVC binding

# Check StatefulSet volume configuration
kubectl describe statefulset my-statefulset

# Check PVCs created by StatefulSet
kubectl get pvc -l app=my-statefulset

# Verify PVC is bound
kubectl describe pvc my-pvc

Problem 5: Performance Issues

Symptom: Slow storage I/O affecting application performance
Cause: Wrong storage type, insufficient IOPS, or network bottlenecks
Solution: Use appropriate storage class and monitor storage metrics

# Check storage class parameters
kubectl describe storageclass my-storage-class

# Monitor pod resource usage
kubectl top pod my-pod

# Check node disk usage
kubectl exec -it my-pod -- df -h

References / Further Reading

Last updated: 2025-08-26 20:00 UTC