alexsusanu@docs:Kubernetes Services and Networking: Deep Dive $
alexsusanu@docs
:~$ cat Kubernetes Services and Networking: Deep Dive.md

HomeNOTES → Kubernetes Services and Networking: Deep Dive

Kubernetes Services and Networking: Deep Dive

category: Kubernetes Certification
tags: cka, kubernetes, exam, kubectl, certification

Table of Contents

  1. The Fundamental Problem: Why Services Exist

  2. Service Types Deep Dive

  3. Ingress Controllers and Routing

  4. Network Policies for Security

  5. DNS Resolution in Kubernetes

  6. Real-World Architecture Examples


The Fundamental Problem: Why Services Exist {#the-fundamental-problem}

The Core Challenge

In Kubernetes, Pods are ephemeral and mortal. They can be:

  • Killed and recreated by deployments
  • Moved between nodes
  • Scaled up/down dynamically
  • Assigned different IP addresses each time

Without Services, your application architecture would be chaos:

# Imagine trying to connect to a database without Services
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: web
        image: nginx
        env:
        - name: DB_HOST
          value: "10.244.1.5"  # ❌ WRONG! This IP changes when Pod restarts

The Service Abstraction solves this by:

  1. Providing stable endpoints - A consistent IP/DNS name
  2. Load balancing - Distributing traffic across healthy Pods
  3. Service discovery - Automatic registration/deregistration of Pods
  4. Health checking - Only routing to ready Pods

Service Types Deep Dive {#service-types}

1. ClusterIP Service

What it is: Internal-only service accessible within the cluster.

Why use it:

  • Microservice communication - Services talking to each other
  • Database access - Applications connecting to databases
  • Internal APIs - Backend services not exposed to outside world

Deep Example - Microservice Architecture:

# Database Service (ClusterIP)
apiVersion: v1
kind: Service
metadata:
  name: postgres-service
  namespace: production
spec:
  type: ClusterIP  # Default type
  selector:
    app: postgres
    tier: database
  ports:
  - port: 5432        # Service port (what clients connect to)
    targetPort: 5432  # Pod port (where container listens)
    protocol: TCP
---
# Application using the database
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  template:
    spec:
      containers:
      - name: app
        image: myapp:v1.0
        env:
        - name: DATABASE_URL
          # ✅ CORRECT: Uses service name, not Pod IP
          value: "postgresql://postgres-service.production.svc.cluster.local:5432/mydb"

Why this works:

  • Service postgres-service gets a stable cluster IP (e.g., 10.96.45.123)
  • DNS name postgres-service.production.svc.cluster.local always resolves to this IP
  • When database Pods restart, Service automatically updates its endpoint list
  • Applications never need to know actual Pod IPs

Traffic Flow:

Web App Pod → ClusterIP Service → iptables rules → Healthy Database Pod

2. NodePort Service

What it is: Exposes service on each node's IP at a static port (30000-32767).

Why use it:

  • Development/Testing - Quick external access without load balancer
  • Legacy integration - Systems that need direct node access
  • Cost optimization - Avoid cloud load balancer costs in non-production

Deep Example - External API Access:

apiVersion: v1
kind: Service
metadata:
  name: api-nodeport
spec:
  type: NodePort
  selector:
    app: api-server
  ports:
  - port: 80          # ClusterIP port
    targetPort: 8080  # Pod port  
    nodePort: 30080   # External port on each node (optional, auto-assigned if omitted)

Why this architecture:

External Client → Node IP:30080 → ClusterIP:80 → Pod:8080
     ↓
Any Node in Cluster (even if Pod isn't running on that node)

Real-world considerations:

  • Security risk: Opens ports on all nodes
  • No load balancing: Client must implement failover between nodes
  • Port management: Limited port range, potential conflicts

Better alternative for production:

# Use with external load balancer pointing to NodePort
apiVersion: v1
kind: Service
metadata:
  name: api-nodeport
  annotations:
    # Cloud provider creates external LB pointing to NodePort
    service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
spec:
  type: NodePort
  # External LB → NodePort → Pods

3. LoadBalancer Service

What it is: Cloud provider provisions external load balancer pointing to service.

Why use it:

  • Production web applications - Need external access with HA
  • API gateways - External clients accessing your services
  • Automatic cloud integration - Leverages cloud provider's load balancing

Deep Example - Production Web Service:

apiVersion: v1
kind: Service
metadata:
  name: web-loadbalancer
  annotations:
    # AWS-specific annotations
    service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
    service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
    service.beta.kubernetes.io/aws-load-balancer-ssl-cert: "arn:aws:acm:us-west-2:123456789:certificate/abc123"
    # Google Cloud annotations
    # cloud.google.com/load-balancer-type: "External"
spec:
  type: LoadBalancer
  selector:
    app: web-server
    tier: frontend
  ports:
  - name: https
    port: 443
    targetPort: 8443
    protocol: TCP
  - name: http
    port: 80
    targetPort: 8080
    protocol: TCP

Architecture flow:

Internet → Cloud Load Balancer → NodePort (auto-created) → ClusterIP → Pods
                ↓
        (External IP: 203.0.113.10)

Why LoadBalancer over NodePort:

  • High Availability: Cloud LB handles node failures
  • SSL Termination: Offload TLS processing from Pods
  • Health Checks: Cloud provider monitors node health
  • Geographic routing: Route to closest region
  • DDoS protection: Cloud provider's built-in protections

Cost consideration:

# Each LoadBalancer Service = One cloud load balancer = $$$ per month
# For multiple services, consider Ingress instead

Ingress Controllers and Routing {#ingress-controllers}

The Problem LoadBalancer Services Create

Scenario: You have 5 microservices that need external access

  • 5 LoadBalancer Services = 5 cloud load balancers = $$$$ per month
  • Each gets different external IP
  • No shared SSL certificates
  • No path-based routing

Why Ingress Exists

Ingress provides:

  • Single entry point - One load balancer for multiple services
  • Path-based routing - /api/users → user-service, /api/orders → order-service
  • Host-based routing - api.company.com → api-service, www.company.com → web-service
  • SSL termination - Centralized certificate management
  • Advanced routing - Headers, query parameters, weighted routing

Deep Example - Production E-commerce Platform

# Ingress Controller (nginx-ingress, istio, traefik, etc.)
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: ecommerce-ingress
  annotations:
    # Nginx-specific annotations
    nginx.ingress.kubernetes.io/rewrite-target: /
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/rate-limit: "100"
    # Certificate management
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
  tls:
  - hosts:
    - api.ecommerce.com
    - www.ecommerce.com
    secretName: ecommerce-tls
  rules:
  # API routing
  - host: api.ecommerce.com
    http:
      paths:
      - path: /users
        pathType: Prefix
        backend:
          service:
            name: user-service
            port:
              number: 80
      - path: /products  
        pathType: Prefix
        backend:
          service:
            name: product-service
            port:
              number: 80
      - path: /orders
        pathType: Prefix  
        backend:
          service:
            name: order-service
            port:
              number: 80
  # Frontend routing
  - host: www.ecommerce.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: frontend-service
            port:
              number: 80
---
# Backend Services (ClusterIP - internal only)
apiVersion: v1
kind: Service
metadata:
  name: user-service
spec:
  type: ClusterIP  # No external access needed
  selector:
    app: user-api
  ports:
  - port: 80
    targetPort: 8080

Traffic Flow Architecture

Internet → Cloud Load Balancer → Ingress Controller Pod → Service → Backend Pods
            (Single External IP)      ↓
                              Route based on:
                              - Host header
                              - URL path  
                              - HTTP headers
                              - Query parameters

Advanced Ingress Features

1. Canary Deployments:

# Route 10% of traffic to new version
apiVersion: networking.k8s.io/v1
kind: Ingress  
metadata:
  name: canary-ingress
  annotations:
    nginx.ingress.kubernetes.io/canary: "true"
    nginx.ingress.kubernetes.io/canary-weight: "10"
spec:
  rules:
  - host: api.company.com
    http:
      paths:
      - path: /
        backend:
          service:
            name: api-service-v2  # New version
            port:
              number: 80

2. Authentication/Authorization:

metadata:
  annotations:
    # OAuth2 authentication
    nginx.ingress.kubernetes.io/auth-url: "https://auth.company.com/oauth2/auth"
    nginx.ingress.kubernetes.io/auth-signin: "https://auth.company.com/oauth2/start"
    # Rate limiting per user
    nginx.ingress.kubernetes.io/rate-limit-rps: "10"

Why Choose Different Ingress Controllers

NGINX Ingress:

  • Best for: General-purpose web applications
  • Pros: Mature, well-documented, high performance
  • Cons: Limited advanced routing features

Istio Gateway:

  • Best for: Microservices with service mesh
  • Pros: Advanced traffic management, security, observability
  • Cons: Complex setup, resource-intensive

Traefik:

  • Best for: Dynamic environments, container-native
  • Pros: Automatic service discovery, built-in Let's Encrypt
  • Cons: Less enterprise features than others

Network Policies for Security {#network-policies}

The Security Problem

Default Kubernetes networking: All Pods can communicate with all other Pods across all namespaces.

# Without Network Policies - SECURITY RISK
frontend-pod → database-pod  ✅ (should be allowed)
frontend-pod → payment-pod   ✅ (should be allowed)  
frontend-pod → admin-pod     ✅ (should be BLOCKED!)
random-pod  → database-pod   ✅ (should be BLOCKED!)

Why Network Policies

Zero-trust networking:

  • Principle of least privilege - Only allow necessary communication
  • Namespace isolation - Prevent cross-namespace access
  • Compliance requirements - PCI-DSS, HIPAA, SOC2
  • Blast radius reduction - Limit impact of compromised Pods

Deep Example - Multi-tier Application Security

# Database tier - most restrictive
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: database-policy
  namespace: production
spec:
  podSelector:
    matchLabels:
      tier: database
  policyTypes:
  - Ingress
  - Egress
  ingress:
  # Only allow API tier to connect to database
  - from:
    - namespaceSelector:
        matchLabels:
          name: production
    - podSelector:
        matchLabels:
          tier: api
    ports:
    - protocol: TCP
      port: 5432
  egress:
  # Database can only connect to DNS and nowhere else
  - to: []
    ports:
    - protocol: UDP
      port: 53  # DNS
---
# API tier - moderate restrictions  
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: api-policy
  namespace: production
spec:
  podSelector:
    matchLabels:
      tier: api
  policyTypes:
  - Ingress
  - Egress
  ingress:
  # Allow frontend and ingress controller
  - from:
    - podSelector:
        matchLabels:
          tier: frontend
    - namespaceSelector:
        matchLabels:
          name: ingress-nginx
  egress:
  # API can connect to database and external services
  - to:
    - podSelector:
        matchLabels:
          tier: database
  - to: []  # External traffic (payment APIs, etc.)
    ports:
    - protocol: TCP
      port: 443
---
# Frontend tier - least restrictive
apiVersion: networking.k8s.io/v1  
kind: NetworkPolicy
metadata:
  name: frontend-policy
  namespace: production
spec:
  podSelector:
    matchLabels:
      tier: frontend
  policyTypes:
  - Egress
  egress:
  # Frontend can connect to API tier and external CDNs
  - to:
    - podSelector:
        matchLabels:
          tier: api
  - to: []  # CDNs, external APIs
    ports:
    - protocol: TCP
      port: 443

Advanced Network Policy Patterns

1. Namespace Isolation:

# Block all cross-namespace traffic by default
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-all-namespaces
  namespace: production
spec:
  podSelector: {}  # All pods in namespace
  policyTypes:
  - Ingress
  ingress:
  # Only allow traffic from same namespace
  - from:
    - namespaceSelector:
        matchLabels:
          name: production

2. Development Environment Isolation:

# Developers can't access production database
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy  
metadata:
  name: block-dev-to-prod
  namespace: production
spec:
  podSelector:
    matchLabels:
      tier: database
  policyTypes:
  - Ingress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          name: production  # Only production namespace
    # Explicitly deny development namespace
    - namespaceSelector:
        matchExpressions:
        - key: name
          operator: NotIn
          values: ["development", "staging"]

Why Network Policy Implementation Matters

CNI Plugin Requirements: Not all Container Network Interface (CNI) plugins support Network Policies:

  • Calico: ✅ Full support, high performance
  • Cilium: ✅ Advanced features (L7 policies, eBPF)
  • Weave: ✅ Basic support
  • Flannel: ❌ No support (common mistake!)

Testing Network Policies:

# Test connectivity between pods
kubectl exec -it frontend-pod -- nc -zv database-service 5432

# Should fail if policy is working:
# nc: database-service (10.96.45.123:5432): Connection refused

DNS Resolution in Kubernetes {#dns-resolution}

The DNS Architecture

Kubernetes runs CoreDNS (or kube-dns in older versions) as a cluster add-on to provide service discovery.

DNS Naming Convention

Full DNS Format:

<service-name>.<namespace>.svc.<cluster-domain>

Default cluster domain: cluster.local

Deep Example - Service Discovery

# Services in different namespaces
apiVersion: v1
kind: Service
metadata:
  name: user-api
  namespace: backend
---
apiVersion: v1  
kind: Service
metadata:
  name: payment-api
  namespace: payment
---
apiVersion: v1
kind: Service  
metadata:
  name: frontend
  namespace: frontend

DNS Resolution from any Pod:

# From frontend Pod connecting to backend services:

# Same namespace - short name works
curl http://frontend/health

# Different namespace - need namespace qualifier  
curl http://user-api.backend/users/123
curl http://payment-api.payment/process

# Full FQDN (always works)
curl http://user-api.backend.svc.cluster.local/users/123

# Cross-cluster (if federated)
curl http://user-api.backend.svc.us-west-2.cluster.local/users/123

DNS Search Domains

Inside a Pod in namespace frontend:

# /etc/resolv.conf contains:
search frontend.svc.cluster.local svc.cluster.local cluster.local
nameserver 10.96.0.10  # CoreDNS service IP

Resolution order:

  1. user-apiuser-api.frontend.svc.cluster.local (same namespace)
  2. user-apiuser-api.svc.cluster.local (cluster-wide search)
  3. user-apiuser-api.cluster.local (external DNS)

Advanced DNS Features

1. Custom DNS Configuration:

apiVersion: v1
kind: Pod
metadata:
  name: custom-dns-pod
spec:
  dnsPolicy: "None"  # Override default DNS
  dnsConfig:
    nameservers:
    - 8.8.8.8  # Google DNS
    - 1.1.1.1  # Cloudflare DNS
    searches:
    - company.internal
    options:
    - name: ndots
      value: "2"
  containers:
  - name: app
    image: nginx

2. Headless Services (StatefulSets):

# Headless service for StatefulSet
apiVersion: v1
kind: Service
metadata:
  name: database-headless
spec:
  clusterIP: None  # Headless - no cluster IP
  selector:
    app: database
  ports:
  - port: 5432

DNS records for headless service:

# Each Pod gets individual DNS record:
database-0.database-headless.default.svc.cluster.local → 10.244.1.10
database-1.database-headless.default.svc.cluster.local → 10.244.1.11  
database-2.database-headless.default.svc.cluster.local → 10.244.1.12

# Service DNS returns all Pod IPs:
database-headless.default.svc.cluster.local → 10.244.1.10,10.244.1.11,10.244.1.12

Why headless services:

  • Database clustering - Connect to specific database replicas
  • Peer discovery - Pods finding each other for clustering
  • Custom load balancing - Application-level load balancing logic

DNS Performance Optimization

1. DNS Caching:

# NodeLocal DNSCache - cache DNS responses on each node
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-local-dns
spec:
  template:
    spec:
      containers:
      - name: node-cache
        image: k8s.gcr.io/dns/k8s-dns-node-cache:1.21.1
        # Caches DNS responses locally on each node
        # Reduces load on CoreDNS and improves response time

2. DNS Policy Tuning:

# For Pods that primarily connect to external services
apiVersion: v1
kind: Pod
spec:
  dnsPolicy: Default  # Use node's DNS, not cluster DNS
  # Faster for external DNS lookups

Real-World Architecture Examples {#architecture-examples}

Example 1: E-commerce Microservices Platform

# Production architecture combining all concepts
---
# 1. Internal Services (ClusterIP)
apiVersion: v1
kind: Service
metadata:
  name: user-service
  namespace: backend
spec:
  type: ClusterIP
  selector:
    app: user-api
  ports:
  - port: 80
    targetPort: 8080
---
# 2. Database with Network Policy
apiVersion: v1
kind: Service  
metadata:
  name: postgres
  namespace: data
spec:
  type: ClusterIP
  selector:
    app: postgres
  ports:
  - port: 5432
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: database-access
  namespace: data
spec:
  podSelector:
    matchLabels:
      app: postgres
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          name: backend
    ports:
    - protocol: TCP
      port: 5432
---
# 3. External Access via Ingress
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: api-ingress
  annotations:
    nginx.ingress.kubernetes.io/rate-limit: "100"
    cert-manager.io/cluster-issuer: "letsencrypt"
spec:
  tls:
  - hosts: [api.ecommerce.com]
    secretName: api-tls
  rules:
  - host: api.ecommerce.com
    http:
      paths:
      - path: /users
        backend:
          service:
            name: user-service
            port: 
              number: 80

Traffic Flow:

Customer → Ingress (nginx) → user-service (ClusterIP) → User Pod
                    ↓
User Pod → postgres.data.svc.cluster.local:5432 → Database Pod

Example 2: Multi-Environment Setup

# Production namespace with strict policies
apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    environment: production
---
# Staging namespace  
apiVersion: v1
kind: Namespace
metadata:
  name: staging
  labels:
    environment: staging
---
# Block production access from staging
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: prod-isolation
  namespace: production
spec:
  podSelector: {}
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          environment: production

DNS resolution across environments:

# From staging Pod:
curl http://api-service.staging/health        # ✅ Same environment
curl http://api-service.production/health     # ❌ Blocked by Network Policy

# From production Pod:  
curl http://api-service.production/health     # ✅ Same environment
curl http://api-service.staging/health        # ✅ Production can access staging (for monitoring)

Key Architecture Decisions

1. Service Type Selection:

  • ClusterIP: Internal microservice communication (95% of services)
  • LoadBalancer: Main application entry points (1-2 per cluster)
  • Ingress: Multiple services sharing single load balancer (recommended)

2. Security Layers:

  • Network Policies: Control traffic between Pods/namespaces
  • Service mesh (Istio/Linkerd): mTLS, advanced routing, observability
  • API Gateway: Authentication, rate limiting, API versioning

3. DNS Strategy:

  • Use short names within same namespace for performance
  • Use FQDN for cross-namespace communication for clarity
  • Implement DNS caching for high-traffic applications

This architecture provides scalable, secure, and maintainable networking for production Kubernetes deployments.

Last updated: 2025-08-26 20:00 UTC