Layer 3: Namespaces In Depth¶

What You're Building Toward¶

Layer 2 showed you how namespaces are created. This layer goes deep on each type — especially network namespaces and veth pairs, because that's the foundation of all Kubernetes networking.

3.1 The 7 Namespace Types¶

Namespace	Flag	Isolates
Mount	CLONE_NEWNS	Filesystem mount points
UTS	CLONE_NEWUTS	Hostname, domainname
IPC	CLONE_NEWIPC	SysV IPC, POSIX message queues
PID	CLONE_NEWPID	Process IDs
Network	CLONE_NEWNET	Network interfaces, IPs, routes, iptables
User	CLONE_NEWUSER	UID/GID mappings
Cgroup	CLONE_NEWCGROUP	Cgroup root directory view

3.2 Network Namespace — The Most Important One¶

This is the one that matters most for Kubernetes. Every Pod gets its own network namespace.

What a new network namespace contains:¶

ip netns add test-ns
ip netns exec test-ns bash

# Inside:
ip a
# 1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN
#    link/loopback 00:00:00:00:00:00

ip route
# (empty)

iptables -L
# (empty chains)

A new network namespace has: - Only loopback (lo), which is DOWN - No routes - Empty iptables - No DNS

It is completely isolated. Nothing in, nothing out until you wire it up.

veth pairs — The Wire Between Namespaces¶

A veth pair is a virtual ethernet cable. What goes in one end comes out the other.

# Create a veth pair
ip link add veth-host type veth peer name veth-container

# veth-host lives in the host namespace
# veth-container gets moved into the container namespace
ip link set veth-container netns test-ns

# Configure the host end
ip addr add 10.10.0.1/24 dev veth-host
ip link set veth-host up

# Configure the container end
ip netns exec test-ns ip addr add 10.10.0.2/24 dev veth-container
ip netns exec test-ns ip link set veth-container up
ip netns exec test-ns ip link set lo up

# Test
ip netns exec test-ns ping 10.10.0.1   # container pings host end
ping 10.10.0.2                          # host pings container

# Verify isolation
ip netns exec test-ns ip a
# Only sees veth-container and lo — cannot see host's eth0

This is what Docker does for every container. Every container gets a veth pair. One end in the host namespace, one end in the container namespace.

The Linux Bridge — Connecting Multiple Containers¶

When you have multiple containers that need to talk to each other, you need a bridge:

# Create a bridge (this is what docker0 is)
ip link add br0 type bridge
ip addr add 172.20.0.1/24 dev br0
ip link set br0 up

# Create namespace and veth for container 1
ip netns add c1
ip link add veth-c1 type veth peer name veth-c1-br
ip link set veth-c1 netns c1
ip link set veth-c1-br master br0    # plug into bridge
ip link set veth-c1-br up
ip netns exec c1 ip addr add 172.20.0.2/24 dev veth-c1
ip netns exec c1 ip link set veth-c1 up
ip netns exec c1 ip link set lo up
ip netns exec c1 ip route add default via 172.20.0.1

# Create namespace and veth for container 2
ip netns add c2
ip link add veth-c2 type veth peer name veth-c2-br
ip link set veth-c2 netns c2
ip link set veth-c2-br master br0
ip link set veth-c2-br up
ip netns exec c2 ip addr add 172.20.0.3/24 dev veth-c2
ip netns exec c2 ip link set veth-c2 up
ip netns exec c2 ip link set lo up
ip netns exec c2 ip route add default via 172.20.0.1

# c1 and c2 can now talk to each other
ip netns exec c1 ping 172.20.0.3   # c1 → c2

# Enable IP forwarding and NAT for internet access
echo 1 > /proc/sys/net/ipv4/ip_forward
iptables -t nat -A POSTROUTING -s 172.20.0.0/24 ! -o br0 -j MASQUERADE

# Now containers can reach the internet
ip netns exec c1 ping 8.8.8.8

You just built what Docker's bridge network mode does. docker0 is this bridge. docker network create makes more bridges.

Port Mapping with iptables¶

This is how docker run -p 8080:80 works:

# Container is at 172.20.0.2:80
# We want host port 8080 to forward to container port 80

iptables -t nat -A PREROUTING -p tcp --dport 8080 -j DNAT --to-destination 172.20.0.2:80
iptables -t nat -A OUTPUT -p tcp --dport 8080 -j DNAT --to-destination 172.20.0.2:80

# Verify
iptables -t nat -L -n -v

This is also how Kubernetes NodePort services work. kube-proxy writes these iptables rules.

3.3 PID Namespace — In Depth¶

# Start a process tree in a new PID namespace
unshare --pid --fork --mount-proc bash

# Check your PID
echo $$   # 1

# Start some children
sleep 100 &
sleep 200 &
ps aux
# PID 1: bash
# PID 2: sleep 100
# PID 3: sleep 200

# Exit the namespace and check from the host
exit
ps aux | grep sleep
# You'll see the REAL PIDs, e.g. 5021, 5022

The mount-proc detail¶

When you create a new PID namespace, /proc still shows host processes unless you remount it. That's what --mount-proc does — it mounts a fresh /proc inside the new namespace.

# Without --mount-proc:
unshare --pid --fork bash
ps aux   # shows all host processes — /proc reflects the host, not your namespace

# With --mount-proc:
unshare --pid --fork --mount-proc bash
ps aux   # only shows your processes

This is a mount namespace interaction — you need both CLONE_NEWPID and CLONE_NEWNS to properly isolate the process view.

PID 1 in a Container¶

In a container, your entrypoint is PID 1. This matters because: - PID 1 is responsible for reaping zombie processes - SIGTERM goes to PID 1 — if your app doesn't handle it, the container won't shut down cleanly - If PID 1 exits, the entire namespace is torn down — all processes die

# Bad: shell script as PID 1 (doesn't forward signals)
# entrypoint.sh:
#!/bin/bash
./myapp   # myapp is a child of bash, not PID 1
          # SIGTERM goes to bash, not myapp
          # docker stop waits terminationGracePeriodSeconds then SIGKILL

# Good: exec replaces the shell with your app
#!/bin/bash
exec ./myapp   # now myapp IS PID 1, gets signals directly

3.4 Mount Namespace — In Depth¶

unshare --mount bash

# You can now mount things without affecting the host
mount -t tmpfs tmpfs /mnt
touch /mnt/test-file

# Host cannot see this mount
# Check with:
cat /proc/mounts | grep tmpfs
# vs in another terminal:
cat /proc/mounts | grep tmpfs

The /proc/mounts vs /etc/mtab difference¶

# /proc/mounts = kernel's view of current mounts (real)
# /etc/mtab    = traditionally maintained by mount command (can be a symlink to /proc/mounts)
cat /proc/mounts
cat /proc/self/mounts   # same thing, relative to current namespace

Bind Mounts¶

These are how Docker volumes and Kubernetes ConfigMaps/Secrets work:

# Bind mount: make a directory appear at another path
mkdir /tmp/source /tmp/target
echo "hello" > /tmp/source/file.txt

mount --bind /tmp/source /tmp/target
cat /tmp/target/file.txt   # hello

# The same inode, different path
# Changes in either location are visible in both
ls -i /tmp/source/file.txt /tmp/target/file.txt
# Same inode number

Kubernetes ConfigMaps are mounted into containers as bind mounts. The kubelet writes the ConfigMap data to a host path, then bind-mounts it into the container's mount namespace.

3.5 UTS Namespace¶

Simple but worth understanding completely:

unshare --uts bash
hostname                  # still shows host hostname
hostname my-container     # change it
hostname                  # my-container

# Host is unaffected
# In another terminal:
hostname   # original hostname

UTS = UNIX Time-sharing System. The namespace isolates two things: - hostname - domainname (NIS domain, not DNS)

# Both fields
hostname              # node hostname
domainname            # NIS domainname (usually "(none)")
uname -n              # same as hostname

In Kubernetes, each pod gets its own UTS namespace. The pod's hostname is the pod name by default, configurable via spec.hostname.

3.6 IPC Namespace¶

Isolates System V IPC objects and POSIX message queues. Processes in different IPC namespaces cannot use shared memory segments to communicate.

# In host namespace: create a shared memory segment
ipcmk -M 1024
# Shared memory id: 0

ipcs -m
# ------ Shared Memory Segments --------
# key        shmid      owner      perms      bytes
# 0x...      0          root       644        1024

# In a new IPC namespace:
unshare --ipc bash
ipcs -m
# ------ Shared Memory Segments --------
# (empty — cannot see host's shared memory)

This matters for pods: containers within the same pod share the IPC namespace (via the pause container). This means they can communicate via shared memory — useful for high-performance sidecars.

3.7 Inspecting Running Container Namespaces¶

Once you're running real containers (Docker, later Kubernetes), use these to inspect:

# Find the container's PID on the host
docker inspect <container_name> --format '{{.State.Pid}}'
CPID=<that_pid>

# View all its namespaces
ls -la /proc/$CPID/ns/

# Enter specific namespaces
nsenter -t $CPID --net ip a          # see its network
nsenter -t $CPID --mount ls /        # see its filesystem
nsenter -t $CPID --pid ps aux        # see its processes
nsenter -t $CPID --net --pid bash    # full shell in its context

# Compare two containers' namespaces
# Containers in the SAME pod will share net and ipc inodes
# Containers in DIFFERENT pods will have different net inodes

3.8 Namespace Cleanup¶

Named network namespaces persist until deleted. Clean up:

# List named network namespaces
ip netns list

# Delete
ip netns del test-ns
ip netns del c1
ip netns del c2

# Note: unnamed namespaces (created with unshare) disappear when
# the last process using them exits

3.9 Practical Exercises¶

Exercise 1 — Build a complete 2-container network from scratch: - Two network namespaces - One bridge - Two veth pairs - IPs in the same subnet - Both can ping each other - Both can ping the internet via NAT

Exercise 2 — Port forward into a namespace: - Start a Python HTTP server inside a network namespace: ip netns exec ns1 python3 -m http.server 80 - Write an iptables rule to forward host port 8080 to it - Curl it from the host

Exercise 3 — Observe namespace isolation failing:

# Without mount namespace, /proc leaks PID info
unshare --pid --fork bash   # no --mount-proc
ps aux   # still shows host processes — why?

Understand why this happens and what it means for security.

Exercise 4 — Find a Kubernetes pod's veth pair:

# On a node with running pods:
# Find the pod's PID
crictl pods
crictl inspect <pod_id> | grep pid

# Find its veth interface
nsenter -t <pid> --net ip a   # see veth name inside namespace

# Find the other end on the host
ip a   # look for interface with matching index number
# or use ethtool:
nsenter -t <pid> --net ethtool -S eth0 | grep peer_ifindex
ip link | grep <peer_index>

Key Takeaways¶

Network namespaces are completely isolated — no interfaces, no routes, no iptables
veth pairs are the wire between namespaces — one end in the container, one in the host/bridge
A Linux bridge connects multiple veth ends — this is docker0
iptables DNAT rules are how port mapping works — and how NodePort services work
PID 1 in a container must handle signals correctly — use exec in shell scripts
IPC namespace sharing is why containers in the same pod can use shared memory
nsenter + /proc/<pid>/ns/ is how you inspect running container namespaces

Next: Layer 4 covers cgroups — the walls that prevent a process from consuming unlimited resources.