Layer 1: The Linux Process¶

What You're Building Toward¶

Before you can understand namespaces, you need to understand what you're isolating. A container is a process. To understand containers, understand processes first — not at a surface level, but at the kernel level.

1.1 How a Process Actually Starts¶

When you type nginx in a shell, this is what actually happens:

shell calls fork()
  → kernel creates a copy of the shell process
  → child process calls execve("/usr/sbin/nginx", args, env)
  → kernel loads the ELF binary into memory
  → kernel sets up the stack, heap, and text segments
  → instruction pointer is set to the binary's entry point
  → process is now running nginx, not a copy of the shell

The key syscall is execve(). It replaces the current process image with a new one. The PID stays the same. The namespaces stay the same. Only the code being run changes.

# Watch execve in action
strace -e trace=execve ls
# You'll see: execve("/usr/bin/ls", ["ls"], [/* env vars */]) = 0

This matters for containers because: runc forks a process and then calls execve() to replace it with your container's entrypoint. The PID, namespace setup, and cgroup assignment all happen before execve() is called.

1.2 The /proc Filesystem¶

/proc is a virtual filesystem. Nothing on disk. The kernel exposes process state as files.

# Start a long-running process
sleep 1000 &
PID=$!

# Everything about that process is here
ls /proc/$PID/

# Key files to understand:
cat /proc/$PID/status          # human-readable: PID, PPID, memory, capabilities
cat /proc/$PID/cmdline         # the command that started it (null-delimited)
cat /proc/$PID/environ         # environment variables
cat /proc/$PID/maps            # memory map — every region of memory it can see
cat /proc/$PID/fd/ -la         # open file descriptors
cat /proc/$PID/net/dev         # network interfaces THIS process can see
ls -la /proc/$PID/ns/          # namespace memberships — THIS is critical for later

Run that last one and look at it carefully:

ls -la /proc/$PID/ns/
# lrwxrwxrwx cgroup -> cgroup:[4026531835]
# lrwxrwxrwx ipc    -> ipc:[4026531839]
# lrwxrwxrwx mnt    -> mnt:[4026531840]
# lrwxrwxrwx net    -> net:[4026531993]
# lrwxrwxrwx pid    -> pid:[4026531836]
# lrwxrwxrwx uts    -> uts:[4026531838]

The number in brackets is the namespace inode. Two processes with the same inode number for net are in the same network namespace — they share the same network stack. This is how you verify whether containers are sharing namespaces.

# Compare two processes' namespaces
ls -la /proc/1/ns/
ls -la /proc/$PID/ns/
# Same numbers = same namespace
# Different numbers = isolated

1.3 The Process Tree¶

Every process has a parent (PPID). The tree goes all the way up to PID 1 (init/systemd).

pstree -p          # full tree with PIDs
pstree -p | head -30

# Or with ps
ps auxf            # tree view with ps

Why this matters: In a PID namespace, the process thinks it's PID 1. But from the host, you can still see it. The host kernel knows the real PID. This is why docker exec can work even though the container thinks its process is PID 1 — the host kernel just uses the real PID.

# Find the "real" PID of a container's PID 1
# (run this after you have a container running in later layers)
docker inspect <container> | grep Pid
cat /proc/<real_pid>/status | grep NSpid
# NSpid: <real_pid>  <namespace_pid>
# You'll see the host PID and the PID the container thinks it has

1.4 File Descriptors¶

Every process starts with 3 open file descriptors: - 0 → stdin - 1 → stdout
- 2 → stderr

sleep 1000 &
PID=$!
ls -la /proc/$PID/fd/
# 0 -> /dev/pts/0  (your terminal)
# 1 -> /dev/pts/0
# 2 -> /dev/pts/0

This matters because when runc starts a container, it rewires these file descriptors. The container's stdin/stdout/stderr get connected to pipes that containerd-shim holds open. That's how docker logs works — it's reading from those pipes.

1.5 Signals¶

Signals are how the kernel (and processes) communicate termination/events.

# The ones that matter for containers:
kill -SIGTERM $PID   # polite: "please shut down" — process can catch this
kill -SIGKILL $PID   # violent: kernel terminates immediately — process cannot catch this
kill -SIGSTOP $PID   # pause (used by CPU cgroup throttling internally)
kill -SIGCONT $PID   # resume

# See all signals
kill -l

When cgroups OOMKill a process, it's a SIGKILL — unkillable, uncatchable, instant. When Kubernetes sends a SIGTERM on pod termination, the process has terminationGracePeriodSeconds to handle it before Kubernetes sends SIGKILL.

# Watch signal handling
# In terminal 1:
trap 'echo "Got SIGTERM, cleaning up..."; exit 0' TERM
sleep 1000 &
PID=$!
wait $PID

# In terminal 2:
kill -SIGTERM $PID   # process handles it
kill -SIGKILL $PID   # process is gone immediately, no handler runs

1.6 How the Kernel Loads a Binary (ELF)¶

Every binary on Linux is in ELF format (Executable and Linkable Format).

file /bin/bash
# /bin/bash: ELF 64-bit LSB pie executable, x86-64

# Read the ELF header
readelf -h /bin/bash

# See what libraries it needs
ldd /bin/bash
# libtinfo.so.6 => /lib/x86_64-linux-gnu/libtinfo.so.6
# libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6

# This is why Alpine containers work — they ship musl libc instead
# The binary and its libraries must both be in the container rootfs

This explains why you can't just copy a binary into a container and expect it to run — you need its shared libraries too. scratch containers (like distroless) only work with statically linked binaries that have zero external dependencies.

# Check if a binary is statically linked
ldd /bin/busybox
# statically linked  ← this one has no dependencies, works anywhere

1.7 Practical Exercises¶

Exercise 1: Start a process, find its PID, read every file in /proc/<pid>/ and understand what each one shows.

Exercise 2: Use strace -f bash in one terminal, run some commands, watch every syscall. Find the execve, clone, read, write calls.

strace -f -e trace=execve,clone,fork bash -c "ls /tmp"

Exercise 3: Find a running process, compare its namespace inodes against your shell. Confirm they're in the same namespaces.

ls -la /proc/$$/ns/
ls -la /proc/1/ns/
# PID 1 and your shell should share all namespaces — you're both in the host

Exercise 4: Write a file to stdout using only syscalls, no printf:

# Understand that echo is just a write() syscall to fd 1
strace echo "hello"
# write(1, "hello\n", 6) = 6

Key Takeaways¶

A process is a running instance of a binary, started via fork() + execve()
/proc/<pid>/ns/ shows namespace membership via inode numbers — same inode = same namespace
File descriptors are how stdin/stdout/stderr work — and how container logging works
SIGKILL cannot be caught — this is what OOMKill and docker kill use
A binary needs its shared libraries in the same filesystem — this is why container images exist

Next: Layer 2 covers the clone() syscall — the actual mechanism that creates namespace isolation.