Apr 6, 2026 · 15 min read · DevOps

How containers work

In this series (8 parts)

Containers are not virtual machines. This is the single most important thing to understand before working with Docker. A container is a regular Linux process that has been given a restricted view of the system using kernel features that have existed for over a decade. No hypervisor. No guest OS. Just isolation primitives applied to ordinary processes.

VMs vs containers

A virtual machine runs a full guest operating system on emulated hardware. A container shares the host kernel and isolates only the userspace.

graph TD
  subgraph VM["Virtual Machine Stack"]
      H1[Host OS + Hypervisor] --> G1[Guest OS 1]
      H1 --> G2[Guest OS 2]
      G1 --> A1[App A]
      G2 --> A2[App B]
  end
  subgraph CT["Container Stack"]
      H2[Host OS + Container Runtime] --> C1[App A]
      H2 --> C2[App B]
  end
  style H1 fill:#64b5f6,stroke:#1976d2,color:#000
  style H2 fill:#81c784,stroke:#388e3c,color:#000
  style G1 fill:#ffb74d,stroke:#f57c00,color:#000
  style G2 fill:#ffb74d,stroke:#f57c00,color:#000
  style C1 fill:#ce93d8,stroke:#7b1fa2,color:#000
  style C2 fill:#ce93d8,stroke:#7b1fa2,color:#000

VM stack vs container stack. Containers skip the guest OS entirely.

VMs provide strong isolation because each guest has its own kernel. Containers are lighter because they share the host kernel, but they depend on kernel-level isolation being correctly configured. The trade-off is startup time and resource overhead (seconds vs minutes, megabytes vs gigabytes) in exchange for a thinner security boundary.

Linux namespaces

Namespaces are the kernel feature that gives each container its own isolated view of system resources. There are six namespaces that matter.

PID namespace

Each container gets its own process ID tree. The first process inside the container sees itself as PID 1, even though the host sees it with a completely different PID.

# Create a new PID namespace and run a shell inside it
sudo unshare --pid --fork --mount-proc /bin/bash

# Inside the new namespace
ps aux
# You will only see the bash process and ps itself

The --fork flag is required because PID namespaces apply to children of the calling process, not the caller itself. The --mount-proc flag remounts /proc so that ps reads from the new namespace.

Network namespace

Each container gets its own network stack: its own interfaces, routing table, iptables rules, and port space.

# Create a network namespace
sudo ip netns add testns

# Run a command inside it
sudo ip netns exec testns ip addr
# Only the loopback interface exists, and it is DOWN

# Clean up
sudo ip netns del testns

This is why two containers can both bind to port 80 without conflict. They each have their own port space.

Mount namespace

Each container gets its own filesystem mount tree. Changes to mounts inside the container do not affect the host.

# Create a new mount namespace
sudo unshare --mount /bin/bash

# Mounts made here are invisible to the host
mount -t tmpfs none /mnt
ls /mnt  # empty tmpfs, only visible in this namespace

UTS namespace

The UTS namespace isolates the hostname and domain name. Each container can have its own hostname without affecting others.

sudo unshare --uts /bin/bash
hostname container-01
hostname  # shows container-01, host is unchanged

IPC namespace

Isolates System V IPC objects and POSIX message queues. Processes in different IPC namespaces cannot see each other’s shared memory segments or semaphores.

User namespace

Maps UIDs inside the container to different UIDs on the host. A process can be root (UID 0) inside the container but map to an unprivileged user (e.g., UID 100000) on the host. This is the foundation of rootless containers.

Control groups (cgroups)

Namespaces control what a process can see. Cgroups control what a process can use. They enforce resource limits on CPU, memory, I/O, and more.

Memory limits

# Create a cgroup (cgroups v2)
sudo mkdir -p /sys/fs/cgroup/demo

# Set a 100MB memory limit
echo $((100 * 1024 * 1024)) | sudo tee /sys/fs/cgroup/demo/memory.max

# Move the current shell into this cgroup
echo $$ | sudo tee /sys/fs/cgroup/demo/cgroup.procs

# Any process in this cgroup that exceeds 100MB will be OOM-killed

CPU limits

# Limit to 50% of one CPU core (50ms out of every 100ms)
echo "50000 100000" | sudo tee /sys/fs/cgroup/demo/cpu.max

The format is quota period. A quota of 50000 microseconds within a 100000 microsecond period means the cgroup gets at most 50% of a single core. Set quota to max to remove the limit.

What happens without cgroups

Without cgroups, a single container could consume all available memory and starve every other process on the host. Cgroups are what make multi-tenant container hosts viable.

Union filesystems

Containers need a filesystem. Copying an entire OS image for every container would waste enormous amounts of disk space. Union filesystems solve this by layering read-only image layers with a thin writable layer on top.

How OverlayFS works

OverlayFS merges multiple directories into a single unified view. It uses four directories:

lowerdir: One or more read-only layers (the image layers)
upperdir: A writable layer where all changes are stored
workdir: A scratch directory used internally by OverlayFS
merged: The unified view presented to the container

# Set up an OverlayFS mount
mkdir -p /data/lower /data/upper /data/work /data/merged

# Populate the lower (read-only) layer
echo "from image" > /data/lower/config.txt

# Mount the overlay
sudo mount -t overlay overlay \
  -o lowerdir=/data/lower,upperdir=/data/upper,workdir=/data/work \
  /data/merged

# The container sees the merged view
cat /data/merged/config.txt  # "from image"

# Writing goes to the upper layer (copy-on-write)
echo "modified" > /data/merged/config.txt
cat /data/upper/config.txt   # "modified"
cat /data/lower/config.txt   # "from image" (unchanged)

When a container modifies a file from a lower layer, OverlayFS copies it to the upper layer first, then applies the modification. The lower layer is never touched. This is copy-on-write.

Layers in practice

A Docker image is a stack of OverlayFS layers. Each instruction in a Dockerfile creates a new layer. When you pull an image that shares base layers with an image you already have, Docker only downloads the new layers. When you run five containers from the same image, they all share the same read-only layers and each get their own thin writable upper layer.

What Docker adds

Everything above exists in the Linux kernel without Docker. You could build a container by hand with unshare, ip netns, cgroups, and mount -t overlay. Docker wraps these primitives into a usable system.

The Docker daemon

dockerd is a long-running daemon that manages container lifecycles. It handles creating namespaces, setting up cgroups, mounting filesystems, configuring networking, and cleaning up when containers exit.

The CLI

docker run, docker build, docker ps. The CLI talks to the daemon over a Unix socket (/var/run/docker.sock). Every docker command is an API call to the daemon.

Image format (OCI)

Docker standardized a format for packaging filesystem layers and metadata into a portable image. The Open Container Initiative (OCI) now governs this spec. An OCI image is a manifest pointing to a config blob and an ordered list of layer tarballs.

Registry

Docker Hub and other registries are HTTP APIs for storing and distributing images. docker pull downloads layers from a registry. docker push uploads them. The protocol supports content-addressable storage, so layers are deduplicated by their SHA256 digest.

Putting it together

When you run docker run -it ubuntu bash:

The daemon pulls the ubuntu image layers from the registry (if not cached)
It stacks the layers using OverlayFS and adds a writable upper layer
It creates new namespaces (pid, net, mnt, uts, ipc, user)
It sets up cgroups with the configured resource limits
It creates a virtual network interface and attaches it to a bridge
It starts bash as PID 1 inside the new namespaces

The result looks like an isolated machine. It is still just a process on the host.

What comes next

Now that you understand the kernel primitives behind containers, the next article covers Docker fundamentals: installing Docker, building images with Dockerfiles, managing containers, and working with volumes and networks.

← Back to all series