Search…

Content Delivery Networks

In this series (20 parts)
  1. What is system design and why it matters
  2. Estimations and back-of-envelope calculations
  3. Scalability: vertical vs horizontal scaling
  4. CAP theorem and distributed system tradeoffs
  5. Consistency models
  6. Load balancing
  7. Caching: strategies and patterns
  8. Content Delivery Networks
  9. Databases: SQL vs NoSQL and when to use each
  10. Database replication
  11. Database sharding and partitioning
  12. Consistent hashing
  13. Message queues and event streaming
  14. API design: REST, GraphQL, gRPC
  15. Rate limiting and throttling
  16. Proxies: forward and reverse
  17. Networking concepts for system design
  18. Reliability patterns: timeouts, retries, circuit breakers
  19. Observability: logging, metrics, tracing
  20. Security in system design

A user in Tokyo requests your homepage. Your origin server sits in Virginia. That round trip crosses the Pacific Ocean, hits undersea cables, traverses multiple autonomous systems, and takes 180 ms before a single byte arrives. Put a cached copy in Tokyo and the same request completes in 12 ms. That is the entire value proposition of a content delivery network.

The problem CDNs solve

Physics constrains networks. Light in fiber travels at roughly two thirds the speed of light in vacuum, which means about 200,000 km/s. A round trip from London to Sydney covers roughly 34,000 km of cable. At fiber speed that is 170 ms of pure propagation delay before you account for routing hops, TCP handshake overhead, and TLS negotiation. Multiply by the number of round trips a typical page load requires and you get a miserable user experience.

CDNs attack this problem by placing servers close to users. Instead of every request traveling to your origin, most requests terminate at a nearby edge server that already holds a cached copy of the response. The origin only gets involved when the edge has nothing to serve or the content has expired.

Edge servers and Points of Presence

A Point of Presence (PoP) is a physical location where a CDN operates servers. Major providers like Cloudflare run over 300 PoPs across six continents. Each PoP contains edge servers: commodity machines with large SSDs and plenty of RAM, optimized for serving cached content at high throughput.

A single PoP typically handles multiple roles. It terminates TLS connections so the handshake happens locally rather than across the ocean. It serves cached static assets like images, CSS, and JavaScript bundles. It can compress responses on the fly with Brotli or gzip. In modern CDNs it also runs application logic at the edge, but more on that later.

The geographic distribution of PoPs matters enormously. If your user base is concentrated in Southeast Asia and your CDN has two PoPs in that region, you are not getting much benefit. When evaluating providers, look at PoP maps and match them against your traffic patterns.

How requests reach the right edge

When a user types your URL and hits enter, their browser needs to resolve the domain to an IP address. CDNs intercept this resolution step to route users to the closest PoP. Two dominant techniques handle this: DNS-based routing and Anycast.

DNS-based routing

Your domain’s CNAME record points to the CDN’s domain, say d1234.cdn.example.net. When a resolver queries for that domain, the CDN’s authoritative DNS server inspects the resolver’s IP address, estimates the user’s geographic location, and returns the IP of the nearest PoP. This is sometimes called GeoDNS.

The technique works well but has a notable weakness. The CDN sees the resolver’s IP, not the user’s IP. If a user in Mumbai uses a DNS resolver hosted in Frankfurt, the CDN might route them to a European PoP instead of an Indian one. The EDNS Client Subnet (ECS) extension mitigates this by including a truncated version of the client’s IP in the DNS query, giving the CDN better location data.

DNS-based routing also depends on TTL values. If the DNS response is cached for 300 seconds and a PoP goes down during that window, some users will hit a dead endpoint until the TTL expires. Providers keep TTLs low, often 30 to 60 seconds, to reduce this risk.

Anycast routing

Anycast takes a different approach. Multiple PoPs advertise the same IP address via BGP. When a user’s packet enters the internet, standard BGP routing delivers it to the topologically closest PoP that announces that address. No DNS tricks required.

Anycast is elegant for several reasons. Failover is automatic: if a PoP goes offline it stops announcing the route and traffic shifts to the next closest PoP within seconds. There is no TTL window where users hit a dead server. It also handles DDoS attacks well because attack traffic gets distributed across all PoPs rather than concentrated on one.

Cloudflare and Google Cloud CDN use Anycast heavily. AWS CloudFront relies more on DNS-based routing. In practice most large CDNs combine both techniques.

Request routing in action

Here is what happens when a user requests a cached asset through a CDN:

graph TD
  U[User in Tokyo] -->|DNS lookup| DNS[CDN DNS / Anycast]
  DNS -->|Nearest PoP IP| U
  U -->|HTTPS request| E[Edge Server - Tokyo PoP]
  E -->|Cache HIT| U
  E -->|Cache MISS| S[Shield / Mid-tier Cache]
  S -->|Cache HIT| E
  S -->|Cache MISS| O[Origin Server - Virginia]
  O -->|Response + headers| S
  S -->|Cache + forward| E
  E -->|Cache + respond| U

Request flow through a CDN. On a cache hit the edge responds directly. On a miss the request may pass through a shield layer before reaching the origin.

The shield or mid-tier cache is an intermediate layer that sits between edge PoPs and the origin. Without it, a cache miss at 50 different PoPs would generate 50 separate requests to your origin. The shield consolidates these: all PoPs in a region funnel misses through a single shield node, which either serves from its own cache or makes one request to the origin. This dramatically reduces origin load during cache expiration events.

Push vs pull CDN

CDNs fall into two broad models based on how content reaches the edge.

Pull CDN

A pull CDN fetches content from the origin on demand. The first user to request a resource after it expires triggers a cache miss, the edge pulls the resource from the origin, caches it, and serves it. Subsequent requests within the TTL window get the cached copy.

Pull is the dominant model today. You do not need to upload anything to the CDN. You configure your origin, set cache headers, and the CDN handles the rest. The tradeoff is that the first request after expiration is slow because it must travel to the origin. For high-traffic sites this barely matters because content gets re-cached almost instantly. For low-traffic pages it can mean occasional slow loads.

Push CDN

A push CDN requires you to upload content to the CDN’s storage before users request it. You are responsible for putting files on the CDN and removing them when they change. Amazon S3 paired with CloudFront in an origin access identity configuration is a common push-like setup, though CloudFront itself still operates as a pull cache in front of S3.

Push works well for large static assets that change infrequently: software binaries, video files, firmware images. You control exactly what is on the edge and when it gets updated. The cost is operational complexity. You need deployment pipelines that push to the CDN, and you must handle invalidation yourself.

Most production setups are hybrids. Static assets might be pushed to an object store that the CDN pulls from, while dynamic HTML is pulled directly from application servers.

Cache-Control headers

The origin server controls caching behavior through HTTP headers. Getting these right is the difference between a CDN that works and one that serves stale content or bypasses the cache entirely.

The Cache-Control header is the primary mechanism. A response with Cache-Control: public, max-age=31536000, immutable tells the CDN (and the browser) to cache this resource for one year and never revalidate it. This is appropriate for versioned assets like app.a3b8f2.js where the filename changes when the content changes.

For content that changes but can tolerate brief staleness, Cache-Control: public, max-age=60, stale-while-revalidate=300 says cache for 60 seconds, and after that serve the stale version for up to 300 more seconds while fetching a fresh copy in the background. The user never waits for the origin.

The s-maxage directive specifically targets shared caches like CDNs. Cache-Control: public, max-age=0, s-maxage=3600 tells browsers not to cache but tells the CDN to cache for an hour. This is useful when you want the CDN to absorb traffic but want browsers to always check the CDN for fresh content.

The Vary header also matters. Vary: Accept-Encoding tells the CDN to maintain separate cached copies for different encodings (gzip vs Brotli vs uncompressed). Misusing Vary can fragment your cache and tank hit ratios.

Cache invalidation and purging

Phil Karlton’s famous quip about the two hard things in computer science applies directly here. When you deploy new content, you need the CDN to stop serving the old version. Several strategies exist.

TTL-based expiration is the simplest. Set reasonable TTLs and wait. For assets with cache-busting filenames this is all you need. The old filename stops being referenced and the new one gets cached on first request.

Purge by URL lets you invalidate a specific cached resource. Most CDNs expose an API for this. Cloudflare processes purge requests in under 150 ms globally. AWS CloudFront invalidations can take up to 15 minutes to propagate, though typically complete faster.

Purge by tag or surrogate key is more powerful. You tag cached responses with logical identifiers like product-123 or homepage. When the product data changes, you purge everything tagged product-123 across the entire CDN. Fastly and Cloudflare both support this. It scales far better than purging individual URLs when a single data change affects many pages.

Purge everything is the nuclear option. It clears the entire cache. Origin load spikes immediately after because every request is a miss. Use this only when absolutely necessary, and make sure your origin can handle the thundering herd.

CDN for dynamic content and edge compute

Traditional CDNs cache static content. Modern CDNs go further. Edge compute platforms like Cloudflare Workers, AWS Lambda@Edge, and Deno Deploy let you run application code at PoPs worldwide. This changes what a CDN can do.

Consider personalization. A product page has 90% shared content and 10% personalized recommendations. Without edge compute you either cache nothing (because the page is personalized) or cache everything (and show wrong recommendations). With edge compute the PoP fetches the cached page shell, runs a lightweight function that queries a nearby recommendations service, stitches the result together, and responds. The user gets a personalized page in 25 ms instead of 200 ms.

Authentication checks are another use case. Instead of forwarding every request to the origin to validate a JWT, the edge server verifies the token locally. Invalid tokens get rejected at the edge, never consuming origin resources. This also provides a natural layer of load balancing since traffic distributes across PoPs.

Edge compute also enables A/B testing without client-side JavaScript. The edge function reads a cookie or assigns a cohort, selects the appropriate variant, and serves it. No layout shift. No flash of wrong content.

These capabilities blur the line between CDN and application platform. The PoP is no longer just a cache. It is a lightweight compute node that can make decisions, transform responses, and interact with backend services.

Measuring CDN performance

The metrics that matter are cache hit ratio, time to first byte (TTFB), and origin offload percentage.

Cache hit ratio is the fraction of requests served from edge cache. A well-tuned static site should hit 95% or higher. An e-commerce site with personalized pages might sit around 70 to 80%. If your ratio is below 50%, something is wrong: check your Cache-Control headers, look for excessive Vary usage, and verify that query string parameters are not fragmenting the cache.

TTFB measures how quickly the first byte of the response reaches the user. For cached content served from a nearby PoP, expect 5 to 30 ms. If TTFB for cached content exceeds 100 ms consistently, your users may not be hitting the nearest PoP. Check DNS routing and consider whether your CDN has adequate PoP coverage in your users’ regions.

Origin offload measures how much traffic the CDN absorbs versus how much reaches your origin. High offload means your origin can be smaller and cheaper. If your CDN handles 95% of requests, your origin only needs to serve 5%. That is the difference between needing 20 application servers and needing 1.

Pitfalls and operational considerations

Caching the wrong thing is worse than not caching at all. If a CDN caches a response containing user-specific data and serves it to other users, you have a privacy incident. Always verify that responses with Set-Cookie or user data include Cache-Control: private or no-store.

HTTPS and CDNs interact in ways that matter. The CDN terminates TLS at the edge, which means the connection between the CDN and your origin is a separate hop. If that hop is over plain HTTP, you have a security gap. Always configure full TLS between CDN and origin, often called “Full (Strict)” mode in Cloudflare’s terminology.

CDN outages happen. Cloudflare had a 19-minute outage in June 2022 that took down thousands of sites. Design your architecture so that your origin can serve traffic directly if the CDN fails. Keep your origin’s DNS records available and have a plan to switch traffic.

Cost is proportional to bandwidth. CDNs charge per GB transferred, typically 0.01to0.01 to 0.08 per GB depending on region and volume. Serving 100 TB per month at 0.02/GBcosts0.02/GB costs 2,000. For video-heavy services, bandwidth costs dominate infrastructure budgets. Compare egress pricing carefully and consider CDN providers that offer unmetered bandwidth on specific plans.

Tying it together

A CDN is, at its core, a geographically distributed caching layer. It relies on the same principles of cache expiration, invalidation, and hit ratios you already know. It uses networking fundamentals like DNS, BGP, and TLS to route traffic efficiently. And it works alongside load balancers to distribute requests across origin servers when the cache cannot help.

The modern CDN does more than cache files. It terminates connections, compresses responses, blocks attacks, and runs application logic. For any system serving users across geographic regions, a CDN is not an optimization. It is a requirement.

What comes next

With caching and content delivery covered, the next layer to understand is where your data actually lives. The next article surveys the database landscape: relational vs non-relational, ACID vs BASE, and how to choose the right storage engine for your workload.

Start typing to search across all content
navigate Enter open Esc close