Dec 14, 2025 · 20 min read · System Design

Design a URL shortener

In this series (18 parts)

Prerequisites: What is system design? and Back-of-the-envelope estimations.

A URL shortener does two things: it turns a long URL into a short code, and it redirects anyone who visits that code back to the original. Simple in concept, tricky at scale. The interesting problems show up when you need to generate billions of unique short codes without collisions, redirect users in under 10ms, and track every click for analytics.

This case study walks through the full design: requirements, capacity math, architecture, and the deep dives that separate a toy project from a production system.

1. Requirements

Functional requirements

Shorten: Given a long URL, return a short URL like https://short.ly/abc123.
Redirect: When a user visits a short URL, redirect them (HTTP 301 or 302) to the original.
Custom slugs: Users can optionally pick their own short code.
Expiration: Short URLs expire after a configurable TTL (default: 5 years).
Analytics: Track click count, referrer, timestamp, and geo for each redirect.

Non-functional requirements

Low latency: Redirects complete in under 10ms at p99.
High availability: 99.99% uptime. A broken shortener breaks every link ever shared.
Scale: 10 million daily active users, 100:1 read-to-write ratio.
Durability: Once created, a short URL must never lose its mapping.

2. Capacity estimation

Start with the users and work outward.

Metric	Value
DAU	10 million
New URLs per day	1 million (not every user creates a link)
Writes per second	~12 (1M / 86,400)
Reads per second	~1,200 (100:1 ratio)
Peak reads per second	~6,000 (5x burst)
URL record size	~500 bytes (short code + long URL + metadata)
Storage per year	~180 GB (1M/day * 365 * 500 bytes)
Storage over 5 years	~900 GB

A single database server can handle 1,200 reads/s comfortably. The bottleneck is not raw throughput but redirect latency. With a caching layer absorbing 90%+ of reads, the database sees roughly 120 reads/s, which is trivial.

3. High-level architecture

graph LR
  Client["Client"] --> LB["Load Balancer"]
  LB --> API["API Service"]
  API --> Cache["Redis Cache"]
  API --> DB["Database"]
  API --> IDGen["ID Generator"]
  API --> Analytics["Analytics Service"]
  Analytics --> Kafka["Message Queue"]
  Kafka --> ClickStore["Click Store"]

High-level architecture of a URL shortener. The API service handles both shorten and redirect flows, backed by a cache, database, and async analytics pipeline.

The load balancing layer distributes traffic across stateless API servers. Redirect requests hit the cache first; on a miss they fall through to the database. Analytics events are pushed to a message queue and processed asynchronously so they never slow down the redirect path.

4. API design

Two core endpoints:

POST /api/shorten
{
  "long_url": "https://example.com/very/long/path?query=1",
  "custom_slug": "my-link",   // optional
  "ttl_days": 365              // optional
}

Response: { "short_url": "https://short.ly/abc123" }

GET /:short_code

Response: HTTP 302 redirect to the original URL

Why 302 and not 301? A 301 tells the browser to cache the redirect permanently. That is great for performance but it means you lose analytics: the browser will never hit your server again for that link. Use 302 if analytics matter. Use 301 if you want to minimize server load and do not need per-click tracking.

5. Deep dive: short code generation

This is the core problem. You need a function that maps a long URL to a short, unique string. There are three main approaches.

Approach A: Hash and truncate

Hash the long URL with MD5 or SHA-256, then take the first 7 characters (base62 encoded). With base62, 7 characters give you 62^7 = ~3.5 trillion combinations.

The problem: collisions. Two different URLs can produce the same 7-character prefix. You check the database; if the code exists and points to a different URL, append a counter and rehash. This works but adds latency on collision.

Approach B: Counter-based ID

Use a globally unique, monotonically increasing counter. Convert the integer to base62. No collisions by definition.

The problem: predictability. Users can guess the next short code. Also, a single counter is a bottleneck. You solve the first issue by shuffling bits (a simple XOR cipher). You solve the second with a distributed ID generator, or by pre-allocating ranges to each API server.

Approach C: Pre-generated keys

A background service generates random short codes in bulk and stores them in a key pool. When the API needs a new code, it grabs one from the pool. No collision check needed at write time because codes are generated and deduplicated ahead of time.

This is the approach most production systems lean toward. It decouples code generation from the write path and eliminates collision handling entirely.

sequenceDiagram
  participant Client
  participant API as API Service
  participant KP as Key Pool
  participant DB as Database

  Client->>API: POST /api/shorten
  API->>KP: Get next available key
  KP-->>API: "abc123"
  API->>DB: INSERT (abc123, long_url, metadata)
  DB-->>API: OK
  API-->>Client: short.ly/abc123

Shorten flow using a pre-generated key pool. The API grabs a key, writes the mapping, and returns immediately.

For custom slugs, skip the key pool. Check the database for conflicts, insert if available, and return an error if the slug is taken.

6. Deep dive: redirect flow

Speed is everything here. Every millisecond of redirect latency is felt by every user who clicks a link.

sequenceDiagram
  participant Client
  participant LB as Load Balancer
  participant API as API Service
  participant Cache as Redis
  participant DB as Database

  Client->>LB: GET /abc123
  LB->>API: forward request
  API->>Cache: GET abc123
  alt Cache hit
      Cache-->>API: long_url
  else Cache miss
      API->>DB: SELECT long_url WHERE short_code = abc123
      DB-->>API: long_url
      API->>Cache: SET abc123 = long_url
  end
  API-->>Client: 302 Redirect to long_url
  API--)Analytics: async click event

Redirect flow. The cache absorbs the majority of reads. Misses fall through to the database and backfill the cache. Analytics are fire-and-forget.

With a warm cache, the p99 redirect latency sits around 2-5ms. The cache-miss path adds a database round trip, pushing it to 10-20ms, but this only happens for cold or rarely accessed links.

Use a TTL on cache entries that matches the link’s expiration. For popular links, the cache entry stays warm naturally. For long-tail links that get one click a year, letting the cache miss is fine.

7. Data model

erDiagram
  URL {
      string short_code PK
      string long_url
      string custom_slug
      datetime created_at
      datetime expires_at
      string user_id FK
  }
  CLICK {
      bigint id PK
      string short_code FK
      datetime clicked_at
      string referrer
      string user_agent
      string country
      string ip_hash
  }
  USER {
      string user_id PK
      string email
      datetime created_at
  }
  USER ||--o{ URL : creates
  URL ||--o{ CLICK : receives

Entity-relationship diagram. The URL table is the core mapping. Clicks are stored separately for analytics.

The URL table is small and read-heavy, which makes it an excellent candidate for a databases overview choice like PostgreSQL with read replicas. The CLICK table grows fast (potentially billions of rows) and is write-heavy. A time-series store or a columnar database like ClickHouse fits better here.

Sharding the URL table

At 900 GB over five years, a single database node can hold the data. But if you need horizontal scaling, shard by short_code. A hash of the short code determines the shard. This is where consistent hashing helps: it lets you add or remove shards without reshuffling the entire keyspace.

8. Handling expiration

Expired links should return HTTP 404, not redirect. Two strategies:

Lazy deletion: On every read, check expires_at. If expired, return 404 and optionally delete the record. Simple, but expired keys linger in storage.
Active cleanup: A background job scans for expired records and deletes them in batches. This reclaims storage and keeps the database lean.

Use both. The lazy check catches expired links immediately. The background job cleans up the rest.

9. Analytics pipeline

Every redirect fires an async event containing the short code, timestamp, referrer, user agent, and IP. These events flow into a message queue (Kafka) and are consumed by a processing service that enriches the data (geo lookup from IP, device parsing from user agent) and writes it to the click store.

This pipeline is fully decoupled from the redirect path. If the analytics service goes down, redirects keep working. Events queue up in Kafka and get processed when the service recovers.

For real-time dashboards, maintain a counter in Redis that increments on every redirect. For historical analytics, query the click store directly.

10. Trade-offs and alternatives

Decision	Option A	Option B	Recommendation
Short code generation	Hash + collision check	Pre-generated key pool	Key pool. Simpler write path, no collision handling.
Redirect status	301 (permanent)	302 (temporary)	302 if analytics matter; 301 for pure speed.
Database	SQL (PostgreSQL)	NoSQL (DynamoDB)	SQL for URL mappings (small, relational). NoSQL for clicks (high write volume).
Cache eviction	TTL-based	LRU	TTL matching link expiration, with LRU as a fallback for memory pressure.
Analytics	Synchronous writes	Async via queue	Always async. Never let analytics slow down redirects.

What about base62 vs base58?

Base62 uses [a-zA-Z0-9]. Base58 removes ambiguous characters (0, O, I, l). If users will type short codes manually, base58 is friendlier. If codes are always copy-pasted or clicked, base62 gives you a slightly larger keyspace.

11. What real systems actually do

Bitly uses a counter-based approach with distributed ID generation. Their system handles tens of billions of links and redirects hundreds of billions of clicks per month. They cache aggressively and use Kafka for analytics.

TinyURL is simpler. It hashes the input URL, checks for collisions, and stores everything in a relational database. At their scale, this works fine.

YouTube uses a similar short-code pattern for video IDs (11-character base64 strings). They pre-generate IDs and assign them at upload time.

The common thread: every production system avoids hash-and-check at write time. They either pre-generate keys or use counters. The write path must be fast and collision-free.

12. What comes next

A URL shortener is a gateway to several deeper system design topics:

Rate limiting: Prevent abuse by capping URL creation per user per minute. See rate limiting for patterns.
Spam detection: Run new URLs through a blocklist or ML classifier before accepting them.
Global distribution: Deploy API servers and caches in multiple regions. Route users to the nearest edge with GeoDNS.
Link previews: Crawl the destination URL at creation time to extract a title, description, and thumbnail for social sharing.

The core shortener is a weekend project. Making it reliable, fast, and observable at scale is where the real engineering lives.

← Back to all series