Search…

Design a chat application

In this series (18 parts)
  1. Design a URL shortener
  2. Design a key-value store
  3. Design a rate limiter
  4. Design a web crawler
  5. Design a notification system
  6. Design a news feed
  7. Design a chat application
  8. Design a video streaming platform
  9. Design a music streaming service
  10. Design a ride-sharing service
  11. Design a food delivery platform
  12. Design a hotel booking platform
  13. Design a search engine
  14. Design a distributed message queue
  15. Design a code deployment system
  16. Design a payments platform
  17. Design an ad click aggregation system
  18. Design a distributed cache

WhatsApp delivers roughly 100 billion messages per day. Slack handles millions of concurrent connections across thousands of organizations. The core problem is deceptively simple: get a message from one person to another, fast. The engineering challenge is doing it reliably at scale while preserving message order, confirming delivery, and storing history so users can scroll back through years of conversation.

This case study designs a chat system supporting 500 million daily active users. We will cover 1-1 messaging, group chat, delivery receipts, and message history. The design builds on networking concepts and real-time systems patterns.

Requirements

Functional requirements

  1. 1-1 messaging. Users send text messages to other users. Messages are delivered in real time when the recipient is online.
  2. Group messaging. Users create groups of up to 500 members. Every member receives every message.
  3. Delivery receipts. Messages transition through sent, delivered, and read states. Senders see these transitions in real time.
  4. Message history. Users retrieve past messages on demand. History is retained for 5 years.
  5. Online presence. Users see whether their contacts are currently online.

Non-functional requirements

  • Scale. 500M DAU, 20B messages per day.
  • Latency. Messages delivered within 300ms for online recipients.
  • Ordering. Messages within a conversation appear in the order they were sent.
  • Durability. Zero message loss. Once the server acknowledges a message, it is persisted.
  • Availability. 99.99% uptime. Users tolerate slight delays but not missing messages.

Capacity estimation

Message volume. 500M DAU sending an average of 40 messages per day gives 20B messages/day, or roughly 230K messages per second. Peak traffic is 3x average, so we design for 700K messages/second.

Message size. Average message is 200 bytes of text plus 100 bytes of metadata (sender, timestamp, conversation ID, message ID). That is 300 bytes per message.

Storage. 20B messages x 300 bytes = 6 TB per day. Over 5 years, that is roughly 11 PB of message data. We need a storage layer built for append-heavy writes with efficient range scans.

Bandwidth. Inbound: 700K messages/sec x 300 bytes = 210 MB/s. Outbound is higher because group messages fan out. If the average fan-out is 5 (mix of 1-1 and group), outbound is about 1 GB/s.

Connections. At peak, assume 100M concurrent WebSocket connections. At 20 KB memory per connection, that is 2 TB of memory across the WebSocket fleet. With 100K connections per server, we need at least 1,000 WebSocket servers.

QPS breakdown. Send message: 700K/s. Delivery receipts (sent + delivered + read): roughly 2M/s. Presence heartbeats: 100M users pinging every 30 seconds gives 3.3M/s. Total QPS hitting the backend services is around 6M/s at peak.

High-level architecture

graph TD
  C1["Mobile/Web Clients"] -->|WebSocket| LB["Load Balancer"]
  C1 -->|HTTPS| APIGW["API Gateway"]
  LB --> WSF["WebSocket Server Fleet"]
  APIGW --> CS["Chat Service"]
  CS --> MQ["Message Queue (Kafka)"]
  MQ --> WSF
  MQ --> MS["Message Storage Service"]
  MS --> DB["Message DB (Cassandra)"]
  WSF <-->|Pub/Sub| REDIS["Redis Pub/Sub"]
  CS --> CR["Connection Registry (Redis)"]
  WSF --> CR
  CS --> PS["Presence Service"]
  PS --> PCACHE["Presence Cache (Redis)"]
  CS --> GS["Group Service"]
  GS --> GDB["Group Metadata DB"]

High-level architecture of the chat system. Clients maintain WebSocket connections for real-time delivery, with Kafka decoupling message ingestion from storage and fan-out.

The architecture separates real-time delivery from durable storage. Clients establish WebSocket connections through load balancers for receiving messages. Sending a message goes through the API gateway to the Chat Service, which validates the request, writes to message queues (Kafka), and returns an acknowledgment. Downstream consumers handle persistence and delivery independently.

The Connection Registry in Redis maps each online user to the WebSocket server holding their connection. When the Chat Service needs to deliver a message, it looks up the recipient’s server and publishes through Redis Pub/Sub. If the recipient is offline, the message is stored and delivered when they reconnect.

Deep dive: 1-1 message delivery

The message delivery path is the most latency-sensitive flow in the system. Let us trace a message from sender to recipient.

sequenceDiagram
  participant A as Alice (Sender)
  participant WS1 as WS Server 1
  participant CS as Chat Service
  participant K as Kafka
  participant CR as Connection Registry
  participant R as Redis Pub/Sub
  participant WS2 as WS Server 2
  participant B as Bob (Recipient)
  participant DB as Message DB

  A->>WS1: Send message (via WebSocket)
  WS1->>CS: Forward message
  CS->>CS: Validate, assign msg_id + timestamp
  CS->>K: Publish to message topic
  CS-->>WS1: ACK (msg_id, sent)
  WS1-->>A: Sent receipt
  K->>DB: Consumer persists message
  CS->>CR: Lookup Bob's WS server
  CR-->>CS: WS Server 2
  CS->>R: Publish to Bob's channel
  R->>WS2: Deliver message
  WS2->>B: Push message (via WebSocket)
  B-->>WS2: Delivered ACK
  WS2-->>CS: Delivered status
  CS-->>WS1: Delivered receipt
  WS1-->>A: Delivered indicator

Sequence diagram for 1-1 message delivery. The sender gets a sent receipt immediately after Kafka publish, while persistence and delivery happen asynchronously.

Several design decisions stand out here:

Message ID generation. Each message gets a unique, time-ordered ID. We use a Snowflake-style ID that encodes a timestamp in the high bits. This gives us globally unique IDs that are sortable by time, which simplifies ordering and pagination of message history.

Sent receipt timing. The sender receives a “sent” acknowledgment as soon as the message is written to Kafka, not after it hits the database. Kafka’s replication provides durability. This keeps the perceived latency low (under 50ms) while guaranteeing the message will not be lost.

Offline delivery. If the Connection Registry shows Bob is offline, the Chat Service skips the pub/sub step. When Bob reconnects, the WebSocket server queries the Message Storage Service for all undelivered messages since Bob’s last seen timestamp and pushes them down the connection.

Connection Registry consistency. The registry uses Redis with a TTL of 30 seconds on each entry. WebSocket servers send heartbeats every 10 seconds to refresh the TTL. If a server crashes, its entries expire within 30 seconds. During this window, messages may be published to a dead server. The delivery retry logic catches these failures.

Deep dive: group message fan-out

Group messaging introduces a fan-out problem. When a user sends a message to a 200-person group, the system must deliver that message to 199 other users. At scale, large groups create bursty write amplification.

sequenceDiagram
  participant S as Sender
  participant CS as Chat Service
  participant GS as Group Service
  participant K as Kafka
  participant FO as Fan-out Workers
  participant CR as Connection Registry
  participant R as Redis Pub/Sub
  participant WS as WS Servers
  participant DB as Message DB

  S->>CS: Send group message
  CS->>GS: Get group members
  GS-->>CS: Member list (200 users)
  CS->>K: Publish message + member list
  K->>DB: Persist message (single copy)
  K->>FO: Fan-out job
  FO->>CR: Batch lookup online members
  CR-->>FO: Server mappings
  FO->>R: Publish to each online member's channel
  R->>WS: Deliver to connected users
  WS->>WS: Push to online recipients
  Note over FO: Offline members get message on reconnect via pull

Group message fan-out. The message is stored once. Fan-out workers handle delivery to each member’s WebSocket server independently.

We use a write-fan-out-on-delivery strategy rather than writing a copy per recipient. The message is stored exactly once in the Message DB, keyed by (group_id, message_id). The fan-out workers handle real-time delivery to online members. Offline members pull unread messages when they reconnect by querying the Message DB for messages in their groups since their last sync timestamp.

Why not write-fan-out-on-send? Writing a copy per recipient (like early Twitter’s home timeline) would mean 200 writes per group message. At 20B messages/day with an average fan-out of 5, that would be 100B writes/day. Storing once and reading on demand cuts storage writes by 80%.

Large group optimization. For groups over 100 members, the fan-out workers partition the member list and process batches in parallel. This prevents a single 500-member group from creating a hot partition in Redis Pub/Sub. The workers also rate-limit fan-out to avoid overwhelming individual WebSocket servers.

Deep dive: message delivery status

Delivery receipts sound simple but add significant complexity. Each message moves through a state machine.

stateDiagram-v2
  [*] --> Sending: Client sends message
  Sending --> Sent: Server ACKs (Kafka write)
  Sent --> Delivered: Recipient device receives
  Delivered --> Read: Recipient opens conversation
  Sent --> Failed: Delivery timeout (72h)
  Sending --> Failed: Server rejects
  Failed --> [*]
  Read --> [*]

Message delivery state machine. Messages progress from sending through sent, delivered, and read states.

Sent means the server accepted the message and it is durably stored. The client can retry until it gets this acknowledgment. Retries use the client-generated message ID for idempotency, so duplicate sends do not create duplicate messages.

Delivered means the recipient’s device received the message. The recipient’s WebSocket server sends a delivery acknowledgment back to the Chat Service, which updates the message status and notifies the sender through their WebSocket connection.

Read receipts are trickier. The client sends a read receipt when the user actually views the conversation, not just when the app receives data. To avoid sending a read receipt per message, we batch them. The client sends “I have read all messages in conversation X up to message_id Y.” This single update covers potentially hundreds of messages.

Failure handling. If a message cannot be delivered within 72 hours (recipient never comes online), it stays in “sent” state. The message is still stored and will be delivered whenever the recipient reconnects, but the sender does not get a “delivered” indicator. The client can show a clock icon for messages stuck in “sent” for more than a few minutes.

Group read receipts. In group chats, tracking read status per member per message is expensive. Most systems simplify this. WhatsApp shows individual read receipts for groups, but only surfaces them on demand (tap to see who read it). We store last_read_id per member per group and compute read-by counts lazily when the sender requests them.

Status storage. We do not store per-message delivery status in the Message DB. Instead, we store a last_delivered_id and last_read_id per user per conversation. This gives us O(1) storage for delivery status regardless of message count. To determine if a specific message has been read, we compare its ID against last_read_id.

Message storage and history

Messages are stored in Cassandra, sharded by conversation_id. The partition key is conversation_id and the clustering key is message_id (which is time-ordered). This gives us efficient range queries: “get the 50 most recent messages in conversation X” is a single partition scan with a limit.

Schema design:

messages (
    conversation_id  UUID,      -- partition key
    message_id       BIGINT,    -- clustering key (Snowflake ID, time-ordered)
    sender_id        UUID,
    content          TEXT,
    content_type     TINYINT,   -- text, image, file, etc.
    created_at       TIMESTAMP,
    PRIMARY KEY (conversation_id, message_id)
) WITH CLUSTERING ORDER BY (message_id DESC);

Hot conversations. A celebrity’s fan group with 500 members and thousands of messages per hour creates a hot partition. We handle this by splitting large conversations across multiple partitions using a bucket suffix: conversation_id:bucket_N. The client-facing API abstracts this away.

Data tiering. Messages older than 6 months are migrated to cold storage (S3 with a columnar format). The Message Storage Service checks hot storage first and falls back to cold storage for older history requests. This keeps the Cassandra cluster size manageable. At 6 TB/day, keeping 6 months hot means roughly 1.1 PB in Cassandra, which is large but feasible across a few hundred nodes.

Sync protocol. When a client reconnects after being offline, it sends its last known message_id per conversation. The server returns all newer messages. For users offline for a long time, this could be thousands of messages across hundreds of conversations. We cap the initial sync at 100 messages per conversation and let the client paginate backward on demand.

Trade-offs and alternatives

Push vs. pull for delivery. We use push (server sends to client via WebSocket) for online users and pull (client requests on reconnect) for offline users. A pure push model would require queuing messages per offline user, which consumes memory. A pure pull model (long polling) adds latency and wastes bandwidth. The hybrid approach balances both.

Single message store vs. inbox per user. Our design stores each message once and fans out on delivery. An alternative is writing to each recipient’s inbox. The inbox model makes reads faster (no fan-out on read) but writes slower (fan-out on write). For a chat app where most messages are read soon after sending, the single-store model wins because it avoids redundant writes. For an app like email where messages are read repeatedly over time, the inbox model is better.

Kafka vs. direct delivery. We route messages through Kafka instead of delivering directly from the Chat Service. This adds 5-10ms of latency but decouples ingestion from delivery. If the WebSocket fleet is overloaded or a server crashes, messages are buffered in Kafka. Without this buffer, a WebSocket server failure would mean lost messages.

Redis Pub/Sub vs. dedicated message broker for inter-server routing. Redis Pub/Sub is fast but fire-and-forget. If a WebSocket server misses a published message (brief disconnection from Redis), that message is gone. We accept this because Kafka provides the durable backup. When a WebSocket server reconnects to Redis, it pulls any missed messages from Kafka for its connected users.

Ordering guarantees. We guarantee ordering within a conversation but not across conversations. Within a conversation, the Snowflake message ID (assigned server-side) determines order. If two users send messages to the same group at nearly the same time, the Chat Service serializes them. Cross-conversation ordering (like “show all chats sorted by latest message”) uses the message timestamp, which is good enough since sub-second differences between conversations do not matter to users.

What real systems actually do

WhatsApp uses Erlang for its connection servers, handling over 2 million connections per server. Messages are stored in Mnesia (Erlang’s built-in distributed database) for transient storage and custom storage for long-term persistence. End-to-end encryption means servers never see plaintext content.

Slack originally used a MySQL-backed architecture but moved to Vitess for sharding. Their WebSocket layer (called “flannel”) routes messages through a custom pub/sub system. They cache recent messages aggressively and use lazy loading for history.

Discord stores messages in Cassandra (later migrated to ScyllaDB for better tail latency). Their fan-out system handles servers with millions of members by treating large servers differently: instead of pushing to each member, they let clients pull from a shared timeline.

Telegram uses a custom binary protocol (MTProto) instead of WebSockets. Their distributed storage layer replicates messages across three data centers for durability.

The common pattern across all these systems: separate the connection layer from the message processing layer, use some form of pub/sub for inter-server routing, and design storage for write-heavy append-only workloads.

What comes next

This design covers the core messaging flow. A production chat system needs several additional features:

  • Media messages. Images, videos, and files require a separate upload service with CDN distribution. The message body contains a reference (URL + encryption key) rather than the actual media.
  • End-to-end encryption. The Signal Protocol provides forward secrecy and deniability. Key exchange happens through the server, but the server never holds decryption keys.
  • Search. Full-text search across message history requires an Elasticsearch cluster with messages indexed asynchronously from Kafka.
  • Notifications. Offline users need push notifications (APNs, FCM). A notification service consumes from Kafka and sends pushes with rate limiting to avoid spamming users.
  • Multi-device sync. Users expect the same conversation state across phone, tablet, and desktop. Each device maintains a sync cursor and pulls messages it has not yet seen.

The next case study in this series tackles designing a notification system, which handles several of the cross-cutting concerns we deferred here.

Start typing to search across all content
navigate Enter open Esc close