Message queues and event streaming
In this series (20 parts)
- What is system design and why it matters
- Estimations and back-of-envelope calculations
- Scalability: vertical vs horizontal scaling
- CAP theorem and distributed system tradeoffs
- Consistency models
- Load balancing
- Caching: strategies and patterns
- Content Delivery Networks
- Databases: SQL vs NoSQL and when to use each
- Database replication
- Database sharding and partitioning
- Consistent hashing
- Message queues and event streaming
- API design: REST, GraphQL, gRPC
- Rate limiting and throttling
- Proxies: forward and reverse
- Networking concepts for system design
- Reliability patterns: timeouts, retries, circuit breakers
- Observability: logging, metrics, tracing
- Security in system design
A user clicks “Place Order.” The request hits your API gateway, which calls the inventory service, the payment service, the notification service, and the shipping service. Each call blocks until the downstream service responds. If payment takes 3 seconds, the user waits 3 seconds. If the notification service is down, the entire order fails.
This is synchronous communication at its worst. Every service in the chain must be alive, fast, and correct at the exact same moment. That coupling looks harmless at low traffic. At 10,000 orders per minute, it becomes the single largest source of cascading failures in your system.
Message queues break this coupling. Instead of calling services directly, producers drop messages onto a queue, and consumers pick them up whenever they are ready. The producer does not wait. The consumer does not rush. The queue absorbs spikes, buffers failures, and lets each component scale independently.
Synchronous vs asynchronous communication
In a synchronous system, service A sends a request to service B and blocks until B responds. The latency of the entire operation equals the sum of every hop in the chain. If you have five services chained together at 50ms each, the user sees 250ms at minimum. Add one slow service at 800ms and you are suddenly at 1050ms.
Asynchronous communication flips this model. Service A publishes a message and immediately returns. It does not know or care when the message gets processed. The response, if one exists, arrives later through a separate channel.
This is not always better. If a user submits a credit card and you need to confirm the charge before showing a receipt, you need synchronous processing for that specific step. The key insight is that most work in a distributed system does not require an immediate answer. Sending a confirmation email, updating analytics, generating a PDF invoice: none of these need to block the user.
The pattern is straightforward: keep the critical synchronous path as short as possible, then push everything else onto a queue.
sequenceDiagram participant U as User participant API as API Gateway participant Q as Message Queue participant Pay as Payment Service participant Notify as Notification Service participant Ship as Shipping Service U->>API: Place Order API->>Pay: Charge card (sync) Pay-->>API: Success API->>Q: Publish OrderPlaced event API-->>U: Order confirmed Q->>Notify: Send confirmation email Q->>Ship: Schedule shipment
The API gateway handles payment synchronously because the user needs confirmation. Everything else flows through the queue asynchronously. The user sees a response in under 200ms instead of waiting for email and shipping services.
Queues vs streams
Message queues and event streams both move data between services, but they serve different purposes and make different guarantees.
A message queue works like a task list. A producer adds a message, exactly one consumer picks it up, processes it, and acknowledges completion. Once acknowledged, the message disappears. RabbitMQ is the canonical example. It routes messages through exchanges to queues, supports priority ordering, and handles complex routing patterns well. If you need “send this task to exactly one worker,” a queue fits.
An event stream works like an append-only log. Producers write events to the end of the log. Consumers read from whatever position they choose. Events are not deleted after consumption. They stay in the log for a configurable retention period (often 7 days, sometimes indefinitely). Apache Kafka is the canonical example here.
The distinction matters for two reasons. First, streams allow multiple independent consumers to read the same data without duplicating it. Your analytics service, your search indexer, and your recommendation engine can all consume the same “OrderPlaced” events from the same log, each tracking their own position. With a traditional queue, you would need to publish the message to three separate queues. Second, streams let you replay history. If you deploy a new service next month and it needs to process all orders from the last 30 days, it reads the log from the beginning. A queue cannot do this because messages are deleted after processing.
Choose a queue when you have discrete tasks that need exactly-one processing. Choose a stream when you have events that multiple consumers need to read, or when you need the ability to reprocess historical data.
Delivery semantics: how many times does a message arrive?
Every messaging system must choose a delivery guarantee, and each choice has real consequences.
At-most-once delivery means the system sends the message and never retries. If the consumer crashes mid-processing, the message is lost. This is fast and simple. Use it for metrics where losing 0.1% of data points is acceptable.
At-least-once delivery means the system retries until it gets an acknowledgment. The message will definitely arrive, but it might arrive more than once. If a consumer processes a message but crashes before sending the ACK, the broker resends the message. Now you have processed it twice. This is the most common default because losing data is usually worse than processing it twice.
Exactly-once delivery means each message is processed precisely one time, no more, no less. This is what everyone wants and almost nobody truly gets. Kafka supports exactly-once semantics within its own ecosystem through idempotent producers and transactional writes, but the guarantee breaks the moment you interact with an external system. You can process a Kafka message exactly once and write the result to a Kafka topic exactly once. But if that processing includes an HTTP call to a payment gateway, the gateway might receive the request twice. True end-to-end exactly-once requires idempotency at every boundary.
In practice, design for at-least-once delivery and make your consumers idempotent. Store a message ID or an idempotency key alongside your processed results. Before processing, check whether you have already handled that ID. If yes, skip it. This pattern handles duplicates gracefully without requiring the complexity of distributed transactions.
Kafka vs RabbitMQ: choosing the right tool
RabbitMQ and Kafka solve overlapping but distinct problems. Understanding when to use each one saves you from architectural regret.
RabbitMQ is a message broker. It excels at routing individual messages to specific consumers based on flexible rules. It supports multiple exchange types (direct, topic, fanout, headers), message priorities, and per-message TTLs. It removes messages after acknowledgment. A single RabbitMQ node handles roughly 20,000 to 50,000 messages per second depending on message size and persistence settings. RabbitMQ works best for task distribution, RPC-style communication, and scenarios where message routing logic is complex.
Kafka is a distributed commit log. It excels at high-throughput, ordered event streaming. Messages are written to partitioned topics and retained for a configurable period. A well-tuned Kafka cluster handles 500,000 to 2,000,000 messages per second. Kafka works best for event sourcing, change data capture, log aggregation, and any pattern where multiple consumers need the same data or where you need replayability.
If your primary need is “distribute tasks to workers and forget about them,” pick RabbitMQ. If your primary need is “stream a high volume of events that multiple services consume independently,” pick Kafka. Many production systems run both: Kafka for the event backbone and RabbitMQ for specific task routing at the edges.
Consumer groups and parallel processing
A single consumer reading from a topic with 100,000 events per second will fall behind. Consumer groups solve this by letting multiple consumers share the workload from a single topic.
In Kafka, a consumer group is a set of consumers that collectively read from a topic. Each partition within the topic is assigned to exactly one consumer in the group. If a topic has 12 partitions and your consumer group has 4 consumers, each consumer reads from 3 partitions. If you scale to 12 consumers, each reads from exactly 1 partition. If you add a 13th consumer, it sits idle because there are no unassigned partitions.
This is why partition count matters at design time. If you create a topic with 6 partitions, you can never have more than 6 consumers in a single group processing in parallel. Choose a partition count that reflects your expected peak parallelism. 12 to 24 partitions is a reasonable default for most workloads. You can increase it later, but rebalancing existing data across new partitions is disruptive.
graph LR P1[Producer 1] --> T[Topic: Orders] P2[Producer 2] --> T T --> PA[Partition 0] T --> PB[Partition 1] T --> PC[Partition 2] PA --> C1[Consumer A<br/>Group: analytics] PB --> C1 PC --> C2[Consumer B<br/>Group: analytics] PA --> C3[Consumer C<br/>Group: search] PB --> C3 PC --> C3
Two consumer groups reading from the same topic independently. The analytics group splits the load across two consumers. The search group uses a single consumer reading all partitions. Neither group interferes with the other.
Multiple consumer groups can read from the same topic without affecting each other. Group “analytics” and group “search-indexer” each maintain their own offset (position in the log). This is the fan-out pattern that makes Kafka so powerful for event-driven architectures. You publish once and consume many times, with each group tracking its own progress. Compare this to a traditional queue where you would need to duplicate every message into separate queues for each downstream system.
Back-pressure: when consumers cannot keep up
Traffic spikes happen. A flash sale pushes your order rate from 1,000 to 50,000 per minute. Your producers are fine because writing to a queue is fast. But your consumers, which might call a database and an external API per message, process at a fixed rate. The queue starts growing.
Back-pressure is any mechanism that prevents a slow consumer from being overwhelmed. Without it, you get unbounded queue growth, memory exhaustion, and eventually dropped messages or crashed brokers.
The simplest form of back-pressure is a bounded queue. When the queue reaches its maximum size (say 100,000 messages), producers block or receive errors until consumers drain some messages. This pushes the problem upstream, which is usually the right place for it. If your system cannot process messages fast enough, the correct response is to slow down intake, not to silently drop work.
Kafka handles back-pressure implicitly. Consumers pull messages at their own pace. If a consumer is slow, it falls behind, and its offset lags further from the log head. The messages are safe on disk. As long as the consumer catches up before the retention period expires (commonly 7 days), no data is lost. You monitor the consumer lag metric: the gap between the latest message and the consumer’s current position. A lag of a few thousand messages during a spike is normal. A lag that grows continuously means you need more consumers or faster processing.
For RabbitMQ, set a prefetch count. This limits how many unacknowledged messages a consumer holds at once. A prefetch of 10 means the broker sends at most 10 messages to a consumer before waiting for ACKs. This prevents a fast broker from flooding a slow consumer. Tune the prefetch based on your consumer’s throughput: too low wastes broker round-trips, too high risks consumer memory pressure.
Other back-pressure strategies include rate limiting at the producer (accept only N messages per second), shedding load by dropping low-priority messages during spikes, and scaling consumers horizontally using autoscaling rules tied to queue depth.
Dead letter queues and poison messages
Some messages will always fail. Maybe the payload is malformed. Maybe a required downstream service has been decommissioned. A consumer retries the message 3 times, fails every time, and the message goes back to the front of the queue. Now it blocks every other message behind it. This is a poison message.
Dead letter queues (DLQs) solve this. After N failed processing attempts (typically 3 to 5), the broker moves the message to a separate dead letter queue instead of retrying it again. Processing continues for all other messages. An engineer inspects the DLQ later, fixes the issue, and replays the failed messages.
Always set up a DLQ. Always set a max retry count. Monitor your DLQ depth as an alerting metric. A DLQ that suddenly fills up is an early warning that something broke in your processing pipeline. These patterns work hand in hand with the reliability patterns like circuit breakers and retries with exponential backoff.
Ordering guarantees
Message ordering is one of the trickiest aspects of asynchronous systems. Global ordering across all messages in a topic is expensive and usually unnecessary. What matters is ordering within a logical entity.
Kafka guarantees ordering within a partition. If you send messages for Order #1234 to partition 7, they arrive at consumers in exactly the order you produced them. Kafka assigns partitions using a hash of the message key. If you set the order ID as the key, all events for the same order land in the same partition. This gives you per-entity ordering without requiring global ordering.
RabbitMQ provides ordering within a single queue from a single producer, but concurrent consumers break that ordering because messages are processed in parallel and complete at different times.
If you need strict ordering, use Kafka with a carefully chosen partition key. If you only need “approximate” ordering (process events in roughly the order they arrived), most queue systems work fine out of the box.
Connecting the pieces
Message queues and event streams are infrastructure. They do not do useful work on their own. Their value comes from enabling architectural patterns that would be fragile or impossible with direct service-to-service calls.
Consistent hashing determines which partition or broker node handles a given message key. Reliability patterns like circuit breakers protect consumers from downstream failures. Your API design determines the shape of the messages flowing through the system. And event-driven architecture uses streaming as its backbone, building entire systems around the concept of immutable event logs.
The combination of asynchronous messaging, idempotent consumers, dead letter queues, and consumer groups gives you a system that handles traffic spikes without dropping requests, retries failures without blocking other work, and scales each component independently. Most production systems at meaningful scale rely on some form of message passing. Once you understand the guarantees and trade-offs, the design decisions become clear.
What comes next
Messages define how services communicate, but the contracts themselves need careful design. Next, we look at API design: how to shape endpoints, version contracts, and build interfaces that age well as your system grows.