Search…
High Level Design · Part 4

Event-driven architecture

In this series (12 parts)
  1. Monolith vs microservices
  2. Microservice communication patterns
  3. Service discovery and registration
  4. Event-driven architecture
  5. Distributed data patterns
  6. Caching architecture patterns
  7. Search architecture
  8. Storage systems at scale
  9. Notification systems
  10. Real-time systems architecture
  11. Batch and stream processing
  12. Multi-region and global systems

Traditional request-driven architectures organize around “do this thing and tell me the result.” Event-driven architectures organize around “this thing happened.” That shift in perspective, from imperative commands to declarative facts, changes how services couple to each other, how data flows through the system, and how you reason about consistency.

Events vs commands vs queries

These three concepts serve fundamentally different purposes and mixing them up creates brittle systems.

A command is an intent to change state: “Place this order,” “Cancel that subscription.” Commands are directed at a specific service and can be rejected. A query reads state without changing it: “What is the order status?” “List all active subscriptions.” Queries are idempotent by definition. An event is a fact about something that already happened: “Order was placed,” “Payment was processed.” Events are immutable, they describe the past and cannot be undone.

graph LR
  subgraph Commands
      C1["PlaceOrder"]
      C2["CancelSubscription"]
  end
  subgraph Queries
      Q1["GetOrderStatus"]
      Q2["ListSubscriptions"]
  end
  subgraph Events
      E1["OrderPlaced"]
      E2["PaymentProcessed"]
  end
  C1 -->|"may fail"| SVC["Service"]
  Q1 -->|"read only"| SVC
  SVC -->|"publishes"| E1
  style Commands fill:#EF553B,color:#fff
  style Queries fill:#636EFA,color:#fff
  style Events fill:#00CC96,color:#fff

Commands request change, queries read state, events report facts. Services receive commands and queries but publish events.

The critical design principle: services expose commands and queries as their API, but they publish events as a side effect of processing commands. Consumers subscribe to events, not to another service’s internal state. This means the producer does not need to know who is listening, and new consumers can be added without changing the producer.

Event sourcing

Most systems store the current state of an entity: the order is CONFIRMED, the balance is $142.50. Event sourcing flips this around. Instead of storing current state, you store the sequence of events that led to the current state. The current state is derived by replaying those events.

graph LR
  subgraph EventStore["Event Store (append-only)"]
      E1["OrderCreated<br/>items: [A, B]"]
      E2["ItemRemoved<br/>item: B"]
      E3["PaymentReceived<br/>amount: $50"]
      E4["OrderShipped<br/>tracking: XY123"]
      E1 --> E2 --> E3 --> E4
  end
  subgraph CurrentState["Derived State"]
      S["Order #42<br/>items: [A]<br/>paid: $50<br/>status: SHIPPED"]
  end
  EventStore -->|"replay"| CurrentState

Event sourcing: the event store is the source of truth. Current state is a projection derived by replaying events.

This gives you a complete audit trail for free. You can answer “what did this order look like at 3pm yesterday?” by replaying events up to that timestamp. You can fix bugs in business logic and reprocess events to correct derived data. You can build new features that need historical data that you did not think to capture at design time.

The costs are significant. Replaying thousands of events to reconstruct state is slow, so you need snapshots (periodic checkpoints of the current state). The event schema becomes your most important contract, evolving it without breaking consumers requires careful versioning. Querying across entities is awkward because the event store is optimized for loading a single entity’s event stream, not for “find all orders over $100.”

CQRS: separating reads from writes

Command Query Responsibility Segregation (CQRS) uses different models for reading and writing data. The write model (command side) handles commands and enforces business rules. The read model (query side) is optimized for the specific queries your UI needs.

graph TD
  Client["Client"] -->|"commands"| CmdAPI["Command API"]
  Client -->|"queries"| QueryAPI["Query API"]
  CmdAPI --> WM["Write Model"]
  WM --> ES[("Event Store / Write DB")]
  ES -->|"events"| Proj["Projection Engine"]
  Proj --> RM[("Read DB (denormalized)")]
  QueryAPI --> RM

CQRS: writes go through the command model to the event store. A projection engine builds read-optimized views from the event stream.

CQRS pairs naturally with event sourcing but does not require it. You can use CQRS with a traditional database where the write model uses a normalized schema and the read model uses denormalized views or a search index.

The read model can be wildly different from the write model. Your write model might be a normalized relational schema enforcing referential integrity. Your read model might be a denormalized document in Elasticsearch optimized for full-text search, or a pre-computed aggregate in Redis for dashboard queries. You can have multiple read models for different use cases, each built from the same event stream.

The price is eventual consistency between the write and read models. After a command is processed, there is a propagation delay before the read model reflects the change. For many use cases, this delay (typically milliseconds to low seconds) is acceptable. For others, you need strategies like read-your-own-writes consistency where the client reads from the write model immediately after a command.

Eventual consistency as a feature

Many engineers treat eventual consistency as a problem to solve. In practice, it is a feature that enables scale and resilience. Consider a social media feed: if your new post appears in your followers’ feeds 500ms after you post it, nobody notices or cares. That 500ms of eventual consistency lets you build the feed service as an independent, highly scalable component that is not coupled to the posting service’s transaction.

The key insight is that most real-world business processes are already eventually consistent. When you deposit a check, the bank does not instantly move funds; it takes days. When you place an Amazon order, the charge to your card, the inventory reservation, and the warehouse pick all happen asynchronously over minutes or hours. The software should model the reality of the business, not impose artificial synchronization.

Design for eventual consistency by making the state transitions visible to users. Show “Order placed, confirmation pending” instead of blocking the UI until all downstream services complete. Provide status endpoints so clients can poll for updates. Use message queues to guarantee that events are eventually processed even if a consumer is temporarily down.

The outbox pattern

There is a subtle but critical problem with publishing events. After processing a command, you need to both update the database and publish an event. If you update the database and then publish, a crash between the two leaves you with updated data but no event. If you publish first, a crash leaves you with an event but no data update.

The outbox pattern solves this with a single database transaction. Instead of publishing the event directly, you write it to an “outbox” table in the same transaction that updates the business data. A separate process reads the outbox table and publishes events to the message broker.

sequenceDiagram
  participant App as Order Service
  participant DB as Database
  participant Relay as Outbox Relay
  participant MQ as Message Broker
  participant Consumer as Inventory Service
  App->>DB: BEGIN TRANSACTION
  App->>DB: INSERT INTO orders (...)
  App->>DB: INSERT INTO outbox (event_type, payload)
  App->>DB: COMMIT
  Note over DB,Relay: Relay polls outbox or uses CDC
  Relay->>DB: SELECT * FROM outbox WHERE published = false
  DB-->>Relay: [OrderCreated event]
  Relay->>MQ: Publish OrderCreated
  MQ-->>Relay: ACK
  Relay->>DB: UPDATE outbox SET published = true
  MQ->>Consumer: OrderCreated

Outbox pattern: the event is written to the database in the same transaction as the business data. A relay process publishes it to the broker.

The outbox relay can poll the table or use change data capture (CDC) to react to new rows. Debezium is the most popular CDC tool for this pattern, tailing the database’s transaction log and publishing changes directly to Kafka. This eliminates polling overhead and provides lower latency.

The outbox guarantees at-least-once delivery: the relay might publish an event and crash before marking it as sent, causing a re-publish on restart. This means consumers must be idempotent.

Idempotent consumers

An idempotent consumer produces the same result whether it processes a message once or ten times. This is non-negotiable in event-driven systems because at-least-once delivery means duplicates will happen.

The standard approach is a deduplication table. Each event carries a unique ID. Before processing, the consumer checks whether it has already seen that ID. If yes, it skips the event. If no, it processes the event and records the ID, both in the same transaction.

BEGIN TRANSACTION
  -- Check for duplicate
  SELECT 1 FROM processed_events WHERE event_id = 'evt-123';
  -- If not found, process and record
  INSERT INTO processed_events (event_id) VALUES ('evt-123');
  UPDATE inventory SET quantity = quantity - 1 WHERE sku = 'WIDGET';
COMMIT

For operations that are naturally idempotent, no deduplication is needed. “Set balance to 100"isidempotent."Add100" is idempotent. "Add 10 to balance” is not. When possible, design your events to carry absolute state rather than deltas.

Putting it together

A production event-driven system typically combines several of these patterns. Commands enter through an API, get validated, and produce events stored in the event store (event sourcing). The outbox pattern ensures events reach the message broker reliably. Projection engines build read models (CQRS). Consumers process events idempotently, publishing their own events to drive further processing.

graph TD
  API["API Gateway"] -->|"command"| CS["Command Service"]
  CS --> DB[("Write DB + Outbox")]
  DB -->|"CDC"| Broker["Message Broker"]
  Broker --> Proj["Projection Service"]
  Broker --> Analytics["Analytics Consumer"]
  Broker --> Notif["Notification Consumer"]
  Proj --> ReadDB[("Read DB")]
  API -->|"query"| QS["Query Service"]
  QS --> ReadDB

Full event-driven architecture: commands write to the database with outbox, CDC publishes to the broker, consumers build read models and trigger side effects.

What comes next

Events flow between services, but each service needs its own data. The next challenge is managing data ownership across service boundaries without falling into the shared database trap. Continue with distributed data patterns to learn about data ownership, change data capture, and denormalization strategies.

Start typing to search across all content
navigate Enter open Esc close