Dec 1, 2025 · 22 min read · System Design

Design a hotel booking platform

In this series (18 parts)

A hotel booking platform connects travelers with properties. The hard part is not listing hotels. It is guaranteeing that when two people search for the same room on the same night, exactly one of them gets it, and the other sees an updated result within seconds. This article builds on the LLD hotel booking design and scales it to a global platform handling millions of properties and billions of search queries.

Requirements

Functional:

Hotel partners list properties with room types, photos, amenities, and nightly rates.
Travelers search by city, date range, guest count, and filters (price, star rating, amenities).
Travelers view detailed property pages with availability and pricing for their dates.
Travelers book rooms with immediate confirmation. No double bookings.
Travelers cancel bookings subject to the property’s cancellation policy.
The platform processes payments from travelers and schedules payouts to hotel partners.
Partners manage a pricing calendar with seasonal rates and promotions.

Non-functional:

Metric	Target
Properties	1M active listings
Search QPS	~600 (50M searches/day)
Bookings/day	2M
Booking confirmation latency	< 2s p99
Search latency	< 500ms p95
Availability	99.99% for booking path
Data durability	Zero lost confirmed bookings

Capacity estimation

Storage:

1M properties, average 5 room types each: 5M room type records.
Each room type tracks inventory per night. 365 days of future inventory: 5M x 365 = ~1.8B inventory rows. At ~50 bytes per row, that is ~90 GB.
2M bookings/day, ~500 bytes each: ~1 GB/day, ~365 GB/year for booking records.
Property metadata (descriptions, photos, amenities): ~2 KB per property, ~2 GB total. Photo storage in object storage is separate, roughly 50 photos per property at 500 KB each = ~25 TB.

Bandwidth:

Search results return 20 properties per page with thumbnails. Average response ~200 KB. At 600 QPS: ~120 MB/s outbound for search alone.
Booking API calls are small (~2 KB request, ~5 KB response). At ~23 bookings/s: negligible.

QPS breakdown:

Endpoint	QPS
Search	600
Property detail	1,200
Check availability	400
Create booking	23
Cancel booking	3

The read-to-write ratio is roughly 100:1. This heavily skews toward caching the search and detail paths.

High level architecture

graph TD
  Client[Client - Web/Mobile]
  CDN[CDN - Static Assets + Images]
  LB[Load Balancer]
  API[API Gateway]
  SearchSvc[Search Service]
  BookingSvc[Booking Service]
  PricingSvc[Pricing Service]
  PaymentSvc[Payment Service]
  NotifSvc[Notification Service]
  PartnerSvc[Partner Portal Service]

  ES[(Elasticsearch - Search Index)]
  PG[(PostgreSQL - Bookings)]
  Redis[(Redis - Inventory Locks + Cache)]
  Kafka[Kafka - Event Bus]
  S3[(Object Storage - Photos)]

  Client --> CDN
  Client --> LB
  LB --> API
  API --> SearchSvc
  API --> BookingSvc
  API --> PricingSvc
  API --> PaymentSvc
  API --> PartnerSvc

  SearchSvc --> ES
  SearchSvc --> Redis
  BookingSvc --> PG
  BookingSvc --> Redis
  PricingSvc --> PG
  PaymentSvc --> Kafka
  PartnerSvc --> PG
  PartnerSvc --> S3
  NotifSvc --> Kafka

  BookingSvc --> Kafka
  Kafka --> NotifSvc

High level architecture of the hotel booking platform. Read-heavy paths (search, detail) are served through Elasticsearch and Redis. Write paths (booking, payment) go through PostgreSQL with distributed locks.

The system splits into six services. Search handles the bulk of traffic and runs against Elasticsearch, completely decoupled from the transactional booking path. The booking service owns the critical write path and talks to PostgreSQL for strong consistency. Redis serves two roles: a cache layer for hot data and a distributed lock provider for inventory holds.

Deep dive: booking data model

The databases overview article covers the relational patterns we use here. The booking data model centers on an inventory table that tracks how many rooms of each type are available per night.

erDiagram
  HOTEL ||--o{ ROOM_TYPE : has
  ROOM_TYPE ||--o{ INVENTORY : tracks
  INVENTORY ||--o{ BOOKING_LINE : reserves
  BOOKING ||--|{ BOOKING_LINE : contains
  BOOKING }o--|| USER : "made by"
  BOOKING ||--|| PAYMENT : has
  HOTEL {
      string hotel_id PK
      string name
      string city
      float latitude
      float longitude
      int star_rating
  }
  ROOM_TYPE {
      string room_type_id PK
      string hotel_id FK
      string name
      int max_guests
      int total_count
  }
  INVENTORY {
      string inventory_id PK
      string room_type_id FK
      date date
      int total_rooms
      int booked_rooms
      int price_cents
  }
  BOOKING {
      string booking_id PK
      string user_id FK
      date check_in
      date check_out
      string status
      int total_cents
      timestamp created_at
  }
  BOOKING_LINE {
      string line_id PK
      string booking_id FK
      string inventory_id FK
      int quantity
  }
  PAYMENT {
      string payment_id PK
      string booking_id FK
      string status
      int amount_cents
      string provider_ref
  }
  USER {
      string user_id PK
      string email
      string name
  }

Entity relationship diagram for the booking data model. The INVENTORY table is the heart of availability tracking, with one row per room type per night.

The key design choice is the INVENTORY table. Instead of tracking individual physical rooms, we track counts: total_rooms and booked_rooms per room type per night. A room is available when booked_rooms < total_rooms. This is simpler than assigning specific room numbers at booking time. The hotel assigns physical rooms closer to check-in.

For database sharding, the inventory table is sharded by hotel_id. All inventory rows for a single hotel live on the same shard. This ensures that a booking transaction, which updates multiple inventory rows for consecutive nights at the same hotel, stays within a single shard and avoids distributed transactions.

Deep dive: booking flow with double-booking prevention

The hardest problem in this system is preventing two concurrent bookings from overselling the same room type on the same night. Here is the sequence:

sequenceDiagram
  participant C as Client
  participant API as API Gateway
  participant BS as Booking Service
  participant R as Redis
  participant DB as PostgreSQL

  C->>API: POST /bookings
  API->>BS: createBooking(hotelId, roomTypeId, checkIn, checkOut, qty)
  BS->>R: SETNX lock:hotel_id:room_type_id:date (TTL 10s)
  R-->>BS: OK (lock acquired)

  BS->>DB: BEGIN TRANSACTION
  BS->>DB: SELECT booked_rooms FROM inventory WHERE room_type_id = X AND date BETWEEN checkIn AND checkOut FOR UPDATE
  DB-->>BS: current counts

  alt rooms available
      BS->>DB: UPDATE inventory SET booked_rooms = booked_rooms + qty
      BS->>DB: INSERT booking + booking_lines
      BS->>DB: COMMIT
      DB-->>BS: success
      BS->>R: DEL lock
      BS-->>API: 201 Created (booking_id)
      API-->>C: Booking confirmed
  else sold out
      BS->>DB: ROLLBACK
      BS->>R: DEL lock
      BS-->>API: 409 Conflict
      API-->>C: Room unavailable
  end

Booking flow with distributed lock and database-level pessimistic locking. The Redis lock prevents thundering herd on popular properties. The FOR UPDATE clause prevents races at the row level.

The system uses two layers of protection:

Redis distributed lock. Before touching the database, the booking service acquires a lock keyed on hotel_id:room_type_id:date. This coarse-grained lock prevents a thundering herd of concurrent requests from hitting the database with SELECT FOR UPDATE at the same time. The TTL of 10 seconds ensures the lock is released even if the service crashes.
PostgreSQL FOR UPDATE. Inside the transaction, the SELECT FOR UPDATE on the inventory rows provides row-level pessimistic locking. Even if two requests slip past the Redis lock (possible due to clock skew or Redis failover), the database guarantees serialized access to those rows.

The combination means we get both performance (most contention resolved in Redis without database round trips) and correctness (the database is the final authority).

Handling lock failures

If the Redis lock is already held, the service returns a 409 immediately rather than queuing. The client retries after a short backoff. For popular properties during flash sales, we also implement a “hold” system: the first step reserves inventory for 10 minutes while the user completes payment, then a background job releases unheld inventory.

Deep dive: search and availability

Search is the highest traffic path. A traveler types “hotels in Tokyo, March 15-18, 2 guests” and expects results in under 500ms with accurate pricing and availability.

Indexing pipeline. When a partner updates property data or pricing, the change flows through Kafka to an indexing consumer that updates Elasticsearch. The search index contains property metadata, geo coordinates, amenities, star rating, and a precomputed min_price for common date ranges. Full availability is not in the search index because it changes too frequently.

Search flow:

Elasticsearch handles the geo query (hotels within 20 km of city center), filters (star rating, amenities, price range), and text matching (property name).
Results return 20 candidate hotels ranked by relevance and price.
For these 20 hotels, the search service makes a batch call to the availability service, which checks Redis cache first, then falls back to the inventory table in PostgreSQL.
Results are returned with live pricing and a “available” or “limited” badge.

The two-phase approach (broad search in Elasticsearch, then targeted availability check in the database) keeps search fast. Elasticsearch handles the complex filtering at scale. The database handles the critical correctness of “is this room actually available right now.”

Cache strategy. Availability data for the next 7 days is cached in Redis with a 60-second TTL. This means a search result might show a room as available when it was booked 30 seconds ago. That is acceptable because the booking flow does the real check and returns 409 if the room is gone. The user sees “sorry, this room was just booked” and tries another option.

Pricing and revenue

Hotel pricing is not a single number. It varies by date, day of week, season, demand, and promotions. The pricing service manages a calendar where partners set:

Base rate per room type per night.
Seasonal multipliers (1.5x during holidays, 0.8x in low season).
Last-minute discounts that activate when occupancy is below a threshold.
Length-of-stay discounts (10% off for 5+ nights).

The total price for a booking is computed by summing the nightly rate for each date in the range after applying all applicable rules. This computation happens at search time for display and again at booking time for the actual charge. The booking-time price is authoritative. If the price changed between search and booking, the user sees the updated price and confirms.

Payment and payout flow

The platform collects payment from the traveler at booking time and pays the hotel partner after check-out. This creates a float period where the platform holds funds.

Charge. When the booking is confirmed, the payment service charges the traveler’s card through a payment gateway (Stripe, Adyen). The charge is captured immediately for non-refundable bookings and authorized-only for flexible bookings.
Cancellation. If the traveler cancels within the free cancellation window, the authorized amount is voided. After the window, the cancellation fee is captured.
Payout. After check-out, a daily batch job calculates amounts owed to each hotel partner (booking total minus platform commission, typically 15-20%) and initiates bank transfers.
Reconciliation. A separate reconciliation service matches charges, refunds, and payouts to ensure the books balance. Discrepancies trigger alerts.

Trade-offs and alternatives

Pessimistic vs. optimistic locking. We chose pessimistic locking (SELECT FOR UPDATE + Redis lock) because hotel inventory is highly contended for popular properties. Optimistic locking (version numbers with retry) works when contention is low but causes retry storms during flash sales. The trade-off is lower throughput per lock, but higher first-attempt success rate.

Count-based vs. room-based inventory. We track counts (booked_rooms) rather than assigning specific rooms at booking time. This simplifies the booking flow and avoids the combinatorial complexity of assigning rooms across overlapping date ranges. The downside is that specific room preferences (ocean view, high floor) require a secondary assignment step closer to check-in.

Elasticsearch vs. dedicated geo database. PostgreSQL with PostGIS could handle geo queries, but Elasticsearch gives us full-text search, faceted filtering, and geo queries in a single system. The trade-off is eventual consistency in the search index (30-60 second lag). For a booking platform where stale search results are acceptable (availability is re-checked at booking time), this is a good trade.

Single database vs. CQRS. We could separate the read model (search, browsing) from the write model (bookings) more aggressively using CQRS. The current design already does this partially: Elasticsearch is the read model, PostgreSQL is the write model. Full CQRS with event sourcing would add complexity (projections, eventual consistency guarantees) without proportional benefit at our scale.

What real systems actually do

Booking.com runs one of the largest hotel platforms globally with over 28M listings. They use a microservices architecture with hundreds of services. Their availability system handles extreme write contention during peak booking hours and uses a combination of in-memory caching and database-level locking similar to what we described.

Airbnb uses a slightly different model because most listings have exactly one unit of inventory (the entire home). Their double-booking prevention is simpler per listing but scales to millions of unique hosts. They use a combination of optimistic locking and a calendar-based availability model.

Expedia aggregates inventory from multiple sources (direct hotel connections, GDS systems, other platforms). Their challenge is not just preventing double bookings within their platform but also handling inventory that can be booked through other channels simultaneously.

All three systems invest heavily in search infrastructure. Elasticsearch or custom search engines handle the heavy lifting of filtering millions of properties by location, price, date, and amenities.

What comes next

This design handles the core booking platform. Production systems layer on additional complexity:

Dynamic pricing. Machine learning models that adjust prices in real-time based on demand, competitor pricing, and historical patterns. This replaces the static pricing calendar with a pricing engine.
Overbooking management. Airlines do it, and hotels do too. A controlled overbooking strategy (booking 102% of capacity) with a rebooking workflow for the rare case when all guests show up.
Multi-region deployment. Hotels are regional, so search traffic clusters by geography. Deploying search clusters in multiple regions reduces latency. The booking path needs a single source of truth, so it stays centralized or uses a leader-follower setup per region.
Fraud detection. Detecting fraudulent bookings (stolen credit cards, fake cancellations for refund abuse) requires a real-time scoring system on the booking path.
Review and ranking system. Guest reviews feed into search ranking. A property with 4.8 stars and 500 reviews ranks higher than a property with no reviews even if the price is lower. The ranking algorithm balances relevance, price, reviews, and platform revenue (promoted listings).

The booking platform is a good example of a system where the read path and write path have fundamentally different requirements. Optimizing for both simultaneously, fast search with eventual consistency and ironclad booking with strong consistency, is the central design tension.

← Back to all series