Search…

Multi-tenancy patterns

In this series (15 parts)
  1. Backend system design scope
  2. Designing RESTful APIs
  3. Authentication and session management
  4. Database design for backend systems
  5. Caching in backend systems
  6. Background jobs and task queues
  7. File upload and storage
  8. Search integration
  9. Email and notification delivery
  10. Webhooks: design and security
  11. Payments integration
  12. Multi-tenancy patterns
  13. Backend for Frontend (BFF) pattern
  14. GraphQL server design
  15. gRPC and internal service APIs

Every SaaS product is multi-tenant. Whether you are building a project management tool, an analytics platform, or a billing system, multiple customers share your infrastructure. The question is not whether to support multi-tenancy but how to isolate tenants from each other.

Get isolation wrong and one tenant’s data leaks into another’s queries. Get it too right and you are running a separate deployment per customer, which defeats the economics that make SaaS viable.

This article covers the three canonical models, their trade-offs at the database level, row-level security as a safety net, and tenant-aware caching strategies. For scaling data storage across tenants, see database sharding.

The three models

Multi-tenancy patterns fall on a spectrum from full isolation to full sharing:

flowchart TB
  subgraph Silo["Silo Model"]
      direction TB
      T1A["Tenant A"] --> DB1A["Database A"]
      T1B["Tenant B"] --> DB1B["Database B"]
      T1C["Tenant C"] --> DB1C["Database C"]
  end

  subgraph Bridge["Bridge Model"]
      direction TB
      T2A["Tenant A"] --> Schema2A["Schema A"]
      T2B["Tenant B"] --> Schema2B["Schema B"]
      T2C["Tenant C"] --> Schema2C["Schema C"]
      Schema2A --> DB2["Shared Database"]
      Schema2B --> DB2
      Schema2C --> DB2
  end

  subgraph Pool["Pool Model"]
      direction TB
      T3A["Tenant A"] --> DB3["Shared Database<br/>Shared Schema"]
      T3B["Tenant B"] --> DB3
      T3C["Tenant C"] --> DB3
  end

Three multi-tenancy models. Silo gives maximum isolation, pool gives maximum efficiency, bridge sits between them.

Silo: one database per tenant

Each tenant gets their own database instance (or cluster). Complete isolation. One tenant’s heavy query cannot affect another’s performance. A data breach in one tenant’s infrastructure does not expose others.

Pros: strongest isolation, simplest mental model for security audits, per-tenant backup and restore, independent scaling.

Cons: expensive at scale (100 tenants means 100 database instances), operational overhead for schema migrations (roll out to each database), connection management becomes complex.

Best for: enterprise SaaS where customers demand data residency guarantees, healthcare or financial applications with strict compliance requirements, or when you have a small number of high-value tenants.

Pool: shared database, shared schema

All tenants share a single database and the same tables. Every table has a tenant_id column. Every query includes a WHERE tenant_id = ? filter.

Pros: cheapest to operate, simplest to deploy, schema migrations apply once, connection pooling is straightforward.

Cons: a missing WHERE tenant_id clause leaks data across tenants, noisy neighbor problems (one tenant’s expensive query affects everyone), harder to offer per-tenant compliance guarantees.

Best for: B2C SaaS with many small tenants, early-stage products where operational simplicity matters more than isolation guarantees, and products where tenant data is not highly sensitive.

Bridge: shared database, separate schemas

Each tenant gets their own schema (or namespace) within a shared database. Tables have the same structure but live in separate schemas. Queries are routed to the correct schema at connection time.

Pros: better isolation than pool without the cost of silo, per-tenant schema migrations possible (useful for gradual rollouts), easier to reason about data ownership.

Cons: schema count can hit database limits (PostgreSQL handles thousands of schemas, but performance degrades), migrations must be applied per schema, connection routing adds complexity.

Best for: mid-market SaaS with moderate isolation requirements, products transitioning from pool to silo, or products where some tenants need stronger isolation than others.

Quantitative comparison

No single model wins on every axis. The right choice depends on your tenant profile and compliance requirements.

Tenant isolation at the database level

Regardless of which model you choose, the database layer is where isolation matters most. A bug in your application code should not be able to expose one tenant’s data to another.

Row-level security (RLS)

PostgreSQL’s row-level security is the strongest defense for pool-model multi-tenancy. RLS policies are enforced by the database engine, not by application code. Even if your ORM generates a query without a tenant filter, the database rejects rows that do not match the policy.

-- Enable RLS on a table
ALTER TABLE orders ENABLE ROW LEVEL SECURITY;

-- Create a policy that restricts access based on a session variable
CREATE POLICY tenant_isolation ON orders
    USING (tenant_id = current_setting('app.current_tenant')::uuid);

-- At the start of every request, set the tenant context
SET app.current_tenant = 'tenant_abc123';

This is a defense-in-depth strategy. Your application code should still include tenant_id filters. RLS catches the cases where it does not.

Important caveats:

  • Superuser bypasses RLS. Your application database user should never be a superuser.
  • Performance. RLS adds a predicate to every query. On well-indexed tables this cost is negligible. On complex queries it can be meaningful. Benchmark.
  • ORM compatibility. Some ORMs do not expect the database to silently filter rows. Test that your ORM handles empty results correctly when RLS filters everything out.

Connection-level isolation

For the bridge model, set the search path at connection time:

SET search_path TO tenant_abc123, public;

This routes all unqualified table references to the tenant’s schema. Combined with connection pooling (PgBouncer in transaction mode), each request gets routed to the correct schema.

For the silo model, maintain a mapping of tenant IDs to database connection strings. A tenant router middleware resolves the tenant from the request (subdomain, header, or JWT claim) and selects the correct connection pool.

Tenant resolution

Every request must be associated with a tenant before it touches the database. Common resolution strategies:

StrategyExampleProsCons
Subdomainacme.app.comClear, no header manipulationDNS and TLS wildcard setup
Path prefixapp.com/acme/...Simple routingHarder to separate in reverse proxy
JWT claimtenant_id in tokenNo URL changesRequires auth on every request
Custom headerX-Tenant-IDFlexibleEasy to forge if not validated

The safest approach combines two: resolve from the subdomain (or JWT) and validate against the authenticated user’s tenant membership. Never trust a client-supplied tenant ID without server-side validation.

Shared schema trade-offs

The pool model with a shared schema is the most common starting point. Here are the trade-offs you will encounter as you scale:

Index bloat. Every index now covers all tenants. A B-tree index on (tenant_id, created_at) serves every tenant’s “recent items” query, but the index is proportional to total rows across all tenants, not per-tenant rows.

Migration risk. A schema migration affects every tenant simultaneously. A bad migration that locks a table blocks all tenants. Use online migration tools (like pg_repack or gh-ost for MySQL) and test migrations against production-sized datasets.

Noisy neighbors. One tenant running a massive analytical query can saturate the connection pool or fill the buffer cache, degrading performance for everyone. Mitigations include query timeouts, per-tenant connection limits, and read replicas for heavy queries.

Data deletion. When a tenant churns, deleting their data from shared tables is a large, potentially locking operation. Soft-delete with background cleanup is safer than DELETE FROM orders WHERE tenant_id = ? on a 100-million-row table.

Tenant-aware caching

Caching in a multi-tenant system requires tenant isolation in the cache layer, not just the database layer. A cache key that does not include the tenant ID will serve one tenant’s data to another.

The simplest approach is key prefixing:

cache_key = f"{tenant_id}:users:{user_id}"

For Redis, you can use separate databases (0-15) per tenant, but this does not scale beyond 16 tenants. A better approach is key prefixing with a shared Redis cluster.

Eviction policies need tenant awareness too. If one tenant has 10x the data, their keys should not crowd out smaller tenants. Options:

  1. Per-tenant memory limits using Redis MEMORY tracking (requires Redis 7+).
  2. TTL-based tiering: high-value tenants get longer TTLs, smaller tenants get shorter ones.
  3. Separate cache clusters for tenants on premium tiers (a hybrid of silo and pool at the cache layer).

Cross-tenant operations

Some operations legitimately need to span tenants: platform-wide analytics, admin dashboards, global search. These should run through a separate code path with explicit authorization, never by removing the tenant filter from the standard query path.

Create a dedicated “platform” role with cross-tenant read access. Audit every cross-tenant query. Rate limit and log all platform-role access.

Migration between models

Products often start with pool (cheapest, fastest to build) and migrate upward as they land enterprise customers who demand isolation.

A typical migration path:

  1. Pool with RLS: start here. RLS gives you a safety net while you build with shared tables.
  2. Bridge for premium tenants: when your first enterprise customer needs isolation, create a separate schema for them. Route their requests to the new schema.
  3. Silo for regulated tenants: healthcare or financial customers may require a dedicated database. Spin up a silo instance and migrate their data.

The key enabler is a clean tenant resolution layer. If your tenant routing is a single middleware function, migrating a tenant from pool to bridge or silo is a configuration change, not a code rewrite.

flowchart LR
  Pool["Pool Model<br/>(shared schema)"] -->|"Enterprise customer<br/>needs isolation"| Bridge["Bridge Model<br/>(separate schemas)"]
  Bridge -->|"Regulated customer<br/>needs dedicated DB"| Silo["Silo Model<br/>(separate databases)"]
  Pool -->|"All tenants need<br/>strong isolation"| Silo

Typical migration path from pool to silo. Most products run a hybrid, with the majority of tenants in pool and premium tenants in bridge or silo.

Testing multi-tenancy

Multi-tenancy bugs are data leaks. They are the worst category of bug in a SaaS product. Your testing strategy must explicitly verify isolation:

  1. Cross-tenant query tests: create two tenants, insert data for both, query as tenant A, and assert that tenant B’s data is not returned.
  2. Missing tenant filter tests: use a test framework that hooks into your ORM and fails if any query against a tenant-scoped table omits the tenant filter.
  3. RLS bypass tests: connect as the application database user (not superuser) and verify that RLS blocks cross-tenant access.
  4. Load tests per tenant: verify that one tenant’s load does not degrade another tenant’s response times beyond your SLA.

What comes next

With multi-tenancy patterns established, the next article covers the Backend for Frontend (BFF) pattern: why serving mobile, web, and API clients from a single backend creates problems, and how dedicated BFFs solve them.

Start typing to search across all content
navigate Enter open Esc close