Caching in backend systems
In this series (15 parts)
- Backend system design scope
- Designing RESTful APIs
- Authentication and session management
- Database design for backend systems
- Caching in backend systems
- Background jobs and task queues
- File upload and storage
- Search integration
- Email and notification delivery
- Webhooks: design and security
- Payments integration
- Multi-tenancy patterns
- Backend for Frontend (BFF) pattern
- GraphQL server design
- gRPC and internal service APIs
Caching is storing a computed result so you can serve it again without recomputing it. In backend systems, this usually means putting frequently read data in Redis or Memcached instead of hitting the database on every request. Done well, caching cuts response times from hundreds of milliseconds to single digits. Done poorly, it introduces stale data bugs that are hard to reproduce and harder to fix.
What to cache and what not to
Cache data that is:
- Read frequently, written rarely: user profiles, product catalogs, feature flags.
- Expensive to compute: aggregation results, leaderboard rankings, permission checks that join multiple tables.
- Tolerant of staleness: a product description that is 30 seconds stale is fine. An account balance that is 30 seconds stale is not.
Do not cache:
- Write-heavy data: if the cache is invalidated on every write, caching adds overhead without benefit.
- Data that must be strongly consistent: financial balances, inventory counts during checkout, anything where serving stale data has real consequences.
- Large, rarely accessed data: caching a 50 MB report that one person reads once a day wastes memory.
Redis data structures and their uses
Redis is not just a key-value store. Its data structures solve specific backend problems.
| Structure | Use Case | Example |
|---|---|---|
| String | Simple cache entries, counters | Session data, rate limit counters |
| Hash | Objects with multiple fields | User profile cache |
| List | Queues, recent activity feeds | Latest 100 notifications |
| Set | Unique membership, tagging | Online users, feature flag audiences |
| Sorted Set | Ranked data, time-series | Leaderboards, scheduled jobs |
| Stream | Event log, consumer groups | Activity feeds, audit logs |
Sorted sets for leaderboards
A sorted set stores members with scores. Redis keeps them sorted by score, so rank queries are O(log N):
ZADD leaderboard 1500 "user:42"
ZADD leaderboard 2300 "user:17"
ZADD leaderboard 1800 "user:89"
ZREVRANGE leaderboard 0 9 WITHSCORES # Top 10
ZREVRANK leaderboard "user:42" # User's rank
Building this from a database query requires a full table scan and sort. Redis answers it in microseconds.
Hashes for object caching
Instead of serializing entire objects to JSON strings, use hashes to cache individual fields:
HSET user:42 name "Alice" email "alice@example.com" role "admin"
HGET user:42 name # Get one field
HMGET user:42 name role # Get multiple fields
This lets you read and update individual fields without deserializing and reserializing the entire object.
Cache-aside implementation
Cache-aside (also called lazy loading) is the most common caching architecture pattern. The application checks the cache first. On a miss, it reads from the database, writes to the cache, and returns the result.
sequenceDiagram participant C as Client participant App as Application participant Cache as Redis participant DB as Database C->>App: GET /users/42 App->>Cache: GET user:42 Cache-->>App: Cache miss App->>DB: SELECT * FROM users WHERE id = 42 DB-->>App: User data App->>Cache: SET user:42 (TTL 300s) App-->>C: 200 OK + user data Note over C, DB: Next request hits cache C->>App: GET /users/42 App->>Cache: GET user:42 Cache-->>App: Cache hit App-->>C: 200 OK + user data (from cache)
Cache-aside pattern. The application manages both the cache and the database.
Invalidation strategies
The hard part of caching is invalidation. When the underlying data changes, the cache must be updated or removed.
TTL-based expiry: set a time-to-live on every cache entry. After the TTL expires, the next read triggers a cache miss and a fresh database lookup. Simple, but data can be stale for up to the TTL duration.
Write-through invalidation: when the application writes to the database, it also deletes or updates the corresponding cache entry. Consistent, but adds write latency and complexity.
Event-driven invalidation: database changes emit events (via CDC or application events), and a consumer invalidates the cache. Decoupled, but adds infrastructure.
Most systems use TTL as a safety net combined with write-through invalidation for critical paths.
Distributed lock patterns
When multiple application instances share a cache, you need distributed locks to prevent problems like cache stampedes (many instances simultaneously fetching the same data after a cache miss).
The cache stampede problem
When a popular cache entry expires, hundreds of concurrent requests all see a cache miss and all hit the database simultaneously. This can overwhelm the database.
Lock-based solution
Only one instance fetches the data; others wait for the cache to be populated.
sequenceDiagram participant A as Instance A participant B as Instance B participant R as Redis participant DB as Database A->>R: GET user:42 R-->>A: Cache miss A->>R: SET lock:user:42 NX EX 10 R-->>A: OK (lock acquired) B->>R: GET user:42 R-->>B: Cache miss B->>R: SET lock:user:42 NX EX 10 R-->>B: nil (lock not acquired) B->>B: Wait and retry A->>DB: SELECT * FROM users WHERE id = 42 DB-->>A: User data A->>R: SET user:42 (TTL 300s) A->>R: DEL lock:user:42 B->>R: GET user:42 R-->>B: Cache hit
Distributed lock prevents cache stampede. Only Instance A queries the database; Instance B waits for the cache to be populated.
The SET lock:user:42 NX EX 10 command atomically sets the lock only if it does not exist (NX) with a 10-second expiration (EX 10). The expiration prevents deadlocks if the lock holder crashes.
Redlock for stronger guarantees
For single-node Redis, the simple SET NX EX lock is sufficient. For Redis clusters, the Redlock algorithm acquires locks on a majority of Redis nodes to handle node failures. Use Redlock when correctness depends on the lock (e.g., preventing double-charging). For cache stampede prevention, the simple lock is fine because the worst case is a few extra database queries.
Cache warming strategies
A cold cache causes a burst of database queries when you deploy new code, scale up instances, or restart Redis. Cache warming pre-populates the cache before traffic hits it.
Strategies
On-deploy warming: a deployment step queries the most-accessed keys and populates the cache before the new instances start receiving traffic.
Background refresh: a periodic job refreshes cache entries before they expire. Instead of a TTL of 300 seconds, set the TTL to 600 seconds and refresh every 250 seconds. The cache is never cold.
Probabilistic early expiration: each cache read has a small probability of triggering a refresh before the TTL expires. The probability increases as the TTL approaches. This spreads refresh load over time instead of creating a thundering herd at expiry.
Cache warming dramatically reduces the database load spike after deployments. Without warming, the first few minutes can see 50x the normal database load.
Cache sizing and eviction
Redis needs enough memory to hold your working set. If it runs out, it evicts entries based on the configured policy:
allkeys-lru: evict the least recently used key. Good default for caches.volatile-lru: evict LRU keys that have a TTL set. Keeps permanent keys safe.allkeys-lfu: evict the least frequently used key. Better for workloads with stable hot keys.
Monitor your cache hit rate. A hit rate below 80% suggests the cache is too small or the TTL is too short. A hit rate above 99% suggests you might be caching too aggressively.
Monitoring cache health
Track these metrics:
- Hit rate: percentage of reads served from cache. Target 85% or higher.
- Eviction rate: entries evicted per second. High eviction means the cache is undersized.
- Memory usage: percentage of allocated memory in use.
- Latency: p50 and p99 cache read/write times. Redis should be sub-millisecond at p99.
- Connection count: too many connections can exhaust Redis’s file descriptors.
What comes next
The next article covers background jobs and task queues: why background jobs exist, job queue architecture, retry strategies, dead letter queues, and observability. Many of the patterns in this article (cache warming, event-driven invalidation) rely on background jobs, making them the natural next topic.