Nov 2, 2025 · 16 min read · System Design

Caching in backend systems

In this series (15 parts)

Caching is storing a computed result so you can serve it again without recomputing it. In backend systems, this usually means putting frequently read data in Redis or Memcached instead of hitting the database on every request. Done well, caching cuts response times from hundreds of milliseconds to single digits. Done poorly, it introduces stale data bugs that are hard to reproduce and harder to fix.

What to cache and what not to

Cache data that is:

Read frequently, written rarely: user profiles, product catalogs, feature flags.
Expensive to compute: aggregation results, leaderboard rankings, permission checks that join multiple tables.
Tolerant of staleness: a product description that is 30 seconds stale is fine. An account balance that is 30 seconds stale is not.

Do not cache:

Write-heavy data: if the cache is invalidated on every write, caching adds overhead without benefit.
Data that must be strongly consistent: financial balances, inventory counts during checkout, anything where serving stale data has real consequences.
Large, rarely accessed data: caching a 50 MB report that one person reads once a day wastes memory.

Redis data structures and their uses

Redis is not just a key-value store. Its data structures solve specific backend problems.

Structure	Use Case	Example
String	Simple cache entries, counters	Session data, rate limit counters
Hash	Objects with multiple fields	User profile cache
List	Queues, recent activity feeds	Latest 100 notifications
Set	Unique membership, tagging	Online users, feature flag audiences
Sorted Set	Ranked data, time-series	Leaderboards, scheduled jobs
Stream	Event log, consumer groups	Activity feeds, audit logs

Sorted sets for leaderboards

A sorted set stores members with scores. Redis keeps them sorted by score, so rank queries are O(log N):

ZADD leaderboard 1500 "user:42"
ZADD leaderboard 2300 "user:17"
ZADD leaderboard 1800 "user:89"

ZREVRANGE leaderboard 0 9 WITHSCORES   # Top 10
ZREVRANK leaderboard "user:42"          # User's rank

Building this from a database query requires a full table scan and sort. Redis answers it in microseconds.

Hashes for object caching

Instead of serializing entire objects to JSON strings, use hashes to cache individual fields:

HSET user:42 name "Alice" email "alice@example.com" role "admin"
HGET user:42 name          # Get one field
HMGET user:42 name role    # Get multiple fields

This lets you read and update individual fields without deserializing and reserializing the entire object.

Cache-aside implementation

Cache-aside (also called lazy loading) is the most common caching architecture pattern. The application checks the cache first. On a miss, it reads from the database, writes to the cache, and returns the result.

sequenceDiagram
  participant C as Client
  participant App as Application
  participant Cache as Redis
  participant DB as Database

  C->>App: GET /users/42
  App->>Cache: GET user:42
  Cache-->>App: Cache miss

  App->>DB: SELECT * FROM users WHERE id = 42
  DB-->>App: User data

  App->>Cache: SET user:42 (TTL 300s)
  App-->>C: 200 OK + user data

  Note over C, DB: Next request hits cache

  C->>App: GET /users/42
  App->>Cache: GET user:42
  Cache-->>App: Cache hit
  App-->>C: 200 OK + user data (from cache)

Cache-aside pattern. The application manages both the cache and the database.

Invalidation strategies

The hard part of caching is invalidation. When the underlying data changes, the cache must be updated or removed.

TTL-based expiry: set a time-to-live on every cache entry. After the TTL expires, the next read triggers a cache miss and a fresh database lookup. Simple, but data can be stale for up to the TTL duration.

Write-through invalidation: when the application writes to the database, it also deletes or updates the corresponding cache entry. Consistent, but adds write latency and complexity.

Event-driven invalidation: database changes emit events (via CDC or application events), and a consumer invalidates the cache. Decoupled, but adds infrastructure.

Most systems use TTL as a safety net combined with write-through invalidation for critical paths.

Distributed lock patterns

When multiple application instances share a cache, you need distributed locks to prevent problems like cache stampedes (many instances simultaneously fetching the same data after a cache miss).

The cache stampede problem

When a popular cache entry expires, hundreds of concurrent requests all see a cache miss and all hit the database simultaneously. This can overwhelm the database.

Lock-based solution

Only one instance fetches the data; others wait for the cache to be populated.

sequenceDiagram
  participant A as Instance A
  participant B as Instance B
  participant R as Redis
  participant DB as Database

  A->>R: GET user:42
  R-->>A: Cache miss
  A->>R: SET lock:user:42 NX EX 10
  R-->>A: OK (lock acquired)

  B->>R: GET user:42
  R-->>B: Cache miss
  B->>R: SET lock:user:42 NX EX 10
  R-->>B: nil (lock not acquired)
  B->>B: Wait and retry

  A->>DB: SELECT * FROM users WHERE id = 42
  DB-->>A: User data
  A->>R: SET user:42 (TTL 300s)
  A->>R: DEL lock:user:42

  B->>R: GET user:42
  R-->>B: Cache hit

Distributed lock prevents cache stampede. Only Instance A queries the database; Instance B waits for the cache to be populated.

The SET lock:user:42 NX EX 10 command atomically sets the lock only if it does not exist (NX) with a 10-second expiration (EX 10). The expiration prevents deadlocks if the lock holder crashes.

Redlock for stronger guarantees

For single-node Redis, the simple SET NX EX lock is sufficient. For Redis clusters, the Redlock algorithm acquires locks on a majority of Redis nodes to handle node failures. Use Redlock when correctness depends on the lock (e.g., preventing double-charging). For cache stampede prevention, the simple lock is fine because the worst case is a few extra database queries.

Cache warming strategies

A cold cache causes a burst of database queries when you deploy new code, scale up instances, or restart Redis. Cache warming pre-populates the cache before traffic hits it.

Strategies

On-deploy warming: a deployment step queries the most-accessed keys and populates the cache before the new instances start receiving traffic.

Background refresh: a periodic job refreshes cache entries before they expire. Instead of a TTL of 300 seconds, set the TTL to 600 seconds and refresh every 250 seconds. The cache is never cold.

Probabilistic early expiration: each cache read has a small probability of triggering a refresh before the TTL expires. The probability increases as the TTL approaches. This spreads refresh load over time instead of creating a thundering herd at expiry.

Cache warming dramatically reduces the database load spike after deployments. Without warming, the first few minutes can see 50x the normal database load.

Cache sizing and eviction

Redis needs enough memory to hold your working set. If it runs out, it evicts entries based on the configured policy:

allkeys-lru: evict the least recently used key. Good default for caches.
volatile-lru: evict LRU keys that have a TTL set. Keeps permanent keys safe.
allkeys-lfu: evict the least frequently used key. Better for workloads with stable hot keys.

Monitor your cache hit rate. A hit rate below 80% suggests the cache is too small or the TTL is too short. A hit rate above 99% suggests you might be caching too aggressively.

Monitoring cache health

Track these metrics:

Hit rate: percentage of reads served from cache. Target 85% or higher.
Eviction rate: entries evicted per second. High eviction means the cache is undersized.
Memory usage: percentage of allocated memory in use.
Latency: p50 and p99 cache read/write times. Redis should be sub-millisecond at p99.
Connection count: too many connections can exhaust Redis’s file descriptors.

What comes next

The next article covers background jobs and task queues: why background jobs exist, job queue architecture, retry strategies, dead letter queues, and observability. Many of the patterns in this article (cache warming, event-driven invalidation) rely on background jobs, making them the natural next topic.

← Back to all series