Jan 14, 2026 · 16 min read · System Design

Notification systems

In this series (12 parts)

A user gets a new follower. Your system needs to display a badge in the app, send a push notification to their phone, and maybe fire off an email. That sounds simple until you consider that the user has three devices, an email preference that says “daily digest only,” and a timezone that makes it 3am where they live. Multiply by millions of users and dozens of notification types, and you have one of the most deceptively complex services in any platform.

Notification systems sit at the intersection of real-time delivery, user preferences, rate limiting, and multi-channel routing. Getting them wrong means either spamming users into disabling notifications entirely, or losing critical alerts in a queue that nobody monitors.

Push vs pull

There are two fundamental models for delivering information to a client. In a pull model, the client periodically asks the server “do I have any new notifications?” This is polling. It is simple to implement but wasteful: most polls return empty responses, yet each one costs a network round trip and server CPU.

In a push model, the server initiates the connection and sends data when it is available. The client does not need to ask. Push is more efficient for notifications because events are sporadic and unpredictable. You do not want a user’s phone polling your API every 10 seconds just to check for new likes.

Mobile operating systems enforce the push model through platform notification services. An app cannot maintain a persistent connection in the background (the OS kills it to save battery), so Apple and Google provide centralized push infrastructure that your server talks to instead.

APNs and FCM at a conceptual level

Apple Push Notification service (APNs) and Firebase Cloud Messaging (FCM) are the gateways for reaching iOS and Android devices. Your server never communicates directly with a user’s phone for push notifications. Instead, it sends a payload to Apple or Google, which handle the last-mile delivery.

The flow works like this. When a user installs your app and grants notification permission, the device registers with APNs or FCM and receives a device token. Your app sends this token to your backend, which stores it alongside the user ID. When you need to notify that user, your backend sends the notification payload plus the device token to APNs or FCM. The platform service delivers it to the device, which displays the notification.

Device tokens are not permanent. They can change when the user reinstalls the app, restores from backup, or gets a new phone. Your system needs to handle token refresh events and prune stale tokens. APNs returns feedback when a token is no longer valid; FCM provides similar signals. Ignoring these leads to wasted API calls and potential rate limiting from the platform.

graph LR
  AS["App Server"] -->|Payload + Token| APNS["APNs / FCM"]
  APNS -->|Push| D1["iPhone"]
  APNS -->|Push| D2["Android"]
  D1 -->|Token refresh| AS
  D2 -->|Token refresh| AS

Push notification delivery through platform services to user devices.

Both platforms impose payload size limits (4KB for APNs, 4000 bytes for FCM) and rate limits. If you blast a million notifications in a burst, the platform may throttle you. Sending notifications in controlled batches and handling retry logic for transient failures is essential.

Email delivery pipelines

Email looks simple on the surface: compose a message, call an SMTP server, done. In practice, reliably delivering email at scale is a battle against spam filters, bounce handling, reputation management, and compliance.

A production email pipeline starts with a message queue. The notification service drops an email job onto the queue with the recipient, template ID, and personalization data. An email worker picks up the job, renders the template (merging in the user’s name, the specific event details, and unsubscribe links), and hands the rendered email to an outbound mail transfer agent (MTA) or a third-party service like SendGrid or Amazon SES.

The MTA attempts delivery to the recipient’s mail server. If the recipient’s server is temporarily unavailable (a soft bounce), the MTA retries with exponential backoff. If the address does not exist (a hard bounce), the MTA records the failure and your system should suppress future sends to that address. Continuing to send to hard-bounced addresses destroys your sender reputation, which causes legitimate emails to land in spam folders.

Sender reputation depends on several factors: bounce rate, spam complaint rate, authentication (SPF, DKIM, DMARC records), and sending volume consistency. A sudden spike from 1,000 emails per day to 500,000 triggers spam filters. Warming up a new IP or domain means gradually increasing volume over weeks.

SMS gateways

SMS notifications go through aggregators like Twilio, Vonage, or AWS SNS. Your backend sends a message to the aggregator’s API, which routes it through carrier networks to the recipient’s phone. The delivery path crosses multiple carriers and sometimes international boundaries, which means latency is unpredictable and delivery is not guaranteed.

SMS is expensive compared to push and email. Costs vary by country, ranging from fractions of a cent in the US to several cents per message internationally. Use SMS sparingly: authentication codes, critical alerts, and account security events. Never use SMS for marketing-style notifications unless the user explicitly opted in, and even then, regulations like TCPA (US) and GDPR (EU) impose strict consent requirements.

Delivery receipts from SMS are unreliable. A “delivered” status from the aggregator means the carrier accepted the message, not that the user’s phone received it. There is no equivalent of a read receipt. Design your system to treat SMS as best-effort delivery and provide fallback channels.

Notification fan-out

Fan-out is the core scaling challenge. A popular user posts a photo and 10 million followers need to be notified. Generating 10 million individual notification records and delivering them across push, email, and in-app channels is a combinatorial explosion.

There are two strategies. Fan-out on write means generating notification records for every recipient at the time of the event. When user A posts, you iterate through all 10 million followers and create a notification for each one. This approach makes reads fast (each user just queries their own notification inbox) but writes are extremely expensive for high-follower accounts.

Fan-out on read means storing the event once and computing who should see it at query time. When user B opens their notification feed, the system checks which accounts they follow and assembles notifications from those accounts’ recent activity. This approach makes writes cheap but reads are slow because you are computing the feed on every request.

Most production systems use a hybrid approach. For users with fewer than a threshold of followers (say 10,000), fan-out on write works fine. For celebrities and high-follower accounts, fan-out on read avoids the write amplification problem. The celebrity’s posts are stored once and merged into feeds at read time for their followers.

graph TD
  E["Event: User A posts"] --> NS["Notification Service"]
  NS -->|Fan-out| Q["Priority Queue"]
  Q --> W1["Worker 1"]
  Q --> W2["Worker 2"]
  Q --> W3["Worker N"]
  W1 --> DB["Notification Store"]
  W2 --> DB
  W3 --> DB
  W1 --> Push["Push Service"]
  W2 --> Email["Email Pipeline"]
  W3 --> SMS["SMS Gateway"]
  DB --> Feed["User Notification Feed"]

Fan-out architecture distributing notifications through workers to storage and multiple delivery channels.

The notification store is typically a wide-column database like Cassandra or a sorted set in Redis, keyed by user ID and sorted by timestamp. This gives each user a chronological inbox with efficient range queries for pagination.

Delivery guarantees and deduplication

Notifications must be delivered at least once, but should never be delivered twice. A user receiving the same “new follower” push notification three times is annoying. A user receiving a duplicate “your account was charged $500” email causes panic and support tickets.

At-least-once delivery is straightforward with message queues and reliability patterns. A worker picks up a notification job, attempts delivery, and acknowledges the job only after success. If the worker crashes before acknowledging, the queue redelivers the job to another worker.

The problem is that “success” is ambiguous. You sent a push notification to FCM and got a 200 response. Did the user’s phone receive it? Maybe, maybe not. You sent an email and the MTA accepted it. Did it land in the inbox? Probably, but you will not know for minutes or hours. Exactly-once delivery across external systems is impossible, so the practical goal is idempotent processing on your side combined with deduplication.

Deduplication works by assigning each notification a unique ID derived from the event (for example, a hash of the event type, subject user ID, and target user ID). Before processing, the worker checks a deduplication store (Redis with TTL or a database table) for this ID. If present, the notification was already processed and the worker skips it. If absent, the worker processes it and records the ID.

User preferences and rate limiting

Users should control what they receive and through which channels. A preference service stores per-user, per-notification-type settings: “send me push notifications for new messages, email me a weekly digest for follows, never SMS me about anything.”

Rate limiting prevents notification fatigue. Even if a user has 50 new followers in an hour, sending 50 individual push notifications is hostile. Batching (“You have 50 new followers”) or throttling (max 5 push notifications per hour per user) preserves the user’s attention and your sender reputation.

Quiet hours add another dimension. A notification that arrives at 3am is either ignored or irritating. Store the user’s timezone and suppress non-critical notifications during sleeping hours, queuing them for morning delivery.

Priority levels help too. A security alert (“Someone logged into your account from a new device”) bypasses rate limits and quiet hours. A marketing notification (“Check out our new feature”) respects all throttling rules. Define three or four priority tiers and enforce rules accordingly.

Putting it together

A complete notification system connects event ingestion, preference evaluation, fan-out, channel routing, template rendering, delivery, and tracking. Each component is independently scalable and each channel has its own retry and failure logic.

graph TD
  Event["Incoming Event"] --> PS["Preference Service"]
  PS -->|Filter + Route| FO["Fan-out Workers"]
  FO --> PQ["Push Queue"]
  FO --> EQ["Email Queue"]
  FO --> SQ["SMS Queue"]
  FO --> IQ["In-App Queue"]
  PQ --> PW["Push Workers"] --> APNS["APNs / FCM"]
  EQ --> EW["Email Workers"] --> MTA["MTA / SES"]
  SQ --> SW["SMS Workers"] --> AGG["Twilio / SNS"]
  IQ --> IW["In-App Workers"] --> WS["WebSocket Gateway"]
  PW --> DL["Delivery Log"]
  EW --> DL
  SW --> DL
  IW --> DL

End-to-end notification system from event ingestion through preference filtering, fan-out, and multi-channel delivery.

The delivery log records every notification sent, its delivery status, and whether the user interacted with it. This data feeds analytics (what is the open rate for push vs email?) and debugging (why did this user not receive their password reset email?).

What comes next

Notifications are one-way messages. Many systems need bidirectional, low-latency communication: chat, live dashboards, collaborative editing. The next article covers real-time systems architecture, from WebSocket management to presence services and pub/sub for live updates.

← Back to all series