Security in system design
In this series (20 parts)
- What is system design and why it matters
- Estimations and back-of-envelope calculations
- Scalability: vertical vs horizontal scaling
- CAP theorem and distributed system tradeoffs
- Consistency models
- Load balancing
- Caching: strategies and patterns
- Content Delivery Networks
- Databases: SQL vs NoSQL and when to use each
- Database replication
- Database sharding and partitioning
- Consistent hashing
- Message queues and event streaming
- API design: REST, GraphQL, gRPC
- Rate limiting and throttling
- Proxies: forward and reverse
- Networking concepts for system design
- Reliability patterns: timeouts, retries, circuit breakers
- Observability: logging, metrics, tracing
- Security in system design
A breach at a startup costs an average of $4.88 million (IBM, 2024). For a distributed system handling millions of requests per second, a single misconfigured endpoint or leaked credential can cascade into a full compromise within minutes. Security is not a layer you add after the architecture is stable. It is the architecture.
This article covers how authentication and authorization work at scale, how tokens flow through distributed services, how secrets are managed without being scattered across config files, how encryption protects data both in motion and at rest, how infrastructure defends against volumetric attacks, and why zero trust networking has replaced perimeter security as the default model. If you have been following this series, you already understand observability and rate limiting. Security ties those ideas together and adds new constraints.
Prerequisites
You should be familiar with observability patterns including structured logging and distributed tracing. Understanding rate limiting and API design will help you see where security controls integrate into request flows. For deeper dives on individual topics, the dedicated authentication and authorization and cryptography fundamentals articles go further than we will here.
Authentication vs authorization at scale
Authentication answers “who are you?” Authorization answers “what are you allowed to do?” They sound similar. They are not. Conflating them is one of the most common mistakes in system design.
Authentication validates identity. A user presents credentials, a username and password, a client certificate, a biometric signature, and the system verifies them against a trusted store. In a monolith, this happens once. In a distributed system with 50 microservices, you need a strategy that does not require every service to independently verify credentials against a central database on every request. That path leads to a single point of failure, high latency, and a database that falls over under load.
The standard approach is token-based authentication. The user authenticates once against an identity provider. The identity provider issues a signed token, typically a JSON Web Token (JWT). Every subsequent request carries that token. Each downstream service verifies the signature locally using a public key. No database call. No network hop. Verification takes microseconds.
Authorization is harder. Knowing that a request comes from user 42 does not tell you whether user 42 can delete records in the billing service. Authorization models range from simple role-based access control (RBAC), where users get roles like admin or viewer, to attribute-based access control (ABAC), where policies evaluate combinations of user attributes, resource attributes, and environmental conditions. Google’s Zanzibar system, which powers authorization for Google Drive, YouTube, and Cloud, processes over 10 million authorization checks per second using a relationship-based model. At that scale, authorization is its own distributed system with its own caching, replication, and consistency concerns.
The key design principle: authenticate at the edge, authorize close to the resource. Your API gateway or load balancer handles authentication. Individual services handle authorization, because only the billing service knows its own permission model.
OAuth 2.0 and token storage
OAuth 2.0 is not an authentication protocol. It is an authorization framework. This distinction trips up engineers constantly. OAuth defines how a third-party application can obtain limited access to a service on behalf of a user, without the user sharing their password with the third party. OpenID Connect (OIDC) builds authentication on top of OAuth 2.0 by adding an ID token that contains identity claims.
The OAuth 2.0 authorization code flow is the most secure variant for server-side applications. Here is how the pieces connect:
sequenceDiagram participant U as User Browser participant A as App Server participant AS as Auth Server participant RS as Resource Server U->>A: Click "Login" A->>U: Redirect to Auth Server U->>AS: Authenticate + consent AS->>U: Redirect with auth code U->>A: Forward auth code A->>AS: Exchange code for tokens AS->>A: Access token + refresh token A->>RS: API call with access token RS->>A: Protected resource
OAuth 2.0 authorization code flow. The app server never exposes tokens to the browser. The auth code is a one-time-use intermediary that limits the attack surface.
Token storage decisions have direct security implications. Access tokens should be short-lived, typically 15 minutes or less. If an access token leaks, the blast radius is bounded by its expiration. Refresh tokens are long-lived and must be stored securely on the server side, never in browser local storage, never in cookies without the HttpOnly and Secure flags. For single-page applications, the Backend for Frontend (BFF) pattern keeps tokens on the server entirely, with the browser holding only an opaque session cookie.
JWTs carry a subtle risk: they cannot be revoked before expiration without additional infrastructure. If you need immediate revocation (user account compromise, permission change), you need either a token denylist checked on each request or short token lifetimes paired with aggressive refresh rotation. Both have tradeoffs. The denylist reintroduces the centralized check you were trying to avoid. Short lifetimes increase traffic to the token endpoint. Pick the tradeoff that matches your threat model.
Secrets management
Hardcoded credentials in source code are the security equivalent of leaving your house keys under the doormat. Yet a 2023 GitGuardian report found over 10 million new secrets exposed in public GitHub repositories in a single year. API keys, database passwords, TLS private keys, and encryption keys all require a dedicated management strategy.
A secrets manager like HashiCorp Vault, AWS Secrets Manager, or Google Secret Manager provides centralized storage with access control, audit logging, and automatic rotation. The architecture is straightforward: applications request secrets at runtime through an authenticated API call. They never store secrets on disk or in environment variables that persist across deployments.
For Kubernetes environments, secrets are injected into pods as mounted volumes or environment variables sourced from the secrets manager. The critical point is that the secret never appears in a container image, a Dockerfile, a CI/CD pipeline log, or a version-controlled file. Rotation happens without redeployment. When a database password rotates, the secrets manager updates the stored value and applications fetch the new credential on their next retrieval cycle.
Envelope encryption adds another layer. Instead of encrypting all data with a single master key, you generate a data encryption key (DEK) for each resource, encrypt the data with the DEK, then encrypt the DEK itself with a key encryption key (KEK) stored in a hardware security module (HSM). Compromising a single DEK exposes only the data it protects. Compromising the KEK requires physical access to tamper-resistant hardware.
Encryption in transit and at rest
Encryption in transit means TLS. Specifically, TLS 1.3, which reduced the handshake from two round trips to one and eliminated insecure cipher suites that TLS 1.2 still allowed. Every connection between clients and servers, between services, and between services and databases should use TLS. “But it is internal traffic” is not a defense. Attackers who gain a foothold inside your network (and they will) can sniff unencrypted internal traffic trivially.
Mutual TLS (mTLS) goes further by requiring both sides of a connection to present certificates. In a microservices mesh, mTLS ensures that service A can only communicate with service B if both present valid certificates issued by a trusted certificate authority. Service meshes like Istio and Linkerd automate mTLS certificate provisioning, rotation, and enforcement. Without a service mesh, managing certificates across hundreds of services becomes an operational burden that teams inevitably deprioritize until something breaks.
Encryption at rest protects data stored on disk. Cloud providers offer server-side encryption by default on most storage services. S3 encrypts objects with AES-256. RDS supports encryption for database volumes. But default encryption uses provider-managed keys. For regulated industries (healthcare, finance), you often need customer-managed keys (CMK) with strict key policies, separation of duties between key administrators and data users, and audit trails showing every key usage. The cryptography fundamentals article covers the underlying algorithms in detail.
A practical rule: encrypt everything. The performance overhead of AES-256 on modern hardware with AES-NI instruction support is under 1% for most workloads. The cost of not encrypting, measured in breach penalties and lost trust, is orders of magnitude higher.
DDoS mitigation at the infrastructure level
A distributed denial-of-service attack floods your system with traffic to make it unavailable. Volumetric attacks have exceeded 3.4 Tbps (Cloudflare, 2024). No single server absorbs that. Mitigation happens at the infrastructure level, well before traffic reaches your application.
The first line of defense is an anycast network operated by a CDN or DDoS mitigation provider. Anycast routes traffic to the nearest point of presence (PoP), spreading load across dozens of data centers globally. A 3 Tbps attack aimed at a single IP gets distributed across 200+ edge locations, each absorbing a manageable fraction.
Behind the edge, rate limiting filters abusive request patterns. But rate limiting alone cannot stop application-layer (L7) attacks where each individual request looks legitimate. For those, you need behavioral analysis: is this IP sending 10,000 requests per second to your login endpoint? Is this user agent rotating through known bot fingerprints? Web application firewalls (WAFs) apply rule-based and ML-based detection to identify and block these patterns. The OWASP security guide covers the specific attack vectors WAFs defend against.
Connection-level defenses matter too. SYN flood protection uses SYN cookies to validate TCP handshakes without allocating server-side state for half-open connections. Geo-blocking restricts traffic from regions where you have no users. IP reputation lists preemptively block known malicious sources. These are blunt tools, but when you are absorbing a terabit of traffic, precision is less important than survival.
Zero trust networking
The traditional network security model is a castle with a moat. Everything inside the perimeter is trusted. Everything outside is not. This model fails completely when your engineers work from coffee shops, your services run across three cloud providers, and an attacker who compromises a single internal machine can move laterally through your entire network unchallenged.
Zero trust assumes no implicit trust based on network location. Every request is authenticated and authorized, regardless of whether it originates from inside or outside the network boundary. The principles are simple. Verify explicitly. Use least-privilege access. Assume breach.
In practice, zero trust requires several interlocking components. An identity provider authenticates every user and every service. A policy engine evaluates authorization decisions based on identity, device health, location, and risk score. A policy enforcement point (a proxy, sidecar, or gateway) intercepts every request and consults the policy engine before allowing it through.
Google’s BeyondCorp is the canonical implementation. Published in 2014, it moved Google’s entire workforce from VPN-based access to a model where every application is accessible from any network, but every access decision considers the user’s identity, the device’s security posture (patch level, disk encryption, screen lock), and the sensitivity of the resource being accessed. The result: no VPN, no privileged internal network, and granular access control that a perimeter model could never achieve.
Microsegmentation is the network-level counterpart. Instead of a flat internal network where any machine can reach any other machine, microsegmentation creates fine-grained network policies. The billing service can talk to the payments database on port 5432. Nothing else can. If an attacker compromises the notification service, they cannot pivot to the payments database because the network policy blocks it. In Kubernetes, network policies and service mesh authorization policies enforce microsegmentation at the pod level.
Zero trust is not a product you buy. It is an architectural philosophy that affects every layer of your system: network topology, service communication, identity management, device management, and monitoring. You build it incrementally. Start with mTLS between services. Add identity-aware proxies. Implement device trust checks. Layer in continuous monitoring with the observability patterns you already have in place.
Putting it together
Security in system design is not one decision. It is hundreds of decisions made consistently across every layer. Authenticate at the edge, authorize at the resource. Use short-lived tokens and manage secrets through dedicated infrastructure. Encrypt everything in transit and at rest. Absorb volumetric attacks at the edge before they reach your application. Trust nothing implicitly, not even traffic from your own network.
The thread connecting all of these practices is defense in depth. No single control prevents all attacks. Each layer reduces the blast radius of a compromise and buys you time to detect and respond. When your observability pipeline catches an anomalous spike in authentication failures, when your rate limiter throttles a sudden burst to the login endpoint, when your WAF blocks a SQL injection attempt, these are layers working together.
Build security into your architecture from the beginning. Retrofitting it is always more expensive, more error-prone, and more disruptive than designing it in from day one.
What comes next
With security principles in place, you have the foundational toolkit for system design: scalability, reliability, observability, and now security. The next step is applying these fundamentals to real architectures. The High-Level Design series begins with the monolith vs microservices decision and walks through designing complete systems where all of these principles intersect.