What is system design and why it matters
In this series (20 parts)
- What is system design and why it matters
- Estimations and back-of-envelope calculations
- Scalability: vertical vs horizontal scaling
- CAP theorem and distributed system tradeoffs
- Consistency models
- Load balancing
- Caching: strategies and patterns
- Content Delivery Networks
- Databases: SQL vs NoSQL and when to use each
- Database replication
- Database sharding and partitioning
- Consistent hashing
- Message queues and event streaming
- API design: REST, GraphQL, gRPC
- Rate limiting and throttling
- Proxies: forward and reverse
- Networking concepts for system design
- Reliability patterns: timeouts, retries, circuit breakers
- Observability: logging, metrics, tracing
- Security in system design
System design is the process of defining a system’s architecture, components, modules, interfaces, and data flow so that the system satisfies a given set of requirements. You do it every time you decide how a feature should work across multiple services, databases, and network boundaries. Whether you are building a URL shortener for a side project or designing the messaging backbone at a company serving 500 million users, the thinking process is the same. The scale changes; the discipline does not.
Why system design matters
Most software bugs that wake engineers up at 3 AM are not logic errors in a single function. They are design problems: a database that cannot handle the write volume, a service that creates a circular dependency, a caching layer that serves stale data for hours. These problems are expensive to fix after launch because they live in the connections between components, not inside them.
Good system design prevents an entire category of failures by forcing you to think about load, failure modes, and data consistency before you write the first line of code. It also gives your team a shared vocabulary. When everyone agrees on where the boundaries are, engineers can work in parallel without stepping on each other.
In interviews, system design rounds exist because companies need to know whether you can reason about tradeoffs at the level where decisions cost real money. A wrong algorithm choice might slow down one endpoint. A wrong architectural choice might require rewriting three services and migrating terabytes of data.
Functional vs non-functional requirements
Every system design starts by splitting requirements into two buckets.
Functional requirements describe what the system does. These are the features a user can see and interact with. For a chat application: users can send messages, create group chats, see online status, search message history. Functional requirements answer the question “what should the system do?”
Non-functional requirements describe how the system behaves under real conditions. They include performance, availability, scalability, consistency, durability, and security. For that same chat application: messages should be delivered within 200 milliseconds, the system should be available 99.99% of the time, it should handle 10 million daily active users, and messages should never be lost once the server acknowledges them.
Here is a concrete comparison:
| Requirement type | Example | Measurable target |
|---|---|---|
| Functional | Users can send text messages | Messages appear in recipient’s chat |
| Functional | Users can search message history | Results returned for keyword queries |
| Non-functional | Low latency delivery | p99 latency under 200 ms |
| Non-functional | High availability | 99.99% uptime (52 minutes downtime/year) |
| Non-functional | Durability | Zero message loss after server ACK |
| Non-functional | Scalability | Support 10M DAU, 50K concurrent connections |
The non-functional requirements are what make system design hard. Almost anyone can design a chat app for 10 users. Designing one for 10 million daily active users that stays responsive during traffic spikes, recovers gracefully from server failures, and never loses a message requires fundamentally different architectural choices.
A common mistake is treating non-functional requirements as an afterthought. You bolt on caching after the database falls over. You add a message queue after requests start timing out. Retrofitting these components is always more painful than designing for them upfront, because each one affects your data model, API contracts, and failure handling.
The design process
System design is not about memorizing architectures. It is a repeatable thinking process that you apply to any problem. The process has four phases, and you cycle through them as you learn more about the problem.
graph LR A[Clarify requirements] --> B[Estimate scale] B --> C[Design architecture] C --> D[Justify decisions] D -->|Revisit assumptions| A style A fill:#4a90d9,color:#fff style B fill:#50b848,color:#fff style C fill:#f5a623,color:#fff style D fill:#d94a4a,color:#fff
The four phases of the system design process. Each phase feeds into the next, and you revisit earlier phases as you uncover new constraints.
Phase 1: Clarify requirements
Before drawing a single box on a whiteboard, you need to know what you are building. This sounds obvious, but skipping this step is the most common failure mode in system design interviews and in real engineering work.
Ask questions that expose scope and constraints. If someone says “design a notification system,” you need to know: what kinds of notifications (push, email, SMS, in-app)? How many users? What is the acceptable delay? Can notifications be lost, or must every one be delivered? Is there a priority system? Do users configure their preferences?
The goal is to turn a vague prompt into a concrete problem with clear boundaries. You are not trying to build everything. You are trying to build the right thing.
In a real engineering setting, this phase involves talking to product managers, reading PRDs, and studying existing system behavior. In an interview, it involves asking the interviewer targeted questions. Either way, the output is a short list of functional requirements and a short list of non-functional requirements with specific numbers attached.
Phase 2: Estimate scale
Once you know what you are building, you need to know how big it is. Scale estimation, sometimes called back-of-the-envelope estimation, turns vague requirements into concrete numbers that drive your design choices.
For example, if you are designing a photo sharing service with 50 million daily active users, and each user uploads an average of 0.5 photos per day, you are looking at 25 million photo uploads per day. That is roughly 290 uploads per second on average, with peak traffic likely 3 to 5 times higher, so around 1,000 to 1,500 uploads per second at peak. If each photo averages 2 MB, you need about 50 TB of new storage per day, or roughly 18 PB per year.
These numbers immediately tell you something important: you cannot store this on a single server. You need distributed storage, a CDN for serving images, and probably an object store like S3 rather than a traditional filesystem.
We cover estimation techniques in depth in the next article on estimations. For now, the key point is that estimation is not about getting exact numbers. It is about getting the right order of magnitude so you choose the right class of solution.
Phase 3: Design the architecture
This is where you draw boxes and arrows. You define the major components of the system, how they communicate, and where data lives. The goal is a design that satisfies your functional requirements while meeting the non-functional constraints you identified.
Start with the most straightforward design that could work. For most web applications, that means:
- Clients (web, mobile) talking to an API gateway or load balancer
- Application servers handling business logic
- A database for persistent storage
- A cache for frequently accessed data
Then stress-test this design against your requirements. Can it handle the write throughput you estimated? What happens when a server goes down? Where are the single points of failure? Which operations need strong consistency, and which can tolerate eventual consistency?
As the design evolves, you will introduce additional components: message queues for asynchronous processing, CDNs for static content, read replicas for scaling read-heavy workloads, sharding for scaling write-heavy workloads. Each component you add should be justified by a specific requirement or bottleneck.
The distinction between high-level design (HLD) and low-level design (LLD) matters here. HLD focuses on the overall architecture: which services exist, how they communicate, and how data flows between them. You can explore this further in the article on monoliths vs microservices. LLD zooms into a single component and designs its internal structure: class hierarchies, API schemas, database table definitions. The LLD introduction covers this in detail. For now, understand that system design encompasses both, and you typically start with HLD before diving into LLD for the most critical components.
Phase 4: Justify decisions
Every architectural choice is a tradeoff. Choosing a NoSQL database over a relational one gives you flexible schemas and horizontal scalability, but you lose ACID transactions and complex query support. Adding a cache improves read latency dramatically, but introduces cache invalidation complexity and the risk of serving stale data.
Strong system designers do not just make choices. They articulate why a choice is the right one given the specific constraints. “We chose Cassandra because our access pattern is write-heavy at 50,000 writes per second, we need high availability across regions, and our data model fits a wide-column store naturally” is a justification. “We chose Cassandra because it scales well” is not.
This phase also involves identifying and documenting the risks of your design. What assumptions are you making? What happens if those assumptions are wrong? If you assumed 1,000 writes per second and actual traffic is 10,000, which parts of your design break first?
What interviewers actually look for
System design interviews are not about producing a perfect architecture. Interviewers evaluate your thinking process, and the signal comes from several specific behaviors.
Structured approach over scattered brainstorming. Starting with requirements, moving to estimation, then to design, and finally to tradeoff analysis shows that you have a repeatable method. Jumping straight to drawing databases and caches suggests you are pattern-matching from memorized solutions rather than reasoning from first principles.
Asking good questions. The best candidates spend the first 5 to 10 minutes of a 45-minute interview asking clarifying questions. They identify ambiguities, confirm assumptions, and narrow the scope to something achievable. Candidates who start designing immediately often build the wrong thing.
Quantitative reasoning. When you say “we need a cache here,” an interviewer wants to know why. How much read traffic are you expecting? What is the cache hit ratio you are assuming? How much memory does the cache need? Specific numbers, even rough ones, show that you understand the relationship between requirements and solutions.
Tradeoff awareness. There is no single correct answer in system design. Every choice has a cost. Interviewers want to hear you acknowledge those costs and explain why the benefits outweigh them for this particular problem. Saying “we could also use X here, which would give us Y but cost us Z” demonstrates depth.
Handling failure scenarios. What happens when a server crashes? When the database becomes unavailable? When a network partition separates your data centers? Thinking about failure modes unprompted shows maturity. Production systems fail constantly, and the design should account for that.
A mental model for tradeoffs
Most system design tradeoffs reduce to a few fundamental tensions. Understanding these helps you navigate unfamiliar problems.
Consistency vs availability. The CAP theorem tells us that during a network partition, a distributed system must choose between consistency (every read returns the most recent write) and availability (every request gets a response). In practice, this means choosing where on the spectrum your system sits. A banking system leans toward consistency. A social media feed leans toward availability.
Latency vs throughput. Processing requests one at a time gives you the lowest latency for each individual request but limits your throughput. Batching requests increases throughput but adds latency. You see this tradeoff in database writes (single inserts vs batch inserts), message processing (one-at-a-time vs micro-batching), and network calls (many small requests vs fewer large ones).
Simplicity vs flexibility. A monolithic application is simpler to deploy, debug, and reason about. A microservices architecture gives you independent scaling and deployment but adds operational complexity: service discovery, distributed tracing, network failures between services. The right choice depends on your team size, the rate of change in different parts of the system, and your operational maturity.
Storage vs computation. You can precompute results and store them (trading storage for faster reads) or compute them on the fly (trading CPU time for less storage). Caching, materialized views, and denormalized data all sit on the “store more, compute less” side. The choice depends on your read-to-write ratio and how often the underlying data changes.
How this series is structured
This series builds your system design knowledge in layers, starting with fundamentals and progressing through high-level design, low-level design, and full case studies.
graph TD F[Fundamentals] --> H[High-Level Design] F --> L[Low-Level Design] H --> C[Case Studies] L --> C F:::fundamentals H:::hld L:::lld C:::cases classDef fundamentals fill:#4a90d9,color:#fff classDef hld fill:#50b848,color:#fff classDef lld fill:#f5a623,color:#fff classDef cases fill:#d94a4a,color:#fff
The learning path through this series. Fundamentals feed into both HLD and LLD, which converge in full case studies.
Fundamentals cover the thinking tools you need regardless of the specific problem: requirements gathering, estimation techniques, consistency models, and common building blocks like load balancers, caches, and message queues. You are reading the first article in this section right now.
High-level design articles explore architectural patterns and how to choose between them. Topics include monoliths vs microservices, database selection, replication strategies, and API design. These are the decisions that shape the overall structure of your system.
Low-level design articles focus on the internal design of individual components. The LLD introduction covers how to design clean interfaces, choose data structures, and structure code for a single service or module. These skills matter just as much as high-level architecture, especially in interviews that ask you to design a specific component in detail.
Case studies bring everything together. Each case study walks through a complete design, a URL shortener, a chat system, a notification platform, applying the fundamentals, HLD, and LLD concepts from earlier articles to a realistic problem.
Common pitfalls to avoid
As you learn system design, watch out for a few traps that catch even experienced engineers.
Over-engineering from the start. You do not need Kubernetes, a service mesh, and event sourcing for a system serving 1,000 users. Design for your current scale with a clear path to handle 10x growth. Beyond that, you will have enough data about real usage patterns to make informed decisions rather than speculative ones.
Ignoring the data model. Many designs focus on services and APIs but treat the database as a black box. Your data model is the foundation of your system. How you structure your data determines which queries are fast, which are slow, and which are impossible. Spend time on it.
Copying architectures without understanding context. Netflix uses microservices because they have thousands of engineers, deploy hundreds of times per day, and need independent scaling for wildly different workloads. That does not mean your 5-person team building an internal tool should use microservices. Every architecture exists in a context, and the context matters more than the pattern.
Treating performance as binary. “Is this fast enough?” is the wrong question. “What is the p50 latency? The p99? What happens at 2x peak load? At 5x?” Understanding the distribution of performance, not just the average, is what separates production-ready designs from whiteboard sketches.
What comes next
The next article in this series covers back-of-the-envelope estimation: how to calculate storage needs, throughput requirements, and latency budgets using quick math. Estimation is the bridge between vague requirements and concrete design decisions, and it is a skill that improves dramatically with practice.
From there, you can explore high-level design patterns to understand how systems are structured at the service level, or dive into low-level design to learn how individual components are built internally. Both paths feed into the case studies that tie everything together.