Networking concepts for system design
In this series (20 parts)
- What is system design and why it matters
- Estimations and back-of-envelope calculations
- Scalability: vertical vs horizontal scaling
- CAP theorem and distributed system tradeoffs
- Consistency models
- Load balancing
- Caching: strategies and patterns
- Content Delivery Networks
- Databases: SQL vs NoSQL and when to use each
- Database replication
- Database sharding and partitioning
- Consistent hashing
- Message queues and event streaming
- API design: REST, GraphQL, gRPC
- Rate limiting and throttling
- Proxies: forward and reverse
- Networking concepts for system design
- Reliability patterns: timeouts, retries, circuit breakers
- Observability: logging, metrics, tracing
- Security in system design
Every request your service handles travels over a network. The protocol you choose, the way you manage connections, and the communication pattern you pick between client and server all shape latency, throughput, and reliability in ways that compound as you scale. A misconfigured keep-alive setting or a naive polling loop can waste more compute than your actual business logic. This article covers the networking concepts that show up repeatedly in system design: transport protocols, real-time communication patterns, HTTP evolution, and connection management.
You should already be comfortable with proxies and reverse proxies before reading this. Familiarity with load balancing helps too, since many of these concepts interact directly with how traffic gets distributed.
TCP vs UDP: the fundamental trade-off
TCP and UDP sit at the transport layer. Every higher-level protocol you use in system design builds on one of them, and the choice has consequences.
TCP provides reliable, ordered delivery. When you send bytes over a TCP connection, the protocol guarantees they arrive in order, without duplicates, and without corruption. It does this through a three-way handshake to establish the connection, sequence numbers to track ordering, acknowledgments to confirm receipt, and retransmission to recover from loss. That reliability has a cost. The handshake alone adds one round trip before any data flows. Head-of-line blocking means a single lost packet stalls the entire stream until it is retransmitted. On a connection with 100ms RTT and 1% packet loss, you will see throughput drop to roughly 60% of the theoretical maximum.
UDP provides none of those guarantees. Packets are sent independently. They can arrive out of order, arrive duplicated, or not arrive at all. There is no handshake, no acknowledgment, no flow control. A UDP packet goes out and the sender moves on. This makes UDP faster for use cases where occasional loss is acceptable and freshness matters more than completeness. Video conferencing drops a frame and moves on. Online gaming interpolates between updates. DNS lookups retry on the application layer rather than waiting for TCP retransmission.
In system design interviews, the choice matters when you are designing real-time systems. A live sports score feed might use UDP under the hood (via WebRTC or a custom protocol) because a score update from 2 seconds ago is worthless if a newer one is available. A financial transaction system uses TCP because losing even one message is unacceptable. Most web services default to TCP because HTTP sits on top of it, and HTTP is how the vast majority of APIs communicate.
Polling: the simplest real-time pattern
When a client needs updates from a server, the most straightforward approach is polling. The client sends an HTTP request at a fixed interval, say every 5 seconds, asking “anything new?” The server responds with data or an empty body.
Polling is dead simple to implement and works with any HTTP infrastructure. It is also wasteful. If you have 100,000 connected clients polling every 5 seconds, that is 20,000 requests per second hitting your server even when nothing has changed. Each request carries HTTP headers (typically 200 to 800 bytes), requires a TCP connection or at least a slot from a connection pool, and demands server-side processing to check for updates. For a chat application, over 90% of those requests will return empty responses.
Long polling: a smarter variation
Long polling improves on basic polling by having the server hold the request open until new data is available or a timeout expires (typically 30 to 60 seconds). The client sends a request, the server parks it, and when something changes, the server responds immediately. The client then sends a new request right away.
This cuts idle traffic dramatically. Instead of 20,000 empty requests per second, you get one request per client per timeout interval when nothing is happening, and near-instant delivery when something does happen. The latency for updates approaches that of a persistent connection. The downside is that each parked request holds a server-side connection open, which ties up memory and file descriptors. A server with a 64K file descriptor limit can hold at most 64K long-polling clients, and in practice the limit is lower because you need descriptors for upstream connections, logging, and other I/O.
Long polling was the backbone of early real-time web applications. Facebook chat used it. Gmail used it. It works surprisingly well up to moderate scale, but it has a fundamental problem: every update cycle requires a full HTTP request/response round trip, including headers, connection negotiation, and parsing overhead.
WebSockets: persistent bidirectional channels
WebSockets solve the connection overhead problem. A WebSocket connection starts as an HTTP request with an Upgrade header. The server agrees, and the connection switches from HTTP to a persistent, full-duplex TCP channel. Both sides can send messages at any time without the overhead of HTTP headers or request/response framing.
The difference in overhead is significant. An HTTP request with typical headers costs 200 to 800 bytes of overhead per message. A WebSocket frame has 2 to 14 bytes of overhead. For a chat application sending 50-byte messages, HTTP overhead can be 4x to 16x the payload size. WebSocket overhead is under 30% of the payload.
sequenceDiagram
participant C as Client
participant S as Server
Note over C,S: HTTP Polling
loop Every 5 seconds
C->>S: GET /updates
S-->>C: 200 (empty or data)
end
Note over C,S: WebSocket
C->>S: GET /ws (Upgrade)
S-->>C: 101 Switching Protocols
S->>C: message: new data
C->>S: message: ack
S->>C: message: new data
HTTP polling sends repeated requests with full headers. WebSockets upgrade once and then exchange lightweight frames in both directions.
WebSockets excel when both the client and server need to push data. Chat, collaborative editing, multiplayer games, and live dashboards are canonical use cases. The persistent connection also eliminates the reconnection latency that long polling suffers on every update cycle.
The trade-offs are real though. WebSocket connections are stateful. A client connects to a specific server, and that server holds the connection state in memory. If you have 10 servers behind a load balancer, a client’s WebSocket is pinned to one of them. This complicates horizontal scaling. You need sticky sessions or a pub/sub backbone (like Redis) so that a message intended for a client on server 3 can be routed there even if the message originated on server 7. WebSocket connections also do not go through HTTP caches or CDNs, which means you lose the caching infrastructure that HTTP gives you for free.
Server-Sent Events: one-way streaming
Server-Sent Events (SSE) occupy a middle ground. The client opens a standard HTTP connection, and the server streams events down that connection as they occur. Unlike WebSockets, SSE is unidirectional: only the server sends data. The client communicates via regular HTTP requests.
SSE uses a simple text-based protocol over HTTP. Each event is a chunk of UTF-8 text with optional event types, IDs, and retry intervals. Browsers handle reconnection automatically. If the connection drops, the browser reconnects and sends a Last-Event-ID header so the server can resume from where it left off. This built-in resume capability is something you have to implement yourself with WebSockets.
For use cases where the server pushes data and the client mostly listens, SSE is simpler than WebSockets. Stock tickers, notification feeds, build logs, and live score updates all fit this pattern. SSE works through HTTP proxies and CDNs without special configuration. It uses standard HTTP, so existing monitoring, logging, and security infrastructure works unchanged.
The limitation is the unidirectional nature. If you need the client to send frequent messages to the server, SSE plus HTTP requests creates more overhead than a single WebSocket connection. SSE is also limited to text data (no binary), though base64 encoding works as a workaround at the cost of 33% size inflation.
HTTP/2 and HTTP/3: protocol evolution
HTTP/1.1 served the web for over 15 years, but its design creates bottlenecks at scale. Each request/response pair occupies a connection for its duration. Browsers open 6 to 8 connections per origin to get parallelism, which wastes memory on both client and server. Head-of-line blocking means a slow response on one connection stalls everything queued behind it.
HTTP/2 fixes this with multiplexing. A single TCP connection carries multiple concurrent streams. Request and response headers are compressed with HPACK, reducing their size by 85% to 90% on typical workloads. Server push lets the server send resources the client has not requested yet but will need. In practice, a page that required 50 HTTP/1.1 connections can use a single HTTP/2 connection with lower total latency.
HTTP/3 goes further by replacing TCP with QUIC, a transport protocol built on UDP. QUIC eliminates TCP’s head-of-line blocking at the transport layer. If stream 4 loses a packet, streams 1, 2, and 3 continue unaffected. QUIC also integrates TLS 1.3 into the handshake, reducing connection setup from TCP’s 2 to 3 round trips (TCP handshake plus TLS handshake) to 1 round trip, or 0 round trips for resumed connections. Google reported that QUIC reduced search latency by 3.6% and video rebuffering by 15.3% compared to TCP.
For system design, the HTTP version affects how you think about connection management and multiplexing. An HTTP/1.1 service behind a reverse proxy might need connection pooling with dozens of connections per upstream server. The same service on HTTP/2 might need just 1 to 3 connections per upstream because multiplexing handles the parallelism. HTTP/3 matters most for mobile clients on unreliable networks where packet loss is common and connection migration (switching from Wi-Fi to cellular without re-handshaking) is valuable.
Connection pooling and keep-alive
Opening a TCP connection is expensive. The three-way handshake takes one RTT. TLS negotiation adds another 1 to 2 RTTs. On a connection with 50ms RTT, that is 100 to 150ms before the first byte of application data flows. If your service makes 10 upstream HTTP calls to process a single user request, creating fresh connections for each one adds 1 to 1.5 seconds of pure connection overhead.
Connection pooling solves this by maintaining a set of pre-established connections that are reused across requests. When your service needs to call an upstream, it borrows a connection from the pool, sends the request, receives the response, and returns the connection to the pool. No handshake delay. Connection pools typically configure a maximum pool size (say 100 connections per upstream host), an idle timeout (how long an unused connection stays in the pool), and a connection lifetime (the maximum age of a connection before it is recycled).
HTTP keep-alive is the mechanism that makes pooling possible. With Connection: keep-alive (the default in HTTP/1.1), the TCP connection remains open after a response completes. The client can send another request on the same connection. Without keep-alive, every HTTP request requires a new TCP connection, which is catastrophically wasteful.
Tuning pool parameters matters. A pool that is too small creates contention: requests queue up waiting for an available connection, adding latency. A pool that is too large wastes memory and file descriptors, and can overwhelm upstream services with too many concurrent connections. A common starting point is to set the pool size equal to the expected concurrent request rate to each upstream, then adjust based on observed p99 latency and connection wait times.
sequenceDiagram
participant App as Application
participant Pool as Connection Pool
participant Up as Upstream Service
App->>Pool: borrow connection
alt Pool has idle connection
Pool-->>App: reuse existing conn
else Pool empty
Pool->>Up: TCP + TLS handshake
Up-->>Pool: connection established
Pool-->>App: new connection
end
App->>Up: HTTP request
Up-->>App: HTTP response
App->>Pool: return connection
Connection pooling amortizes the cost of TCP and TLS handshakes across many requests. Borrowed connections skip the handshake entirely.
In microservice architectures, connection management compounds. If service A calls services B, C, and D, and each of those calls three more services, poor connection management at any layer cascades. A service that creates a new TCP connection for every outbound request will eventually exhaust its ephemeral port range (roughly 28,000 ports on Linux with default settings) under high load. This manifests as EADDRNOTAVAIL errors that are painful to debug in production.
Choosing the right pattern
The communication pattern you choose depends on your requirements. For standard request/response APIs, HTTP with connection pooling is the default. For server-to-client streaming where updates flow in one direction, SSE gives you simplicity and automatic reconnection. For bidirectional real-time communication, WebSockets provide the lowest overhead per message. For fire-and-forget telemetry or media streaming where occasional loss is acceptable, UDP-based protocols reduce latency.
Most real systems use multiple patterns. A web application might serve its API over HTTP/2, push notifications via SSE, handle chat via WebSockets, and stream video over QUIC. The key is matching the protocol to the communication pattern, not picking one protocol for everything.
Consider the infrastructure implications too. WebSockets require sticky sessions or a message routing layer. SSE connections count against the server’s concurrent connection limit. HTTP/3 requires UDP support through your network path, which some corporate firewalls block. Long polling works everywhere but wastes resources at scale. These constraints are as important as the theoretical performance characteristics.
What comes next
Networking gives you the transport layer. But networks fail, services crash, and deployments go wrong. The next article covers reliability patterns: retries with backoff, circuit breakers, bulkheads, and the techniques that keep distributed systems running when individual components do not.