Networking in the cloud
In this series (10 parts)
- Cloud fundamentals and the shared responsibility model
- Compute: VMs, containers, serverless
- Networking in the cloud
- Cloud storage services
- Managed databases in the cloud
- Cloud IAM and access control
- Serverless architecture patterns
- Cloud cost management
- Multi-cloud and cloud-agnostic design
- Cloud Well-Architected Framework
Cloud networking defines how your resources communicate with each other and the outside world. Get it right and your architecture is secure, performant, and easy to reason about. Get it wrong and you face mysterious connectivity failures, security breaches, or surprise egress bills. This article covers VPC design from the ground up.
Virtual Private Cloud
A Virtual Private Cloud (VPC) is a logically isolated section of the cloud provider’s network. You define the IP address range, create subnets, configure route tables, and control traffic flow. Every resource you launch lives inside a VPC.
AWS calls it VPC. GCP calls it VPC Network. Azure calls it Virtual Network (VNet). The concepts map directly across providers.
CIDR blocks
You assign a VPC a CIDR block like 10.0.0.0/16, which gives you 65,536 IP addresses. Plan your CIDR ranges carefully. Overlapping ranges between VPCs prevent peering. Too small a range limits future growth.
A common approach:
- Production:
10.0.0.0/16 - Staging:
10.1.0.0/16 - Development:
10.2.0.0/16
This keeps environments isolated and peerable.
Subnet design
Subnets divide a VPC into smaller network segments. Each subnet exists in a single availability zone. The three-tier pattern is the most common design.
graph TD IGW["Internet Gateway"] --> PubRT["Public Route Table"] PubRT --> PubA["Public Subnet AZ-a<br/>10.0.1.0/24"] PubRT --> PubB["Public Subnet AZ-b<br/>10.0.2.0/24"] PubA --> NAT["NAT Gateway"] NAT --> PrivRT["Private Route Table"] PrivRT --> PrivA["Private Subnet AZ-a<br/>10.0.10.0/24"] PrivRT --> PrivB["Private Subnet AZ-b<br/>10.0.20.0/24"] PrivA --> DbRT["Database Route Table"] DbRT --> DbA["Database Subnet AZ-a<br/>10.0.100.0/24"] DbRT --> DbB["Database Subnet AZ-b<br/>10.0.200.0/24"] style PubA fill:#2ecc71,color:#fff style PubB fill:#2ecc71,color:#fff style PrivA fill:#3498db,color:#fff style PrivB fill:#3498db,color:#fff style DbA fill:#e74c3c,color:#fff style DbB fill:#e74c3c,color:#fff style IGW fill:#f39c12,color:#fff style NAT fill:#9b59b6,color:#fff
A well-architected VPC with public, private, and database subnets across two availability zones.
Public subnets
Public subnets have a route to the internet gateway (IGW). Resources here get public IP addresses and are reachable from the internet. Load balancers and bastion hosts live in public subnets. Application servers should not.
Private subnets
Private subnets have no direct route to the internet. Outbound traffic goes through a NAT gateway, which allows instances to pull updates and call external APIs without being directly addressable from the internet. Your application servers, workers, and internal services belong here.
Database subnets
Database subnets are private subnets with even tighter restrictions. They have no route to the internet at all. No NAT gateway. Databases only communicate with the private subnets via security group rules. This layered approach limits the blast radius of a compromise.
Routing
Every subnet is associated with a route table. A route table contains rules that determine where network traffic is directed.
| Destination | Target | Purpose |
|---|---|---|
| 10.0.0.0/16 | local | Traffic within the VPC |
| 0.0.0.0/0 | igw-xxx | Internet access (public subnets) |
| 0.0.0.0/0 | nat-xxx | Outbound internet (private subnets) |
The most specific route wins. Traffic destined for 10.0.10.5 matches the 10.0.0.0/16 local route, not the 0.0.0.0/0 default route.
Internet Gateway and NAT Gateway
The Internet Gateway is a horizontally scaled, redundant component that allows communication between your VPC and the internet. It is attached to the VPC, and public subnets route to it.
The NAT Gateway sits in a public subnet and enables outbound internet access for resources in private subnets. It translates private IP addresses to its own public IP. NAT gateways are not free. They charge per hour and per GB of data processed. For cost-sensitive environments, a NAT instance (a small VM acting as a NAT) is cheaper but less reliable.
Security groups vs NACLs
Cloud networking provides two layers of traffic filtering.
Security groups
Security groups are stateful firewalls attached to individual resources (VMs, load balancers, databases). “Stateful” means if you allow inbound traffic on port 443, the response is automatically allowed out.
Inbound rules:
- Port 443, Source: 0.0.0.0/0 (HTTPS from anywhere)
- Port 80, Source: 0.0.0.0/0 (HTTP from anywhere)
Outbound rules:
- All traffic, Destination: 0.0.0.0/0 (allow all outbound)
Security groups default to denying all inbound traffic. You add rules to allow specific traffic. A powerful pattern: reference other security groups instead of IP ranges. Allow the application security group to reach the database security group on port 5432. If instances move or scale, the rules still work.
Network ACLs
Network ACLs (NACLs) are stateless firewalls applied at the subnet level. “Stateless” means you must explicitly allow both inbound and outbound traffic, including ephemeral port ranges for response traffic.
| Feature | Security Groups | NACLs |
|---|---|---|
| Scope | Resource level | Subnet level |
| Statefulness | Stateful | Stateless |
| Rule type | Allow only | Allow and deny |
| Rule evaluation | All rules evaluated | Rules evaluated in order |
| Default behavior | Deny all inbound | Allow all traffic |
Most teams rely primarily on security groups and use NACLs as a secondary defense layer. NACLs are useful for blocking known malicious IP ranges at the subnet boundary.
VPC peering and Transit Gateway
As your cloud footprint grows, you need to connect multiple VPCs.
VPC peering
VPC peering creates a direct network link between two VPCs. Traffic stays on the provider’s backbone. It does not traverse the public internet.
Limitations:
- No transitive routing. If VPC-A peers with VPC-B and VPC-B peers with VPC-C, VPC-A cannot reach VPC-C through VPC-B.
- CIDR blocks must not overlap.
- Peering connections must be established in both directions.
Transit Gateway
Transit Gateway (AWS) or equivalent services solve the transitive routing problem. It acts as a central hub that connects multiple VPCs, on-premises networks, and VPN connections.
graph TD TGW["Transit Gateway"] --> VPC1["VPC: Production"] TGW --> VPC2["VPC: Staging"] TGW --> VPC3["VPC: Shared Services"] TGW --> VPN["VPN to On-Premises"] style TGW fill:#9b59b6,color:#fff style VPC1 fill:#3498db,color:#fff style VPC2 fill:#2ecc71,color:#fff style VPC3 fill:#f39c12,color:#fff style VPN fill:#e74c3c,color:#fff
Transit Gateway provides a hub-and-spoke model for connecting multiple VPCs and on-premises networks.
GCP uses Shared VPC and VPC Network Peering. Azure uses Virtual Network Peering and Virtual WAN. The hub-and-spoke topology is common across all providers.
DNS
Domain Name System (DNS) translates human-readable names to IP addresses. Cloud providers offer managed DNS services.
| Feature | AWS Route 53 | GCP Cloud DNS | Azure DNS |
|---|---|---|---|
| Public zones | Yes | Yes | Yes |
| Private zones | Yes | Yes | Yes |
| Health checks | Yes | No (use Cloud Monitoring) | Yes (Traffic Manager) |
| Routing policies | Weighted, latency, failover, geo | Weighted round robin | Traffic Manager policies |
Private DNS zones
Private DNS zones resolve names only within your VPC. Your database might be reachable at db.internal.myapp.com instead of a raw IP address. This simplifies configuration and makes it portable across environments.
Service discovery
In dynamic environments where instances come and go, DNS-based service discovery helps services find each other. AWS Cloud Map, GCP Service Directory, and Azure Private DNS with auto-registration all support this pattern.
Private connectivity
Sometimes traffic must never touch the public internet.
VPC endpoints / Private Link
VPC endpoints let you access cloud services (S3, DynamoDB, SQS) from your VPC without going through the internet gateway or NAT. Traffic stays within the provider’s network.
Two types on AWS:
- Gateway endpoints: Free. Support S3 and DynamoDB.
- Interface endpoints: Use Elastic Network Interfaces with private IPs. Support most other services. They cost money.
GCP calls this Private Google Access and Private Service Connect. Azure calls it Private Endpoints.
VPN and Direct Connect
For hybrid architectures connecting cloud to on-premises:
- Site-to-site VPN: Encrypted tunnel over the public internet. Easy to set up. Bandwidth limited and latency varies.
- Direct Connect / Cloud Interconnect / ExpressRoute: Dedicated physical connection to the provider’s network. Consistent latency, high bandwidth, but expensive and takes weeks to provision.
Cost traps in networking
Networking costs are the most common surprise on cloud bills.
- Cross-AZ data transfer: Traffic between AZs in the same region costs money. Place communicating services in the same AZ when possible.
- NAT Gateway data processing: Charges per GB. A chatty application calling external APIs can run up significant NAT costs.
- Egress charges: Data leaving the cloud is expensive. Data entering is usually free.
- VPC endpoints save money: If your application makes heavy calls to S3, a gateway endpoint eliminates NAT data processing charges.
Data transfer within the same AZ is free. Every hop after that adds cost.
What comes next
With networking in place, the next article covers cloud storage services: object storage, block storage, file storage, lifecycle policies, and replication strategies.