Capacity planning
In this series (11 parts)
- What SRE is
- Reliability fundamentals
- SLIs, SLOs, and error budgets in practice
- Toil reduction and automation
- Capacity planning
- Performance testing and load testing
- Chaos engineering
- Incident response in practice
- Postmortems and learning from failure
- Production readiness reviews
- Reliability patterns for services
Your service handles 1,200 requests per second on an average Tuesday. Friday morning, marketing launches a campaign. Traffic doubles. The service tips over. Users see 503 errors. Leadership asks hard questions.
This is a capacity failure. Not a code bug, not a network partition. You simply ran out of room. Capacity planning exists to make sure this never happens.
Why capacity planning matters
Capacity problems come in two flavors. Running out of resources causes outages. Over-provisioning wastes money. Both are preventable with a structured approach.
Under-provisioning hits you during traffic spikes. Black Friday sales, product launches, viral social media posts. The system saturates CPU, memory, or database connections and starts rejecting requests. Recovery takes time because auto-scaling needs minutes to spin up new instances. Those minutes cost revenue and trust.
Over-provisioning is the quieter problem. You allocate 10x your peak traffic “just in case.” Finance notices the cloud bill. Engineering credibility takes a hit. Teams that cannot explain their resource usage lose budget negotiations.
The goal is a middle path. Enough capacity to absorb spikes. Not so much that you are paying for idle servers.
Demand forecasting
Capacity planning starts with predicting how much traffic you will serve. Three inputs drive a good forecast.
Historical trends
Look at your traffic data for the past 12 months. Plot requests per second over time. Identify the growth rate. If traffic grew 8% per quarter for the last year, project that forward.
Simple linear regression works surprisingly well for steady-growth services. For services with more complex patterns, use exponential smoothing or ARIMA models. The key is having clean, granular data. Per-minute or per-second metrics are far more useful than daily averages.
Seasonal patterns
Many services have predictable cycles. E-commerce spikes during holidays. B2B SaaS peaks on weekday mornings. Tax software surges in April. Identify your seasonal multipliers and apply them to the baseline forecast.
A service that averages 1,000 RPS but hits 2,500 RPS every December has a seasonal multiplier of 2.5x for that month. Your capacity plan must account for this peak, not just the average.
Planned launches
Product launches, marketing campaigns, and partnerships create step-function increases in traffic. These are not visible in historical data. You need a process to collect launch plans from product and marketing teams.
Build a simple intake form: expected launch date, estimated traffic increase, duration. Even rough estimates are better than being surprised.
Load testing as input to capacity models
Forecasting tells you how much traffic to expect. Load testing tells you how much traffic your infrastructure can handle. The gap between the two drives provisioning decisions.
Run a load test against a staging environment that mirrors production. Gradually increase traffic until the system degrades. Record the breaking point. That number is your empirical capacity ceiling.
Compare it to your forecasted peak:
Capacity gap = Forecasted peak - Empirical ceiling
If the gap is positive, you need more resources before that peak arrives. If it is negative, you have headroom. Simple arithmetic, but teams skip it constantly.
Load test results also reveal which resource saturates first. CPU-bound services need faster or more compute instances. Memory-bound services need larger instance types. Database-bound services need connection pooling, read replicas, or query optimization.
Headroom targets
Running at exactly your forecasted peak leaves zero margin for error. Forecasts are wrong. Traffic is bursty. Deploys temporarily reduce capacity.
Headroom is the buffer between your forecasted peak and your provisioned capacity. Industry practice targets 30% to 50% above peak.
| Risk tolerance | Headroom target | Use case |
|---|---|---|
| Low risk (payments, auth) | 50% | Services where failure has outsized impact |
| Medium risk (API, web) | 30 to 40% | Standard production services |
| High risk tolerance (batch jobs) | 15 to 20% | Non-user-facing workloads |
A service forecasted to peak at 2,000 RPS with a 40% headroom target needs capacity for 2,800 RPS.
Traffic forecast with provisioned capacity stepped up at quarterly reviews. The dotted line shows the 30% headroom target that provisioned capacity must stay above.
Notice the stepped green line. Provisioned capacity does not track the forecast smoothly. It jumps at quarterly review points when teams adjust reservations. The gap between the green and red lines is your safety margin. When the red line approaches the green line, it is time to provision more.
Provisioned vs on-demand capacity
Cloud providers offer two pricing models that map directly to capacity planning strategy.
Reserved capacity
Reserved instances (AWS), committed use discounts (GCP), or reservations (Azure) give you a fixed amount of compute at a discounted rate. You pay whether you use it or not. Use these for your baseline: the minimum traffic your service always handles.
Reserve enough capacity to cover your average traffic plus a small buffer. This is your floor.
Auto-scaling for peaks
Auto-scaling groups add instances when traffic exceeds your reserved capacity. They handle spikes and seasonal peaks. You pay on-demand rates, which are higher, but only when you need the extra capacity.
Configure scaling policies with appropriate cooldown periods. Scaling too aggressively wastes money. Scaling too slowly drops requests. Most teams find that scaling at 60 to 70% CPU utilization with a 3-minute cooldown works as a starting point.
The hybrid approach
The optimal strategy combines both. Reserve capacity for your P50 traffic (the level you exceed 50% of the time). Auto-scale for everything above that. This minimizes cost while maintaining reliability.
Total capacity = Reserved baseline + Auto-scaled headroom
Cost = (Reserved rate * baseline) + (On-demand rate * peak overage)
Capacity review cadence
Capacity planning is not a one-time exercise. Traffic patterns change. New features shift load profiles. Infrastructure costs fluctuate.
Quarterly capacity reviews
Every quarter, gather the team and review:
- Actual vs forecasted traffic. Was the forecast accurate? Adjust the model if it was off by more than 15%.
- Resource utilization. Are reserved instances well-utilized? Are you paying for idle capacity?
- Upcoming launches. What product changes will affect traffic in the next quarter?
- Cost efficiency. Can you right-size instances or shift workloads to cheaper regions?
- Load test results. Has the empirical ceiling changed after recent deploys?
Document decisions and publish them to the team. Capacity planning loses value if it lives in one person’s head.
Annual planning
Once a year, do a deeper review. Evaluate your forecasting model accuracy over 12 months. Renegotiate reserved instance commitments. Reassess headroom targets based on actual incident history.
Documenting traffic model assumptions
Every forecast rests on assumptions. Write them down. When the forecast is wrong, you can trace back to which assumption failed and fix the model.
A capacity planning document should include:
Service: checkout-api
Forecast period: Q3 2026
Baseline traffic: 1,400 RPS (measured P50 from Q2)
Growth assumption: 8% quarterly (based on 4-quarter trend)
Seasonal multiplier: 1.3x for September (back-to-school)
Planned launches: Premium tier launch Aug 15 (+200 RPS estimated)
Headroom target: 40%
Forecasted peak: 1,400 * 1.08 * 1.3 + 200 = 2,165 RPS
Required capacity: 2,165 * 1.4 = 3,031 RPS
Current capacity: 2,800 RPS (load test result from June)
Action needed: Provision additional capacity before August
This format makes capacity decisions auditable. When someone asks why you provisioned 3,000 RPS of capacity, you can point to the math. When the forecast is wrong, you can identify whether the growth rate, seasonal multiplier, or launch estimate was the problem.
Common pitfalls
Averaging instead of using percentiles. A service averaging 500 RPS might spike to 2,000 RPS during peak seconds. Plan for P99 traffic, not averages.
Ignoring dependencies. Your service might handle 3,000 RPS, but if the database caps out at 1,500 connections, that is your real ceiling. Map the full dependency chain and identify the narrowest bottleneck.
Planning for compute but not storage. Disk fills up slowly and then all at once. Include storage growth projections in your capacity plan.
Forgetting about deploy capacity. Rolling deploys temporarily reduce your fleet size. If you run 10 pods and deploy 2 at a time, you have 80% capacity during deploys. Factor this into your headroom.
What comes next
A capacity plan tells you how much traffic to expect. The next step is verifying your system actually handles that traffic. In Performance testing and load testing, you will learn how to run load tests, stress tests, and soak tests that validate your capacity assumptions with real data.