Progressive delivery
In this series (10 parts)
Prerequisite: Pipeline security and supply chain.
Traditional deployment is binary: the new version is either running or it is not. Progressive delivery replaces that with a gradient. You expose the new version to 1% of traffic, watch the metrics, promote to 10%, watch again, and eventually reach 100%. If something breaks, you roll back before most users notice.
The CI/CD pipeline orchestrates the rollout, monitors health signals, and triggers rollback automatically. See also: deployment strategies for the infrastructure perspective.
Progressive delivery pipeline
graph LR BUILD["Build + Test"] --> CANARY["Canary<br/>1-5% traffic"] CANARY --> ANALYZE["Analyze Metrics<br/>Error rate, latency"] ANALYZE -->|healthy| PROMOTE["Promote<br/>25% then 50% then 100%"] ANALYZE -->|unhealthy| ROLLBACK["Automatic Rollback"] PROMOTE --> FULL["Full Rollout"] style BUILD fill:#3b82f6,color:#fff style CANARY fill:#f59e0b,color:#000 style ANALYZE fill:#8b5cf6,color:#fff style PROMOTE fill:#22c55e,color:#000 style ROLLBACK fill:#ef4444,color:#fff style FULL fill:#10b981,color:#000
Progressive delivery pipeline with automated promotion gates. Each stage checks health metrics before proceeding.
Feature flags
Feature flags decouple deployment from release. You deploy code containing a new feature, but it is hidden behind a flag. Turning it on is a configuration change, not a deployment.
Why this matters for CI/CD
Without flags, long-lived feature branches accumulate merge conflicts. With flags, developers merge to main daily. The code ships to production in a dormant state and a flag flip activates it when ready.
Unleash (open-source)
# docker-compose.yml for local Unleash
services:
unleash:
image: unleashorg/unleash-server:latest
ports:
- "4242:4242"
environment:
DATABASE_URL: postgres://postgres:unleash@db:5432/unleash
depends_on:
db:
condition: service_healthy
db:
image: postgres:16
environment:
POSTGRES_DB: unleash
POSTGRES_USER: postgres
POSTGRES_PASSWORD: unleash
SDK integration
import { initialize } from "unleash-client";
const unleash = initialize({
url: process.env.UNLEASH_URL,
appName: "api-service",
customHeaders: { Authorization: process.env.UNLEASH_API_TOKEN },
});
export function isEnabled(flagName, context = {}) {
return unleash.isEnabled(flagName, context);
}
import { isEnabled } from "../flags.js";
app.get("/api/search", async (req, res) => {
if (isEnabled("new-search-algorithm", { userId: req.user.id })) {
return newSearchHandler(req, res);
}
return legacySearchHandler(req, res);
});
The userId context enables percentage-based rollouts. Unleash can route 5% of users to the new code path. If errors spike, disable the flag with no redeployment.
LaunchDarkly (managed)
LaunchDarkly provides the same capability as a managed service with richer targeting rules and audit logs. The SDK pattern is nearly identical:
import LaunchDarkly from "@launchdarkly/node-server-sdk";
const client = LaunchDarkly.init(process.env.LD_SDK_KEY);
async function isEnabled(flagKey, user) {
await client.waitForInitialization();
return client.variation(flagKey, user, false);
}
Canary deployments via pipeline
A canary deployment runs the new version alongside the old one, serving a small percentage of traffic to the new version. The pipeline monitors metrics and decides whether to promote or roll back.
Argo Rollouts
Argo Rollouts extends Kubernetes with progressive delivery primitives.
# rollout.yaml
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: api-service
spec:
replicas: 10
strategy:
canary:
canaryService: api-service-canary
stableService: api-service-stable
trafficRouting:
istio:
virtualService:
name: api-service-vsvc
routes:
- primary
steps:
- setWeight: 5
- pause: { duration: 5m }
- analysis:
templates:
- templateName: success-rate
args:
- name: service-name
value: api-service-canary
- setWeight: 25
- pause: { duration: 5m }
- analysis:
templates:
- templateName: success-rate
- setWeight: 50
- pause: { duration: 10m }
- analysis:
templates:
- templateName: success-rate
- setWeight: 100
Analysis template
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
name: success-rate
spec:
args:
- name: service-name
metrics:
- name: success-rate
interval: 60s
count: 5
successCondition: result[0] >= 0.99
failureLimit: 2
provider:
prometheus:
address: http://prometheus:9090
query: |
sum(rate(http_requests_total{
service="{{args.service-name}}",
status=~"2.."
}[2m])) /
sum(rate(http_requests_total{
service="{{args.service-name}}"
}[2m]))
The analysis runs every 60 seconds. If the success rate drops below 99% twice, Argo Rollouts automatically rolls back to the stable version.
A/B testing as a delivery concern
A/B testing is progressive delivery with a business question attached. Instead of asking “is this version healthy?” you ask “does this version improve conversion?”
The pipeline deploys both variants. A feature flag routes users to variant A or variant B. An analytics pipeline collects conversion data. Statistical significance determines the winner.
// Middleware: assign variant
app.use((req, res, next) => {
const variant = isEnabled("checkout-redesign", {
userId: req.user.id,
});
req.variant = variant ? "B" : "A";
res.setHeader("X-Variant", req.variant);
next();
});
The key difference from canary: canary cares about error rates and latency. A/B testing cares about business metrics like click-through rate, conversion, or revenue per session.
Automated rollback triggers
Manual rollback requires someone to notice the problem, diagnose it, and act. At 3 AM, that takes too long. Automated rollback based on metrics is faster and more reliable.
Metrics to watch
| Metric | Threshold | Action |
|---|---|---|
| Error rate (5xx) | > 1% for 2 minutes | Rollback |
| P99 latency | > 2x baseline for 5 minutes | Rollback |
| CPU usage | > 90% for 3 minutes | Pause promotion |
| Crash loop restarts | > 3 in 5 minutes | Rollback |
GitHub Actions rollback
monitor-canary:
runs-on: ubuntu-latest
needs: deploy-canary
steps:
- name: Wait for metrics
run: sleep 300
- name: Check error rate
id: check
run: |
ERROR_RATE=$(curl -s "http://prometheus:9090/api/v1/query" \
--data-urlencode 'query=sum(rate(http_requests_total{status=~"5.."}[5m]))/sum(rate(http_requests_total[5m]))' \
| jq -r '.data.result[0].value[1]')
echo "error_rate=$ERROR_RATE" >> "$GITHUB_OUTPUT"
if (( $(echo "$ERROR_RATE > 0.01" | bc -l) )); then
echo "healthy=false" >> "$GITHUB_OUTPUT"
else
echo "healthy=true" >> "$GITHUB_OUTPUT"
fi
- name: Rollback if unhealthy
if: steps.check.outputs.healthy == 'false'
run: |
kubectl argo rollouts abort api-service
echo "Canary aborted due to error rate: ${{ steps.check.outputs.error_rate }}"
exit 1
- name: Promote if healthy
if: steps.check.outputs.healthy == 'true'
run: kubectl argo rollouts promote api-service
Combining the pieces
A mature progressive delivery pipeline uses all four techniques:
- Feature flags gate new functionality at the code level. Developers merge to main daily.
- Canary deployments route a small percentage of traffic to the new version at the infrastructure level.
- Automated analysis compares canary metrics against the stable baseline.
- Automated rollback acts on degraded metrics without human intervention.
The result: you ship faster because the blast radius of any single deployment is small. A bad release affects 5% of users for 5 minutes, not 100% of users for an hour.
What comes next
Progressive delivery relies on pipeline infrastructure. But what happens when the hosted runners that power your pipeline cannot keep up? Self-hosted runners and pipeline scaling covers Kubernetes-based runners, caching strategies, and cost optimization.