Apr 5, 2026 · 16 min read · DevOps

Progressive delivery

In this series (10 parts)

Prerequisite: Pipeline security and supply chain.

Traditional deployment is binary: the new version is either running or it is not. Progressive delivery replaces that with a gradient. You expose the new version to 1% of traffic, watch the metrics, promote to 10%, watch again, and eventually reach 100%. If something breaks, you roll back before most users notice.

The CI/CD pipeline orchestrates the rollout, monitors health signals, and triggers rollback automatically. See also: deployment strategies for the infrastructure perspective.

Progressive delivery pipeline

graph LR
  BUILD["Build + Test"] --> CANARY["Canary<br/>1-5% traffic"]
  CANARY --> ANALYZE["Analyze Metrics<br/>Error rate, latency"]
  ANALYZE -->|healthy| PROMOTE["Promote<br/>25% then 50% then 100%"]
  ANALYZE -->|unhealthy| ROLLBACK["Automatic Rollback"]
  PROMOTE --> FULL["Full Rollout"]

  style BUILD fill:#3b82f6,color:#fff
  style CANARY fill:#f59e0b,color:#000
  style ANALYZE fill:#8b5cf6,color:#fff
  style PROMOTE fill:#22c55e,color:#000
  style ROLLBACK fill:#ef4444,color:#fff
  style FULL fill:#10b981,color:#000

Progressive delivery pipeline with automated promotion gates. Each stage checks health metrics before proceeding.

Feature flags

Feature flags decouple deployment from release. You deploy code containing a new feature, but it is hidden behind a flag. Turning it on is a configuration change, not a deployment.

Why this matters for CI/CD

Without flags, long-lived feature branches accumulate merge conflicts. With flags, developers merge to main daily. The code ships to production in a dormant state and a flag flip activates it when ready.

Unleash (open-source)

# docker-compose.yml for local Unleash
services:
  unleash:
    image: unleashorg/unleash-server:latest
    ports:
      - "4242:4242"
    environment:
      DATABASE_URL: postgres://postgres:unleash@db:5432/unleash
    depends_on:
      db:
        condition: service_healthy
  db:
    image: postgres:16
    environment:
      POSTGRES_DB: unleash
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: unleash

SDK integration

import { initialize } from "unleash-client";

const unleash = initialize({
  url: process.env.UNLEASH_URL,
  appName: "api-service",
  customHeaders: { Authorization: process.env.UNLEASH_API_TOKEN },
});

export function isEnabled(flagName, context = {}) {
  return unleash.isEnabled(flagName, context);
}

import { isEnabled } from "../flags.js";

app.get("/api/search", async (req, res) => {
  if (isEnabled("new-search-algorithm", { userId: req.user.id })) {
    return newSearchHandler(req, res);
  }
  return legacySearchHandler(req, res);
});

The userId context enables percentage-based rollouts. Unleash can route 5% of users to the new code path. If errors spike, disable the flag with no redeployment.

LaunchDarkly (managed)

LaunchDarkly provides the same capability as a managed service with richer targeting rules and audit logs. The SDK pattern is nearly identical:

import LaunchDarkly from "@launchdarkly/node-server-sdk";
const client = LaunchDarkly.init(process.env.LD_SDK_KEY);

async function isEnabled(flagKey, user) {
  await client.waitForInitialization();
  return client.variation(flagKey, user, false);
}

Canary deployments via pipeline

A canary deployment runs the new version alongside the old one, serving a small percentage of traffic to the new version. The pipeline monitors metrics and decides whether to promote or roll back.

Argo Rollouts

Argo Rollouts extends Kubernetes with progressive delivery primitives.

# rollout.yaml
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: api-service
spec:
  replicas: 10
  strategy:
    canary:
      canaryService: api-service-canary
      stableService: api-service-stable
      trafficRouting:
        istio:
          virtualService:
            name: api-service-vsvc
            routes:
              - primary
      steps:
        - setWeight: 5
        - pause: { duration: 5m }
        - analysis:
            templates:
              - templateName: success-rate
            args:
              - name: service-name
                value: api-service-canary
        - setWeight: 25
        - pause: { duration: 5m }
        - analysis:
            templates:
              - templateName: success-rate
        - setWeight: 50
        - pause: { duration: 10m }
        - analysis:
            templates:
              - templateName: success-rate
        - setWeight: 100

Analysis template

apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: success-rate
spec:
  args:
    - name: service-name
  metrics:
    - name: success-rate
      interval: 60s
      count: 5
      successCondition: result[0] >= 0.99
      failureLimit: 2
      provider:
        prometheus:
          address: http://prometheus:9090
          query: |
            sum(rate(http_requests_total{
              service="{{args.service-name}}",
              status=~"2.."
            }[2m])) /
            sum(rate(http_requests_total{
              service="{{args.service-name}}"
            }[2m]))

The analysis runs every 60 seconds. If the success rate drops below 99% twice, Argo Rollouts automatically rolls back to the stable version.

A/B testing as a delivery concern

A/B testing is progressive delivery with a business question attached. Instead of asking “is this version healthy?” you ask “does this version improve conversion?”

The pipeline deploys both variants. A feature flag routes users to variant A or variant B. An analytics pipeline collects conversion data. Statistical significance determines the winner.

// Middleware: assign variant
app.use((req, res, next) => {
  const variant = isEnabled("checkout-redesign", {
    userId: req.user.id,
  });
  req.variant = variant ? "B" : "A";
  res.setHeader("X-Variant", req.variant);
  next();
});

The key difference from canary: canary cares about error rates and latency. A/B testing cares about business metrics like click-through rate, conversion, or revenue per session.

Automated rollback triggers

Manual rollback requires someone to notice the problem, diagnose it, and act. At 3 AM, that takes too long. Automated rollback based on metrics is faster and more reliable.

Metrics to watch

Metric	Threshold	Action
Error rate (5xx)	> 1% for 2 minutes	Rollback
P99 latency	> 2x baseline for 5 minutes	Rollback
CPU usage	> 90% for 3 minutes	Pause promotion
Crash loop restarts	> 3 in 5 minutes	Rollback

GitHub Actions rollback

  monitor-canary:
    runs-on: ubuntu-latest
    needs: deploy-canary
    steps:
      - name: Wait for metrics
        run: sleep 300

      - name: Check error rate
        id: check
        run: |
          ERROR_RATE=$(curl -s "http://prometheus:9090/api/v1/query" \
            --data-urlencode 'query=sum(rate(http_requests_total{status=~"5.."}[5m]))/sum(rate(http_requests_total[5m]))' \
            | jq -r '.data.result[0].value[1]')
          echo "error_rate=$ERROR_RATE" >> "$GITHUB_OUTPUT"
          if (( $(echo "$ERROR_RATE > 0.01" | bc -l) )); then
            echo "healthy=false" >> "$GITHUB_OUTPUT"
          else
            echo "healthy=true" >> "$GITHUB_OUTPUT"
          fi

      - name: Rollback if unhealthy
        if: steps.check.outputs.healthy == 'false'
        run: |
          kubectl argo rollouts abort api-service
          echo "Canary aborted due to error rate: ${{ steps.check.outputs.error_rate }}"
          exit 1

      - name: Promote if healthy
        if: steps.check.outputs.healthy == 'true'
        run: kubectl argo rollouts promote api-service

Combining the pieces

A mature progressive delivery pipeline uses all four techniques:

Feature flags gate new functionality at the code level. Developers merge to main daily.
Canary deployments route a small percentage of traffic to the new version at the infrastructure level.
Automated analysis compares canary metrics against the stable baseline.
Automated rollback acts on degraded metrics without human intervention.

The result: you ship faster because the blast radius of any single deployment is small. A bad release affects 5% of users for 5 minutes, not 100% of users for an hour.

What comes next

Progressive delivery relies on pipeline infrastructure. But what happens when the hosted runners that power your pipeline cannot keep up? Self-hosted runners and pipeline scaling covers Kubernetes-based runners, caching strategies, and cost optimization.

← Back to all series