Apr 6, 2026 · 15 min read · DevOps

Self-hosted runners and pipeline scaling

In this series (10 parts)

What CI/CD actually means
Pipeline anatomy and design
GitHub Actions in depth
GitLab CI/CD in depth
Jenkins fundamentals
Testing in CI pipelines
Artifact management
Pipeline security and supply chain
Progressive delivery
Self-hosted runners and pipeline scaling

Prerequisite: Progressive delivery.

GitHub Actions gives you 2,000 free minutes per month on hosted runners. GitLab gives you 400. For a team running 50 pipelines a day with 20-minute builds, those minutes vanish in a week. Self-hosted runners solve three problems: cost, speed, and access to private networks or specialized hardware.

When hosted runners are not enough

Hosted runners hit limits in predictable ways:

Queue times: hosted runners can take 30-60 seconds to start during peak hours. Self-hosted warm pools start in under 5 seconds.
Build duration: shared hardware is fine for linting but slow for compiling large projects.
Network access: your pipeline needs to reach resources inside a private VPC.
Specialized hardware: ML pipelines need GPUs, iOS builds need macOS.
Cost at scale: GitHub charges $0.008/minute for Linux. At 100K minutes/month that is$ 800. Self-hosted runners on reserved instances cost a fraction of that.

Self-hosted runners have a higher base cost but scale better. The crossover is around 20K-30K minutes/month.

GitHub Actions Runner Controller (ARC)

ARC runs GitHub Actions runners as Kubernetes pods. When a workflow triggers, ARC spins up a pod, runs the job, and tears it down. No persistent VMs to manage.

Installation

helm repo add actions-runner-controller \
  https://actions-runner-controller.github.io/actions-runner-controller

helm install arc actions-runner-controller/actions-runner-controller \
  --namespace arc-system \
  --create-namespace \
  --set authSecret.create=true \
  --set authSecret.github_token="${GITHUB_PAT}"

Runner deployment

# runner-deployment.yaml
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
  name: ci-runners
  namespace: arc-runners
spec:
  replicas: 3
  template:
    spec:
      repository: myorg/myrepo
      labels:
        - self-hosted
        - linux
        - x64
      resources:
        limits:
          cpu: "4"
          memory: 8Gi
        requests:
          cpu: "2"
          memory: 4Gi
      dockerdWithinRunnerContainer: true

Autoscaling

Static replicas waste resources during off-hours. ARC supports horizontal runner autoscaling based on webhook events.

# runner-autoscaler.yaml
apiVersion: actions.summerwind.dev/v1alpha1
kind: HorizontalRunnerAutoscaler
metadata:
  name: ci-runners-autoscaler
  namespace: arc-runners
spec:
  scaleTargetRef:
    kind: RunnerDeployment
    name: ci-runners
  minReplicas: 1
  maxReplicas: 20
  scaleUpTriggers:
    - githubEvent:
        workflowJob: {}
      amount: 1
      duration: "10m"
  scaleDownDelaySecondsAfterScaleOut: 300

When a workflow job is queued, ARC scales up by one runner. After 5 minutes of inactivity, it scales back down. Minimum one runner stays warm to avoid cold-start delays.

GitLab Runner on Kubernetes

GitLab uses a similar pattern with its runner Helm chart.

helm repo add gitlab https://charts.gitlab.io

helm install gitlab-runner gitlab/gitlab-runner \
  --namespace gitlab-runners \
  --create-namespace \
  --set gitlabUrl=https://gitlab.com \
  --set runnerRegistrationToken="${GITLAB_TOKEN}" \
  --set runners.executor=kubernetes \
  --set runners.kubernetes.namespace=gitlab-runners \
  --set runners.kubernetes.cpu_limit="4" \
  --set runners.kubernetes.memory_limit="8Gi"

GitLab runners use the Kubernetes executor to create a new pod for each job. The pod runs the job and is destroyed.

Ephemeral runners

Ephemeral runners are created for a single job and destroyed immediately after. They eliminate state leakage between jobs. No leftover files, no cached credentials, no compromised environment persisting across builds.

GitHub Actions ephemeral mode

apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
  name: ephemeral-runners
spec:
  template:
    spec:
      ephemeral: true
      repository: myorg/myrepo
      resources:
        limits:
          cpu: "4"
          memory: 8Gi

The ephemeral: true flag tells the runner to deregister itself after completing one job. ARC automatically replaces it.

Security benefits

Ephemeral runners prevent three attack vectors:

Credential persistence: a job that leaks a secret into the filesystem cannot affect the next job because the filesystem is destroyed.
Build poisoning: an attacker who modifies build tools on the runner cannot affect subsequent builds.
Lateral movement: the runner pod has a short lifespan, reducing the window for an attacker to pivot.

Caching strategies

Ephemeral runners create a problem: every job starts from scratch. Without caching, you download dependencies on every single run. That is slow and wasteful.

Layer 1: GitHub Actions cache

      - uses: actions/cache@v4
        with:
          path: |
            ~/.npm
            node_modules
          key: npm-${{ runner.os }}-${{ hashFiles('package-lock.json') }}
          restore-keys: |
            npm-${{ runner.os }}-

The cache is stored externally (in GitHub’s infrastructure or an S3 bucket for self-hosted setups). The runner downloads it at the start of the job. This is fast but adds network transfer time.

Layer 2: Docker layer caching

For container builds, cache Docker layers to avoid rebuilding unchanged layers.

      - uses: docker/build-push-action@v5
        with:
          context: .
          push: true
          tags: ghcr.io/myorg/app:${{ github.sha }}
          cache-from: type=registry,ref=ghcr.io/myorg/app:buildcache
          cache-to: type=registry,ref=ghcr.io/myorg/app:buildcache,mode=max

Registry-based caching stores layers in the container registry. Any runner can pull them. No shared filesystem needed.

Layer 3: Shared PersistentVolume

For large dependencies that take too long to download every time, mount a shared PersistentVolume.

apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
  name: cached-runners
spec:
  template:
    spec:
      repository: myorg/myrepo
      volumeMounts:
        - name: shared-cache
          mountPath: /home/runner/.cache
      volumes:
        - name: shared-cache
          persistentVolumeClaim:
            claimName: runner-cache-pvc

This is faster than downloading from S3 on every run but requires a ReadWriteMany PVC (like EFS on AWS or Filestore on GCP).

Pipeline cost optimization

Self-hosted runners are cheaper at scale but still cost money. Optimize with these strategies.

Right-size your runners

Not every job needs 4 CPUs and 8 GB RAM. Linting needs 1 CPU. Unit tests need 2. Only container builds and E2E tests need the full allocation.

# Use runner labels to match jobs to appropriately sized runners
jobs:
  lint:
    runs-on: [self-hosted, small]
    steps:
      - run: npm run lint

  build:
    runs-on: [self-hosted, large]
    steps:
      - run: docker build .

Create multiple RunnerDeployments with different resource limits and labels.

Use spot instances

Spot instances cost 60-90% less than on-demand. CI jobs are perfect for spot because they are short-lived, stateless, and can tolerate interruption (the job just restarts).

# Karpenter provisioner for spot runners
apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  name: ci-runners
spec:
  requirements:
    - key: karpenter.sh/capacity-type
      operator: In
      values: ["spot"]
    - key: node.kubernetes.io/instance-type
      operator: In
      values: ["m5.xlarge", "m5a.xlarge", "m6i.xlarge"]
  limits:
    resources:
      cpu: "64"
      memory: 128Gi
  ttlSecondsAfterEmpty: 60

Karpenter provisions spot nodes when runners need capacity and terminates them 60 seconds after they are empty.

Schedule scale-down

If your team works 9-to-5, scale down runners outside business hours.

apiVersion: batch/v1
kind: CronJob
metadata:
  name: scale-down-runners
spec:
  schedule: "0 18 * * 1-5"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
            - name: scaler
              image: bitnami/kubectl:latest
              command:
                - kubectl
                - patch
                - runnerdeployment
                - ci-runners
                - -n
                - arc-runners
                - --type=merge
                - -p
                - '{"spec":{"replicas":1}}'
          restartPolicy: OnFailure

A matching CronJob at 8 AM scales back up.

Monitoring runner health

Runners that are running but unhealthy waste pipeline time. Monitor these metrics:

Job queue time: how long jobs wait for a runner. If this exceeds 30 seconds consistently, scale up.
Job duration by runner type: compare self-hosted vs hosted to confirm you are getting the speed benefit.
Runner utilization: runners sitting idle cost money. Aim for 60-80% utilization during business hours.
Failed jobs due to runner issues: out-of-memory kills, disk space exhaustion, Docker daemon crashes.

Export these metrics from ARC to Prometheus and build a Grafana dashboard. The dashboard answers one question: are we spending the right amount on CI infrastructure?

Key takeaways

Switch to self-hosted runners when you exceed 20K-30K minutes per month or need private network access.
Use Kubernetes-based runners (ARC or GitLab Runner) for automatic scaling.
Make runners ephemeral for security. Cache dependencies externally to recover speed.
Right-size runners, use spot instances, and schedule scale-down to control costs.
Monitor queue times and utilization to avoid both over-provisioning and under-provisioning.

What comes next

This article concludes the CI/CD Pipelines series. You now have a complete picture: pipeline anatomy, testing strategy, artifact management, security, progressive delivery, and infrastructure scaling. The next step is applying these patterns to your own projects and iterating as your team and codebase grow.

← Back to all series