Self-hosted runners and pipeline scaling
In this series (10 parts)
Prerequisite: Progressive delivery.
GitHub Actions gives you 2,000 free minutes per month on hosted runners. GitLab gives you 400. For a team running 50 pipelines a day with 20-minute builds, those minutes vanish in a week. Self-hosted runners solve three problems: cost, speed, and access to private networks or specialized hardware.
When hosted runners are not enough
Hosted runners hit limits in predictable ways:
- Queue times: hosted runners can take 30-60 seconds to start during peak hours. Self-hosted warm pools start in under 5 seconds.
- Build duration: shared hardware is fine for linting but slow for compiling large projects.
- Network access: your pipeline needs to reach resources inside a private VPC.
- Specialized hardware: ML pipelines need GPUs, iOS builds need macOS.
- Cost at scale: GitHub charges 800. Self-hosted runners on reserved instances cost a fraction of that.
Self-hosted runners have a higher base cost but scale better. The crossover is around 20K-30K minutes/month.
GitHub Actions Runner Controller (ARC)
ARC runs GitHub Actions runners as Kubernetes pods. When a workflow triggers, ARC spins up a pod, runs the job, and tears it down. No persistent VMs to manage.
Installation
helm repo add actions-runner-controller \
https://actions-runner-controller.github.io/actions-runner-controller
helm install arc actions-runner-controller/actions-runner-controller \
--namespace arc-system \
--create-namespace \
--set authSecret.create=true \
--set authSecret.github_token="${GITHUB_PAT}"
Runner deployment
# runner-deployment.yaml
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
name: ci-runners
namespace: arc-runners
spec:
replicas: 3
template:
spec:
repository: myorg/myrepo
labels:
- self-hosted
- linux
- x64
resources:
limits:
cpu: "4"
memory: 8Gi
requests:
cpu: "2"
memory: 4Gi
dockerdWithinRunnerContainer: true
Autoscaling
Static replicas waste resources during off-hours. ARC supports horizontal runner autoscaling based on webhook events.
# runner-autoscaler.yaml
apiVersion: actions.summerwind.dev/v1alpha1
kind: HorizontalRunnerAutoscaler
metadata:
name: ci-runners-autoscaler
namespace: arc-runners
spec:
scaleTargetRef:
kind: RunnerDeployment
name: ci-runners
minReplicas: 1
maxReplicas: 20
scaleUpTriggers:
- githubEvent:
workflowJob: {}
amount: 1
duration: "10m"
scaleDownDelaySecondsAfterScaleOut: 300
When a workflow job is queued, ARC scales up by one runner. After 5 minutes of inactivity, it scales back down. Minimum one runner stays warm to avoid cold-start delays.
GitLab Runner on Kubernetes
GitLab uses a similar pattern with its runner Helm chart.
helm repo add gitlab https://charts.gitlab.io
helm install gitlab-runner gitlab/gitlab-runner \
--namespace gitlab-runners \
--create-namespace \
--set gitlabUrl=https://gitlab.com \
--set runnerRegistrationToken="${GITLAB_TOKEN}" \
--set runners.executor=kubernetes \
--set runners.kubernetes.namespace=gitlab-runners \
--set runners.kubernetes.cpu_limit="4" \
--set runners.kubernetes.memory_limit="8Gi"
GitLab runners use the Kubernetes executor to create a new pod for each job. The pod runs the job and is destroyed.
Ephemeral runners
Ephemeral runners are created for a single job and destroyed immediately after. They eliminate state leakage between jobs. No leftover files, no cached credentials, no compromised environment persisting across builds.
GitHub Actions ephemeral mode
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
name: ephemeral-runners
spec:
template:
spec:
ephemeral: true
repository: myorg/myrepo
resources:
limits:
cpu: "4"
memory: 8Gi
The ephemeral: true flag tells the runner to deregister itself after completing one job. ARC automatically replaces it.
Security benefits
Ephemeral runners prevent three attack vectors:
- Credential persistence: a job that leaks a secret into the filesystem cannot affect the next job because the filesystem is destroyed.
- Build poisoning: an attacker who modifies build tools on the runner cannot affect subsequent builds.
- Lateral movement: the runner pod has a short lifespan, reducing the window for an attacker to pivot.
Caching strategies
Ephemeral runners create a problem: every job starts from scratch. Without caching, you download dependencies on every single run. That is slow and wasteful.
Layer 1: GitHub Actions cache
- uses: actions/cache@v4
with:
path: |
~/.npm
node_modules
key: npm-${{ runner.os }}-${{ hashFiles('package-lock.json') }}
restore-keys: |
npm-${{ runner.os }}-
The cache is stored externally (in GitHub’s infrastructure or an S3 bucket for self-hosted setups). The runner downloads it at the start of the job. This is fast but adds network transfer time.
Layer 2: Docker layer caching
For container builds, cache Docker layers to avoid rebuilding unchanged layers.
- uses: docker/build-push-action@v5
with:
context: .
push: true
tags: ghcr.io/myorg/app:${{ github.sha }}
cache-from: type=registry,ref=ghcr.io/myorg/app:buildcache
cache-to: type=registry,ref=ghcr.io/myorg/app:buildcache,mode=max
Registry-based caching stores layers in the container registry. Any runner can pull them. No shared filesystem needed.
Layer 3: Shared PersistentVolume
For large dependencies that take too long to download every time, mount a shared PersistentVolume.
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
name: cached-runners
spec:
template:
spec:
repository: myorg/myrepo
volumeMounts:
- name: shared-cache
mountPath: /home/runner/.cache
volumes:
- name: shared-cache
persistentVolumeClaim:
claimName: runner-cache-pvc
This is faster than downloading from S3 on every run but requires a ReadWriteMany PVC (like EFS on AWS or Filestore on GCP).
Pipeline cost optimization
Self-hosted runners are cheaper at scale but still cost money. Optimize with these strategies.
Right-size your runners
Not every job needs 4 CPUs and 8 GB RAM. Linting needs 1 CPU. Unit tests need 2. Only container builds and E2E tests need the full allocation.
# Use runner labels to match jobs to appropriately sized runners
jobs:
lint:
runs-on: [self-hosted, small]
steps:
- run: npm run lint
build:
runs-on: [self-hosted, large]
steps:
- run: docker build .
Create multiple RunnerDeployments with different resource limits and labels.
Use spot instances
Spot instances cost 60-90% less than on-demand. CI jobs are perfect for spot because they are short-lived, stateless, and can tolerate interruption (the job just restarts).
# Karpenter provisioner for spot runners
apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
name: ci-runners
spec:
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot"]
- key: node.kubernetes.io/instance-type
operator: In
values: ["m5.xlarge", "m5a.xlarge", "m6i.xlarge"]
limits:
resources:
cpu: "64"
memory: 128Gi
ttlSecondsAfterEmpty: 60
Karpenter provisions spot nodes when runners need capacity and terminates them 60 seconds after they are empty.
Schedule scale-down
If your team works 9-to-5, scale down runners outside business hours.
apiVersion: batch/v1
kind: CronJob
metadata:
name: scale-down-runners
spec:
schedule: "0 18 * * 1-5"
jobTemplate:
spec:
template:
spec:
containers:
- name: scaler
image: bitnami/kubectl:latest
command:
- kubectl
- patch
- runnerdeployment
- ci-runners
- -n
- arc-runners
- --type=merge
- -p
- '{"spec":{"replicas":1}}'
restartPolicy: OnFailure
A matching CronJob at 8 AM scales back up.
Monitoring runner health
Runners that are running but unhealthy waste pipeline time. Monitor these metrics:
- Job queue time: how long jobs wait for a runner. If this exceeds 30 seconds consistently, scale up.
- Job duration by runner type: compare self-hosted vs hosted to confirm you are getting the speed benefit.
- Runner utilization: runners sitting idle cost money. Aim for 60-80% utilization during business hours.
- Failed jobs due to runner issues: out-of-memory kills, disk space exhaustion, Docker daemon crashes.
Export these metrics from ARC to Prometheus and build a Grafana dashboard. The dashboard answers one question: are we spending the right amount on CI infrastructure?
Key takeaways
- Switch to self-hosted runners when you exceed 20K-30K minutes per month or need private network access.
- Use Kubernetes-based runners (ARC or GitLab Runner) for automatic scaling.
- Make runners ephemeral for security. Cache dependencies externally to recover speed.
- Right-size runners, use spot instances, and schedule scale-down to control costs.
- Monitor queue times and utilization to avoid both over-provisioning and under-provisioning.
What comes next
This article concludes the CI/CD Pipelines series. You now have a complete picture: pipeline anatomy, testing strategy, artifact management, security, progressive delivery, and infrastructure scaling. The next step is applying these patterns to your own projects and iterating as your team and codebase grow.