Pipeline anatomy and design
In this series (10 parts)
Every CI/CD system uses different syntax, but they all share the same building blocks. Learn the concepts once and you can read any pipeline configuration file, whether it is a GitHub Actions workflow, a GitLab CI YAML, or a Jenkinsfile. This article covers those building blocks and the design decisions behind them.
Pipeline as code
Modern CI/CD systems define pipelines in version-controlled files that live alongside the application code. This is “pipeline as code.” The alternative, configuring pipelines through a web UI, creates problems:
- Changes to the pipeline are not tracked in version control.
- You cannot review pipeline changes in a pull request.
- Reproducing a pipeline from a previous point in time is impossible.
- Different branches cannot have different pipeline configurations.
Pipeline as code solves all of these. Your pipeline file is a first-class part of the repository. It gets reviewed, versioned, and branched just like application code.
Common filenames:
| System | File |
|---|---|
| GitHub Actions | .github/workflows/*.yml |
| GitLab CI | .gitlab-ci.yml |
| Jenkins | Jenkinsfile |
| CircleCI | .circleci/config.yml |
| Azure Pipelines | azure-pipelines.yml |
Triggers
A trigger defines when a pipeline runs. Most pipelines trigger on code changes, but that is not the only option.
Push triggers run the pipeline when code is pushed to a specific branch. This is the most common trigger and the backbone of CI.
Pull request triggers run the pipeline when a PR is opened or updated. These let you validate changes before they reach main.
Schedule triggers run the pipeline on a cron schedule. Useful for nightly builds, dependency update checks, or long-running test suites that would slow down every push.
Manual triggers let a human start the pipeline on demand. Common for production deploys in continuous delivery setups.
Tag triggers run when a Git tag is pushed. Often used for release pipelines that build and publish versioned artifacts.
Event triggers respond to external events: a new container image, a webhook from another service, or a completed pipeline in a different repository.
Good pipeline design uses multiple triggers. A typical setup:
- On push to main: run the full pipeline, deploy to staging.
- On pull request: run lint, tests, and build. Skip deploy.
- On tag: run the full pipeline, deploy to production.
- On schedule (nightly): run the full pipeline plus long-running integration tests.
Stages, jobs, and steps
These three levels of hierarchy organize the work in a pipeline.
Stages group jobs that serve the same purpose. Common stages: build, test, deploy. Jobs in the same stage may run in parallel. A stage starts only after the previous stage completes successfully.
Jobs are the unit of execution. Each job runs on a separate machine (or container). Jobs in the same stage are independent by default. A job that fails does not necessarily stop other jobs in the same stage, depending on the configuration.
Steps are the individual commands within a job. They run sequentially inside the job’s environment. A step might check out code, install dependencies, run a test suite, or upload an artifact.
graph TD
subgraph Build Stage
A[build-app<br/>Compile and package]
end
subgraph Test Stage
B[unit-tests<br/>Run unit tests]
C[integration-tests<br/>Run integration tests]
D[lint<br/>Run linters]
end
subgraph Security Stage
E[dependency-scan<br/>Check for CVEs]
F[sast<br/>Static analysis]
end
subgraph Deploy Stage
G[deploy-staging<br/>Deploy to staging]
end
subgraph Verify Stage
H[smoke-tests<br/>Run smoke tests]
end
subgraph Release Stage
I[deploy-production<br/>Deploy to prod]
end
A --> B
A --> C
A --> D
B --> E
C --> E
D --> E
E --> G
F --> G
G --> H
H --> I
A realistic multi-stage pipeline. The test stage runs three jobs in parallel. The deploy stage waits for all tests and security checks to pass.
Artifacts
An artifact is a file or set of files produced by one job and consumed by another. The most common artifact is a build output: a compiled binary, a Docker image, a JavaScript bundle, or a JAR file.
Artifacts serve two purposes:
- Passing data between jobs. The build job produces a binary. The deploy job needs that binary. Without artifacts, the deploy job would need to rebuild from source.
- Preserving evidence. Test reports, coverage data, and security scan results are artifacts. They let you inspect what happened after the pipeline finishes.
Artifact design rules:
- Build once, deploy everywhere. The same artifact goes to staging and production. Never rebuild for a different environment.
- Keep artifacts small. Upload only what downstream jobs need. A 2 GB artifact that includes
node_moduleswastes storage and slows transfers. - Set expiration policies. Most CI systems let you configure how long artifacts are retained. Test reports from six months ago are rarely useful.
Caching
Caching stores files between pipeline runs to avoid repeating expensive work. The most common use case: dependency installation.
Without caching, every pipeline run downloads and installs all dependencies from scratch. For a Node.js project with 800 packages, that might take two minutes. For a Java project with Maven dependencies, it could be five minutes. Multiply that by dozens of pipeline runs per day and the wasted time adds up.
A cache entry has a key (usually based on the lock file hash) and a path (the directory to cache). When the lock file changes, the cache key changes and a fresh install happens.
# Example: caching node_modules based on package-lock.json
cache:
key:
files:
- package-lock.json
paths:
- node_modules/
Caching pitfalls:
- Stale caches. If your cache key does not change when it should, you get stale dependencies. Always base the key on the lock file, not a static string.
- Cache poisoning. A bad build can write corrupt data to the cache. Subsequent runs inherit the corruption. If builds start failing mysteriously, clear the cache.
- Cache vs artifact confusion. Caches speed up repeated work within the same job across runs. Artifacts pass data between different jobs within the same run. They are not interchangeable.
Parallelism
Parallelism is the most effective way to make a pipeline faster. There are three levels:
Job-level parallelism. Independent jobs run at the same time. Lint, unit tests, and integration tests can all start in parallel after the build finishes.
Step-level parallelism. Some CI systems let you run steps within a job concurrently. This is less common and harder to manage because steps share the same filesystem.
Test splitting. A single test suite runs across multiple machines. Each machine runs a subset of the tests. If you have 600 tests that take 10 minutes on one machine, splitting them across 4 machines brings the time down to about 3 minutes (accounting for overhead).
Parallelism gives diminishing returns due to overhead from splitting, artifact transfer, and uneven test distribution.
Designing for parallelism
To benefit from parallelism, your pipeline needs to be structured correctly:
- Minimize dependencies between jobs. Every dependency creates a bottleneck. If job B depends on job A, B cannot start until A finishes.
- Keep the critical path short. The critical path is the longest chain of dependent jobs. Identify it and optimize those jobs first.
- Balance parallel jobs. If you split tests across 4 runners but one runner gets 60% of the slow tests, the overall time is limited by that runner.
Dependencies between jobs
Jobs often need to run in a specific order. A deploy job should not start until tests pass. A security scan might need the build artifact.
Dependencies create a directed acyclic graph (DAG). Most CI systems let you define these dependencies explicitly:
# GitHub Actions example
jobs:
build:
runs-on: ubuntu-latest
steps:
- run: npm run build
test:
needs: build
runs-on: ubuntu-latest
steps:
- run: npm test
deploy:
needs: [test]
runs-on: ubuntu-latest
steps:
- run: ./deploy.sh
# GitLab CI example
stages:
- build
- test
- deploy
build-app:
stage: build
script:
- npm run build
run-tests:
stage: test
script:
- npm test
deploy-staging:
stage: deploy
script:
- ./deploy.sh
The DAG approach (explicit needs) is more flexible than the stage approach (implicit ordering). With stages, all jobs in stage 2 wait for all jobs in stage 1. With a DAG, each job waits only for the specific jobs it depends on. This allows more parallelism.
Pipeline design patterns
Fan-out, fan-in
One job triggers multiple parallel jobs. A later job waits for all of them to complete. The build job fans out to unit tests, integration tests, and linting. The deploy job fans in, waiting for all three.
Matrix builds
Run the same job with different configurations. Test against Node 18, 20, and 22. Test on Ubuntu and macOS. A matrix of 3 versions and 2 operating systems produces 6 parallel jobs from one definition.
Conditional stages
Skip stages based on conditions. If the change only affects documentation, skip the build and test stages. If the branch is not main, skip the production deploy. Conditional stages keep pipelines fast for changes that do not need the full treatment.
Environment promotion
Deploy to staging first. Run smoke tests. If they pass, promote the same artifact to production. This pattern ensures you never deploy an untested artifact to production. The key word is “same.” Do not rebuild for production. Promote the exact artifact that passed staging tests.
Pipeline anti-patterns
The monolith pipeline. One pipeline that does everything for every change. Documentation changes trigger a full build, test, and deploy cycle. Use conditional logic to skip irrelevant stages.
The flaky test treadmill. Tests that fail randomly. The team adds retry logic instead of fixing the root cause. Over time, the pipeline becomes unreliable and everyone stops trusting it.
The snowflake pipeline. Each microservice has a completely different pipeline structure. Maintenance becomes a nightmare. Create shared templates or reusable workflows to keep pipelines consistent.
The slow monolith. A pipeline that takes 30+ minutes because everything runs sequentially. Parallelize tests, cache dependencies, and consider splitting the pipeline into stages that can run concurrently.
What comes next
The next three articles go deep on specific CI/CD platforms. GitHub Actions in depth covers workflow files, matrix builds, reusable workflows, and a complete Node.js pipeline. GitLab CI/CD in depth covers .gitlab-ci.yml, runners, and review apps. Jenkins fundamentals covers Jenkinsfile syntax, shared libraries, and the controller/agent architecture. Pick the one your team uses, or read all three to compare.