Testing in CI pipelines
In this series (10 parts)
Prerequisite: CI/CD pipeline anatomy.
A CI pipeline without a deliberate testing strategy is just a slow build that occasionally turns red. This article maps the test pyramid to pipeline stages, adds parallelism, tackles flaky tests, and introduces contract testing with Pact.
The test pyramid
Fast, cheap tests form the base. Slow, expensive tests sit at the top. You want many more of the former.
graph TB E2E["E2E Tests<br/>Few, slow, brittle<br/>Deploy stage"] INT["Integration Tests<br/>Moderate count<br/>Build stage"] UNIT["Unit Tests<br/>Many, fast, isolated<br/>Commit stage"] E2E --- INT INT --- UNIT style UNIT fill:#22c55e,color:#000 style INT fill:#facc15,color:#000 style E2E fill:#ef4444,color:#fff
Unit tests run first, E2E tests run last.
Unit tests in the commit stage
Unit tests cover pure functions and business logic in isolation. No database, no network. They belong in the first pipeline stage.
# .github/workflows/ci.yml (excerpt)
jobs:
unit-tests:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with: { node-version: 20, cache: npm }
- run: npm ci
- run: npm run test:unit -- --coverage
If unit tests fail, the pipeline stops immediately.
Integration tests in the build stage
Integration tests verify that components work together: a service calling a real database, an API handler hitting a real cache.
integration-tests:
needs: unit-tests
runs-on: ubuntu-latest
services:
postgres:
image: postgres:16
env: { POSTGRES_PASSWORD: testpass, POSTGRES_DB: testdb }
ports: ["5432:5432"]
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with: { node-version: 20, cache: npm }
- run: npm ci
- run: npm run test:integration
env:
DATABASE_URL: postgres://postgres:testpass@localhost:5432/testdb
E2E tests in the deploy stage
E2E tests exercise the full stack via Playwright or Cypress against a staging environment. Keep the count low and focused on critical journeys: login, checkout, search.
e2e-tests:
needs: integration-tests
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: npm ci
- run: npx playwright install --with-deps chromium
- run: npm run test:e2e
env:
BASE_URL: https://staging.example.com
Test parallelism
Most test frameworks support sharding. GitHub Actions supports matrix strategies.
unit-tests:
runs-on: ubuntu-latest
strategy:
matrix:
shard: [1, 2, 3, 4]
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with: { node-version: 20, cache: npm }
- run: npm ci
- run: npx vitest --shard=${{ matrix.shard }}/4
Four shards run simultaneously. A 12-minute suite drops to roughly 3 minutes.
Diminishing returns start around 8 shards for most projects.
Flaky tests
A flaky test passes and fails on the same code without changes. Flaky tests erode trust and developers start ignoring red builds.
Common causes:
- Shared state: tests modify a database row and do not clean up.
- Timing dependencies: a test expects a fast response but the CI runner is slower.
- Order dependence: test B passes only when test A runs first.
Strategies to fight flakiness:
- Quarantine: move flaky tests to a non-blocking job.
- Retry with limits: allow one retry. Two retries means quarantine.
- Deterministic fixtures: use transactions that roll back after each test.
flaky-quarantine:
runs-on: ubuntu-latest
continue-on-error: true
steps:
- uses: actions/checkout@v4
- run: npm ci
- run: npx vitest --config vitest.quarantine.config.ts
Coverage gates
Raw coverage percentages are noisy. A better gate: coverage on changed lines only.
- run: npm run test:unit -- --coverage
- name: Check diff coverage
uses: romeovs/lcov-reporter-action@v0.4.0
with:
lcov-file: coverage/lcov.info
github-token: ${{ secrets.GITHUB_TOKEN }}
min-coverage: 80
filter-changed-files: true
80% on changed lines is a good starting point. 100% leads to useless tests written only to satisfy the gate.
Contract testing with Pact
In a microservice architecture, integration tests between services get expensive fast. Service A needs Service B, which needs Service C. Contract testing breaks this chain: each service publishes a contract describing what it expects and provides.
graph LR Consumer["Consumer Service<br/>Generates contract"] Broker["Pact Broker<br/>Stores contracts"] Provider["Provider Service<br/>Verifies contract"] Consumer -->|publish| Broker Broker -->|retrieve| Provider Provider -->|verify result| Broker Broker -->|can-i-deploy check| Consumer style Consumer fill:#3b82f6,color:#fff style Broker fill:#8b5cf6,color:#fff style Provider fill:#22c55e,color:#000
Consumer publishes a contract to the broker. Provider verifies independently. No shared environment needed.
Consumer side
import { PactV4 } from "@pact-foundation/pact";
const provider = new PactV4({
consumer: "OrderService",
provider: "InventoryService",
});
describe("Inventory API", () => {
it("returns stock level", async () => {
await provider
.addInteraction()
.given("product ABC exists with stock 42")
.uponReceiving("a request for stock level")
.withRequest("GET", "/api/stock/ABC")
.willRespondWith(200, (builder) => {
builder.jsonBody({ sku: "ABC", quantity: 42 });
})
.executeTest(async (mockServer) => {
const res = await fetch(`${mockServer.url}/api/stock/ABC`);
const body = await res.json();
expect(body.quantity).toBe(42);
});
});
});
Provider side
verify-contracts:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: npm ci
- run: npm run start:test &
- run: >
npx pact-provider-verifier
--provider-base-url=http://localhost:3000
--pact-broker-base-url=https://pact.example.com
--provider=InventoryService
--provider-app-version=${{ github.sha }}
--publish-verification-results
The can-i-deploy check asks the broker whether it is safe to deploy this version based on verified contracts. If not, the pipeline fails before reaching production.
Putting it all together
A solid CI test strategy has four properties:
- Speed: unit tests under a minute, integration under five, E2E under ten.
- Isolation: each tier runs independently with no shared side effects.
- Reliability: flaky tests are quarantined, not tolerated.
- Contracts: services verify compatibility without running the full stack.
Encode these into your pipeline config, not a wiki page nobody reads.
What comes next
Testing tells you the code works. The next question is what happens to the artifacts those tests produce. Artifact management covers versioning, registries, signing, and retention policies for the binaries your pipeline builds.