Search…

Monorepos and large repo management

In this series (8 parts)
  1. Git internals: how Git actually works
  2. Everyday Git: the commands that matter
  3. Branching and merging
  4. Branching strategies for teams
  5. Git rebase and history rewriting
  6. Git hooks and automation
  7. Monorepos and large repo management
  8. GitOps

Google stores billions of lines of code in a single repository. Most teams do not operate at that scale, but the monorepo pattern is increasingly popular even for mid-sized projects. The trade-offs between monorepos and polyrepos shape how you organize code, manage dependencies, and configure CI.

Monorepo vs polyrepo

A monorepo stores multiple projects, services, or packages in one repository. A polyrepo gives each project its own repository.

AspectMonorepoPolyrepo
Code sharingImport directly, always up to datePublish packages, manage versions
Atomic changesOne commit can update API + clientCoordinated PRs across repos
CI complexityMust detect what changedEach repo has focused CI
Clone sizeGrows with every projectSmall per repo
Access controlCoarse (repo-level) or needs CODEOWNERSFine-grained per repo
Dependency managementSingle lockfile, shared versionsIndependent version per repo

Neither is universally better. The choice depends on your team’s workflow.

When monorepos shine

  • Tightly coupled services that change together frequently.
  • Shared libraries consumed by multiple services.
  • Small to medium teams that want simplified dependency management.
  • Organizations that value atomic cross-project changes.

When polyrepos make sense

  • Independent teams with separate release cycles.
  • Open-source projects where each package has its own contributors.
  • Projects with fundamentally different toolchains.
  • Strict access control requirements.

Monorepo structure

A typical monorepo layout:

my-monorepo/
  packages/
    shared-utils/
      package.json
      src/
    web-app/
      package.json
      src/
    api-server/
      package.json
      src/
  package.json          # root workspace config
  nx.json               # or turbo.json
  .github/
    workflows/
      ci.yml

Workspaces (npm, Yarn, pnpm) handle the package relationships. A build orchestrator handles task execution.

Scaling Git for large repos

As a monorepo grows, standard Git operations slow down. Several features address this.

Sparse checkout

Sparse checkout lets you check out only the directories you need. The rest of the repository exists in the object store but does not appear in your working directory.

# Enable sparse checkout
git sparse-checkout init --cone

# Check out only specific directories
git sparse-checkout set packages/web-app packages/shared-utils

# List current sparse checkout paths
git sparse-checkout list

# Disable sparse checkout
git sparse-checkout disable

The --cone mode restricts patterns to directory-based matching, which is significantly faster than arbitrary patterns.

Shallow clones

A shallow clone fetches only recent history, dramatically reducing clone time and disk usage.

# Clone with only the last commit
git clone --depth 1 https://github.com/org/monorepo.git

# Clone with the last 10 commits
git clone --depth 10 https://github.com/org/monorepo.git

# Deepen later if needed
git fetch --deepen=50

# Convert to a full clone
git fetch --unshallow

Shallow clones are ideal for CI environments where you only need the current code, not the full history.

Partial clones

Partial clones go further. They skip downloading blob objects until they are needed (on-demand fetching).

# Blobless clone: skip blobs, fetch on demand
git clone --filter=blob:none https://github.com/org/monorepo.git

# Treeless clone: skip blobs and trees
git clone --filter=tree:0 https://github.com/org/monorepo.git

Blobless clones are the sweet spot for most developers. You get full history (for log and blame) without downloading every file version upfront.

Git LFS

Git Large File Storage replaces large files with text pointers inside Git, storing the actual content on a separate server.

Why LFS exists

Git stores every version of every file. Binary files (images, videos, compiled assets, datasets) do not diff well and bloat the repository. A 50 MB model file with 20 versions means 1 GB of history.

Setup

# Install Git LFS
git lfs install

# Track file patterns
git lfs track "*.psd"
git lfs track "*.zip"
git lfs track "datasets/**"

# This updates .gitattributes
cat .gitattributes
# *.psd filter=lfs diff=lfs merge=lfs -text
# *.zip filter=lfs diff=lfs merge=lfs -text

How it works

graph LR
DEV["Developer"] -->|git add large.psd| IDX["Staging area"]
IDX -->|git commit| REPO["Local repo<br/>(stores pointer)"]
REPO -->|git push| REMOTE["Git remote<br/>(stores pointer)"]
REPO -->|git lfs push| LFS["LFS server<br/>(stores actual file)"]
LFS -->|git lfs pull| DEV

Git LFS flow. The Git repository stores small pointer files. The actual binary content lives on the LFS server.

When you push, Git LFS uploads the binary to the LFS server and commits only a pointer file. When you clone or pull, LFS downloads the binaries you need.

LFS commands

# See tracked patterns
git lfs track

# See LFS files in the repo
git lfs ls-files

# Pull LFS files (if not auto-fetched)
git lfs pull

# Migrate existing files to LFS
git lfs migrate import --include="*.psd" --everything

LFS considerations

  • Hosting support: GitHub, GitLab, and Bitbucket all support LFS. Self-hosted needs a separate LFS server.
  • Bandwidth: LFS downloads count against hosting quotas.
  • CI: CI runners need LFS installed. Use GIT_LFS_SKIP_SMUDGE=1 to skip LFS downloads when only source code matters.

Build orchestration

A monorepo without build intelligence rebuilds everything on every change. That does not scale.

Nx

Nx analyzes the dependency graph between projects and only builds/tests what is affected by a change.

# Run tests only for affected projects
npx nx affected --target=test

# Build only what changed
npx nx affected --target=build

# Visualize the dependency graph
npx nx graph

Nx caches results locally and remotely. If a project has not changed, its cached test results are reused.

Turborepo

Turborepo takes a similar approach with a focus on simplicity.

{
  "pipeline": {
    "build": {
      "dependsOn": ["^build"],
      "outputs": ["dist/**"]
    },
    "test": {
      "dependsOn": ["build"]
    }
  }
}

Tasks declare their dependencies. Turborepo figures out the execution order and parallelism.

Bazel

Bazel is Google’s build system. It handles multi-language monorepos at massive scale. The learning curve is steep, but it provides hermetic builds, remote caching, and remote execution.

The choice between these tools depends on your ecosystem. Nx and Turborepo excel in JavaScript/TypeScript. Bazel is language-agnostic but complex.

CI strategies for monorepos

Path-based filtering

Run CI jobs only when relevant files change.

# GitHub Actions example
on:
  push:
    paths:
      - 'packages/web-app/**'
      - 'packages/shared-utils/**'

jobs:
  test-web-app:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npm test --workspace=packages/web-app

Affected-based CI

Use the build orchestrator’s affected detection in CI.

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0
      - run: npx nx affected --target=test --base=origin/main

fetch-depth: 0 is important. Nx needs full history to determine what changed since the base branch.

Caching in CI

Cache dependencies and build outputs between runs.

- uses: actions/cache@v4
  with:
    path: |
      node_modules/
      .nx/cache/
    key: ${{ runner.os }}-node-${{ hashFiles('package-lock.json') }}

Remote caching (Nx Cloud, Turborepo remote cache) shares cache across CI runs and developer machines.

CODEOWNERS

In a monorepo, different teams own different directories. CODEOWNERS enforces review requirements.

# .github/CODEOWNERS
packages/web-app/       @frontend-team
packages/api-server/    @backend-team
packages/shared-utils/  @platform-team
infrastructure/         @devops-team

Pull requests that touch a directory automatically request reviews from the owning team.

What comes next

With monorepo management covered, the final article in this series explores GitOps. GitOps takes the concept of Git as a source of truth and applies it to infrastructure and deployments, using the same branching and merging workflows you have learned throughout this series.

Start typing to search across all content
navigate Enter open Esc close