Search…

Git internals: how Git actually works

In this series (8 parts)
  1. Git internals: how Git actually works
  2. Everyday Git: the commands that matter
  3. Branching and merging
  4. Branching strategies for teams
  5. Git rebase and history rewriting
  6. Git hooks and automation
  7. Monorepos and large repo management
  8. GitOps

Most developers learn Git through its commands. That works until it doesn’t. When a rebase goes sideways or a detached HEAD appears, knowing what Git actually stores makes recovery straightforward instead of terrifying.

Git is not a diff-based system. It is a content-addressable object store with a thin layer of porcelain commands on top.

Content-addressable storage

Every piece of data Git stores gets a SHA-1 hash computed from its contents. That hash becomes the object’s name. Two files with identical content produce the same hash and share the same object. Change a single byte and the hash changes completely.

This means Git never needs to ask “did this file change?” It compares hashes. If the hashes match, the content is identical.

# Hash a file without adding it to Git
echo "hello world" | git hash-object --stdin
# => ce013625030ba8dba906f756967f9e9ca394464a

Run that command on any machine and you get the same hash. The address is the content.

The four object types

Git’s object database contains exactly four types of objects.

Blobs

A blob stores file contents. Nothing else. No filename, no permissions, no metadata. Just raw bytes with a small header.

# Inspect a blob
git cat-file -p ce0136

Two files with different names but identical content share one blob. Git deduplicates automatically.

Trees

A tree maps filenames and permissions to blobs (or other trees for subdirectories). Think of a tree as a directory listing.

100644 blob ce0136...  README.md
040000 tree a1b2c3...  src/

Trees give structure. Blobs give content. Neither knows about history.

Commits

A commit points to exactly one tree (the project snapshot), zero or more parent commits, an author, a committer, and a message.

git cat-file -p HEAD
# tree 4b825dc...
# parent a3f1d7...
# author Alice <alice@example.com> 1700000000 +0000
# committer Alice <alice@example.com> 1700000000 +0000
#
# Add initial project structure

The first commit has no parent. A normal commit has one parent. A merge commit has two or more.

Tags

An annotated tag is an object that points to a commit (usually) and adds a tagger, date, and message. Lightweight tags skip the object and just point directly at a commit.

The object graph

These four types form a directed acyclic graph (DAG). Commits point to trees. Trees point to blobs and other trees. Every object is immutable once created.

graph LR
C1["commit: a3f1"] --> T1["tree: 4b82"]
C2["commit: b7d9"] --> T2["tree: 9e1a"]
C2 --> C1
T1 --> B1["blob: ce01<br/>README.md"]
T1 --> T3["tree: d4f2<br/>src/"]
T3 --> B2["blob: 8a3c<br/>index.js"]
T2 --> B1
T2 --> T4["tree: f1e5<br/>src/"]
T4 --> B3["blob: 2b7d<br/>index.js"]

The Git object graph. Commit b7d9 points to a new tree, but the README blob is shared because its content did not change.

Notice how the README blob appears once even though both commits reference it. This is deduplication at work.

Refs: human-friendly names

SHA-1 hashes are terrible for humans. Refs solve this. A ref is simply a file containing a 40-character hash.

cat .git/refs/heads/main
# b7d9e4f2a1c3b5d7e9f1a3c5b7d9e4f2a1c3b5d7

That is all a branch is. A file with a hash. When you make a new commit on main, Git writes the new commit’s hash into that file.

Branches as pointers

A branch is a movable pointer to a commit. Creating a branch costs almost nothing because Git only creates a 41-byte file (40 hex chars plus a newline).

# Create a branch
git branch feature
# This creates .git/refs/heads/feature pointing to the current commit

Deleting a branch removes the pointer. The commits remain in the object store until garbage collection cleans up unreachable objects.

HEAD tells Git which branch you are on. It is usually a symbolic reference.

cat .git/HEAD
# ref: refs/heads/main

When you switch branches, Git updates HEAD to point at the new branch ref. When HEAD points directly at a commit hash instead of a branch, you are in detached HEAD state.

git checkout a3f1d7
cat .git/HEAD
# a3f1d7e9b2c4d6f8a0b2c4d6f8a0b2c4d6f8a0b2

Detached HEAD is not an error. It means “you are not on any branch.” Commits made here are valid but will be lost if you switch away without creating a branch to hold them.

The reflog: your safety net

Git records every time a ref changes in the reflog. Even after a bad rebase or reset, the old commit hashes live in the reflog for at least 30 days.

git reflog
# b7d9e4f HEAD@{0}: commit: Add auth module
# a3f1d7e HEAD@{1}: checkout: moving from feature to main

If you ever lose commits, git reflog is where you find them.

Packfiles and efficiency

Storing complete snapshots sounds wasteful. Git addresses this with packfiles. During garbage collection or push/fetch, Git packs objects into compressed files using delta compression. It stores the most recent version in full and diffs against older versions.

# Trigger packing manually
git gc

# See pack statistics
git count-objects -v

This is an implementation detail, not a conceptual model. The object graph stays the same. Packfiles just compress it.

Walking the graph yourself

You can verify everything described here using plumbing commands.

# Show the tree for the current commit
git cat-file -p HEAD^{tree}

# Show a specific blob
git cat-file -p <blob-hash>

# List all objects
git rev-list --all --objects

# Verify object database integrity
git fsck

These commands peel back the porcelain and expose the plumbing. They are invaluable for debugging.

Why this matters

Understanding the object model changes how you think about Git operations:

OperationWhat actually happens
git commitCreate blob(s), tree(s), and a commit object. Update the branch ref.
git branch featureWrite a 41-byte file.
git mergeCreate a commit with two parents.
git checkoutUpdate HEAD and rewrite the working tree to match the target tree.
git reset --hardMove the branch pointer. Update index and working tree.

Nothing is magic. Every command manipulates the object graph and refs.

Garbage collection

Objects that no longer have any ref pointing to them (directly or through a chain of parent commits) are unreachable. Git’s garbage collector removes unreachable objects after a grace period.

# See unreachable objects
git fsck --unreachable

# Run garbage collection
git gc --prune=now

The default grace period is two weeks. The reflog keeps objects reachable for at least 30 days even after you reset or delete branches.

Common misconceptions

“Git stores diffs.” It stores snapshots. Delta compression in packfiles is a storage optimization, not the data model.

“Deleting a branch deletes commits.” It deletes a pointer. Commits persist until GC removes unreachable ones.

“Rebase rewrites history.” Rebase creates new commit objects with new hashes. The old commits still exist until GC runs.

“merge and rebase do the same thing.” They produce different graph topologies. Merge creates a commit with multiple parents. Rebase replays commits onto a new base, creating new single-parent commits.

What comes next

Now that you understand how Git stores data internally, the next article covers the everyday commands you will use constantly. Knowing that git add creates blobs and updates the index makes the staging area intuitive rather than mysterious.

Start typing to search across all content
navigate Enter open Esc close