How computers store data
Every data structure you will ever learn — arrays, linked lists, trees, hash tables — is built on top of the same foundation: computer memory. Before learning any data structure, you need to understand how data is physically stored and accessed. This is what separates programmers who memorize algorithms from those who truly understand why things work the way they do.
What is memory?
Think of your computer’s RAM as a massive row of numbered mailboxes. Each mailbox:
- Has a unique address (a number, like 0, 1, 2, 3…)
- Can hold exactly 1 byte (8 bits) of data
Your 8GB RAM has roughly 8 billion of these mailboxes, numbered from 0 to about 8,589,934,591.
When your program says x = 42, the CPU:
- Finds a free mailbox (say, address 1000)
- Writes the value 42 into that location
- Remembers that
xlives at address 1000
Binary: the language of memory
Every value in memory is stored as binary — a sequence of 0s and 1s. Each digit is called a bit (binary digit). A group of 8 bits is a byte.
| Decimal | Binary (8-bit) | Bytes used |
|---|---|---|
| 0 | 00000000 | 1 |
| 1 | 00000001 | 1 |
| 42 | 00101010 | 1 |
| 127 | 01111111 | 1 |
| 255 | 11111111 | 1 (max for unsigned byte) |
| 256 | 00000001 00000000 | 2 |
| 1000 | 00000011 11101000 | 2 |
How integers are stored in binary
Why this matters for data structures: the number of bytes a value needs determines how much space it occupies in memory. An int (32-bit) takes 4 bytes. A char takes 1 byte. This directly affects how arrays calculate element positions.
Common data sizes
| Type | Size (bytes) | Range |
|---|---|---|
| char / byte | 1 | 0 to 255 (or -128 to 127) |
| short | 2 | -32,768 to 32,767 |
| int | 4 | −2.1B to 2.1B |
| long / int64 | 8 | ±9.2 × 10¹⁸ |
| float | 4 | ±3.4 × 10³⁸ (7 digits precision) |
| double | 8 | ±1.8 × 10³⁰⁸ (15 digits precision) |
| pointer | 8 | Any memory address (64-bit system) |
Common data type sizes on a 64-bit system
Why array indices start at 0
This is one of the most common beginner questions, and the answer is beautiful in its simplicity. It comes directly from how memory works.
An array is a contiguous block of memory — elements are stored back-to-back with no gaps. If you have an array of integers starting at address 1000:
arr[0]is at address1000 + 0 × 4 = 1000arr[1]is at address1000 + 1 × 4 = 1004arr[2]is at address1000 + 2 × 4 = 1008
The formula is:
When , you get the base address itself — zero offset. The index IS the offset multiplier. If arrays started at 1, the CPU would need to compute base + (i-1) × size every time — an extra subtraction on every access. Starting at 0 eliminates this waste.
This is why array access is O(1)
No matter which element you access — the first, the last, or any in between — the CPU just computes one multiplication and one addition. There’s no searching, no traversal. This is the fundamental superpower of arrays: constant-time random access.
Stack vs Heap: two types of memory allocation
When your program runs, it uses two main regions of memory:
The Stack
- Fast: allocation is just moving a pointer
- Automatic: memory is freed when a function returns
- Limited size: typically 1–8 MB
- Grows downward (from high addresses to low)
Used for: local variables, function parameters, return addresses.
The Heap
- Flexible: can allocate any amount at runtime
- Manual (in C/C++) or garbage-collected (in Python/Java/JS)
- Slower: must find a free block, can fragment
- Grows upward (from low addresses to high)
Used for: dynamic arrays, objects, anything whose size isn’t known at compile time.
When you use each
def example():
# Stack: local variables (Python manages this internally)
x = 10 # int on the stack frame
name = "hello" # reference on stack, string object on heap
# Heap: dynamic allocations
data = [1, 2, 3, 4, 5] # list object lives on the heap
matrix = [[0]*100 for _ in range(100)] # large allocation → heap
return data # data survives because heap is not tied to function lifetime #include <stdlib.h>
void example() {
// Stack: fixed-size local variables
int x = 10; // 4 bytes on the stack
int arr[5] = {1,2,3,4,5}; // 20 bytes on the stack
// Heap: dynamic allocation with malloc
int* data = malloc(100 * sizeof(int)); // 400 bytes on the heap
// pointer 'data' is on stack (8 bytes), but points to heap memory
// MUST free heap memory manually (or it leaks!)
free(data);
} // Stack variables (x, arr, data pointer) automatically freed here How the CPU actually reads memory
When your program accesses arr[3], here’s what happens at the hardware level:
- Address calculation: CPU computes
base + 3 × 4 = base + 12 - Cache check: CPU checks its L1 cache (fastest, ~1ns). If the data is there (cache hit), done.
- L2/L3 cache: If not in L1, check L2 (~5ns) then L3 (~20ns)
- RAM access: If not in any cache (cache miss), fetch from RAM (~100ns)
This has a huge implication for data structures:
Data that is stored contiguously in memory (arrays) is much faster to access in practice than data scattered around memory (linked lists), because CPUs load entire cache lines (64 bytes) at once.
When you access arr[0], the CPU loads bytes 0–63 into cache. If arr[1] through arr[15] are integers in that same 64-byte range, they’re all in cache already — free to access. This is called spatial locality and it’s why arrays often outperform linked lists in practice, even when linked lists have better theoretical complexity for some operations.
| Memory level | Size | Latency | Analogy |
|---|---|---|---|
| L1 Cache | 64 KB | ~1 ns | Book on your desk |
| L2 Cache | 256 KB | ~5 ns | Bookshelf in your room |
| L3 Cache | 8–32 MB | ~20 ns | Library downstairs |
| RAM | 8–64 GB | ~100 ns | Bookstore across town |
| SSD | 256 GB–2 TB | ~100,000 ns | Warehouse in another city |
Memory hierarchy — each level is larger but slower
Pointers: addresses as data
A pointer is a variable that stores a memory address. Instead of holding a value like 42, it holds an address like 1000 — which tells the CPU where to find some other data.
# Python hides pointers, but they're everywhere
a = [1, 2, 3] # 'a' is actually a pointer to the list object on the heap
b = a # 'b' is another pointer to the SAME object
b.append(4)
print(a) # [1, 2, 3, 4] — both point to the same list!
In C, pointers are explicit:
int x = 42;
int* p = &x; // p holds the ADDRESS of x
printf("%d\n", *p); // *p "dereferences" — follows the pointer to get 42
Why this matters: every data structure beyond arrays (linked lists, trees, graphs, hash tables) uses pointers to connect pieces of scattered memory. Understanding pointers is understanding how data structures are wired together.
Memory alignment
CPUs work most efficiently when data starts at addresses that are multiples of the data’s size:
- A 4-byte
intshould start at an address divisible by 4 - An 8-byte
doubleshould start at an address divisible by 8
If data is misaligned, the CPU might need two memory accesses instead of one. Compilers automatically insert padding bytes between struct fields to maintain alignment:
struct Example {
char a; // 1 byte (address 0)
// 3 bytes padding (addresses 1-3)
int b; // 4 bytes (address 4, aligned to 4)
char c; // 1 byte (address 8)
// 7 bytes padding (addresses 9-15)
double d; // 8 bytes (address 16, aligned to 8)
};
// Total size: 24 bytes (not 14 as you might expect!)
This is why the order of fields in a struct can affect memory usage — and why understanding memory layout matters for performance-critical code.
Key takeaways
- Memory is a numbered array of bytes. Every piece of data lives at a specific address.
- Arrays start at index 0 because the index is the offset from the base address. Index 0 = zero offset = the base itself.
- Array access is O(1) because of the formula:
address = base + index × element_size. One multiplication, one addition — done. - Stack is fast but limited (local variables, function calls). Heap is flexible but slower (dynamic allocations).
- Contiguous memory (arrays) is cache-friendly. Scattered memory (pointers/linked lists) causes cache misses. This is why arrays are often faster in practice than their theoretical complexity suggests.
- Pointers store addresses. Every non-trivial data structure uses pointers to link pieces of memory together.
What’s next
Now that you understand how memory works, the next posts will build on this:
- Big-O and algorithm analysis — how to measure whether a data structure or algorithm is fast (coming soon)
- Arrays and dynamic arrays — the simplest data structure, and how languages like Python resize lists behind the scenes (coming soon)