Search…
DSA Foundations · Part 1

How computers store data

Every data structure you will ever learn — arrays, linked lists, trees, hash tables — is built on top of the same foundation: computer memory. Before learning any data structure, you need to understand how data is physically stored and accessed. This is what separates programmers who memorize algorithms from those who truly understand why things work the way they do.


What is memory?

Think of your computer’s RAM as a massive row of numbered mailboxes. Each mailbox:

  • Has a unique address (a number, like 0, 1, 2, 3…)
  • Can hold exactly 1 byte (8 bits) of data

Your 8GB RAM has roughly 8 billion of these mailboxes, numbered from 0 to about 8,589,934,591.

When your program says x = 42, the CPU:

  1. Finds a free mailbox (say, address 1000)
  2. Writes the value 42 into that location
  3. Remembers that x lives at address 1000

Binary: the language of memory

Every value in memory is stored as binary — a sequence of 0s and 1s. Each digit is called a bit (binary digit). A group of 8 bits is a byte.

Decimal Binary (8-bit) Bytes used
0 00000000 1
1 00000001 1
42 00101010 1
127 01111111 1
255 11111111 1 (max for unsigned byte)
256 00000001 00000000 2
1000 00000011 11101000 2

How integers are stored in binary

Why this matters for data structures: the number of bytes a value needs determines how much space it occupies in memory. An int (32-bit) takes 4 bytes. A char takes 1 byte. This directly affects how arrays calculate element positions.

Common data sizes

Type Size (bytes) Range
char / byte 1 0 to 255 (or -128 to 127)
short 2 -32,768 to 32,767
int 4 −2.1B to 2.1B
long / int64 8 ±9.2 × 10¹⁸
float 4 ±3.4 × 10³⁸ (7 digits precision)
double 8 ±1.8 × 10³⁰⁸ (15 digits precision)
pointer 8 Any memory address (64-bit system)

Common data type sizes on a 64-bit system


Why array indices start at 0

This is one of the most common beginner questions, and the answer is beautiful in its simplicity. It comes directly from how memory works.

An array is a contiguous block of memory — elements are stored back-to-back with no gaps. If you have an array of integers starting at address 1000:

  • arr[0] is at address 1000 + 0 × 4 = 1000
  • arr[1] is at address 1000 + 1 × 4 = 1004
  • arr[2] is at address 1000 + 2 × 4 = 1008

The formula is:

address(arr[i])=base_address+i×element_size\text{address}(arr[i]) = \text{base\_address} + i \times \text{element\_size}

When i=0i = 0, you get the base address itself — zero offset. The index IS the offset multiplier. If arrays started at 1, the CPU would need to compute base + (i-1) × size every time — an extra subtraction on every access. Starting at 0 eliminates this waste.

Ready
Click Step to see how the computer accesses array elements using memory addresses.

This is why array access is O(1)

No matter which element you access — the first, the last, or any in between — the CPU just computes one multiplication and one addition. There’s no searching, no traversal. This is the fundamental superpower of arrays: constant-time random access.


Stack vs Heap: two types of memory allocation

When your program runs, it uses two main regions of memory:

The Stack

  • Fast: allocation is just moving a pointer
  • Automatic: memory is freed when a function returns
  • Limited size: typically 1–8 MB
  • Grows downward (from high addresses to low)

Used for: local variables, function parameters, return addresses.

The Heap

  • Flexible: can allocate any amount at runtime
  • Manual (in C/C++) or garbage-collected (in Python/Java/JS)
  • Slower: must find a free block, can fragment
  • Grows upward (from low addresses to high)

Used for: dynamic arrays, objects, anything whose size isn’t known at compile time.

Ready
Click Step to see how the computer accesses array elements using memory addresses.

When you use each

def example():
    # Stack: local variables (Python manages this internally)
    x = 10          # int on the stack frame
    name = "hello"  # reference on stack, string object on heap

    # Heap: dynamic allocations
    data = [1, 2, 3, 4, 5]  # list object lives on the heap
    matrix = [[0]*100 for _ in range(100)]  # large allocation → heap

    return data  # data survives because heap is not tied to function lifetime
#include <stdlib.h>

void example() {
    // Stack: fixed-size local variables
    int x = 10;             // 4 bytes on the stack
    int arr[5] = {1,2,3,4,5};  // 20 bytes on the stack

    // Heap: dynamic allocation with malloc
    int* data = malloc(100 * sizeof(int));  // 400 bytes on the heap
    // pointer 'data' is on stack (8 bytes), but points to heap memory

    // MUST free heap memory manually (or it leaks!)
    free(data);
}  // Stack variables (x, arr, data pointer) automatically freed here

How the CPU actually reads memory

When your program accesses arr[3], here’s what happens at the hardware level:

  1. Address calculation: CPU computes base + 3 × 4 = base + 12
  2. Cache check: CPU checks its L1 cache (fastest, ~1ns). If the data is there (cache hit), done.
  3. L2/L3 cache: If not in L1, check L2 (~5ns) then L3 (~20ns)
  4. RAM access: If not in any cache (cache miss), fetch from RAM (~100ns)

This has a huge implication for data structures:

Data that is stored contiguously in memory (arrays) is much faster to access in practice than data scattered around memory (linked lists), because CPUs load entire cache lines (64 bytes) at once.

When you access arr[0], the CPU loads bytes 0–63 into cache. If arr[1] through arr[15] are integers in that same 64-byte range, they’re all in cache already — free to access. This is called spatial locality and it’s why arrays often outperform linked lists in practice, even when linked lists have better theoretical complexity for some operations.

Memory level Size Latency Analogy
L1 Cache 64 KB ~1 ns Book on your desk
L2 Cache 256 KB ~5 ns Bookshelf in your room
L3 Cache 8–32 MB ~20 ns Library downstairs
RAM 8–64 GB ~100 ns Bookstore across town
SSD 256 GB–2 TB ~100,000 ns Warehouse in another city

Memory hierarchy — each level is larger but slower


Pointers: addresses as data

A pointer is a variable that stores a memory address. Instead of holding a value like 42, it holds an address like 1000 — which tells the CPU where to find some other data.

# Python hides pointers, but they're everywhere
a = [1, 2, 3]  # 'a' is actually a pointer to the list object on the heap
b = a           # 'b' is another pointer to the SAME object
b.append(4)
print(a)        # [1, 2, 3, 4] — both point to the same list!

In C, pointers are explicit:

int x = 42;
int* p = &x;    // p holds the ADDRESS of x
printf("%d\n", *p);  // *p "dereferences" — follows the pointer to get 42

Why this matters: every data structure beyond arrays (linked lists, trees, graphs, hash tables) uses pointers to connect pieces of scattered memory. Understanding pointers is understanding how data structures are wired together.


Memory alignment

CPUs work most efficiently when data starts at addresses that are multiples of the data’s size:

  • A 4-byte int should start at an address divisible by 4
  • An 8-byte double should start at an address divisible by 8

If data is misaligned, the CPU might need two memory accesses instead of one. Compilers automatically insert padding bytes between struct fields to maintain alignment:

struct Example {
    char a;     // 1 byte  (address 0)
    // 3 bytes padding (addresses 1-3)
    int b;      // 4 bytes (address 4, aligned to 4)
    char c;     // 1 byte  (address 8)
    // 7 bytes padding (addresses 9-15)
    double d;   // 8 bytes (address 16, aligned to 8)
};
// Total size: 24 bytes (not 14 as you might expect!)

This is why the order of fields in a struct can affect memory usage — and why understanding memory layout matters for performance-critical code.


Key takeaways

  1. Memory is a numbered array of bytes. Every piece of data lives at a specific address.
  2. Arrays start at index 0 because the index is the offset from the base address. Index 0 = zero offset = the base itself.
  3. Array access is O(1) because of the formula: address = base + index × element_size. One multiplication, one addition — done.
  4. Stack is fast but limited (local variables, function calls). Heap is flexible but slower (dynamic allocations).
  5. Contiguous memory (arrays) is cache-friendly. Scattered memory (pointers/linked lists) causes cache misses. This is why arrays are often faster in practice than their theoretical complexity suggests.
  6. Pointers store addresses. Every non-trivial data structure uses pointers to link pieces of memory together.

What’s next

Now that you understand how memory works, the next posts will build on this:

  • Big-O and algorithm analysis — how to measure whether a data structure or algorithm is fast (coming soon)
  • Arrays and dynamic arrays — the simplest data structure, and how languages like Python resize lists behind the scenes (coming soon)
Start typing to search across all content
navigate Enter open Esc close