Feb 19, 2026 · 15 min read · Maths for ML

Scalars, Vectors, and Vector Spaces

In this series (15 parts)

Every data point in machine learning is a vector. An image is a vector of pixel values. A sentence is a vector of word embeddings. A user profile is a vector of features. Before you can do anything useful in ML, you need to be comfortable with vectors and the operations you can perform on them.

Prerequisites

This article assumes you have read Why Maths Matters for ML. No prior linear algebra knowledge is required.

Scalars

A scalar is a single number. That is it. Temperature, price, age, the number 7, all scalars.

In notation, we write scalars as lowercase letters: $a$ , $b$ , $x$ . When we say $a \in \mathbb{R}$ , we mean $a$ is a real number.

Scalars seem too simple to bother defining, but the distinction matters. When you move to vectors and matrices, you need to be precise about whether something is a single number, a list of numbers, or a grid of numbers.

Vectors

A vector is an ordered list of numbers. A vector with $n$ entries is called an $n$ -dimensional vector, and we write it as:

\mathbf{v} = \begin{bmatrix} v_1 \\ v_2 \\ \vdots \\ v_n \end{bmatrix}

Bold lowercase letters ( $\mathbf{v}$ , $\mathbf{w}$ , $\mathbf{x}$ ) denote vectors. The individual entries $v_1, v_2, \ldots, v_n$ are scalars called the components of the vector.

A 3D vector like $\mathbf{v} = [2, 5, -1]$ could represent a point in space, a feature set for a data point, or the RGB values of a pixel. The meaning depends on context, but the math is the same.

Vectors in ML

In ML, you encounter vectors constantly:

A single training example with $n$ features is a vector in $\mathbb{R}^n$ .
The weights of a linear model form a vector.
Gradients are vectors that point in the direction of steepest increase of a function.
Word embeddings map words to vectors in a high-dimensional space.

Vector addition

You add two vectors by adding their corresponding components. Both vectors must have the same dimension.

\mathbf{u} + \mathbf{v} = \begin{bmatrix} u_1 + v_1 \\ u_2 + v_2 \\ \vdots \\ u_n + v_n \end{bmatrix}

For example:

\begin{bmatrix} 1 \\ 4 \\ -2 \end{bmatrix} + \begin{bmatrix} 3 \\ -1 \\ 5 \end{bmatrix} = \begin{bmatrix} 1+3 \\ 4+(-1) \\ -2+5 \end{bmatrix} = \begin{bmatrix} 4 \\ 3 \\ 3 \end{bmatrix}

Geometrically, adding two vectors places one at the tip of the other. The result is the diagonal of the parallelogram they form.

Vector addition as the parallelogram rule:

graph LR
  O["Origin"] -->|"u"| U["Tip of u"]
  O -->|"v"| V["Tip of v"]
  U -->|"v"| S["u + v"]
  V -->|"u"| S

Vector addition in 2D: u + v follows the parallelogram rule

Vector addition is both commutative ( $\mathbf{u} + \mathbf{v} = \mathbf{v} + \mathbf{u}$ ) and associative ( $(\mathbf{u} + \mathbf{v}) + \mathbf{w} = \mathbf{u} + (\mathbf{v} + \mathbf{w})$ ).

Scalar multiplication

Multiplying a vector by a scalar scales every component:

c \cdot \mathbf{v} = \begin{bmatrix} c \cdot v_1 \\ c \cdot v_2 \\ \vdots \\ c \cdot v_n \end{bmatrix}

For example:

3 \cdot \begin{bmatrix} 2 \\ -1 \\ 4 \end{bmatrix} = \begin{bmatrix} 6 \\ -3 \\ 12 \end{bmatrix}

Geometrically, scalar multiplication stretches or shrinks a vector. A positive scalar keeps the direction. A negative scalar flips it. Multiplying by zero gives the zero vector.

This is where the term “scaling” comes from, and it is why scalars are called scalars.

Vector magnitude (norm)

The magnitude (or length) of a vector tells you how far it is from the origin. The most common measure is the Euclidean norm, also called the $L^2$ norm:

\|\mathbf{v}\| = \sqrt{v_1^2 + v_2^2 + \cdots + v_n^2}

This is the Pythagorean theorem generalized to $n$ dimensions.

Other norms you will encounter:

$L^1$ norm (Manhattan distance): $\|\mathbf{v}\|_1 = |v_1| + |v_2| + \cdots + |v_n|$
$L^\infty$ norm (max norm): $\|\mathbf{v}\|_\infty = \max(|v_1|, |v_2|, \ldots, |v_n|)$

A unit vector has magnitude 1. You can turn any nonzero vector into a unit vector by dividing it by its norm:

\hat{\mathbf{v}} = \frac{\mathbf{v}}{\|\mathbf{v}\|}

This process is called normalization. It is common in ML when you want direction without magnitude, for example, in cosine similarity.

Norm ball shapes in 2D describe different ways to measure size:

graph TD
  subgraph "L1 Norm Ball"
      L1["Diamond shape<br/>Sum of absolute values = 1<br/>Corners on axes<br/>Encourages sparsity"]
  end
  subgraph "L2 Norm Ball"
      L2["Circle shape<br/>Euclidean distance = 1<br/>Smooth boundary<br/>Shrinks weights evenly"]
  end
  subgraph "L-inf Norm Ball"
      LI["Square shape<br/>Max component = 1<br/>All components up to 1<br/>Worst-case measure"]
  end

The dot product

The dot product of two vectors $\mathbf{u}$ and $\mathbf{v}$ (both in $\mathbb{R}^n$ ) is:

\mathbf{u} \cdot \mathbf{v} = u_1 v_1 + u_2 v_2 + \cdots + u_n v_n = \sum_{i=1}^{n} u_i v_i

The result is a scalar, not a vector. This is why it is sometimes called the scalar product.

Geometric interpretation

The dot product has a beautiful geometric meaning:

\mathbf{u} \cdot \mathbf{v} = \|\mathbf{u}\| \; \|\mathbf{v}\| \; \cos\theta

where $\theta$ is the angle between the two vectors. This tells you:

Dot product sign tells you the angle between vectors:

graph TD
  subgraph "Positive dot product"
      A1["u"] --- B1["v"]
      C1["Angle < 90 degrees<br/>Same general direction"]
  end
  subgraph "Zero dot product"
      A2["u"] --- B2["v"]
      C2["Angle = 90 degrees<br/>Perpendicular / orthogonal"]
  end
  subgraph "Negative dot product"
      A3["u"] --- B3["v"]
      C3["Angle > 90 degrees<br/>Opposite general direction"]
  end

If $\mathbf{u} \cdot \mathbf{v} > 0$ : the vectors point in roughly the same direction ( $\theta < 90°$ ).
If $\mathbf{u} \cdot \mathbf{v} = 0$ : the vectors are perpendicular, also called orthogonal ( $\theta = 90°$ ).
If $\mathbf{u} \cdot \mathbf{v} < 0$ : the vectors point in roughly opposite directions ( $\theta > 90°$ ).

Why the dot product matters in ML

The dot product is everywhere in ML:

Linear models: prediction is $\hat{y} = \mathbf{w} \cdot \mathbf{x} + b$ .
Cosine similarity: $\text{sim}(\mathbf{u}, \mathbf{v}) = \frac{\mathbf{u} \cdot \mathbf{v}}{\|\mathbf{u}\| \; \|\mathbf{v}\|}$ measures how similar two vectors are, regardless of their magnitude.
Attention mechanisms: transformers compute attention scores using dot products of query and key vectors.
Matrix multiplication: each entry of a matrix product is a dot product of a row and a column.

Worked example 1: computing a dot product

Compute the dot product of $\mathbf{u} = [4, -2, 5]$ and $\mathbf{v} = [3, 1, -2]$ .

Solution:

\mathbf{u} \cdot \mathbf{v} = (4)(3) + (-2)(1) + (5)(-2)

= 12 + (-2) + (-10)

= 0

The dot product is zero. This tells us these two vectors are orthogonal (perpendicular in 3D space).

import numpy as np

u = np.array([4, -2, 5])
v = np.array([3, 1, -2])
print(np.dot(u, v))  # Output: 0

Worked example 2: checking orthogonality

Are the vectors $\mathbf{a} = [1, 2, 3]$ and $\mathbf{b} = [4, -1, 2]$ orthogonal?

Solution: Two vectors are orthogonal if and only if their dot product is zero.

\mathbf{a} \cdot \mathbf{b} = (1)(4) + (2)(-1) + (3)(2)

= 4 + (-2) + 6

= 8

Since $8 \neq 0$ , these vectors are not orthogonal. The positive value tells us the angle between them is less than 90 degrees.

Now let’s try $\mathbf{a} = [1, 2, 3]$ and $\mathbf{c} = [1, 1, -1]$ :

\mathbf{a} \cdot \mathbf{c} = (1)(1) + (2)(1) + (3)(-1)

= 1 + 2 + (-3)

= 0

✓ These two vectors are orthogonal.

Worked example 3: finding vector magnitude

Find the magnitude of $\mathbf{v} = [3, -4, 12]$ and then normalize it.

Solution:

\|\mathbf{v}\| = \sqrt{3^2 + (-4)^2 + 12^2}

= \sqrt{9 + 16 + 144}

= \sqrt{169}

= 13

Now normalize by dividing each component by 13:

\hat{\mathbf{v}} = \frac{1}{13} \begin{bmatrix} 3 \\ -4 \\ 12 \end{bmatrix} = \begin{bmatrix} 3/13 \\ -4/13 \\ 12/13 \end{bmatrix} \approx \begin{bmatrix} 0.231 \\ -0.308 \\ 0.923 \end{bmatrix}

Let’s verify the unit vector has magnitude 1:

\|\hat{\mathbf{v}}\| = \sqrt{\left(\frac{3}{13}\right)^2 + \left(\frac{-4}{13}\right)^2 + \left(\frac{12}{13}\right)^2}

= \sqrt{\frac{9 + 16 + 144}{169}}

= \sqrt{\frac{169}{169}} = \sqrt{1} = 1 \quad \checkmark

import numpy as np

v = np.array([3, -4, 12])
magnitude = np.linalg.norm(v)
print(f"Magnitude: {magnitude}")  # Output: 13.0

unit_v = v / magnitude
print(f"Unit vector: {unit_v}")
print(f"Magnitude of unit vector: {np.linalg.norm(unit_v):.4f}")  # Output: 1.0000

Vector spaces

A vector space is the formal setting where vectors live. It is a set of vectors that is closed under addition and scalar multiplication. “Closed under” means that if you add two vectors from the set or scale a vector by any scalar, the result stays in the set.

Formally, a vector space $V$ over the real numbers must satisfy these axioms:

Closure under addition: if $\mathbf{u}, \mathbf{v} \in V$ , then $\mathbf{u} + \mathbf{v} \in V$ .
Closure under scalar multiplication: if $\mathbf{v} \in V$ and $c \in \mathbb{R}$ , then $c\mathbf{v} \in V$ .
Contains the zero vector: $\mathbf{0} \in V$ .
Additive inverses: for every $\mathbf{v} \in V$ , there exists $-\mathbf{v} \in V$ .

Plus the usual commutativity, associativity, and distributivity rules.

The most common vector space in ML is $\mathbb{R}^n$ , the set of all $n$ -dimensional vectors of real numbers. But vector spaces can also contain functions, matrices, or polynomials. The concept is more general than you might expect.

Linear combinations and span

A linear combination of vectors $\mathbf{v}_1, \mathbf{v}_2, \ldots, \mathbf{v}_k$ is any expression of the form:

c_1 \mathbf{v}_1 + c_2 \mathbf{v}_2 + \cdots + c_k \mathbf{v}_k

where $c_1, c_2, \ldots, c_k$ are scalars.

The span of a set of vectors is the collection of all possible linear combinations you can make from them. If the span of your vectors covers all of $\mathbb{R}^n$ , then those vectors can represent any point in $n$ -dimensional space.

Linear independence

Vectors $\mathbf{v}_1, \mathbf{v}_2, \ldots, \mathbf{v}_k$ are linearly independent if none of them can be written as a linear combination of the others. Equivalently, the only solution to:

c_1 \mathbf{v}_1 + c_2 \mathbf{v}_2 + \cdots + c_k \mathbf{v}_k = \mathbf{0}

is $c_1 = c_2 = \cdots = c_k = 0$ .

If a set of vectors is linearly dependent, at least one vector is redundant. It carries no new information. In ML terms, linearly dependent features are redundant. Removing them does not reduce your model’s expressive power.

Basis and dimension

A basis for a vector space is a set of linearly independent vectors whose span covers the entire space. The number of vectors in a basis is the dimension of the space.

The standard basis for $\mathbb{R}^3$ is:

\mathbf{e}_1 = \begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix}, \quad \mathbf{e}_2 = \begin{bmatrix} 0 \\ 1 \\ 0 \end{bmatrix}, \quad \mathbf{e}_3 = \begin{bmatrix} 0 \\ 0 \\ 1 \end{bmatrix}

Any vector in $\mathbb{R}^3$ can be written as a linear combination of these three basis vectors.

Why this matters for ML

Vector spaces are not just abstract theory. When you do PCA, you are finding a new basis that captures the most variance in your data. When you say a neural network “learns representations,” you mean it learns to map inputs to a vector space where similar things are close together.

The concept of dimension is also practical. If your data lies on a lower-dimensional subspace within $\mathbb{R}^n$ , dimensionality reduction techniques exploit that structure. This is why understanding vector spaces gives you real insight into how ML algorithms work.

Summary

Concept	What it is	Why it matters in ML
Scalar	A single number	Parameters, learning rates, loss values
Vector	An ordered list of numbers	Data points, weights, gradients
Vector addition	Add corresponding components	Combining features, residual connections
Scalar multiplication	Scale every component	Learning rate times gradient
Magnitude	Length of a vector	Normalization, regularization
Dot product	Sum of component-wise products	Predictions, similarity, attention
Vector space	A set closed under addition and scaling	The mathematical setting for all of ML

What comes next

Now that you understand vectors, the next step is to organize them into rectangular grids. The next article covers matrices and matrix operations, where you will learn how to represent linear transformations and perform the computations that power neural networks.

← Back to all series