Search…
Maths for ML · Part 2

Scalars, Vectors, and Vector Spaces

In this series (15 parts)
  1. Why Maths Matters for ML: A Practical Overview
  2. Scalars, Vectors, and Vector Spaces
  3. Matrices and Matrix Operations
  4. Matrix Inverses and Systems of Linear Equations
  5. Eigenvalues and Eigenvectors
  6. Matrix Decompositions: LU, QR, SVD
  7. Norms, Distances, and Similarity
  8. Calculus Review: Derivatives and the Chain Rule
  9. Partial Derivatives and Gradients
  10. The Jacobian and Hessian Matrices
  11. Taylor series and local approximations
  12. Probability fundamentals
  13. Random variables and distributions
  14. Bayes theorem and its role in ML
  15. Information theory: entropy, KL divergence, cross-entropy

Every data point in machine learning is a vector. An image is a vector of pixel values. A sentence is a vector of word embeddings. A user profile is a vector of features. Before you can do anything useful in ML, you need to be comfortable with vectors and the operations you can perform on them.

Prerequisites

This article assumes you have read Why Maths Matters for ML. No prior linear algebra knowledge is required.

Scalars

A scalar is a single number. That is it. Temperature, price, age, the number 7, all scalars.

In notation, we write scalars as lowercase letters: aa, bb, xx. When we say aRa \in \mathbb{R}, we mean aa is a real number.

Scalars seem too simple to bother defining, but the distinction matters. When you move to vectors and matrices, you need to be precise about whether something is a single number, a list of numbers, or a grid of numbers.

Vectors

A vector is an ordered list of numbers. A vector with nn entries is called an nn-dimensional vector, and we write it as:

v=[v1v2vn]\mathbf{v} = \begin{bmatrix} v_1 \\ v_2 \\ \vdots \\ v_n \end{bmatrix}

Bold lowercase letters (v\mathbf{v}, w\mathbf{w}, x\mathbf{x}) denote vectors. The individual entries v1,v2,,vnv_1, v_2, \ldots, v_n are scalars called the components of the vector.

A 3D vector like v=[2,5,1]\mathbf{v} = [2, 5, -1] could represent a point in space, a feature set for a data point, or the RGB values of a pixel. The meaning depends on context, but the math is the same.

Vectors in ML

In ML, you encounter vectors constantly:

  • A single training example with nn features is a vector in Rn\mathbb{R}^n.
  • The weights of a linear model form a vector.
  • Gradients are vectors that point in the direction of steepest increase of a function.
  • Word embeddings map words to vectors in a high-dimensional space.

Vector addition

You add two vectors by adding their corresponding components. Both vectors must have the same dimension.

u+v=[u1+v1u2+v2un+vn]\mathbf{u} + \mathbf{v} = \begin{bmatrix} u_1 + v_1 \\ u_2 + v_2 \\ \vdots \\ u_n + v_n \end{bmatrix}

For example:

[142]+[315]=[1+34+(1)2+5]=[433]\begin{bmatrix} 1 \\ 4 \\ -2 \end{bmatrix} + \begin{bmatrix} 3 \\ -1 \\ 5 \end{bmatrix} = \begin{bmatrix} 1+3 \\ 4+(-1) \\ -2+5 \end{bmatrix} = \begin{bmatrix} 4 \\ 3 \\ 3 \end{bmatrix}

Geometrically, adding two vectors places one at the tip of the other. The result is the diagonal of the parallelogram they form.

Vector addition as the parallelogram rule:

graph LR
  O["Origin"] -->|"u"| U["Tip of u"]
  O -->|"v"| V["Tip of v"]
  U -->|"v"| S["u + v"]
  V -->|"u"| S

Vector addition in 2D: u + v follows the parallelogram rule

Vector addition is both commutative (u+v=v+u\mathbf{u} + \mathbf{v} = \mathbf{v} + \mathbf{u}) and associative ((u+v)+w=u+(v+w)(\mathbf{u} + \mathbf{v}) + \mathbf{w} = \mathbf{u} + (\mathbf{v} + \mathbf{w})).

Scalar multiplication

Multiplying a vector by a scalar scales every component:

cv=[cv1cv2cvn]c \cdot \mathbf{v} = \begin{bmatrix} c \cdot v_1 \\ c \cdot v_2 \\ \vdots \\ c \cdot v_n \end{bmatrix}

For example:

3[214]=[6312]3 \cdot \begin{bmatrix} 2 \\ -1 \\ 4 \end{bmatrix} = \begin{bmatrix} 6 \\ -3 \\ 12 \end{bmatrix}

Geometrically, scalar multiplication stretches or shrinks a vector. A positive scalar keeps the direction. A negative scalar flips it. Multiplying by zero gives the zero vector.

This is where the term “scaling” comes from, and it is why scalars are called scalars.

Vector magnitude (norm)

The magnitude (or length) of a vector tells you how far it is from the origin. The most common measure is the Euclidean norm, also called the L2L^2 norm:

v=v12+v22++vn2\|\mathbf{v}\| = \sqrt{v_1^2 + v_2^2 + \cdots + v_n^2}

This is the Pythagorean theorem generalized to nn dimensions.

Other norms you will encounter:

  • L1L^1 norm (Manhattan distance): v1=v1+v2++vn\|\mathbf{v}\|_1 = |v_1| + |v_2| + \cdots + |v_n|
  • LL^\infty norm (max norm): v=max(v1,v2,,vn)\|\mathbf{v}\|_\infty = \max(|v_1|, |v_2|, \ldots, |v_n|)

A unit vector has magnitude 1. You can turn any nonzero vector into a unit vector by dividing it by its norm:

v^=vv\hat{\mathbf{v}} = \frac{\mathbf{v}}{\|\mathbf{v}\|}

This process is called normalization. It is common in ML when you want direction without magnitude, for example, in cosine similarity.

Norm ball shapes in 2D describe different ways to measure size:

graph TD
  subgraph "L1 Norm Ball"
      L1["Diamond shape<br/>Sum of absolute values = 1<br/>Corners on axes<br/>Encourages sparsity"]
  end
  subgraph "L2 Norm Ball"
      L2["Circle shape<br/>Euclidean distance = 1<br/>Smooth boundary<br/>Shrinks weights evenly"]
  end
  subgraph "L-inf Norm Ball"
      LI["Square shape<br/>Max component = 1<br/>All components up to 1<br/>Worst-case measure"]
  end

The dot product

The dot product of two vectors u\mathbf{u} and v\mathbf{v} (both in Rn\mathbb{R}^n) is:

uv=u1v1+u2v2++unvn=i=1nuivi\mathbf{u} \cdot \mathbf{v} = u_1 v_1 + u_2 v_2 + \cdots + u_n v_n = \sum_{i=1}^{n} u_i v_i

The result is a scalar, not a vector. This is why it is sometimes called the scalar product.

Geometric interpretation

The dot product has a beautiful geometric meaning:

uv=u  v  cosθ\mathbf{u} \cdot \mathbf{v} = \|\mathbf{u}\| \; \|\mathbf{v}\| \; \cos\theta

where θ\theta is the angle between the two vectors. This tells you:

Dot product sign tells you the angle between vectors:

graph TD
  subgraph "Positive dot product"
      A1["u"] --- B1["v"]
      C1["Angle < 90 degrees<br/>Same general direction"]
  end
  subgraph "Zero dot product"
      A2["u"] --- B2["v"]
      C2["Angle = 90 degrees<br/>Perpendicular / orthogonal"]
  end
  subgraph "Negative dot product"
      A3["u"] --- B3["v"]
      C3["Angle > 90 degrees<br/>Opposite general direction"]
  end
  • If uv>0\mathbf{u} \cdot \mathbf{v} > 0: the vectors point in roughly the same direction (θ<90°\theta < 90°).
  • If uv=0\mathbf{u} \cdot \mathbf{v} = 0: the vectors are perpendicular, also called orthogonal (θ=90°\theta = 90°).
  • If uv<0\mathbf{u} \cdot \mathbf{v} < 0: the vectors point in roughly opposite directions (θ>90°\theta > 90°).

Why the dot product matters in ML

The dot product is everywhere in ML:

  • Linear models: prediction is y^=wx+b\hat{y} = \mathbf{w} \cdot \mathbf{x} + b.
  • Cosine similarity: sim(u,v)=uvu  v\text{sim}(\mathbf{u}, \mathbf{v}) = \frac{\mathbf{u} \cdot \mathbf{v}}{\|\mathbf{u}\| \; \|\mathbf{v}\|} measures how similar two vectors are, regardless of their magnitude.
  • Attention mechanisms: transformers compute attention scores using dot products of query and key vectors.
  • Matrix multiplication: each entry of a matrix product is a dot product of a row and a column.

Worked example 1: computing a dot product

Compute the dot product of u=[4,2,5]\mathbf{u} = [4, -2, 5] and v=[3,1,2]\mathbf{v} = [3, 1, -2].

Solution:

uv=(4)(3)+(2)(1)+(5)(2)\mathbf{u} \cdot \mathbf{v} = (4)(3) + (-2)(1) + (5)(-2) =12+(2)+(10)= 12 + (-2) + (-10) =0= 0

The dot product is zero. This tells us these two vectors are orthogonal (perpendicular in 3D space).

import numpy as np

u = np.array([4, -2, 5])
v = np.array([3, 1, -2])
print(np.dot(u, v))  # Output: 0

Worked example 2: checking orthogonality

Are the vectors a=[1,2,3]\mathbf{a} = [1, 2, 3] and b=[4,1,2]\mathbf{b} = [4, -1, 2] orthogonal?

Solution: Two vectors are orthogonal if and only if their dot product is zero.

ab=(1)(4)+(2)(1)+(3)(2)\mathbf{a} \cdot \mathbf{b} = (1)(4) + (2)(-1) + (3)(2) =4+(2)+6= 4 + (-2) + 6 =8= 8

Since 808 \neq 0, these vectors are not orthogonal. The positive value tells us the angle between them is less than 90 degrees.

Now let’s try a=[1,2,3]\mathbf{a} = [1, 2, 3] and c=[1,1,1]\mathbf{c} = [1, 1, -1]:

ac=(1)(1)+(2)(1)+(3)(1)\mathbf{a} \cdot \mathbf{c} = (1)(1) + (2)(1) + (3)(-1) =1+2+(3)= 1 + 2 + (-3) =0= 0

✓ These two vectors are orthogonal.

Worked example 3: finding vector magnitude

Find the magnitude of v=[3,4,12]\mathbf{v} = [3, -4, 12] and then normalize it.

Solution:

v=32+(4)2+122\|\mathbf{v}\| = \sqrt{3^2 + (-4)^2 + 12^2} =9+16+144= \sqrt{9 + 16 + 144} =169= \sqrt{169} =13= 13

Now normalize by dividing each component by 13:

v^=113[3412]=[3/134/1312/13][0.2310.3080.923]\hat{\mathbf{v}} = \frac{1}{13} \begin{bmatrix} 3 \\ -4 \\ 12 \end{bmatrix} = \begin{bmatrix} 3/13 \\ -4/13 \\ 12/13 \end{bmatrix} \approx \begin{bmatrix} 0.231 \\ -0.308 \\ 0.923 \end{bmatrix}

Let’s verify the unit vector has magnitude 1:

v^=(313)2+(413)2+(1213)2\|\hat{\mathbf{v}}\| = \sqrt{\left(\frac{3}{13}\right)^2 + \left(\frac{-4}{13}\right)^2 + \left(\frac{12}{13}\right)^2} =9+16+144169= \sqrt{\frac{9 + 16 + 144}{169}} =169169=1=1= \sqrt{\frac{169}{169}} = \sqrt{1} = 1 \quad \checkmark
import numpy as np

v = np.array([3, -4, 12])
magnitude = np.linalg.norm(v)
print(f"Magnitude: {magnitude}")  # Output: 13.0

unit_v = v / magnitude
print(f"Unit vector: {unit_v}")
print(f"Magnitude of unit vector: {np.linalg.norm(unit_v):.4f}")  # Output: 1.0000

Vector spaces

A vector space is the formal setting where vectors live. It is a set of vectors that is closed under addition and scalar multiplication. “Closed under” means that if you add two vectors from the set or scale a vector by any scalar, the result stays in the set.

Formally, a vector space VV over the real numbers must satisfy these axioms:

  1. Closure under addition: if u,vV\mathbf{u}, \mathbf{v} \in V, then u+vV\mathbf{u} + \mathbf{v} \in V.
  2. Closure under scalar multiplication: if vV\mathbf{v} \in V and cRc \in \mathbb{R}, then cvVc\mathbf{v} \in V.
  3. Contains the zero vector: 0V\mathbf{0} \in V.
  4. Additive inverses: for every vV\mathbf{v} \in V, there exists vV-\mathbf{v} \in V.

Plus the usual commutativity, associativity, and distributivity rules.

The most common vector space in ML is Rn\mathbb{R}^n, the set of all nn-dimensional vectors of real numbers. But vector spaces can also contain functions, matrices, or polynomials. The concept is more general than you might expect.

Linear combinations and span

A linear combination of vectors v1,v2,,vk\mathbf{v}_1, \mathbf{v}_2, \ldots, \mathbf{v}_k is any expression of the form:

c1v1+c2v2++ckvkc_1 \mathbf{v}_1 + c_2 \mathbf{v}_2 + \cdots + c_k \mathbf{v}_k

where c1,c2,,ckc_1, c_2, \ldots, c_k are scalars.

The span of a set of vectors is the collection of all possible linear combinations you can make from them. If the span of your vectors covers all of Rn\mathbb{R}^n, then those vectors can represent any point in nn-dimensional space.

Linear independence

Vectors v1,v2,,vk\mathbf{v}_1, \mathbf{v}_2, \ldots, \mathbf{v}_k are linearly independent if none of them can be written as a linear combination of the others. Equivalently, the only solution to:

c1v1+c2v2++ckvk=0c_1 \mathbf{v}_1 + c_2 \mathbf{v}_2 + \cdots + c_k \mathbf{v}_k = \mathbf{0}

is c1=c2==ck=0c_1 = c_2 = \cdots = c_k = 0.

If a set of vectors is linearly dependent, at least one vector is redundant. It carries no new information. In ML terms, linearly dependent features are redundant. Removing them does not reduce your model’s expressive power.

Basis and dimension

A basis for a vector space is a set of linearly independent vectors whose span covers the entire space. The number of vectors in a basis is the dimension of the space.

The standard basis for R3\mathbb{R}^3 is:

e1=[100],e2=[010],e3=[001]\mathbf{e}_1 = \begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix}, \quad \mathbf{e}_2 = \begin{bmatrix} 0 \\ 1 \\ 0 \end{bmatrix}, \quad \mathbf{e}_3 = \begin{bmatrix} 0 \\ 0 \\ 1 \end{bmatrix}

Any vector in R3\mathbb{R}^3 can be written as a linear combination of these three basis vectors.

Why this matters for ML

Vector spaces are not just abstract theory. When you do PCA, you are finding a new basis that captures the most variance in your data. When you say a neural network “learns representations,” you mean it learns to map inputs to a vector space where similar things are close together.

The concept of dimension is also practical. If your data lies on a lower-dimensional subspace within Rn\mathbb{R}^n, dimensionality reduction techniques exploit that structure. This is why understanding vector spaces gives you real insight into how ML algorithms work.

Summary

ConceptWhat it isWhy it matters in ML
ScalarA single numberParameters, learning rates, loss values
VectorAn ordered list of numbersData points, weights, gradients
Vector additionAdd corresponding componentsCombining features, residual connections
Scalar multiplicationScale every componentLearning rate times gradient
MagnitudeLength of a vectorNormalization, regularization
Dot productSum of component-wise productsPredictions, similarity, attention
Vector spaceA set closed under addition and scalingThe mathematical setting for all of ML

What comes next

Now that you understand vectors, the next step is to organize them into rectangular grids. The next article covers matrices and matrix operations, where you will learn how to represent linear transformations and perform the computations that power neural networks.

Start typing to search across all content
navigate Enter open Esc close