Scalars, Vectors, and Vector Spaces
In this series (15 parts)
- Why Maths Matters for ML: A Practical Overview
- Scalars, Vectors, and Vector Spaces
- Matrices and Matrix Operations
- Matrix Inverses and Systems of Linear Equations
- Eigenvalues and Eigenvectors
- Matrix Decompositions: LU, QR, SVD
- Norms, Distances, and Similarity
- Calculus Review: Derivatives and the Chain Rule
- Partial Derivatives and Gradients
- The Jacobian and Hessian Matrices
- Taylor series and local approximations
- Probability fundamentals
- Random variables and distributions
- Bayes theorem and its role in ML
- Information theory: entropy, KL divergence, cross-entropy
Every data point in machine learning is a vector. An image is a vector of pixel values. A sentence is a vector of word embeddings. A user profile is a vector of features. Before you can do anything useful in ML, you need to be comfortable with vectors and the operations you can perform on them.
Prerequisites
This article assumes you have read Why Maths Matters for ML. No prior linear algebra knowledge is required.
Scalars
A scalar is a single number. That is it. Temperature, price, age, the number 7, all scalars.
In notation, we write scalars as lowercase letters: , , . When we say , we mean is a real number.
Scalars seem too simple to bother defining, but the distinction matters. When you move to vectors and matrices, you need to be precise about whether something is a single number, a list of numbers, or a grid of numbers.
Vectors
A vector is an ordered list of numbers. A vector with entries is called an -dimensional vector, and we write it as:
Bold lowercase letters (, , ) denote vectors. The individual entries are scalars called the components of the vector.
A 3D vector like could represent a point in space, a feature set for a data point, or the RGB values of a pixel. The meaning depends on context, but the math is the same.
Vectors in ML
In ML, you encounter vectors constantly:
- A single training example with features is a vector in .
- The weights of a linear model form a vector.
- Gradients are vectors that point in the direction of steepest increase of a function.
- Word embeddings map words to vectors in a high-dimensional space.
Vector addition
You add two vectors by adding their corresponding components. Both vectors must have the same dimension.
For example:
Geometrically, adding two vectors places one at the tip of the other. The result is the diagonal of the parallelogram they form.
Vector addition as the parallelogram rule:
graph LR O["Origin"] -->|"u"| U["Tip of u"] O -->|"v"| V["Tip of v"] U -->|"v"| S["u + v"] V -->|"u"| S
Vector addition in 2D: u + v follows the parallelogram rule
Vector addition is both commutative () and associative ().
Scalar multiplication
Multiplying a vector by a scalar scales every component:
For example:
Geometrically, scalar multiplication stretches or shrinks a vector. A positive scalar keeps the direction. A negative scalar flips it. Multiplying by zero gives the zero vector.
This is where the term “scaling” comes from, and it is why scalars are called scalars.
Vector magnitude (norm)
The magnitude (or length) of a vector tells you how far it is from the origin. The most common measure is the Euclidean norm, also called the norm:
This is the Pythagorean theorem generalized to dimensions.
Other norms you will encounter:
- norm (Manhattan distance):
- norm (max norm):
A unit vector has magnitude 1. You can turn any nonzero vector into a unit vector by dividing it by its norm:
This process is called normalization. It is common in ML when you want direction without magnitude, for example, in cosine similarity.
Norm ball shapes in 2D describe different ways to measure size:
graph TD
subgraph "L1 Norm Ball"
L1["Diamond shape<br/>Sum of absolute values = 1<br/>Corners on axes<br/>Encourages sparsity"]
end
subgraph "L2 Norm Ball"
L2["Circle shape<br/>Euclidean distance = 1<br/>Smooth boundary<br/>Shrinks weights evenly"]
end
subgraph "L-inf Norm Ball"
LI["Square shape<br/>Max component = 1<br/>All components up to 1<br/>Worst-case measure"]
end
The dot product
The dot product of two vectors and (both in ) is:
The result is a scalar, not a vector. This is why it is sometimes called the scalar product.
Geometric interpretation
The dot product has a beautiful geometric meaning:
where is the angle between the two vectors. This tells you:
Dot product sign tells you the angle between vectors:
graph TD
subgraph "Positive dot product"
A1["u"] --- B1["v"]
C1["Angle < 90 degrees<br/>Same general direction"]
end
subgraph "Zero dot product"
A2["u"] --- B2["v"]
C2["Angle = 90 degrees<br/>Perpendicular / orthogonal"]
end
subgraph "Negative dot product"
A3["u"] --- B3["v"]
C3["Angle > 90 degrees<br/>Opposite general direction"]
end
- If : the vectors point in roughly the same direction ().
- If : the vectors are perpendicular, also called orthogonal ().
- If : the vectors point in roughly opposite directions ().
Why the dot product matters in ML
The dot product is everywhere in ML:
- Linear models: prediction is .
- Cosine similarity: measures how similar two vectors are, regardless of their magnitude.
- Attention mechanisms: transformers compute attention scores using dot products of query and key vectors.
- Matrix multiplication: each entry of a matrix product is a dot product of a row and a column.
Worked example 1: computing a dot product
Compute the dot product of and .
Solution:
The dot product is zero. This tells us these two vectors are orthogonal (perpendicular in 3D space).
import numpy as np
u = np.array([4, -2, 5])
v = np.array([3, 1, -2])
print(np.dot(u, v)) # Output: 0
Worked example 2: checking orthogonality
Are the vectors and orthogonal?
Solution: Two vectors are orthogonal if and only if their dot product is zero.
Since , these vectors are not orthogonal. The positive value tells us the angle between them is less than 90 degrees.
Now let’s try and :
✓ These two vectors are orthogonal.
Worked example 3: finding vector magnitude
Find the magnitude of and then normalize it.
Solution:
Now normalize by dividing each component by 13:
Let’s verify the unit vector has magnitude 1:
import numpy as np
v = np.array([3, -4, 12])
magnitude = np.linalg.norm(v)
print(f"Magnitude: {magnitude}") # Output: 13.0
unit_v = v / magnitude
print(f"Unit vector: {unit_v}")
print(f"Magnitude of unit vector: {np.linalg.norm(unit_v):.4f}") # Output: 1.0000
Vector spaces
A vector space is the formal setting where vectors live. It is a set of vectors that is closed under addition and scalar multiplication. “Closed under” means that if you add two vectors from the set or scale a vector by any scalar, the result stays in the set.
Formally, a vector space over the real numbers must satisfy these axioms:
- Closure under addition: if , then .
- Closure under scalar multiplication: if and , then .
- Contains the zero vector: .
- Additive inverses: for every , there exists .
Plus the usual commutativity, associativity, and distributivity rules.
The most common vector space in ML is , the set of all -dimensional vectors of real numbers. But vector spaces can also contain functions, matrices, or polynomials. The concept is more general than you might expect.
Linear combinations and span
A linear combination of vectors is any expression of the form:
where are scalars.
The span of a set of vectors is the collection of all possible linear combinations you can make from them. If the span of your vectors covers all of , then those vectors can represent any point in -dimensional space.
Linear independence
Vectors are linearly independent if none of them can be written as a linear combination of the others. Equivalently, the only solution to:
is .
If a set of vectors is linearly dependent, at least one vector is redundant. It carries no new information. In ML terms, linearly dependent features are redundant. Removing them does not reduce your model’s expressive power.
Basis and dimension
A basis for a vector space is a set of linearly independent vectors whose span covers the entire space. The number of vectors in a basis is the dimension of the space.
The standard basis for is:
Any vector in can be written as a linear combination of these three basis vectors.
Why this matters for ML
Vector spaces are not just abstract theory. When you do PCA, you are finding a new basis that captures the most variance in your data. When you say a neural network “learns representations,” you mean it learns to map inputs to a vector space where similar things are close together.
The concept of dimension is also practical. If your data lies on a lower-dimensional subspace within , dimensionality reduction techniques exploit that structure. This is why understanding vector spaces gives you real insight into how ML algorithms work.
Summary
| Concept | What it is | Why it matters in ML |
|---|---|---|
| Scalar | A single number | Parameters, learning rates, loss values |
| Vector | An ordered list of numbers | Data points, weights, gradients |
| Vector addition | Add corresponding components | Combining features, residual connections |
| Scalar multiplication | Scale every component | Learning rate times gradient |
| Magnitude | Length of a vector | Normalization, regularization |
| Dot product | Sum of component-wise products | Predictions, similarity, attention |
| Vector space | A set closed under addition and scaling | The mathematical setting for all of ML |
What comes next
Now that you understand vectors, the next step is to organize them into rectangular grids. The next article covers matrices and matrix operations, where you will learn how to represent linear transformations and perform the computations that power neural networks.