Matrices and Matrix Operations
In this series (15 parts)
- Why Maths Matters for ML: A Practical Overview
- Scalars, Vectors, and Vector Spaces
- Matrices and Matrix Operations
- Matrix Inverses and Systems of Linear Equations
- Eigenvalues and Eigenvectors
- Matrix Decompositions: LU, QR, SVD
- Norms, Distances, and Similarity
- Calculus Review: Derivatives and the Chain Rule
- Partial Derivatives and Gradients
- The Jacobian and Hessian Matrices
- Taylor series and local approximations
- Probability fundamentals
- Random variables and distributions
- Bayes theorem and its role in ML
- Information theory: entropy, KL divergence, cross-entropy
A matrix is a rectangular grid of numbers. If vectors are the atoms of ML data, matrices are the molecules. Your dataset is a matrix. The weights of a neural network layer are a matrix. Every linear transformation can be written as a matrix multiplication. You will work with matrices constantly, so let’s get the operations down cold.
Prerequisites
This article builds on scalars, vectors, and vector spaces. You should be comfortable with vectors, the dot product, and basic vector operations.
What is a matrix?
A matrix is a 2D array of numbers arranged in rows and columns. A matrix with rows and columns is called an matrix (read ” by ”).
This is a matrix. The entry sits in row , column . We typically use uppercase bold letters (, , ) for matrices.
A vector is just a special case of a matrix: a column vector is an matrix, and a row vector is a matrix.
Matrices in ML
- Datasets: samples with features form an matrix.
- Weight matrices: a neural network layer with 128 inputs and 64 outputs has a weight matrix.
- Images: a grayscale image with 28 rows and 28 columns is a matrix.
- Attention scores: in a transformer, the attention matrix is where is the sequence length.
Matrix addition
Add two matrices of the same size by adding corresponding entries:
For example:
You cannot add matrices of different sizes. A matrix plus a matrix is undefined.
Matrix addition is commutative () and associative ().
Scalar multiplication
Multiply every entry of the matrix by the scalar:
For example:
The transpose
The transpose of a matrix , written , flips rows and columns. The entry at row , column of becomes the entry at row , column of .
If is , then is .
Useful properties:
- (note the order reversal)
A matrix is called symmetric if . Symmetric matrices come up frequently in ML, for instance, covariance matrices are always symmetric.
Transpose operation: rows become columns:
graph LR A["Original A<br/>m x n<br/>Row i, Col j = a_ij"] -->|"Transpose"| AT["Transposed A^T<br/>n x m<br/>Row j, Col i = a_ij"]
The identity matrix
The identity matrix is the square matrix with ones on the diagonal and zeros everywhere else:
It is the matrix equivalent of the number 1. For any matrix of compatible size:
Matrix multiplication
This is the most important operation in this article. Matrix multiplication is not done element by element. Instead, each entry of the product is a dot product of a row from the left matrix with a column from the right matrix.
For of size and of size , the product is , and:
The inner dimensions must match: the number of columns in must equal the number of rows in .
How matrix multiplication dimensions work:
graph LR A["Matrix A<br/>m x n"] -->|"inner dimension n must match"| B["Matrix B<br/>n x p"] B --> C["Result C<br/>m x p"] style A fill:#e1f5fe style B fill:#e1f5fe style C fill:#c8e6c9
Dimension matching in a matrix multiplication chain
How to compute it by hand
For each entry :
- Take row of matrix .
- Take column of matrix .
- Compute their dot product. That is .
Repeat for every combination of row and column.
Worked example 1: multiplying two 2x2 matrices
Compute where:
Solution:
Both are , so the product is .
Entry : row 1 of dotted with column 1 of :
Entry : row 1 of dotted with column 2 of :
Entry : row 2 of dotted with column 1 of :
Entry : row 2 of dotted with column 2 of :
import numpy as np
A = np.array([[2, 3], [1, 4]])
B = np.array([[5, -1], [2, 3]])
print(A @ B)
# [[16 7]
# [13 11]]
Worked example 2: multiplying non-square matrices
Compute where:
is and is . The inner dimensions match (both 3), so the result is .
Solution:
Entry : row 1 of dotted with column 1 of :
Entry : row 1 of dotted with column 2 of :
Entry : row 2 of dotted with column 1 of :
Entry : row 2 of dotted with column 2 of :
import numpy as np
C = np.array([[1, 0, 2], [3, 1, -1]])
D = np.array([[4, 1], [-2, 0], [3, 5]])
print(C @ D)
# [[10 11]
# [ 7 -2]]
Worked example 3: matrix multiplication is not commutative
A common mistake is to assume . Let’s show this is false with a concrete example.
Using the same and from example 1:
We already computed:
Now compute :
Entry :
Entry :
Entry :
Entry :
Comparing:
✗ in general. Matrix multiplication is not commutative.
This matters in ML. The order of operations in neural network computations is significant. applies first, then . Reversing them gives a different result.
import numpy as np
A = np.array([[2, 3], [1, 4]])
B = np.array([[5, -1], [2, 3]])
print("AB =")
print(A @ B)
print("\nBA =")
print(B @ A)
print("\nAB == BA?", np.array_equal(A @ B, B @ A)) # False
Note: with non-square matrices, might be defined while is not. If is and is , then is but is . They do not even have the same shape.
Element-wise vs matrix multiplication
This distinction trips up many beginners, especially in code.
Matrix multiplication (what we just covered) involves dot products of rows and columns. In NumPy, use @ or np.matmul().
Element-wise multiplication (also called the Hadamard product, written ) multiplies corresponding entries. Both matrices must be the same size. In NumPy, use *.
Compare with our earlier result for matrix multiplication:
Completely different results.
import numpy as np
A = np.array([[2, 3], [1, 4]])
B = np.array([[5, -1], [2, 3]])
print("Matrix multiply (A @ B):")
print(A @ B)
print("\nElement-wise multiply (A * B):")
print(A * B)
Where each appears in ML
- Matrix multiplication: forward pass of a neural network (), attention score computation, any linear transformation.
- Element-wise multiplication: applying masks (setting certain values to zero), gating mechanisms in LSTMs and GRUs, feature scaling.
Properties of matrix multiplication
While not commutative, matrix multiplication does satisfy some useful properties:
| Property | Formula |
|---|---|
| Associative | |
| Distributive | |
| Scalar factor | |
| Transpose | |
| Identity |
The transpose rule is worth memorizing: the transpose of a product reverses the order. This comes up constantly in deriving gradients of matrix expressions.
Matrices as linear transformations
Here is the deeper insight: every matrix represents a linear transformation. When you multiply a vector by a matrix, you are transforming that vector.
For a matrix and a 2D vector:
Different matrices encode different transformations:
- Rotation: rotates vectors by an angle.
- Scaling: stretches or compresses along axes.
- Reflection: flips across an axis.
- Projection: collapses onto a lower-dimensional space.
A neural network layer computes . The weight matrix is a linear transformation that maps the input space to the output space. The nonlinear activation function applied afterward is what lets neural networks learn complex patterns.
Understanding what eigenvalues tell you about a transformation, and how to decompose matrices using SVD, builds directly on the foundation we are laying here.
Matrix as a linear transformation:
graph LR X["Input vector x<br/>in R^n"] --> M["Multiply by matrix A<br/>Linear transformation"] M --> Y["Output vector y = Ax<br/>in R^m"] style X fill:#e1f5fe style M fill:#fff9c4 style Y fill:#c8e6c9
Summary
| Operation | Notation | Result size | Key rule |
|---|---|---|---|
| Addition | Same as inputs | Same dimensions required | |
| Scalar multiply | Same as | Every entry times | |
| Transpose | Rows and columns flipped | ||
| Matrix multiply | from and | Inner dimensions must match | |
| Element-wise multiply | Same as inputs | Same dimensions required |
What comes next
With vectors and matrices in hand, the next logical question is: can we undo a matrix operation? The next article covers matrix inverses and systems of linear equations, where you will learn to solve and understand when a solution exists.