Feb 20, 2026 · 18 min read · Maths for ML

Matrices and Matrix Operations

In this series (15 parts)

A matrix is a rectangular grid of numbers. If vectors are the atoms of ML data, matrices are the molecules. Your dataset is a matrix. The weights of a neural network layer are a matrix. Every linear transformation can be written as a matrix multiplication. You will work with matrices constantly, so let’s get the operations down cold.

Prerequisites

This article builds on scalars, vectors, and vector spaces. You should be comfortable with vectors, the dot product, and basic vector operations.

What is a matrix?

A matrix is a 2D array of numbers arranged in rows and columns. A matrix with $m$ rows and $n$ columns is called an $m \times n$ matrix (read ” $m$ by $n$ ”).

A = \begin{bmatrix} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \end{bmatrix}

This is a $2 \times 3$ matrix. The entry $a_{ij}$ sits in row $i$ , column $j$ . We typically use uppercase bold letters ( $A$ , $B$ , $W$ ) for matrices.

A vector is just a special case of a matrix: a column vector is an $n \times 1$ matrix, and a row vector is a $1 \times n$ matrix.

Matrices in ML

Datasets: $m$ samples with $n$ features form an $m \times n$ matrix.
Weight matrices: a neural network layer with 128 inputs and 64 outputs has a $64 \times 128$ weight matrix.
Images: a grayscale image with 28 rows and 28 columns is a $28 \times 28$ matrix.
Attention scores: in a transformer, the attention matrix is $n \times n$ where $n$ is the sequence length.

Matrix addition

Add two matrices of the same size by adding corresponding entries:

A + B = \begin{bmatrix} a_{11} + b_{11} & a_{12} + b_{12} \\ a_{21} + b_{21} & a_{22} + b_{22} \end{bmatrix}

For example:

\begin{bmatrix} 1 & 3 \\ 2 & -1 \end{bmatrix} + \begin{bmatrix} 4 & 0 \\ -2 & 5 \end{bmatrix} = \begin{bmatrix} 5 & 3 \\ 0 & 4 \end{bmatrix}

You cannot add matrices of different sizes. A $2 \times 3$ matrix plus a $3 \times 2$ matrix is undefined.

Matrix addition is commutative ( $A + B = B + A$ ) and associative ( $(A + B) + C = A + (B + C)$ ).

Scalar multiplication

Multiply every entry of the matrix by the scalar:

cA = \begin{bmatrix} ca_{11} & ca_{12} \\ ca_{21} & ca_{22} \end{bmatrix}

For example:

2 \cdot \begin{bmatrix} 3 & -1 \\ 0 & 4 \end{bmatrix} = \begin{bmatrix} 6 & -2 \\ 0 & 8 \end{bmatrix}

The transpose

The transpose of a matrix $A$ , written $A^T$ , flips rows and columns. The entry at row $i$ , column $j$ of $A$ becomes the entry at row $j$ , column $i$ of $A^T$ .

If $A$ is $m \times n$ , then $A^T$ is $n \times m$ .

A = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix} \quad \Rightarrow \quad A^T = \begin{bmatrix} 1 & 4 \\ 2 & 5 \\ 3 & 6 \end{bmatrix}

Useful properties:

$(A^T)^T = A$
$(A + B)^T = A^T + B^T$
$(AB)^T = B^T A^T$ (note the order reversal)

A matrix is called symmetric if $A = A^T$ . Symmetric matrices come up frequently in ML, for instance, covariance matrices are always symmetric.

Transpose operation: rows become columns:

graph LR
  A["Original A<br/>m x n<br/>Row i, Col j = a_ij"] -->|"Transpose"| AT["Transposed A^T<br/>n x m<br/>Row j, Col i = a_ij"]

The identity matrix

The identity matrix $I_n$ is the $n \times n$ square matrix with ones on the diagonal and zeros everywhere else:

I_3 = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}

It is the matrix equivalent of the number 1. For any matrix $A$ of compatible size:

AI = A \quad \text{and} \quad IA = A

Matrix multiplication

This is the most important operation in this article. Matrix multiplication is not done element by element. Instead, each entry of the product is a dot product of a row from the left matrix with a column from the right matrix.

For $A$ of size $m \times p$ and $B$ of size $p \times n$ , the product $C = AB$ is $m \times n$ , and:

c_{ij} = \sum_{k=1}^{p} a_{ik} \, b_{kj}

The inner dimensions must match: the number of columns in $A$ must equal the number of rows in $B$ .

How matrix multiplication dimensions work:

graph LR
  A["Matrix A<br/>m x n"] -->|"inner dimension n must match"| B["Matrix B<br/>n x p"]
  B --> C["Result C<br/>m x p"]
  style A fill:#e1f5fe
  style B fill:#e1f5fe
  style C fill:#c8e6c9

\underbrace{A}_{m \times p} \cdot \underbrace{B}_{p \times n} = \underbrace{C}_{m \times n}

Dimension matching in a matrix multiplication chain

How to compute it by hand

For each entry $c_{ij}$ :

Take row $i$ of matrix $A$ .
Take column $j$ of matrix $B$ .
Compute their dot product. That is $c_{ij}$ .

Repeat for every combination of row and column.

Worked example 1: multiplying two 2x2 matrices

Compute $AB$ where:

A = \begin{bmatrix} 2 & 3 \\ 1 & 4 \end{bmatrix}, \quad B = \begin{bmatrix} 5 & -1 \\ 2 & 3 \end{bmatrix}

Solution:

Both are $2 \times 2$ , so the product is $2 \times 2$ .

Entry $c_{11}$ : row 1 of $A$ dotted with column 1 of $B$ :

c_{11} = (2)(5) + (3)(2) = 10 + 6 = 16

Entry $c_{12}$ : row 1 of $A$ dotted with column 2 of $B$ :

c_{12} = (2)(-1) + (3)(3) = -2 + 9 = 7

Entry $c_{21}$ : row 2 of $A$ dotted with column 1 of $B$ :

c_{21} = (1)(5) + (4)(2) = 5 + 8 = 13

Entry $c_{22}$ : row 2 of $A$ dotted with column 2 of $B$ :

c_{22} = (1)(-1) + (4)(3) = -1 + 12 = 11

AB = \begin{bmatrix} 16 & 7 \\ 13 & 11 \end{bmatrix}

import numpy as np

A = np.array([[2, 3], [1, 4]])
B = np.array([[5, -1], [2, 3]])
print(A @ B)
# [[16  7]
#  [13 11]]

Worked example 2: multiplying non-square matrices

Compute $CD$ where:

C = \begin{bmatrix} 1 & 0 & 2 \\ 3 & 1 & -1 \end{bmatrix}, \quad D = \begin{bmatrix} 4 & 1 \\ -2 & 0 \\ 3 & 5 \end{bmatrix}

$C$ is $2 \times 3$ and $D$ is $3 \times 2$ . The inner dimensions match (both 3), so the result is $2 \times 2$ .

Solution:

Entry $c_{11}$ : row 1 of $C$ dotted with column 1 of $D$ :

(1)(4) + (0)(-2) + (2)(3) = 4 + 0 + 6 = 10

Entry $c_{12}$ : row 1 of $C$ dotted with column 2 of $D$ :

(1)(1) + (0)(0) + (2)(5) = 1 + 0 + 10 = 11

Entry $c_{21}$ : row 2 of $C$ dotted with column 1 of $D$ :

(3)(4) + (1)(-2) + (-1)(3) = 12 + (-2) + (-3) = 7

Entry $c_{22}$ : row 2 of $C$ dotted with column 2 of $D$ :

(3)(1) + (1)(0) + (-1)(5) = 3 + 0 + (-5) = -2

CD = \begin{bmatrix} 10 & 11 \\ 7 & -2 \end{bmatrix}

import numpy as np

C = np.array([[1, 0, 2], [3, 1, -1]])
D = np.array([[4, 1], [-2, 0], [3, 5]])
print(C @ D)
# [[10 11]
#  [ 7 -2]]

Worked example 3: matrix multiplication is not commutative

A common mistake is to assume $AB = BA$ . Let’s show this is false with a concrete example.

Using the same $A$ and $B$ from example 1:

A = \begin{bmatrix} 2 & 3 \\ 1 & 4 \end{bmatrix}, \quad B = \begin{bmatrix} 5 & -1 \\ 2 & 3 \end{bmatrix}

We already computed:

AB = \begin{bmatrix} 16 & 7 \\ 13 & 11 \end{bmatrix}

Now compute $BA$ :

Entry $(1,1)$ : $(5)(2) + (-1)(1) = 10 - 1 = 9$

Entry $(1,2)$ : $(5)(3) + (-1)(4) = 15 - 4 = 11$

Entry $(2,1)$ : $(2)(2) + (3)(1) = 4 + 3 = 7$

Entry $(2,2)$ : $(2)(3) + (3)(4) = 6 + 12 = 18$

BA = \begin{bmatrix} 9 & 11 \\ 7 & 18 \end{bmatrix}

Comparing:

AB = \begin{bmatrix} 16 & 7 \\ 13 & 11 \end{bmatrix} \neq \begin{bmatrix} 9 & 11 \\ 7 & 18 \end{bmatrix} = BA

✗ $AB \neq BA$ in general. Matrix multiplication is not commutative.

This matters in ML. The order of operations in neural network computations is significant. $W_2 W_1 \mathbf{x}$ applies $W_1$ first, then $W_2$ . Reversing them gives a different result.

import numpy as np

A = np.array([[2, 3], [1, 4]])
B = np.array([[5, -1], [2, 3]])

print("AB =")
print(A @ B)
print("\nBA =")
print(B @ A)
print("\nAB == BA?", np.array_equal(A @ B, B @ A))  # False

Note: with non-square matrices, $AB$ might be defined while $BA$ is not. If $A$ is $2 \times 3$ and $B$ is $3 \times 2$ , then $AB$ is $2 \times 2$ but $BA$ is $3 \times 3$ . They do not even have the same shape.

Element-wise vs matrix multiplication

This distinction trips up many beginners, especially in code.

Matrix multiplication (what we just covered) involves dot products of rows and columns. In NumPy, use @ or np.matmul().

Element-wise multiplication (also called the Hadamard product, written $A \odot B$ ) multiplies corresponding entries. Both matrices must be the same size. In NumPy, use *.

\begin{bmatrix} 2 & 3 \\ 1 & 4 \end{bmatrix} \odot \begin{bmatrix} 5 & -1 \\ 2 & 3 \end{bmatrix} = \begin{bmatrix} 10 & -3 \\ 2 & 12 \end{bmatrix}

Compare with our earlier result for matrix multiplication:

\begin{bmatrix} 2 & 3 \\ 1 & 4 \end{bmatrix} \begin{bmatrix} 5 & -1 \\ 2 & 3 \end{bmatrix} = \begin{bmatrix} 16 & 7 \\ 13 & 11 \end{bmatrix}

Completely different results.

import numpy as np

A = np.array([[2, 3], [1, 4]])
B = np.array([[5, -1], [2, 3]])

print("Matrix multiply (A @ B):")
print(A @ B)

print("\nElement-wise multiply (A * B):")
print(A * B)

Where each appears in ML

Matrix multiplication: forward pass of a neural network ( $W\mathbf{x}$ ), attention score computation, any linear transformation.
Element-wise multiplication: applying masks (setting certain values to zero), gating mechanisms in LSTMs and GRUs, feature scaling.

Properties of matrix multiplication

While not commutative, matrix multiplication does satisfy some useful properties:

Property	Formula
Associative	$(AB)C = A(BC)$
Distributive	$A(B + C) = AB + AC$
Scalar factor	$c(AB) = (cA)B = A(cB)$
Transpose	$(AB)^T = B^T A^T$
Identity	$AI = IA = A$

The transpose rule is worth memorizing: the transpose of a product reverses the order. This comes up constantly in deriving gradients of matrix expressions.

Matrices as linear transformations

Here is the deeper insight: every matrix represents a linear transformation. When you multiply a vector by a matrix, you are transforming that vector.

For a $2 \times 2$ matrix and a 2D vector:

A\mathbf{x} = \begin{bmatrix} a & b \\ c & d \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} = \begin{bmatrix} ax_1 + bx_2 \\ cx_1 + dx_2 \end{bmatrix}

Different matrices encode different transformations:

Rotation: rotates vectors by an angle.
Scaling: stretches or compresses along axes.
Reflection: flips across an axis.
Projection: collapses onto a lower-dimensional space.

A neural network layer computes $\mathbf{y} = W\mathbf{x} + \mathbf{b}$ . The weight matrix $W$ is a linear transformation that maps the input space to the output space. The nonlinear activation function applied afterward is what lets neural networks learn complex patterns.

Understanding what eigenvalues tell you about a transformation, and how to decompose matrices using SVD, builds directly on the foundation we are laying here.

Matrix as a linear transformation:

graph LR
  X["Input vector x<br/>in R^n"] --> M["Multiply by matrix A<br/>Linear transformation"]
  M --> Y["Output vector y = Ax<br/>in R^m"]
  style X fill:#e1f5fe
  style M fill:#fff9c4
  style Y fill:#c8e6c9

Summary

Operation	Notation	Result size	Key rule
Addition	$A + B$	Same as inputs	Same dimensions required
Scalar multiply	$cA$	Same as $A$	Every entry times $c$
Transpose	$A^T$	Rows and columns flipped	$(AB)^T = B^T A^T$
Matrix multiply	$AB$	$m \times n$ from $m \times p$ and $p \times n$	Inner dimensions must match
Element-wise multiply	$A \odot B$	Same as inputs	Same dimensions required

What comes next

With vectors and matrices in hand, the next logical question is: can we undo a matrix operation? The next article covers matrix inverses and systems of linear equations, where you will learn to solve $A\mathbf{x} = \mathbf{b}$ and understand when a solution exists.

← Back to all series