Matrix Decompositions: LU, QR, SVD
In this series (15 parts)
- Why Maths Matters for ML: A Practical Overview
- Scalars, Vectors, and Vector Spaces
- Matrices and Matrix Operations
- Matrix Inverses and Systems of Linear Equations
- Eigenvalues and Eigenvectors
- Matrix Decompositions: LU, QR, SVD
- Norms, Distances, and Similarity
- Calculus Review: Derivatives and the Chain Rule
- Partial Derivatives and Gradients
- The Jacobian and Hessian Matrices
- Taylor series and local approximations
- Probability fundamentals
- Random variables and distributions
- Bayes theorem and its role in ML
- Information theory: entropy, KL divergence, cross-entropy
Every matrix is hiding simpler structure inside it. Decompositions are how we reveal that structure. Instead of working with one complicated matrix, we split it into two or three factors that are easier to reason about, easier to compute with, and often more numerically stable.
This article covers three decompositions: LU, QR, and SVD. Each solves a different class of problem, and each shows up constantly in machine learning.
Prerequisites: You should be comfortable with eigenvalues and eigenvectors, matrix multiplication, and dot products.
Why decompose matrices?
Solving directly by computing is expensive and numerically fragile. Decompositions give us better alternatives. Factor into pieces with known structure (triangular, orthogonal, diagonal), then solve each piece cheaply.
Beyond solving linear systems, decompositions let us:
- Understand the geometry a matrix encodes (SVD)
- Solve least squares problems efficiently (QR)
- Speed up repeated solves with different right-hand sides (LU)
- Reduce dimensionality in data (PCA relies directly on SVD)
LU Decomposition
LU factors a square matrix into a lower triangular matrix and an upper triangular matrix :
has ones on its diagonal and multipliers below. is what you get after Gaussian elimination. If you have ever row-reduced a matrix by hand, you already know how to compute LU. The elimination steps give you , and the multipliers you used become the entries of .
When to use LU
LU is the workhorse for solving when is square. Once you have and , solving becomes two fast triangular solves:
- Solve (forward substitution)
- Solve (back substitution)
Both steps are , while the factorization itself is . If you need to solve for many different vectors, you factor once and reuse.
Example 1: LU decomposition of a 3x3 matrix
Decompose into :
Step 1: Eliminate below the first pivot (2).
Row 2 multiplier:
Row 3 multiplier:
Step 2: Eliminate below the second pivot (1).
Row 3 multiplier:
Step 3: Assemble and .
Verify: Multiply :
In practice, most implementations use partial pivoting () to avoid numerical issues with small pivots. NumPy’s scipy.linalg.lu does this automatically.
import numpy as np
from scipy.linalg import lu
A = np.array([[2, 1, 1], [4, 3, 3], [8, 7, 9]], dtype=float)
P, L, U = lu(A)
print("L =\n", L)
print("U =\n", U)
QR Decomposition
QR factors a matrix (not necessarily square) into an orthogonal matrix and an upper triangular matrix :
has orthonormal columns, meaning . This orthogonality is what makes QR numerically stable.
When to use QR
QR is the standard approach for solving least squares problems. When you want to minimize and is tall (more rows than columns), QR gives you the answer without forming (which can lose precision). It is also the backbone of the QR algorithm for computing eigenvalues.
How it works: Gram-Schmidt
The classical way to build is the Gram-Schmidt process. You take the columns of and orthogonalize them one by one: subtract the projection onto all previous basis vectors, then normalize.
Example 2: QR decomposition of a 3x2 matrix
Decompose:
Step 1: Take the first column .
Step 2: Take the second column . Subtract its projection onto .
Step 3: Assemble and .
comes from the projections we computed:
Verify: reproduces :
import numpy as np
A = np.array([[1, 1], [1, 0], [0, 1]], dtype=float)
Q, R = np.linalg.qr(A)
print("Q =\n", Q)
print("R =\n", R)
print("QR =\n", Q @ R) # Should match A
Singular Value Decomposition (SVD)
SVD is the most important decomposition in data science. Every matrix, regardless of shape, has an SVD. It factors (an matrix) into:
- is orthogonal. Its columns are the left singular vectors.
- is diagonal. The diagonal entries are the singular values.
- is orthogonal. Its columns are the right singular vectors.
SVD structure with matrix shapes:
graph LR A["A<br/>m x n"] === U["U<br/>m x m<br/>orthogonal"] A === S["Sigma<br/>m x n<br/>diagonal"] A === VT["V^T<br/>n x n<br/>orthogonal"] style A fill:#fff9c4 style U fill:#e1f5fe style S fill:#c8e6c9 style VT fill:#e1f5fe
What SVD tells you
Think of any linear transformation as three steps: rotate (), scale (), rotate again (). SVD separates these steps. The singular values tell you how much stretching happens along each direction. Large singular values correspond to directions where the data has high variance.
SVD as three geometric steps:
graph LR X["Input vector x"] --> VT["V^T: Rotate<br/>Align to singular<br/>vector basis"] VT --> SIG["Sigma: Scale<br/>Stretch or shrink<br/>along each axis"] SIG --> U["U: Rotate<br/>Map to output<br/>basis"] U --> Y["Output Ax"] style VT fill:#e1f5fe style SIG fill:#c8e6c9 style U fill:#e1f5fe
Singular value scree plot: the rapid drop-off shows most information is in the first few components
When to use SVD
- Dimensionality reduction: Keep the top singular values and you get the best rank- approximation to .
- PCA: Principal Component Analysis is just SVD applied to the centered data matrix. The right singular vectors are the principal components.
- Pseudoinverse: The Moore-Penrose pseudoinverse is computed via SVD.
- Noise reduction: Small singular values often correspond to noise. Truncating them denoises the data.
SVD and PCA: the connection
If is your data matrix (centered, so each column has mean zero), then:
The principal components are the columns of . The variance explained by the -th component is . You never need to form the covariance matrix explicitly. SVD gives you the same information, more stably.
Example 3: SVD of a 2x2 matrix
Compute the SVD of:
Step 1: Compute .
Step 2: Find eigenvalues of .
The singular values are and .
Step 3: Find the right singular vectors (eigenvectors of ).
For : solve :
For : solve :
Step 4: Find the left singular vectors: .
Step 5: Assemble the SVD.
Interpretation: The matrix stretches by a factor of 3 along the direction and by a factor of 1 along . If this were a data covariance matrix, the first direction captures 9 times more variance than the second.
import numpy as np
A = np.array([[2, 1], [1, 2]], dtype=float)
U, s, Vt = np.linalg.svd(A)
print("U =\n", U)
print("Singular values:", s)
print("V^T =\n", Vt)
print("Reconstructed A =\n", U @ np.diag(s) @ Vt)
Example 4: Low-rank approximation with SVD
SVD gives you the best rank- approximation to any matrix (in terms of Frobenius norm). This is the Eckart-Young theorem.
Take a matrix:
Computing SVD (you can verify with NumPy), the singular values are .
The fact that tells us is already rank 2. To get the best rank-1 approximation, keep only :
This rank-1 matrix captures the dominant pattern in . The second singular value () captures the remaining structure in the second row/column.
Choosing the right decomposition
| Decomposition | Requirements | Best for | Cost |
|---|---|---|---|
| LU | Square matrix | Solving repeatedly | factor, per solve |
| QR | Any shape | Least squares, eigenvalue algorithms | for |
| SVD | Any shape | Dimensionality reduction, pseudoinverse, rank | for |
Rules of thumb:
- Need to solve for a square ? Start with LU.
- Need least squares (overdetermined system)? Use QR. It is more stable than forming the normal equations .
- Need to understand the structure of your data, reduce dimensions, or handle rank-deficient matrices? Use SVD.
Choosing the right decomposition:
graph TD
Q["What is your problem?"] --> SQ{"Square matrix<br/>solving Ax = b?"}
SQ -->|"Yes"| LU["Use LU<br/>Fast repeated solves"]
SQ -->|"No"| LS{"Least squares<br/>overdetermined system?"}
LS -->|"Yes"| QR["Use QR<br/>Numerically stable"]
LS -->|"No"| SVD["Use SVD<br/>Dimensionality reduction,<br/>rank analysis,<br/>pseudoinverse"]
style LU fill:#e1f5fe
style QR fill:#c8e6c9
style SVD fill:#fff9c4
Key properties to remember
LU:
- (product of diagonal entries of , since has ones on the diagonal)
- Not every matrix has an LU factorization without pivoting
QR:
- , so (cheap to invert)
- for any (orthogonal matrices preserve lengths)
- Numerically stable for least squares
SVD:
- Always exists for any matrix
- Singular values are non-negative and conventionally sorted in decreasing order
- The rank of equals the number of nonzero singular values
- (the largest singular value is the spectral norm)
- (Frobenius norm relates to all singular values)
What comes next
Now that you know how to decompose matrices, the next step is understanding how to measure vectors and matrices. Head to Norms, Distances, and Similarity to learn about , , and cosine similarity, all of which build on the ideas from SVD and orthogonality we covered here.