Matrix Inverses and Systems of Linear Equations
In this series (15 parts)
- Why Maths Matters for ML: A Practical Overview
- Scalars, Vectors, and Vector Spaces
- Matrices and Matrix Operations
- Matrix Inverses and Systems of Linear Equations
- Eigenvalues and Eigenvectors
- Matrix Decompositions: LU, QR, SVD
- Norms, Distances, and Similarity
- Calculus Review: Derivatives and the Chain Rule
- Partial Derivatives and Gradients
- The Jacobian and Hessian Matrices
- Taylor series and local approximations
- Probability fundamentals
- Random variables and distributions
- Bayes theorem and its role in ML
- Information theory: entropy, KL divergence, cross-entropy
Solving equations is the bread and butter of applied mathematics, and ML is no exception. Linear regression solves a system of equations. The normal equations for least squares are a linear system. Even training neural networks boils down to repeatedly solving linear approximations of a nonlinear problem. This article teaches you how to solve and understand when a solution exists.
Prerequisites
You should be comfortable with vectors and matrix multiplication before reading this article.
Systems of linear equations
A system of linear equations looks like this:
Two equations, two unknowns. You are looking for values of and that satisfy both equations simultaneously.
We can write any such system in matrix form:
where:
This compact notation works for any size system. A system with 100 equations and 100 unknowns still looks like , just with bigger matrices.
Three possible outcomes
A linear system has either:
- Exactly one solution. The typical case when is square and invertible.
- No solution. The equations contradict each other (parallel lines in 2D).
- Infinitely many solutions. The equations are redundant (same line in 2D).
Three cases when solving a 2-equation, 2-variable system:
graph TD
subgraph "Unique Solution"
A1["Two lines cross<br/>at one point"] --> R1["Exactly one solution<br/>det A != 0"]
end
subgraph "No Solution"
A2["Two lines are parallel<br/>never intersect"] --> R2["No solution exists<br/>Contradictory equations"]
end
subgraph "Infinite Solutions"
A3["Two lines are the same<br/>overlap completely"] --> R3["Infinitely many solutions<br/>Redundant equations"]
end
The matrix inverse
If is a square matrix, its inverse (if it exists) is the unique matrix such that:
where is the identity matrix.
If exists, solving is straightforward:
Multiply both sides by on the left, and the cancels out.
When does the inverse exist?
A matrix is invertible (also called nonsingular) if and only if:
- Its determinant is nonzero: .
- Its rows (or columns) are linearly independent.
- The system has only the trivial solution .
- It has pivots during Gaussian elimination.
These conditions are all equivalent. If any one holds, all hold. If any one fails, is singular (not invertible).
Is the matrix invertible?
graph TD
Q["Is matrix A invertible?"] --> D{"det A != 0?"}
D -->|"Yes"| R{"Rows linearly<br/>independent?"}
R -->|"Yes"| F{"Full rank?"}
F -->|"Yes"| INV["A is invertible<br/>Unique solution to Ax = b"]
D -->|"No"| SING["A is singular<br/>No unique inverse"]
R -->|"No"| SING
F -->|"No"| SING
style INV fill:#c8e6c9
style SING fill:#ffcdd2
The 2x2 inverse formula
For a matrix, there is a clean formula:
The quantity is the determinant of , written . If , the matrix has no inverse.
This formula is quick and useful for hand calculations. For larger matrices, you need Gaussian elimination.
Gaussian elimination
Gaussian elimination is the standard algorithm for solving linear systems. The idea is to use elementary row operations to transform the system into a simpler form that is easy to solve.
Elementary row operations
Three operations that do not change the solution:
- Swap two rows.
- Scale a row by a nonzero constant.
- Add a multiple of one row to another row.
The augmented matrix
Write the system as an augmented matrix by placing next to :
Then apply row operations until you reach row echelon form (upper triangular), and solve by back substitution.
Gaussian elimination steps:
graph TD S1["Start: Augmented matrix A|b"] --> S2["Apply row operations<br/>Swap, scale, add multiples"] S2 --> S3["Reach row echelon form<br/>Upper triangular"] S3 --> S4["Back substitution<br/>Solve from bottom row up"] S4 --> S5["Solution vector x"] style S1 fill:#e1f5fe style S5 fill:#c8e6c9
Two lines intersecting at a unique solution (2, 5)
Worked example 1: solving a 2x2 system
Solve the system:
Method 1: using the inverse formula.
Compute the determinant:
Since , the inverse exists:
Now multiply:
Check: plug back into the original equations:
import numpy as np
A = np.array([[2, 3], [1, -1]])
b = np.array([8, 1])
x = np.linalg.solve(A, b)
print(x) # [2.2 1.2]
Worked example 2: solving a 3x3 system with Gaussian elimination
Solve:
Step 1: write the augmented matrix.
Step 2: eliminate below the first pivot.
:
:
Step 3: eliminate below the second pivot.
:
Step 4: back substitution.
From row 3: .
From row 2: .
From row 1: , so .
Solution: , , .
Check: substitute into all three original equations:
import numpy as np
A = np.array([[1, 2, 1],
[2, 5, 2],
[3, 7, 4]])
b = np.array([9, 21, 30])
x = np.linalg.solve(A, b)
print(x) # [3. 3. 0.]
Worked example 3: a system with a unique twist
Solve:
And then verify by computing explicitly.
Step 1: set up.
Step 2: compute the determinant.
Step 3: compute the inverse.
Step 4: solve.
Step 5: verify.
Check :
Entry : ✓
Entry : ✓
Entry : ✓
Entry : ✓
Check the solution in the original equations:
When things go wrong: singular matrices
Not every matrix has an inverse. Consider:
The determinant is . This matrix is singular.
Look at the rows: row 1 is exactly 2 times row 2. The rows are linearly dependent. Geometrically, both equations describe the same line (or parallel lines, depending on ), so there is either no solution or infinitely many.
In ML, singular matrices cause problems. If your feature matrix is singular (or close to singular), the normal equations for linear regression have no unique solution. This is why regularization (adding to ) is so important: it pushes the matrix away from singularity.
Computational considerations
In practice, you almost never compute matrix inverses explicitly. Here is why:
- Numerical stability. Computing and then multiplying by amplifies rounding errors. Gaussian elimination applied directly to is more stable.
- Speed. Solving directly takes roughly operations. Computing first takes operations, then multiplying takes more. Direct solving is faster.
- Memory. You need to store the entire inverse matrix, which is entries.
Use np.linalg.solve(A, b) instead of np.linalg.inv(A) @ b. The result is the same, but solve is faster and more accurate.
import numpy as np
A = np.array([[1, 2, 1],
[2, 5, 2],
[3, 7, 4]])
b = np.array([9, 21, 30])
# Preferred: direct solve
x = np.linalg.solve(A, b)
# Avoid: explicit inverse
x_bad = np.linalg.inv(A) @ b
# Same result, but solve() is faster and more numerically stable
print(np.allclose(x, x_bad)) # True
The determinant
The determinant is a single number that encodes important information about a matrix.
For a matrix:
For larger matrices, you can compute the determinant using cofactor expansion or by performing Gaussian elimination (the determinant is the product of the pivots).
What the determinant tells you:
- : is invertible, has a unique solution.
- : is singular, no unique solution.
- gives the factor by which scales areas (2D) or volumes (3D).
The determinant also connects directly to eigenvalues: the determinant of a matrix equals the product of its eigenvalues.
Summary
| Concept | Key idea |
|---|---|
| The standard form for a linear system | |
| Inverse | Satisfies ; gives |
| Determinant | Zero means singular (no inverse) |
| Gaussian elimination | Row operations to reach triangular form, then back-substitute |
| Singular matrix | Linearly dependent rows, , no unique solution |
What comes next
Inverses tell you how to undo a matrix transformation. But what directions does a matrix leave unchanged? The next article covers eigenvalues and eigenvectors, one of the most powerful ideas in linear algebra and a foundation for PCA, spectral methods, and stability analysis.