Search…
Maths for ML · Part 4

Matrix Inverses and Systems of Linear Equations

In this series (15 parts)
  1. Why Maths Matters for ML: A Practical Overview
  2. Scalars, Vectors, and Vector Spaces
  3. Matrices and Matrix Operations
  4. Matrix Inverses and Systems of Linear Equations
  5. Eigenvalues and Eigenvectors
  6. Matrix Decompositions: LU, QR, SVD
  7. Norms, Distances, and Similarity
  8. Calculus Review: Derivatives and the Chain Rule
  9. Partial Derivatives and Gradients
  10. The Jacobian and Hessian Matrices
  11. Taylor series and local approximations
  12. Probability fundamentals
  13. Random variables and distributions
  14. Bayes theorem and its role in ML
  15. Information theory: entropy, KL divergence, cross-entropy

Solving equations is the bread and butter of applied mathematics, and ML is no exception. Linear regression solves a system of equations. The normal equations for least squares are a linear system. Even training neural networks boils down to repeatedly solving linear approximations of a nonlinear problem. This article teaches you how to solve Ax=bA\mathbf{x} = \mathbf{b} and understand when a solution exists.

Prerequisites

You should be comfortable with vectors and matrix multiplication before reading this article.

Systems of linear equations

A system of linear equations looks like this:

2x+3y=82x + 3y = 8 xy=1x - y = 1

Two equations, two unknowns. You are looking for values of xx and yy that satisfy both equations simultaneously.

We can write any such system in matrix form:

Ax=bA\mathbf{x} = \mathbf{b}

where:

A=[2311],x=[xy],b=[81]A = \begin{bmatrix} 2 & 3 \\ 1 & -1 \end{bmatrix}, \quad \mathbf{x} = \begin{bmatrix} x \\ y \end{bmatrix}, \quad \mathbf{b} = \begin{bmatrix} 8 \\ 1 \end{bmatrix}

This compact notation works for any size system. A system with 100 equations and 100 unknowns still looks like Ax=bA\mathbf{x} = \mathbf{b}, just with bigger matrices.

Three possible outcomes

A linear system has either:

  1. Exactly one solution. The typical case when AA is square and invertible.
  2. No solution. The equations contradict each other (parallel lines in 2D).
  3. Infinitely many solutions. The equations are redundant (same line in 2D).

Three cases when solving a 2-equation, 2-variable system:

graph TD
  subgraph "Unique Solution"
      A1["Two lines cross<br/>at one point"] --> R1["Exactly one solution<br/>det A != 0"]
  end
  subgraph "No Solution"
      A2["Two lines are parallel<br/>never intersect"] --> R2["No solution exists<br/>Contradictory equations"]
  end
  subgraph "Infinite Solutions"
      A3["Two lines are the same<br/>overlap completely"] --> R3["Infinitely many solutions<br/>Redundant equations"]
  end

The matrix inverse

If AA is a square n×nn \times n matrix, its inverse A1A^{-1} (if it exists) is the unique matrix such that:

AA1=A1A=IA A^{-1} = A^{-1} A = I

where II is the identity matrix.

If A1A^{-1} exists, solving Ax=bA\mathbf{x} = \mathbf{b} is straightforward:

x=A1b\mathbf{x} = A^{-1}\mathbf{b}

Multiply both sides by A1A^{-1} on the left, and the AA cancels out.

When does the inverse exist?

A matrix is invertible (also called nonsingular) if and only if:

  • Its determinant is nonzero: det(A)0\det(A) \neq 0.
  • Its rows (or columns) are linearly independent.
  • The system Ax=0A\mathbf{x} = \mathbf{0} has only the trivial solution x=0\mathbf{x} = \mathbf{0}.
  • It has nn pivots during Gaussian elimination.

These conditions are all equivalent. If any one holds, all hold. If any one fails, AA is singular (not invertible).

Is the matrix invertible?

graph TD
  Q["Is matrix A invertible?"] --> D{"det A != 0?"}
  D -->|"Yes"| R{"Rows linearly<br/>independent?"}
  R -->|"Yes"| F{"Full rank?"}
  F -->|"Yes"| INV["A is invertible<br/>Unique solution to Ax = b"]
  D -->|"No"| SING["A is singular<br/>No unique inverse"]
  R -->|"No"| SING
  F -->|"No"| SING
  style INV fill:#c8e6c9
  style SING fill:#ffcdd2

The 2x2 inverse formula

For a 2×22 \times 2 matrix, there is a clean formula:

A=[abcd]A1=1adbc[dbca]A = \begin{bmatrix} a & b \\ c & d \end{bmatrix} \quad \Rightarrow \quad A^{-1} = \frac{1}{ad - bc} \begin{bmatrix} d & -b \\ -c & a \end{bmatrix}

The quantity adbcad - bc is the determinant of AA, written det(A)\det(A). If det(A)=0\det(A) = 0, the matrix has no inverse.

This formula is quick and useful for hand calculations. For larger matrices, you need Gaussian elimination.

Gaussian elimination

Gaussian elimination is the standard algorithm for solving linear systems. The idea is to use elementary row operations to transform the system into a simpler form that is easy to solve.

Elementary row operations

Three operations that do not change the solution:

  1. Swap two rows.
  2. Scale a row by a nonzero constant.
  3. Add a multiple of one row to another row.

The augmented matrix

Write the system Ax=bA\mathbf{x} = \mathbf{b} as an augmented matrix [Ab][A \mid \mathbf{b}] by placing b\mathbf{b} next to AA:

[238111]\left[\begin{array}{cc|c} 2 & 3 & 8 \\ 1 & -1 & 1 \end{array}\right]

Then apply row operations until you reach row echelon form (upper triangular), and solve by back substitution.

Gaussian elimination steps:

graph TD
  S1["Start: Augmented matrix A|b"] --> S2["Apply row operations<br/>Swap, scale, add multiples"]
  S2 --> S3["Reach row echelon form<br/>Upper triangular"]
  S3 --> S4["Back substitution<br/>Solve from bottom row up"]
  S4 --> S5["Solution vector x"]
  style S1 fill:#e1f5fe
  style S5 fill:#c8e6c9

Two lines intersecting at a unique solution (2, 5)

Worked example 1: solving a 2x2 system

Solve the system:

2x+3y=82x + 3y = 8 xy=1x - y = 1

Method 1: using the inverse formula.

A=[2311],b=[81]A = \begin{bmatrix} 2 & 3 \\ 1 & -1 \end{bmatrix}, \quad \mathbf{b} = \begin{bmatrix} 8 \\ 1 \end{bmatrix}

Compute the determinant:

det(A)=(2)(1)(3)(1)=23=5\det(A) = (2)(-1) - (3)(1) = -2 - 3 = -5

Since det(A)=50\det(A) = -5 \neq 0, the inverse exists:

A1=15[1312]=[1/53/51/52/5]A^{-1} = \frac{1}{-5} \begin{bmatrix} -1 & -3 \\ -1 & 2 \end{bmatrix} = \begin{bmatrix} 1/5 & 3/5 \\ 1/5 & -2/5 \end{bmatrix}

Now multiply:

x=A1b=[1/53/51/52/5][81]\mathbf{x} = A^{-1}\mathbf{b} = \begin{bmatrix} 1/5 & 3/5 \\ 1/5 & -2/5 \end{bmatrix} \begin{bmatrix} 8 \\ 1 \end{bmatrix} x=(1/5)(8)+(3/5)(1)=8/5+3/5=11/5=2.2x = (1/5)(8) + (3/5)(1) = 8/5 + 3/5 = 11/5 = 2.2 y=(1/5)(8)+(2/5)(1)=8/52/5=6/5=1.2y = (1/5)(8) + (-2/5)(1) = 8/5 - 2/5 = 6/5 = 1.2

Check: plug back into the original equations:

2(2.2)+3(1.2)=4.4+3.6=82(2.2) + 3(1.2) = 4.4 + 3.6 = 8 \quad \checkmark 2.21.2=12.2 - 1.2 = 1 \quad \checkmark
import numpy as np

A = np.array([[2, 3], [1, -1]])
b = np.array([8, 1])
x = np.linalg.solve(A, b)
print(x)  # [2.2 1.2]

Worked example 2: solving a 3x3 system with Gaussian elimination

Solve:

x+2y+z=9x + 2y + z = 9 2x+5y+2z=212x + 5y + 2z = 21 3x+7y+4z=303x + 7y + 4z = 30

Step 1: write the augmented matrix.

[12192522137430]\left[\begin{array}{ccc|c} 1 & 2 & 1 & 9 \\ 2 & 5 & 2 & 21 \\ 3 & 7 & 4 & 30 \end{array}\right]

Step 2: eliminate below the first pivot.

R2R22R1R_2 \leftarrow R_2 - 2R_1:

[1219010337430]\left[\begin{array}{ccc|c} 1 & 2 & 1 & 9 \\ 0 & 1 & 0 & 3 \\ 3 & 7 & 4 & 30 \end{array}\right]

R3R33R1R_3 \leftarrow R_3 - 3R_1:

[121901030113]\left[\begin{array}{ccc|c} 1 & 2 & 1 & 9 \\ 0 & 1 & 0 & 3 \\ 0 & 1 & 1 & 3 \end{array}\right]

Step 3: eliminate below the second pivot.

R3R3R2R_3 \leftarrow R_3 - R_2:

[121901030010]\left[\begin{array}{ccc|c} 1 & 2 & 1 & 9 \\ 0 & 1 & 0 & 3 \\ 0 & 0 & 1 & 0 \end{array}\right]

Step 4: back substitution.

From row 3: z=0z = 0.

From row 2: y=3y = 3.

From row 1: x+2(3)+0=9x + 2(3) + 0 = 9, so x=96=3x = 9 - 6 = 3.

Solution: x=3x = 3, y=3y = 3, z=0z = 0.

Check: substitute into all three original equations:

3+2(3)+0=3+6+0=93 + 2(3) + 0 = 3 + 6 + 0 = 9 \quad \checkmark 2(3)+5(3)+2(0)=6+15+0=212(3) + 5(3) + 2(0) = 6 + 15 + 0 = 21 \quad \checkmark 3(3)+7(3)+4(0)=9+21+0=303(3) + 7(3) + 4(0) = 9 + 21 + 0 = 30 \quad \checkmark
import numpy as np

A = np.array([[1, 2, 1],
              [2, 5, 2],
              [3, 7, 4]])
b = np.array([9, 21, 30])
x = np.linalg.solve(A, b)
print(x)  # [3. 3. 0.]

Worked example 3: a system with a unique twist

Solve:

3x+y=73x + y = 7 x+2y=6x + 2y = 6

And then verify by computing A1A^{-1} explicitly.

Step 1: set up.

A=[3112],b=[76]A = \begin{bmatrix} 3 & 1 \\ 1 & 2 \end{bmatrix}, \quad \mathbf{b} = \begin{bmatrix} 7 \\ 6 \end{bmatrix}

Step 2: compute the determinant.

det(A)=(3)(2)(1)(1)=61=5\det(A) = (3)(2) - (1)(1) = 6 - 1 = 5

Step 3: compute the inverse.

A1=15[2113]=[2/51/51/53/5]A^{-1} = \frac{1}{5} \begin{bmatrix} 2 & -1 \\ -1 & 3 \end{bmatrix} = \begin{bmatrix} 2/5 & -1/5 \\ -1/5 & 3/5 \end{bmatrix}

Step 4: solve.

x=A1b=[2/51/51/53/5][76]\mathbf{x} = A^{-1}\mathbf{b} = \begin{bmatrix} 2/5 & -1/5 \\ -1/5 & 3/5 \end{bmatrix} \begin{bmatrix} 7 \\ 6 \end{bmatrix} x=(2/5)(7)+(1/5)(6)=14/56/5=8/5=1.6x = (2/5)(7) + (-1/5)(6) = 14/5 - 6/5 = 8/5 = 1.6 y=(1/5)(7)+(3/5)(6)=7/5+18/5=11/5=2.2y = (-1/5)(7) + (3/5)(6) = -7/5 + 18/5 = 11/5 = 2.2

Step 5: verify.

Check AA1=IA A^{-1} = I:

[3112][2/51/51/53/5]\begin{bmatrix} 3 & 1 \\ 1 & 2 \end{bmatrix} \begin{bmatrix} 2/5 & -1/5 \\ -1/5 & 3/5 \end{bmatrix}

Entry (1,1)(1,1): (3)(2/5)+(1)(1/5)=6/51/5=5/5=1(3)(2/5) + (1)(-1/5) = 6/5 - 1/5 = 5/5 = 1

Entry (1,2)(1,2): (3)(1/5)+(1)(3/5)=3/5+3/5=0(3)(-1/5) + (1)(3/5) = -3/5 + 3/5 = 0

Entry (2,1)(2,1): (1)(2/5)+(2)(1/5)=2/52/5=0(1)(2/5) + (2)(-1/5) = 2/5 - 2/5 = 0

Entry (2,2)(2,2): (1)(1/5)+(2)(3/5)=1/5+6/5=5/5=1(1)(-1/5) + (2)(3/5) = -1/5 + 6/5 = 5/5 = 1

AA1=[1001]=IA A^{-1} = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} = I \quad \checkmark

Check the solution in the original equations:

3(1.6)+2.2=4.8+2.2=73(1.6) + 2.2 = 4.8 + 2.2 = 7 \quad \checkmark 1.6+2(2.2)=1.6+4.4=61.6 + 2(2.2) = 1.6 + 4.4 = 6 \quad \checkmark

When things go wrong: singular matrices

Not every matrix has an inverse. Consider:

A=[2412]A = \begin{bmatrix} 2 & 4 \\ 1 & 2 \end{bmatrix}

The determinant is (2)(2)(4)(1)=44=0(2)(2) - (4)(1) = 4 - 4 = 0. This matrix is singular.

Look at the rows: row 1 is exactly 2 times row 2. The rows are linearly dependent. Geometrically, both equations describe the same line (or parallel lines, depending on b\mathbf{b}), so there is either no solution or infinitely many.

In ML, singular matrices cause problems. If your feature matrix XTXX^T X is singular (or close to singular), the normal equations for linear regression have no unique solution. This is why regularization (adding λI\lambda I to XTXX^T X) is so important: it pushes the matrix away from singularity.

Computational considerations

In practice, you almost never compute matrix inverses explicitly. Here is why:

  1. Numerical stability. Computing A1A^{-1} and then multiplying by b\mathbf{b} amplifies rounding errors. Gaussian elimination applied directly to [Ab][A \mid \mathbf{b}] is more stable.
  2. Speed. Solving Ax=bA\mathbf{x} = \mathbf{b} directly takes roughly 2n33\frac{2n^3}{3} operations. Computing A1A^{-1} first takes 2n32n^3 operations, then multiplying takes 2n22n^2 more. Direct solving is faster.
  3. Memory. You need to store the entire inverse matrix, which is n2n^2 entries.

Use np.linalg.solve(A, b) instead of np.linalg.inv(A) @ b. The result is the same, but solve is faster and more accurate.

import numpy as np

A = np.array([[1, 2, 1],
              [2, 5, 2],
              [3, 7, 4]])
b = np.array([9, 21, 30])

# Preferred: direct solve
x = np.linalg.solve(A, b)

# Avoid: explicit inverse
x_bad = np.linalg.inv(A) @ b

# Same result, but solve() is faster and more numerically stable
print(np.allclose(x, x_bad))  # True

The determinant

The determinant is a single number that encodes important information about a matrix.

For a 2×22 \times 2 matrix:

det[abcd]=adbc\det\begin{bmatrix} a & b \\ c & d \end{bmatrix} = ad - bc

For larger matrices, you can compute the determinant using cofactor expansion or by performing Gaussian elimination (the determinant is the product of the pivots).

What the determinant tells you:

  • det(A)0\det(A) \neq 0: AA is invertible, Ax=bA\mathbf{x} = \mathbf{b} has a unique solution.
  • det(A)=0\det(A) = 0: AA is singular, no unique solution.
  • det(A)|\det(A)| gives the factor by which AA scales areas (2D) or volumes (3D).

The determinant also connects directly to eigenvalues: the determinant of a matrix equals the product of its eigenvalues.

Summary

ConceptKey idea
Ax=bA\mathbf{x} = \mathbf{b}The standard form for a linear system
Inverse A1A^{-1}Satisfies AA1=IAA^{-1} = I; gives x=A1b\mathbf{x} = A^{-1}\mathbf{b}
DeterminantZero means singular (no inverse)
Gaussian eliminationRow operations to reach triangular form, then back-substitute
Singular matrixLinearly dependent rows, det=0\det = 0, no unique solution

What comes next

Inverses tell you how to undo a matrix transformation. But what directions does a matrix leave unchanged? The next article covers eigenvalues and eigenvectors, one of the most powerful ideas in linear algebra and a foundation for PCA, spectral methods, and stability analysis.

Start typing to search across all content
navigate Enter open Esc close