Search…
Maths for ML · Part 5

Eigenvalues and Eigenvectors

In this series (15 parts)
  1. Why Maths Matters for ML: A Practical Overview
  2. Scalars, Vectors, and Vector Spaces
  3. Matrices and Matrix Operations
  4. Matrix Inverses and Systems of Linear Equations
  5. Eigenvalues and Eigenvectors
  6. Matrix Decompositions: LU, QR, SVD
  7. Norms, Distances, and Similarity
  8. Calculus Review: Derivatives and the Chain Rule
  9. Partial Derivatives and Gradients
  10. The Jacobian and Hessian Matrices
  11. Taylor series and local approximations
  12. Probability fundamentals
  13. Random variables and distributions
  14. Bayes theorem and its role in ML
  15. Information theory: entropy, KL divergence, cross-entropy

Most vectors change direction when you multiply them by a matrix. Eigenvalues and eigenvectors are the special cases where a matrix simply stretches or compresses a vector without rotating it. This turns out to be one of the most important ideas in linear algebra, with direct applications to PCA, spectral clustering, stability analysis, and matrix decompositions.

Prerequisites

This article builds on matrix inverses and linear systems. You should be comfortable with matrix multiplication, determinants, and solving systems of equations.

The core definition

A nonzero vector v\mathbf{v} is an eigenvector of a square matrix AA if multiplying AA by v\mathbf{v} gives back a scaled version of v\mathbf{v}:

Av=λvA\mathbf{v} = \lambda \mathbf{v}

The scalar λ\lambda is the corresponding eigenvalue.

Eigenvector equation: the matrix only scales, not rotates:

graph LR
  V["Input vector v"] --> A["Multiply by matrix A"]
  A --> OUT["Output = lambda * v<br/>Same direction as v<br/>Scaled by eigenvalue lambda"]
  style V fill:#e1f5fe
  style OUT fill:#c8e6c9

Read this equation carefully. On the left, a matrix transforms a vector. On the right, that same vector is simply multiplied by a number. The matrix’s effect on this particular vector is nothing more than scaling.

Why this matters

Most vectors get both scaled and rotated by a matrix. Eigenvectors are special because they only get scaled. This makes them natural “axes” for understanding what a matrix does.

  • In PCA, the eigenvectors of the covariance matrix are the principal components, the directions of maximum variance in your data.
  • In gradient descent, the eigenvalues of the Hessian determine how fast you converge along each direction.
  • In graph algorithms, the eigenvectors of the adjacency matrix reveal community structure.
  • In dynamical systems, eigenvalues tell you whether the system is stable (all λ<1|\lambda| < 1) or unstable.

Finding eigenvalues: the characteristic polynomial

Start from the defining equation and rearrange:

Av=λvA\mathbf{v} = \lambda \mathbf{v} Avλv=0A\mathbf{v} - \lambda \mathbf{v} = \mathbf{0} (AλI)v=0(A - \lambda I)\mathbf{v} = \mathbf{0}

We need a nonzero v\mathbf{v}, which means the matrix (AλI)(A - \lambda I) must be singular. A matrix is singular when its determinant is zero:

det(AλI)=0\det(A - \lambda I) = 0

This is the characteristic equation. The left side, det(AλI)\det(A - \lambda I), is a polynomial in λ\lambda called the characteristic polynomial. For an n×nn \times n matrix, this polynomial has degree nn, which means there are at most nn eigenvalues (counted with multiplicity).

For a 2x2 matrix

Given:

A=[abcd]A = \begin{bmatrix} a & b \\ c & d \end{bmatrix}

The characteristic polynomial is:

det[aλbcdλ]=(aλ)(dλ)bc=0\det\begin{bmatrix} a - \lambda & b \\ c & d - \lambda \end{bmatrix} = (a - \lambda)(d - \lambda) - bc = 0 λ2(a+d)λ+(adbc)=0\lambda^2 - (a + d)\lambda + (ad - bc) = 0

Notice: the coefficient of λ\lambda is (a+d)-(a + d), which is the negative of the trace (sum of diagonal entries). The constant term is adbcad - bc, which is the determinant. So:

λ2tr(A)λ+det(A)=0\lambda^2 - \text{tr}(A) \cdot \lambda + \det(A) = 0

This is a quadratic you can solve with the quadratic formula.

Characteristic polynomial: roots are the eigenvalues

Finding eigenvectors

Once you know an eigenvalue λ\lambda, find its eigenvector by solving:

(AλI)v=0(A - \lambda I)\mathbf{v} = \mathbf{0}

This is a homogeneous system. Since det(AλI)=0\det(A - \lambda I) = 0, there are infinitely many solutions (a whole line or subspace of eigenvectors). You typically pick the simplest nonzero solution.

The set of all eigenvectors for a given eigenvalue, plus the zero vector, is called the eigenspace for that eigenvalue.

Worked example 1: eigenvalues of a 2x2 matrix

Find the eigenvalues of:

A=[4123]A = \begin{bmatrix} 4 & 1 \\ 2 & 3 \end{bmatrix}

Step 1: set up the characteristic equation.

det(AλI)=det[4λ123λ]=0\det(A - \lambda I) = \det\begin{bmatrix} 4 - \lambda & 1 \\ 2 & 3 - \lambda \end{bmatrix} = 0

Step 2: expand the determinant.

(4λ)(3λ)(1)(2)=0(4 - \lambda)(3 - \lambda) - (1)(2) = 0 124λ3λ+λ22=012 - 4\lambda - 3\lambda + \lambda^2 - 2 = 0 λ27λ+10=0\lambda^2 - 7\lambda + 10 = 0

Step 3: solve the quadratic.

(λ5)(λ2)=0(\lambda - 5)(\lambda - 2) = 0 λ1=5,λ2=2\lambda_1 = 5, \quad \lambda_2 = 2

Quick sanity check: λ1+λ2=5+2=7=tr(A)\lambda_1 + \lambda_2 = 5 + 2 = 7 = \text{tr}(A) ✓ and λ1λ2=5×2=10=det(A)\lambda_1 \cdot \lambda_2 = 5 \times 2 = 10 = \det(A) ✓.

import numpy as np

A = np.array([[4, 1], [2, 3]])
eigenvalues, eigenvectors = np.linalg.eig(A)
print("Eigenvalues:", eigenvalues)  # [5. 2.]

Worked example 2: finding eigenvectors

Using the same matrix A=[4123]A = \begin{bmatrix} 4 & 1 \\ 2 & 3 \end{bmatrix} with eigenvalues λ1=5\lambda_1 = 5 and λ2=2\lambda_2 = 2.

Eigenvector for λ1=5\lambda_1 = 5:

Solve (A5I)v=0(A - 5I)\mathbf{v} = \mathbf{0}:

A5I=[451235]=[1122]A - 5I = \begin{bmatrix} 4 - 5 & 1 \\ 2 & 3 - 5 \end{bmatrix} = \begin{bmatrix} -1 & 1 \\ 2 & -2 \end{bmatrix}

The system:

v1+v2=0-v_1 + v_2 = 0 2v12v2=02v_1 - 2v_2 = 0

Both equations say the same thing: v2=v1v_2 = v_1. Choose v1=1v_1 = 1:

v1=[11]\mathbf{v}_1 = \begin{bmatrix} 1 \\ 1 \end{bmatrix}

Eigenvector for λ2=2\lambda_2 = 2:

Solve (A2I)v=0(A - 2I)\mathbf{v} = \mathbf{0}:

A2I=[421232]=[2121]A - 2I = \begin{bmatrix} 4 - 2 & 1 \\ 2 & 3 - 2 \end{bmatrix} = \begin{bmatrix} 2 & 1 \\ 2 & 1 \end{bmatrix}

The system:

2v1+v2=02v_1 + v_2 = 0

So v2=2v1v_2 = -2v_1. Choose v1=1v_1 = 1:

v2=[12]\mathbf{v}_2 = \begin{bmatrix} 1 \\ -2 \end{bmatrix}
import numpy as np

A = np.array([[4, 1], [2, 3]])
vals, vecs = np.linalg.eig(A)
print("Eigenvectors (as columns):")
print(vecs)
# Note: NumPy returns normalized eigenvectors, so they may look
# different in scale but point in the same direction.

Worked example 3: verifying Av=λvA\mathbf{v} = \lambda\mathbf{v}

Let’s verify our results. We found that for A=[4123]A = \begin{bmatrix} 4 & 1 \\ 2 & 3 \end{bmatrix}:

  • λ1=5\lambda_1 = 5 with v1=[11]\mathbf{v}_1 = \begin{bmatrix} 1 \\ 1 \end{bmatrix}
  • λ2=2\lambda_2 = 2 with v2=[12]\mathbf{v}_2 = \begin{bmatrix} 1 \\ -2 \end{bmatrix}

Verify λ1=5\lambda_1 = 5, v1=[1,1]T\mathbf{v}_1 = [1, 1]^T:

Av1=[4123][11]=[4(1)+1(1)2(1)+3(1)]=[55]A\mathbf{v}_1 = \begin{bmatrix} 4 & 1 \\ 2 & 3 \end{bmatrix} \begin{bmatrix} 1 \\ 1 \end{bmatrix} = \begin{bmatrix} 4(1) + 1(1) \\ 2(1) + 3(1) \end{bmatrix} = \begin{bmatrix} 5 \\ 5 \end{bmatrix} λ1v1=5[11]=[55]\lambda_1 \mathbf{v}_1 = 5 \begin{bmatrix} 1 \\ 1 \end{bmatrix} = \begin{bmatrix} 5 \\ 5 \end{bmatrix} Av1=λ1v1A\mathbf{v}_1 = \lambda_1 \mathbf{v}_1 \quad \checkmark

Verify λ2=2\lambda_2 = 2, v2=[1,2]T\mathbf{v}_2 = [1, -2]^T:

Av2=[4123][12]=[4(1)+1(2)2(1)+3(2)]=[24]A\mathbf{v}_2 = \begin{bmatrix} 4 & 1 \\ 2 & 3 \end{bmatrix} \begin{bmatrix} 1 \\ -2 \end{bmatrix} = \begin{bmatrix} 4(1) + 1(-2) \\ 2(1) + 3(-2) \end{bmatrix} = \begin{bmatrix} 2 \\ -4 \end{bmatrix} λ2v2=2[12]=[24]\lambda_2 \mathbf{v}_2 = 2 \begin{bmatrix} 1 \\ -2 \end{bmatrix} = \begin{bmatrix} 2 \\ -4 \end{bmatrix} Av2=λ2v2A\mathbf{v}_2 = \lambda_2 \mathbf{v}_2 \quad \checkmark

Both check out. The matrix AA scales v1\mathbf{v}_1 by a factor of 5 and v2\mathbf{v}_2 by a factor of 2, without changing their directions.

import numpy as np

A = np.array([[4, 1], [2, 3]])
v1 = np.array([1, 1])
v2 = np.array([1, -2])

print("Av1 =", A @ v1)        # [5 5]
print("5*v1 =", 5 * v1)       # [5 5]
print("Av2 =", A @ v2)        # [ 2 -4]
print("2*v2 =", 2 * v2)       # [ 2 -4]

Geometric meaning

Think of a 2×22 \times 2 matrix as a transformation of the plane. It stretches, rotates, and/or reflects every point.

graph LR
  A["Input vector v"] --> B["Multiply by A"]
  B --> C{"Is v an eigenvector?"}
  C -->|Yes| D["Output = λv<br/>(same direction, scaled)"]
  C -->|No| E["Output = Av<br/>(new direction and scale)"]

Eigenvectors are the directions that survive the transformation unchanged (up to scaling). The eigenvalue tells you the scaling factor along that direction.

If both eigenvalues are positive, the matrix stretches in both eigenvector directions. If one is negative, it flips the direction along that eigenvector. If an eigenvalue is zero, the matrix collapses that direction entirely.

A concrete picture

For our matrix A=[4123]A = \begin{bmatrix} 4 & 1 \\ 2 & 3 \end{bmatrix}:

  • Along v1=[1,1]T\mathbf{v}_1 = [1, 1]^T: stretch by factor 5.
  • Along v2=[1,2]T\mathbf{v}_2 = [1, -2]^T: stretch by factor 2.

Any vector in R2\mathbb{R}^2 can be decomposed into a combination of v1\mathbf{v}_1 and v2\mathbf{v}_2. So the matrix’s action on any vector is determined by these two stretching factors.

Diagonalization

If an n×nn \times n matrix AA has nn linearly independent eigenvectors, we can diagonalize it. Arrange the eigenvectors as columns of a matrix PP and the eigenvalues on the diagonal of a matrix DD:

P=[v1v2vn],D=[λ1000λ2000λn]P = \begin{bmatrix} \mathbf{v}_1 & \mathbf{v}_2 & \cdots & \mathbf{v}_n \end{bmatrix}, \quad D = \begin{bmatrix} \lambda_1 & 0 & \cdots & 0 \\ 0 & \lambda_2 & \cdots & 0 \\ \vdots & & \ddots & \vdots \\ 0 & 0 & \cdots & \lambda_n \end{bmatrix}

Then:

A=PDP1A = PDP^{-1}

This is called the eigendecomposition of AA.

Diagonalization process:

graph TD
  A["Start with matrix A"] --> EV["Find eigenvalues<br/>Solve det of A - lambda I = 0"]
  EV --> EC["Find eigenvectors<br/>Solve A - lambda I times v = 0"]
  EC --> P["Form P<br/>Eigenvectors as columns"]
  EC --> D["Form D<br/>Eigenvalues on diagonal"]
  P --> RES["A = P D P-inverse"]
  D --> RES
  style A fill:#e1f5fe
  style RES fill:#c8e6c9

Why diagonalization is useful

  1. Fast matrix powers. Ak=PDkP1A^k = PD^kP^{-1}. Since DD is diagonal, DkD^k just raises each diagonal entry to the kk-th power. This turns O(n3k)O(n^3 k) into O(n3)O(n^3).

  2. Understanding transformations. The decomposition says: “change to the eigenvector basis (P1P^{-1}), scale along each axis (DD), change back (PP).”

  3. PCA. The covariance matrix is symmetric, so it is always diagonalizable. Its eigenvectors form an orthogonal basis, and the eigenvalues tell you the variance along each principal component.

Diagonalizing our example

For A=[4123]A = \begin{bmatrix} 4 & 1 \\ 2 & 3 \end{bmatrix} with λ1=5\lambda_1 = 5, v1=[1,1]T\mathbf{v}_1 = [1, 1]^T, λ2=2\lambda_2 = 2, v2=[1,2]T\mathbf{v}_2 = [1, -2]^T:

P=[1112],D=[5002]P = \begin{bmatrix} 1 & 1 \\ 1 & -2 \end{bmatrix}, \quad D = \begin{bmatrix} 5 & 0 \\ 0 & 2 \end{bmatrix}

First, compute P1P^{-1}. Using the 2×22 \times 2 inverse formula:

det(P)=(1)(2)(1)(1)=3\det(P) = (1)(-2) - (1)(1) = -3 P1=13[2111]=[2/31/31/31/3]P^{-1} = \frac{1}{-3}\begin{bmatrix} -2 & -1 \\ -1 & 1 \end{bmatrix} = \begin{bmatrix} 2/3 & 1/3 \\ 1/3 & -1/3 \end{bmatrix}

Verify PDP1=APDP^{-1} = A:

PD=[1112][5002]=[5254]PD = \begin{bmatrix} 1 & 1 \\ 1 & -2 \end{bmatrix}\begin{bmatrix} 5 & 0 \\ 0 & 2 \end{bmatrix} = \begin{bmatrix} 5 & 2 \\ 5 & -4 \end{bmatrix} PDP1=[5254][2/31/31/31/3]PDP^{-1} = \begin{bmatrix} 5 & 2 \\ 5 & -4 \end{bmatrix}\begin{bmatrix} 2/3 & 1/3 \\ 1/3 & -1/3 \end{bmatrix}

Entry (1,1)(1,1): 5(2/3)+2(1/3)=10/3+2/3=12/3=45(2/3) + 2(1/3) = 10/3 + 2/3 = 12/3 = 4

Entry (1,2)(1,2): 5(1/3)+2(1/3)=5/32/3=3/3=15(1/3) + 2(-1/3) = 5/3 - 2/3 = 3/3 = 1

Entry (2,1)(2,1): 5(2/3)+(4)(1/3)=10/34/3=6/3=25(2/3) + (-4)(1/3) = 10/3 - 4/3 = 6/3 = 2

Entry (2,2)(2,2): 5(1/3)+(4)(1/3)=5/3+4/3=9/3=35(1/3) + (-4)(-1/3) = 5/3 + 4/3 = 9/3 = 3

PDP1=[4123]=APDP^{-1} = \begin{bmatrix} 4 & 1 \\ 2 & 3 \end{bmatrix} = A \quad \checkmark
import numpy as np

A = np.array([[4, 1], [2, 3]])
vals, P = np.linalg.eig(A)
D = np.diag(vals)
P_inv = np.linalg.inv(P)

# Reconstruct A
A_reconstructed = P @ D @ P_inv
print(np.allclose(A, A_reconstructed))  # True

Special cases and pitfalls

Repeated eigenvalues

A matrix can have repeated eigenvalues. The 2×22 \times 2 identity matrix has λ=1\lambda = 1 with multiplicity 2. It is still diagonalizable (it is already diagonal).

But some matrices with repeated eigenvalues are not diagonalizable:

A=[3103]A = \begin{bmatrix} 3 & 1 \\ 0 & 3 \end{bmatrix}

This has λ=3\lambda = 3 with multiplicity 2, but only one linearly independent eigenvector. Such matrices require the Jordan normal form instead of a simple diagonalization.

Complex eigenvalues

Some real matrices have complex eigenvalues. For example:

A=[0110]A = \begin{bmatrix} 0 & -1 \\ 1 & 0 \end{bmatrix}

The characteristic equation is λ2+1=0\lambda^2 + 1 = 0, giving λ=±i\lambda = \pm i. This matrix is a 90-degree rotation, so no real vector stays on its line after transformation. The complex eigenvalues encode the rotation angle.

Symmetric matrices are special

If AA is symmetric (A=ATA = A^T), you get two guarantees:

  1. All eigenvalues are real.
  2. The eigenvectors are orthogonal to each other.

This makes symmetric matrices much easier to work with. Covariance matrices are symmetric, which is why PCA works so cleanly.

Eigenvalues in ML: a quick reference

ApplicationWhat you computeWhat eigenvalues tell you
PCAEigenvectors of covariance matrixVariance along each principal component
Spectral clusteringEigenvectors of graph LaplacianCluster structure in the data
Gradient descent convergenceEigenvalues of HessianCondition number, convergence speed
Stability of recurrent networksEigenvalues of weight matrixWhether gradients explode or vanish
PageRankDominant eigenvector of link matrixImportance of each page

Summary

ConceptDefinition
Eigenvalue λ\lambdaScalar satisfying Av=λvA\mathbf{v} = \lambda\mathbf{v}
Eigenvector v\mathbf{v}Nonzero vector scaled but not rotated by AA
Characteristic polynomialdet(AλI)=0\det(A - \lambda I) = 0; its roots are the eigenvalues
EigenspaceAll eigenvectors for a given λ\lambda, plus 0\mathbf{0}
DiagonalizationA=PDP1A = PDP^{-1} where DD is diagonal
TraceSum of eigenvalues =a11+a22+= a_{11} + a_{22} + \cdots
DeterminantProduct of eigenvalues

What comes next

Eigenvalues give you one way to decompose a matrix. The next article covers matrix decompositions more broadly, including SVD (singular value decomposition), which generalizes eigendecomposition to non-square matrices and is behind dimensionality reduction, recommendation systems, and low-rank approximations.

Start typing to search across all content
navigate Enter open Esc close