Eigenvalues and Eigenvectors
In this series (15 parts)
- Why Maths Matters for ML: A Practical Overview
- Scalars, Vectors, and Vector Spaces
- Matrices and Matrix Operations
- Matrix Inverses and Systems of Linear Equations
- Eigenvalues and Eigenvectors
- Matrix Decompositions: LU, QR, SVD
- Norms, Distances, and Similarity
- Calculus Review: Derivatives and the Chain Rule
- Partial Derivatives and Gradients
- The Jacobian and Hessian Matrices
- Taylor series and local approximations
- Probability fundamentals
- Random variables and distributions
- Bayes theorem and its role in ML
- Information theory: entropy, KL divergence, cross-entropy
Most vectors change direction when you multiply them by a matrix. Eigenvalues and eigenvectors are the special cases where a matrix simply stretches or compresses a vector without rotating it. This turns out to be one of the most important ideas in linear algebra, with direct applications to PCA, spectral clustering, stability analysis, and matrix decompositions.
Prerequisites
This article builds on matrix inverses and linear systems. You should be comfortable with matrix multiplication, determinants, and solving systems of equations.
The core definition
A nonzero vector is an eigenvector of a square matrix if multiplying by gives back a scaled version of :
The scalar is the corresponding eigenvalue.
Eigenvector equation: the matrix only scales, not rotates:
graph LR V["Input vector v"] --> A["Multiply by matrix A"] A --> OUT["Output = lambda * v<br/>Same direction as v<br/>Scaled by eigenvalue lambda"] style V fill:#e1f5fe style OUT fill:#c8e6c9
Read this equation carefully. On the left, a matrix transforms a vector. On the right, that same vector is simply multiplied by a number. The matrix’s effect on this particular vector is nothing more than scaling.
Why this matters
Most vectors get both scaled and rotated by a matrix. Eigenvectors are special because they only get scaled. This makes them natural “axes” for understanding what a matrix does.
- In PCA, the eigenvectors of the covariance matrix are the principal components, the directions of maximum variance in your data.
- In gradient descent, the eigenvalues of the Hessian determine how fast you converge along each direction.
- In graph algorithms, the eigenvectors of the adjacency matrix reveal community structure.
- In dynamical systems, eigenvalues tell you whether the system is stable (all ) or unstable.
Finding eigenvalues: the characteristic polynomial
Start from the defining equation and rearrange:
We need a nonzero , which means the matrix must be singular. A matrix is singular when its determinant is zero:
This is the characteristic equation. The left side, , is a polynomial in called the characteristic polynomial. For an matrix, this polynomial has degree , which means there are at most eigenvalues (counted with multiplicity).
For a 2x2 matrix
Given:
The characteristic polynomial is:
Notice: the coefficient of is , which is the negative of the trace (sum of diagonal entries). The constant term is , which is the determinant. So:
This is a quadratic you can solve with the quadratic formula.
Characteristic polynomial: roots are the eigenvalues
Finding eigenvectors
Once you know an eigenvalue , find its eigenvector by solving:
This is a homogeneous system. Since , there are infinitely many solutions (a whole line or subspace of eigenvectors). You typically pick the simplest nonzero solution.
The set of all eigenvectors for a given eigenvalue, plus the zero vector, is called the eigenspace for that eigenvalue.
Worked example 1: eigenvalues of a 2x2 matrix
Find the eigenvalues of:
Step 1: set up the characteristic equation.
Step 2: expand the determinant.
Step 3: solve the quadratic.
Quick sanity check: ✓ and ✓.
import numpy as np
A = np.array([[4, 1], [2, 3]])
eigenvalues, eigenvectors = np.linalg.eig(A)
print("Eigenvalues:", eigenvalues) # [5. 2.]
Worked example 2: finding eigenvectors
Using the same matrix with eigenvalues and .
Eigenvector for :
Solve :
The system:
Both equations say the same thing: . Choose :
Eigenvector for :
Solve :
The system:
So . Choose :
import numpy as np
A = np.array([[4, 1], [2, 3]])
vals, vecs = np.linalg.eig(A)
print("Eigenvectors (as columns):")
print(vecs)
# Note: NumPy returns normalized eigenvectors, so they may look
# different in scale but point in the same direction.
Worked example 3: verifying
Let’s verify our results. We found that for :
- with
- with
Verify , :
Verify , :
Both check out. The matrix scales by a factor of 5 and by a factor of 2, without changing their directions.
import numpy as np
A = np.array([[4, 1], [2, 3]])
v1 = np.array([1, 1])
v2 = np.array([1, -2])
print("Av1 =", A @ v1) # [5 5]
print("5*v1 =", 5 * v1) # [5 5]
print("Av2 =", A @ v2) # [ 2 -4]
print("2*v2 =", 2 * v2) # [ 2 -4]
Geometric meaning
Think of a matrix as a transformation of the plane. It stretches, rotates, and/or reflects every point.
graph LR
A["Input vector v"] --> B["Multiply by A"]
B --> C{"Is v an eigenvector?"}
C -->|Yes| D["Output = λv<br/>(same direction, scaled)"]
C -->|No| E["Output = Av<br/>(new direction and scale)"]
Eigenvectors are the directions that survive the transformation unchanged (up to scaling). The eigenvalue tells you the scaling factor along that direction.
If both eigenvalues are positive, the matrix stretches in both eigenvector directions. If one is negative, it flips the direction along that eigenvector. If an eigenvalue is zero, the matrix collapses that direction entirely.
A concrete picture
For our matrix :
- Along : stretch by factor 5.
- Along : stretch by factor 2.
Any vector in can be decomposed into a combination of and . So the matrix’s action on any vector is determined by these two stretching factors.
Diagonalization
If an matrix has linearly independent eigenvectors, we can diagonalize it. Arrange the eigenvectors as columns of a matrix and the eigenvalues on the diagonal of a matrix :
Then:
This is called the eigendecomposition of .
Diagonalization process:
graph TD A["Start with matrix A"] --> EV["Find eigenvalues<br/>Solve det of A - lambda I = 0"] EV --> EC["Find eigenvectors<br/>Solve A - lambda I times v = 0"] EC --> P["Form P<br/>Eigenvectors as columns"] EC --> D["Form D<br/>Eigenvalues on diagonal"] P --> RES["A = P D P-inverse"] D --> RES style A fill:#e1f5fe style RES fill:#c8e6c9
Why diagonalization is useful
-
Fast matrix powers. . Since is diagonal, just raises each diagonal entry to the -th power. This turns into .
-
Understanding transformations. The decomposition says: “change to the eigenvector basis (), scale along each axis (), change back ().”
-
PCA. The covariance matrix is symmetric, so it is always diagonalizable. Its eigenvectors form an orthogonal basis, and the eigenvalues tell you the variance along each principal component.
Diagonalizing our example
For with , , , :
First, compute . Using the inverse formula:
Verify :
Entry : ✓
Entry : ✓
Entry : ✓
Entry : ✓
import numpy as np
A = np.array([[4, 1], [2, 3]])
vals, P = np.linalg.eig(A)
D = np.diag(vals)
P_inv = np.linalg.inv(P)
# Reconstruct A
A_reconstructed = P @ D @ P_inv
print(np.allclose(A, A_reconstructed)) # True
Special cases and pitfalls
Repeated eigenvalues
A matrix can have repeated eigenvalues. The identity matrix has with multiplicity 2. It is still diagonalizable (it is already diagonal).
But some matrices with repeated eigenvalues are not diagonalizable:
This has with multiplicity 2, but only one linearly independent eigenvector. Such matrices require the Jordan normal form instead of a simple diagonalization.
Complex eigenvalues
Some real matrices have complex eigenvalues. For example:
The characteristic equation is , giving . This matrix is a 90-degree rotation, so no real vector stays on its line after transformation. The complex eigenvalues encode the rotation angle.
Symmetric matrices are special
If is symmetric (), you get two guarantees:
- All eigenvalues are real.
- The eigenvectors are orthogonal to each other.
This makes symmetric matrices much easier to work with. Covariance matrices are symmetric, which is why PCA works so cleanly.
Eigenvalues in ML: a quick reference
| Application | What you compute | What eigenvalues tell you |
|---|---|---|
| PCA | Eigenvectors of covariance matrix | Variance along each principal component |
| Spectral clustering | Eigenvectors of graph Laplacian | Cluster structure in the data |
| Gradient descent convergence | Eigenvalues of Hessian | Condition number, convergence speed |
| Stability of recurrent networks | Eigenvalues of weight matrix | Whether gradients explode or vanish |
| PageRank | Dominant eigenvector of link matrix | Importance of each page |
Summary
| Concept | Definition |
|---|---|
| Eigenvalue | Scalar satisfying |
| Eigenvector | Nonzero vector scaled but not rotated by |
| Characteristic polynomial | ; its roots are the eigenvalues |
| Eigenspace | All eigenvectors for a given , plus |
| Diagonalization | where is diagonal |
| Trace | Sum of eigenvalues |
| Determinant | Product of eigenvalues |
What comes next
Eigenvalues give you one way to decompose a matrix. The next article covers matrix decompositions more broadly, including SVD (singular value decomposition), which generalizes eigendecomposition to non-square matrices and is behind dimensionality reduction, recommendation systems, and low-rank approximations.