MnemosyneMnemosyne

Eigenvalues and Eigenvectors

An eigenvector is a direction a matrix only stretches, never rotates. The eigenvalue is the stretch factor. This is the intuition behind PCA, optimization landscapes, and why certain network behaviors emerge.

Intuition First

When you multiply most vectors by a matrix, two things change: the direction and the length. The vector gets rotated and scaled.

But for certain special vectors — the eigenvectors — something remarkable happens: the matrix only stretches or squishes them, without rotating them at all. They point in the same direction before and after.

The eigenvalue is the stretch factor. If the eigenvalue is 3, the eigenvector gets 3× longer. If it's 0.5, it gets halved. If it's negative, it flips direction.

These "undisturbed" directions are the natural axes of the transformation — and they reveal a matrix's true structure.


What's Actually Happening

The defining equation is:

A · v = λ · v

A applied to v gives back v scaled by λ. Nothing rotated — just scaled.

Think of a matrix that squashes space along one axis and stretches along another (squeezing a ball into an ellipse). The axis being squashed and the axis being stretched are the eigenvectors. The amount of squash/stretch is the eigenvalue.

For symmetric matrices (like covariance matrices), the eigenvectors are always perpendicular to each other — they form clean orthogonal axes. This is why PCA produces principal components that are at right angles.


Build the Idea Step-by-Step

Matrix A = some transformation
Find directions A only scales (not rotates)
These are eigenvectors
Scale factors are eigenvalues
Symmetric A → eigenvectors always perpendicular

Formal Explanation

Definition: non-zero vector v and scalar λ satisfy Av = λv.

Eigenvalue interpretation:

  • λ > 1 → stretches v
  • 0 < λ < 1 → compresses v
  • λ < 0 → flips and scales v
  • λ = 0 → v collapses to zero; matrix is not invertible

Finding eigenvalues: solve det(A − λI) = 0. In practice you never do this by hand — you call np.linalg.eigh() (for symmetric) or np.linalg.eig() (general).

Key facts about eigenvalues:

trace(A) = sum of eigenvalues
det(A)   = product of eigenvalues

Key Properties / Rules

PropertyMeaning
Av = λvDefining equation
Symmetric matrixReal eigenvalues, orthogonal eigenvectors (always)
Positive definiteAll λ > 0 (covariance matrices have this)
λ = 0Matrix is singular — maps some direction to zero
trace(A) = ΣλᵢSum of diagonal = sum of eigenvalues
det(A) = ΠλᵢDeterminant = product of eigenvalues

Why It Matters

PCA finds eigenvectors of the data covariance matrix. The eigenvector with the largest eigenvalue points where data varies most. The eigenvalue tells you how much variance lives in that direction. Projecting onto the top-k eigenvectors gives the best k-dimensional summary of the data.

Optimization: the eigenvalues of the Hessian (second-derivative matrix) of the loss at a minimum describe how curved the loss is. Large eigenvalues = sharp curvature = need a small learning rate. Small eigenvalues = flat = slow convergence. A big spread between the largest and smallest eigenvalue (high condition number) is why training can be slow and unstable.

Why batch normalization helps: it reshapes the loss surface to have more similar curvature in all directions (brings eigenvalues closer together), making gradient descent behave more uniformly.


Common Pitfalls

  • Eigenvectors are a direction, not a specific vector. Any scalar multiple of an eigenvector is also an eigenvector. Libraries return unit-length eigenvectors, but the sign (+ or −) is arbitrary — don't rely on sign.
  • Not all matrices have real eigenvectors. A rotation matrix has no real fixed direction — its eigenvalues are complex. In ML you mostly deal with symmetric matrices (covariance, Hessians) which always have real eigenvalues.
  • Eigenvalues ≠ singular values. SVD singular values are always ≥ 0 and are NOT eigenvalues in general. For symmetric positive-semidefinite matrices they happen to coincide, but don't assume this elsewhere.

Examples

import numpy as np

# Symmetric 2×2 matrix
A = np.array([[3., 1.],
              [1., 3.]])

# Use eigh for symmetric matrices (guaranteed real, stable)
eigenvalues, eigenvectors = np.linalg.eigh(A)
# eigenvalues: [2., 4.]  — sorted ascending
# eigenvectors: columns are the eigenvectors

print(f"Eigenvalues: {eigenvalues}")
print(f"Eigenvectors:\n{eigenvectors}")

# Verify: Av = λv for each pair
for i in range(len(eigenvalues)):
    v = eigenvectors[:, i]
    lam = eigenvalues[i]
    assert np.allclose(A @ v, lam * v), "Should be equal"
    print(f"λ={lam}: A@v={A @ v},  λ*v={lam * v}")

# PCA via eigendecomposition of covariance matrix
data = np.random.randn(200, 5)     # 200 samples, 5 features
cov = np.cov(data.T)               # 5×5 symmetric covariance matrix

vals, vecs = np.linalg.eigh(cov)
# Sort descending (eigh returns ascending)
idx = np.argsort(vals)[::-1]
vals, vecs = vals[idx], vecs[:, idx]

variance_explained = vals / vals.sum()
print(f"\nVariance explained per component: {variance_explained.round(3)}")
# Each should be ~0.2 since data is isotropic

Review Questions