MnemosyneMnemosyne

Linear Combinations and Span

A linear combination scales and adds vectors together. The span is all the points reachable by those combinations — the entire space those vectors can fill. This defines what a model can and cannot represent.

Intuition First

Say you have two paint colors: red and blue. By mixing different amounts of each — more red, less blue, or vice versa — you can produce a whole range of purples. That's a linear combination.

Now ask: what's every color you could ever produce using just these two? That set — the entire reachable palette — is the span of red and blue.

In math, your "colors" are vectors, and you mix them by multiplying each by a scalar and summing:

result = c₁·v₁ + c₂·v₂ + ...

What's Actually Happening

A linear combination of vectors v₁, v₂, ..., vₖ with scalars c₁, ..., cₖ:

c₁v₁ + c₂v₂ + ... + cₖvₖ

The span is the set of all possible linear combinations — every point you can reach by choosing any scalars c₁, ..., cₖ.

Two things shape the span:

  • How many vectors you start with
  • Whether they're independent — i.e., whether each one points somewhere truly new

If you add a vector that's already reachable from the others (a dependent vector), the span doesn't grow. It's redundant.


Build the Idea Step-by-Step

Vectors v₁, v₂, ...
Form all linear combinations
Span = full reachable region
Independent vectors → each expands span
Dependent vector → span stays the same

Formal Explanation

Linear combination:

w = c₁v₁ + c₂v₂ + ... + cₖvₖ     where c₁,...,cₖ ∈ ℝ

Span:

span(v₁,...,vₖ) = {c₁v₁ + ... + cₖvₖ | c₁,...,cₖ ∈ ℝ}

The span is always a flat subspace through the origin:

  • One vector in 3D → span is a line through origin
  • Two non-parallel vectors in 3D → span is a plane through origin
  • Three non-coplanar vectors in 3D → span is all of ℝ³

Linear independence: vectors v₁, ..., vₖ are linearly independent if the only solution to c₁v₁ + ... + cₖvₖ = 0 is c₁ = ... = cₖ = 0. No vector can be expressed as a combination of the others.


Key Properties / Rules

ConceptKey fact
Span always contains originSetting all cᵢ = 0 gives zero vector
Adding a dependent vectorDoesn't expand the span
Adding an independent vectorExpands span by one dimension
n independent vectors in ℝⁿSpan = all of ℝⁿ
Rank of a matrixNumber of independent columns = span dimension

Why It Matters

What a neural network layer can output is exactly the span of its weight matrix columns (the column space). A (512×256) weight matrix can only produce outputs in a 256-dimensional space — and potentially much smaller if many columns are nearly parallel (low rank). The model literally cannot learn to produce outputs outside this span.

LoRA fine-tuning exploits this directly: instead of updating the full weight matrix, it parameterizes the update as a product of two thin matrices whose columns span a small subspace. The assumption: the meaningful update has low rank — only a few independent directions need to change.

Null space is the flip side: it's the set of inputs mapped to zero. Large null space = many different inputs produce the same output = information loss.


Common Pitfalls

  • More vectors ≠ bigger span. Adding redundant (dependent) vectors gives you no new reach. What matters is the number of independent directions, i.e., the rank.
  • Span always passes through origin. You can never reach a point "away from origin" with a pure linear combination — that's why neural networks add bias terms.
  • Near-dependence in high dimensions. In 1000D, vectors can be nearly parallel without being exactly parallel. Use SVD or np.linalg.matrix_rank() to check effective independence — your eyes can't tell.

Examples

import numpy as np

v1 = np.array([1., 0.])
v2 = np.array([0., 1.])
v3 = np.array([2., 3.])   # = 2·v1 + 3·v2 → linearly dependent

# Check rank — how many independent directions?
A = np.column_stack([v1, v2, v3])   # (2, 3) matrix
print(f"Rank: {np.linalg.matrix_rank(A)}")   # 2 — v3 adds nothing

# Span of v1, v2 already covers all of ℝ²
# Any target y = [a, b] can be reached:
y = np.array([5., -3.])
c, _, _, _ = np.linalg.lstsq(A, y, rcond=None)
print(f"Reconstructed y: {A @ c}")   # matches [5., -3.]

# Matrix rank in neural network context
W = np.random.randn(512, 256)       # full rank: 256 independent output directions
W_low = W[:, :10] @ W[:10, :]      # rank 10: only 10 useful directions
print(f"Full rank: {np.linalg.matrix_rank(W)}")       # 256
print(f"Low rank: {np.linalg.matrix_rank(W_low)}")    # ≤ 10

Review Questions