Linear Combinations and Span
A linear combination scales and adds vectors together. The span is all the points reachable by those combinations — the entire space those vectors can fill. This defines what a model can and cannot represent.
Intuition First
Say you have two paint colors: red and blue. By mixing different amounts of each — more red, less blue, or vice versa — you can produce a whole range of purples. That's a linear combination.
Now ask: what's every color you could ever produce using just these two? That set — the entire reachable palette — is the span of red and blue.
In math, your "colors" are vectors, and you mix them by multiplying each by a scalar and summing:
result = c₁·v₁ + c₂·v₂ + ...
What's Actually Happening
A linear combination of vectors v₁, v₂, ..., vₖ with scalars c₁, ..., cₖ:
c₁v₁ + c₂v₂ + ... + cₖvₖ
The span is the set of all possible linear combinations — every point you can reach by choosing any scalars c₁, ..., cₖ.
Two things shape the span:
- How many vectors you start with
- Whether they're independent — i.e., whether each one points somewhere truly new
If you add a vector that's already reachable from the others (a dependent vector), the span doesn't grow. It's redundant.
Build the Idea Step-by-Step
Formal Explanation
Linear combination:
w = c₁v₁ + c₂v₂ + ... + cₖvₖ where c₁,...,cₖ ∈ ℝ
Span:
span(v₁,...,vₖ) = {c₁v₁ + ... + cₖvₖ | c₁,...,cₖ ∈ ℝ}
The span is always a flat subspace through the origin:
- One vector in 3D → span is a line through origin
- Two non-parallel vectors in 3D → span is a plane through origin
- Three non-coplanar vectors in 3D → span is all of ℝ³
Linear independence: vectors v₁, ..., vₖ are linearly independent if the only solution to c₁v₁ + ... + cₖvₖ = 0 is c₁ = ... = cₖ = 0. No vector can be expressed as a combination of the others.
Key Properties / Rules
| Concept | Key fact |
|---|---|
| Span always contains origin | Setting all cᵢ = 0 gives zero vector |
| Adding a dependent vector | Doesn't expand the span |
| Adding an independent vector | Expands span by one dimension |
| n independent vectors in ℝⁿ | Span = all of ℝⁿ |
| Rank of a matrix | Number of independent columns = span dimension |
Why It Matters
What a neural network layer can output is exactly the span of its weight matrix columns (the column space). A (512×256) weight matrix can only produce outputs in a 256-dimensional space — and potentially much smaller if many columns are nearly parallel (low rank). The model literally cannot learn to produce outputs outside this span.
LoRA fine-tuning exploits this directly: instead of updating the full weight matrix, it parameterizes the update as a product of two thin matrices whose columns span a small subspace. The assumption: the meaningful update has low rank — only a few independent directions need to change.
Null space is the flip side: it's the set of inputs mapped to zero. Large null space = many different inputs produce the same output = information loss.
Common Pitfalls
- More vectors ≠ bigger span. Adding redundant (dependent) vectors gives you no new reach. What matters is the number of independent directions, i.e., the rank.
- Span always passes through origin. You can never reach a point "away from origin" with a pure linear combination — that's why neural networks add bias terms.
- Near-dependence in high dimensions. In 1000D, vectors can be nearly parallel without being exactly parallel. Use SVD or
np.linalg.matrix_rank()to check effective independence — your eyes can't tell.
Examples
import numpy as np
v1 = np.array([1., 0.])
v2 = np.array([0., 1.])
v3 = np.array([2., 3.]) # = 2·v1 + 3·v2 → linearly dependent
# Check rank — how many independent directions?
A = np.column_stack([v1, v2, v3]) # (2, 3) matrix
print(f"Rank: {np.linalg.matrix_rank(A)}") # 2 — v3 adds nothing
# Span of v1, v2 already covers all of ℝ²
# Any target y = [a, b] can be reached:
y = np.array([5., -3.])
c, _, _, _ = np.linalg.lstsq(A, y, rcond=None)
print(f"Reconstructed y: {A @ c}") # matches [5., -3.]
# Matrix rank in neural network context
W = np.random.randn(512, 256) # full rank: 256 independent output directions
W_low = W[:, :10] @ W[:10, :] # rank 10: only 10 useful directions
print(f"Full rank: {np.linalg.matrix_rank(W)}") # 256
print(f"Low rank: {np.linalg.matrix_rank(W_low)}") # ≤ 10