Linear Combinations and Span

A linear combination scales and adds vectors together. The span is all the points reachable by those combinations — the entire space those vectors can fill. This defines what a model can and cannot represent.

Intuition First

Say you have two paint colors: red and blue. By mixing different amounts of each — more red, less blue, or vice versa — you can produce a whole range of purples. That's a linear combination.

Now ask: what's every color you could ever produce using just these two? That set — the entire reachable palette — is the span of red and blue.

In math, your "colors" are vectors, and you mix them by multiplying each by a scalar and summing:

result = c₁·v₁ + c₂·v₂ + ...

What's Actually Happening

A linear combination of vectors v₁, v₂, ..., vₖ with scalars c₁, ..., cₖ:

c₁v₁ + c₂v₂ + ... + cₖvₖ

The span is the set of all possible linear combinations — every point you can reach by choosing any scalars c₁, ..., cₖ.

Two things shape the span:

How many vectors you start with
Whether they're independent — i.e., whether each one points somewhere truly new

If you add a vector that's already reachable from the others (a dependent vector), the span doesn't grow. It's redundant.

Build the Idea Step-by-Step

Vectors v₁, v₂, ...

→

Form all linear combinations

→

Span = full reachable region

→

Independent vectors → each expands span

→

Dependent vector → span stays the same

Formal Explanation

Linear combination:

w = c₁v₁ + c₂v₂ + ... + cₖvₖ     where c₁,...,cₖ ∈ ℝ

Span:

span(v₁,...,vₖ) = {c₁v₁ + ... + cₖvₖ | c₁,...,cₖ ∈ ℝ}

The span is always a flat subspace through the origin:

One vector in 3D → span is a line through origin
Two non-parallel vectors in 3D → span is a plane through origin
Three non-coplanar vectors in 3D → span is all of ℝ³

Linear independence: vectors v₁, ..., vₖ are linearly independent if the only solution to c₁v₁ + ... + cₖvₖ = 0 is c₁ = ... = cₖ = 0. No vector can be expressed as a combination of the others.

Key Properties / Rules

Concept	Key fact
Span always contains origin	Setting all cᵢ = 0 gives zero vector
Adding a dependent vector	Doesn't expand the span
Adding an independent vector	Expands span by one dimension
n independent vectors in ℝⁿ	Span = all of ℝⁿ
Rank of a matrix	Number of independent columns = span dimension

Why It Matters

What a neural network layer can output is exactly the span of its weight matrix columns (the column space). A (512×256) weight matrix can only produce outputs in a 256-dimensional space — and potentially much smaller if many columns are nearly parallel (low rank). The model literally cannot learn to produce outputs outside this span.

LoRA fine-tuning exploits this directly: instead of updating the full weight matrix, it parameterizes the update as a product of two thin matrices whose columns span a small subspace. The assumption: the meaningful update has low rank — only a few independent directions need to change.

Null space is the flip side: it's the set of inputs mapped to zero. Large null space = many different inputs produce the same output = information loss.

Common Pitfalls

More vectors ≠ bigger span. Adding redundant (dependent) vectors gives you no new reach. What matters is the number of independent directions, i.e., the rank.
Span always passes through origin. You can never reach a point "away from origin" with a pure linear combination — that's why neural networks add bias terms.
Near-dependence in high dimensions. In 1000D, vectors can be nearly parallel without being exactly parallel. Use SVD or np.linalg.matrix_rank() to check effective independence — your eyes can't tell.

Examples

import numpy as np

v1 = np.array([1., 0.])
v2 = np.array([0., 1.])
v3 = np.array([2., 3.])   # = 2·v1 + 3·v2 → linearly dependent

# Check rank — how many independent directions?
A = np.column_stack([v1, v2, v3])   # (2, 3) matrix
print(f"Rank: {np.linalg.matrix_rank(A)}")   # 2 — v3 adds nothing

# Span of v1, v2 already covers all of ℝ²
# Any target y = [a, b] can be reached:
y = np.array([5., -3.])
c, _, _, _ = np.linalg.lstsq(A, y, rcond=None)
print(f"Reconstructed y: {A @ c}")   # matches [5., -3.]

# Matrix rank in neural network context
W = np.random.randn(512, 256)       # full rank: 256 independent output directions
W_low = W[:, :10] @ W[:10, :]      # rank 10: only 10 useful directions
print(f"Full rank: {np.linalg.matrix_rank(W)}")       # 256
print(f"Low rank: {np.linalg.matrix_rank(W_low)}")    # ≤ 10