Basis and Dimensionality

A basis is the minimal set of independent vectors that spans a space — a coordinate system. Dimensionality is how many basis vectors are needed. These concepts determine how much information a representation can hold.

Intuition First

When you say a point is at (3, -1, 2) in 3D space, what does that actually mean? It means: 3 steps along the x-axis, -1 along y, 2 along z. The three axes are your basis — a set of reference directions. The numbers 3, -1, 2 are your coordinates in that basis.

Any space needs a basis to have coordinates. The number of vectors in a basis is the dimension of the space — the number of independent directions that exist.

Change the basis (use different reference directions), and the coordinates change. But the dimension stays the same.

What's Actually Happening

A basis for a space is a set of vectors that is:

Linearly independent — no redundancy; none can be built from the others
Spanning — they can reach every point in the space

Together, these two properties mean: every vector has exactly one way to be expressed as a combination of basis vectors. That unique combination is its coordinates.

Dimensionality is the number of vectors in any basis. All bases for the same space have the same number of vectors — that count is intrinsic to the space itself.

Changing basis is just re-expressing the same point using different reference directions. Same physical point, different coordinate representation — like describing the same location in GPS coordinates vs. street addresses.

Build the Idea Step-by-Step

Vectors in a space

→

Find independent spanning set → basis

→

Each vector has unique coordinates in basis

→

Count basis vectors → dimension

→

Change basis → same geometry, new coordinates

Formal Explanation

A set B = {b₁, b₂, ..., bₖ} is a basis for vector space V if:

Linearly independent: c₁b₁ + ... + cₖbₖ = 0 only when all cᵢ = 0
Spanning: every v ∈ V can be written as v = c₁b₁ + ... + cₖbₖ

Dimension = number of vectors in any basis for V.

Standard basis of ℝⁿ: the unit vectors e₁, e₂, ..., eₙ along each axis. When you write [3, -1, 2], those are the coordinates in the standard basis.

Changing basis: if your new basis vectors are columns of matrix B, find coordinates c via:

Bc = v   →   c = B⁻¹v

Rank-nullity theorem: for a matrix A with n columns:

rank(A) + nullity(A) = n

Rank = dimension of column space. Nullity = dimension of null space (inputs mapped to zero).

Key Properties / Rules

Concept	Key fact
Basis size	Always equals the dimension of the space
Coordinates	Unique for each vector in a given basis
Rank of matrix	Dimension of its column space
Null space dim	n − rank (rank-nullity theorem)
Orthonormal basis	Orthogonal + unit length; coordinates = dot products

Why It Matters

Embedding dimensions in NLP (e.g., 768-dim BERT embeddings) are the dimension of the learned representation space. More dimensions = more capacity to encode distinct concepts. But each dimension needs parameters to fill it — too many with too little data causes overfitting.

PCA finds a new basis aligned with the directions of maximum variance in your data. It re-expresses your data in a new coordinate system where the first axis explains the most variance, the second explains the next most, and so on. Dropping the low-variance axes = dimensionality reduction.

Rank deficiency means a matrix's column space has lower dimension than expected — some output directions are unreachable. This is why zero initialization is bad: a zero weight matrix has rank 0, and all rows stay identical after the first gradient step on identical inputs.

Common Pitfalls

More dimensions isn't always better. Each extra dimension needs training signal to fill it meaningfully. Low-data settings benefit from lower-dimensional embeddings.
Basis vectors don't need to be perpendicular. Any linearly independent set is a valid basis. Orthonormal bases (perpendicular + unit length) are convenient because coordinate extraction simplifies to a dot product: cᵢ = bᵢ · v.
Dimensionality ≠ number of features. If 1000 features are correlated, the effective rank of the data matrix might be 10. The data lives on a low-dimensional manifold even though you measured 1000 things.

Examples

import numpy as np

# Standard basis in R^3
e1 = np.array([1., 0., 0.])
e2 = np.array([0., 1., 0.])
e3 = np.array([0., 0., 1.])

v = np.array([3., -1., 2.])
# v = 3*e1 + (-1)*e2 + 2*e3 — coordinates are just the entries

# Non-standard basis (also spans R^3)
b1 = np.array([1., 1., 0.])
b2 = np.array([1., -1., 0.])
b3 = np.array([0., 0., 1.])

B = np.column_stack([b1, b2, b3])
print(f"Rank: {np.linalg.matrix_rank(B)}")   # 3 — full rank, valid basis

c = np.linalg.solve(B, v)                    # coordinates of v in new basis
print(f"Coordinates in {[b1, b2, b3]}: {c}")
print(f"Reconstructed: {B @ c}")             # should match v exactly

# PCA: find the effective dimension of a dataset
data = np.random.randn(100, 10)
data[:, 5:] = data[:, :5] @ np.random.randn(5, 5)   # last 5 depend on first 5

_, s, _ = np.linalg.svd(data, full_matrices=False)
effective_dim = np.sum(s > 0.1 * s[0])              # singular values above 10% of max
print(f"Singular values: {s.round(2)}")
print(f"Effective dimension: {effective_dim}")       # ≈ 5, not 10