Vectors

A vector is an ordered list of numbers that represents both a position in space and a direction. Dot products, norms, and projections are the three operations that power similarity search, attention, and regression in AI.

Intuition First

Imagine you're describing a friend's apartment location to someone: "go 3 blocks east and 1 block north." That description — [3, 1] — is a vector. Two numbers, pinning down a specific point (or direction) in 2D space.

In machine learning, vectors are how everything gets represented: a word, an image, a user's preferences, a layer's activations. All of them become ordered lists of numbers. Once you have vectors, three operations unlock almost everything:

Dot product — how much do two vectors point the same way? (measuring similarity)
Norm — how long is a vector? (measuring magnitude)
Projection — what part of vector A lies along vector B? (extracting components)

What's Actually Happening

Think of two vectors as arrows starting from the origin.

The dot product captures how much they "agree" in direction. If they point the same way, the result is big and positive. If they're perpendicular (90°), the result is zero. If they point opposite ways, it's negative.

The norm is just the length of the arrow — calculated by Pythagoras extended to N dimensions.

The projection of u onto v is the shadow u casts when light shines perpendicular to v. It extracts the part of u that lies along v's direction.

Build the Idea Step-by-Step

Two vectors u and v

→

Dot product: u·v = how aligned?

→

Norm: ‖v‖ = how long?

→

Cosine similarity: direction only

→

Projection: shadow of u onto v

Formal Explanation

Given vectors u and v with n components:

Dot product:

u · v = u₁v₁ + u₂v₂ + ... + uₙvₙ
      = ‖u‖ ‖v‖ cos(θ)

L2 Norm:

‖v‖ = √(v₁² + v₂² + ... + vₙ²)

Unit vector (length 1, direction preserved):

v̂ = v / ‖v‖

Cosine similarity (removes magnitude — pure direction comparison):

cos(θ) = (u · v) / (‖u‖ · ‖v‖)    ∈ [-1, 1]

Projection of u onto v:

proj_v(u) = (u · v / ‖v‖²) · v

Key Properties / Rules

Operation	Formula	Meaning
Dot product	`Σ uᵢvᵢ`	Alignment / raw similarity
L2 norm	`√(Σ vᵢ²)`	Length of vector
Unit vector	`v / ‖v‖`	Direction without magnitude
Orthogonal	`u · v = 0`	Perpendicular — no alignment
Projection	`(u·v / ‖v‖²) · v`	Component of u along v
Cosine similarity	`(u·v) / (‖u‖‖v‖)`	Direction only, −1 to 1

Why It Matters

Attention in transformers is a dot product. The query vector Q and key vector K for each token are compared with Q · Kᵀ. A high dot product = high relevance = high attention weight. You're literally asking "how much does this query align with this key?"

Semantic search in vector databases computes cosine similarity between the query embedding and every stored embedding. "Find documents similar to this question" = find vectors with small angle to the query vector.

Regression is a projection problem under the hood — finding the closest point in a feature subspace to the target values.

Common Pitfalls

Raw dot product inflates with magnitude. A large vector scores high against everything. Use cosine similarity when you care about direction, not size. Always normalize embeddings before similarity search.
L2 norm ≠ L1 norm. L1 is Σ|vᵢ| (sum of absolutes, used in sparse regularization). L2 is √(Σvᵢ²) (Euclidean length). They punish differently — L1 pushes values to zero, L2 shrinks them.
High-dimensional intuition breaks. In 1000 dimensions, almost all random vectors are nearly orthogonal — cosine similarity ≈ 0 for most pairs. The 2D intuition of "similar vectors point the same way" doesn't hold at scale.

Examples

import numpy as np

a = np.array([1.0, 2.0, 3.0])
b = np.array([4.0, -1.0, 2.0])

dot    = np.dot(a, b)                        # 1*4 + 2*(-1) + 3*2 = 8
norm_a = np.linalg.norm(a)                   # √(1+4+9) = √14 ≈ 3.74
norm_b = np.linalg.norm(b)                   # √(16+1+4) = √21 ≈ 4.58
cosine = dot / (norm_a * norm_b)             # ≈ 0.467

# Projection of a onto b
proj = (np.dot(a, b) / np.dot(b, b)) * b    # (8/21) * b

print(f"dot={dot}, cosine similarity={cosine:.3f}")
print(f"projection of a onto b: {proj}")

# Unit vector
a_unit = a / norm_a
print(f"unit vector length: {np.linalg.norm(a_unit):.6f}")  # exactly 1.0