MnemosyneMnemosyne

Matrix-Vector Multiplication

Multiplying a matrix by a vector produces a new vector. The matrix is a transformation — it rotates, scales, or projects the input into a new space. Every neural network layer is this operation.

Intuition First

Imagine you have a machine that takes a 3D point and outputs a 2D point. You feed it [x, y, z], it applies some rule, and out comes [a, b]. That machine is a matrix.

More concretely: every linear neural network layer is exactly this. You have a weight matrix W. You feed it an input vector x. Out comes an output vector y. The layer is transforming the input — reshaping, rotating, and scaling it into a new representation.

This is matrix-vector multiplication: y = W @ x.


What's Actually Happening

You can think about y = Ax in two complementary ways — they give the same result, but each reveals different structure.

Row view: each output entry y[i] is a dot product between row i of A and the full input x. It asks: "how much does x align with this row direction?"

Column view: Ax is a weighted sum of A's columns, where the weights are the entries of x:

Ax = x[0]·(col 0) + x[1]·(col 1) + ... + x[n-1]·(col n-1)

The column view is key: the output always lives in the column space of A — the set of all linear combinations of A's columns. A can never produce a result outside this space, no matter what you put in.


Build the Idea Step-by-Step

Input vector x (n-dim)
Matrix A (m×n) = transformation
y = Ax (m-dim output)
y lives in column space of A
Add bias b to shift output off origin

Formal Explanation

For A (m×n) and x (n-dim):

y = Ax

y[i] = Σⱼ A[i,j] · x[j]   (row view: dot product with row i)

Ax = Σⱼ x[j] · A[:, j]    (column view: linear combination of columns)

Common geometric effects of different matrices:

Matrix typeEffect on vectors
Identity INo change
Diagonal (s₁, s₂)Scale axis 1 by s₁, axis 2 by s₂
Rotation by θRotate all vectors by θ
Projection matrixSquash onto a line or plane
Singular matrixCollapses some directions to zero

Key Properties / Rules

PropertyDescription
Shape(m×n) @ (n,) → (m,)
LinearityA(u + v) = Au + Av
LinearityA(cu) = c·Au
Origin always fixedA·0 = 0 — linear maps can't translate
CompositionB(Ax) = (BA)x — order reverses in the matrix

Why It Matters

The forward pass of a neural network is a sequence of matrix-vector multiplications:

h₁ = σ(W₁x)
h₂ = σ(W₂h₁)
output = W₃h₂

Each Wᵢ reshapes the representation, and σ (ReLU, GELU) adds non-linearity between them.

Without non-linearity: composing two linear maps W₂(W₁x) = (W₂W₁)x is still just one linear map. Any chain of linear layers can be collapsed to a single one. Depth only helps because of the activation function.


Common Pitfalls

  • A linear transformation always fixes the origin. A·0 = 0 always. This is why layers add a bias: y = Ax + b. Without b, the model can only represent hyperplanes through the origin — severely limiting expressivity.
  • The output dimension is the number of rows. An (5×3) matrix maps 3D → 5D. Input size = columns, output size = rows. This is the most common shape confusion.
  • Composition reverses order. "Apply A first, then B" = B(Ax) = (BA)x. The combined matrix is BA, not AB. The order flips when you write it as a product.

Examples

import numpy as np

# A (2×3) matrix maps 3D → 2D
A = np.array([[1., 0., -1.],
              [0., 1.,  2.]])

x = np.array([3., 1., 2.])

y = A @ x    # [3-2, 1+4] = [1., 5.]
print(f"Input: {x}  →  Output: {y}")

# Column view gives the same result
col_view = x[0]*A[:,0] + x[1]*A[:,1] + x[2]*A[:,2]
print(f"Column view: {col_view}")    # identical to y

# Batch: 32 inputs, each 3-dim → 2-dim output
W = np.random.randn(2, 3)           # weight matrix
X = np.random.randn(32, 3)          # 32 input vectors
Y = X @ W.T                         # (32, 3) @ (3, 2) → (32, 2)
print(f"Batch output shape: {Y.shape}")  # (32, 2)

# Why bias matters
b = np.array([1., -3.])
y_with_bias = A @ x + b    # can represent any affine output, not just 0-centered

Review Questions