Matrices
A matrix is a rectangular grid of numbers. Matrix multiplication composes transformations, and the transpose flips rows into columns. These two operations are the foundation of every neural network layer.
Intuition First
A matrix is just a 2D grid of numbers — rows and columns. The moment you have a spreadsheet of data (samples × features), you have a matrix. The moment you write down the weights of a neural network layer, you have a matrix.
Two operations matter most:
- Multiplication — combining two transformations into one
- Transpose — flipping rows into columns (and vice versa)
If you can reason about matrix shapes and understand why AB ≠ BA, you have the core skill for debugging deep learning code.
What's Actually Happening
Multiplication
When you multiply matrix A by matrix B, each entry of the result is a dot product: row i of A dotted with column j of B.
C[i, j] = (row i of A) · (column j of B)
Think of it as: A applies its transformation first, then B applies its transformation on top. The result C is the composition of both transformations at once.
The critical constraint: inner dimensions must match. (m×n) @ (n×p) → (m×p). The n must be the same.
Transpose
The transpose swaps rows and columns. A[i, j] becomes Aᵀ[j, i].
If A is (m×n), then Aᵀ is (n×m).
The key rule for transposing products: (AB)ᵀ = BᵀAᵀ — order reverses.
Build the Idea Step-by-Step
Formal Explanation
Matrix product C = AB where A is m×n, B is n×p:
C[i, j] = Σₖ A[i,k] · B[k,j]
Example:
A = [[1, 2], B = [[5, 6],
[3, 4]] [7, 8]]
C[0,0] = 1·5 + 2·7 = 19
C[0,1] = 1·6 + 2·8 = 22
C[1,0] = 3·5 + 4·7 = 43
C[1,1] = 3·6 + 4·8 = 50
Transpose:
(Aᵀ)[i, j] = A[j, i]
(AB)ᵀ = BᵀAᵀ ← order reverses!
Key Properties / Rules
| Property | Formula | Notes |
|---|---|---|
| Shape rule | (m×n) @ (n×p) → (m×p) | Inner dims must match |
| Non-commutative | AB ≠ BA in general | Order matters always |
| Associative | (AB)C = A(BC) | Can group freely |
| Distributive | A(B+C) = AB + AC | Works like algebra |
| Transpose of product | (AB)ᵀ = BᵀAᵀ | Order reverses |
| Symmetric matrix | A = Aᵀ | Covariance matrices, Hessians |
Why It Matters
A linear layer in a neural network is: output = W @ input + b. When processing a batch of inputs at once, you multiply W by an entire input matrix. The forward pass of a neural network is a chain of matrix multiplications.
Shape reasoning is the most practically useful linear algebra skill. Shape errors are the most common bug in deep learning. Track (batch, features) through every operation.
Common Pitfalls
*vs@in NumPy/PyTorch.A * Bis element-wise multiplication (both arrays must have the same shape).A @ Bis matrix multiplication. Both are valid syntax — you'll get no error, just silently wrong results.ABdoesn't exist just becauseBAdoes. (3×4) @ (4×2) is valid, but (4×2) @ (3×4) is not. Always verify shapes before multiplying.- Transpose rule order.
(AB)ᵀ = BᵀAᵀ, notAᵀBᵀ. Forgetting the reversal causes shape bugs in backpropagation derivations.
Examples
import numpy as np
A = np.array([[1., 2.],
[3., 4.]])
B = np.array([[5., 6.],
[7., 8.]])
C = A @ B # [[19,22],[43,50]]
At = A.T # [[1,3],[2,4]] — rows and columns swapped
print(f"A shape: {A.shape}")
print(f"C = A@B:\n{C}")
print(f"Aᵀ:\n{At}")
print(f"AB ≠ BA: {not np.allclose(A @ B, B @ A)}") # True
# Neural network layer: W maps 3-dim input to 4-dim output
W = np.random.randn(4, 3) # (out_features × in_features)
x = np.random.randn(3) # single input
y = W @ x # shape (4,)
# Batch: process 8 inputs simultaneously
X = np.random.randn(8, 3) # (batch × in_features)
Y = X @ W.T # (8,3) @ (3,4) → shape (8, 4)
print(f"Batch output shape: {Y.shape}") # (8, 4)