Calculus
Derivatives
A derivative measures how much a function's output changes when you nudge its input. It's the foundation of every learning algorithm — without it, there's no way to know which way to adjust weights.
Partial Derivatives
When a function takes multiple inputs, partial derivatives measure the effect of changing one input while holding all others fixed. Every weight in a neural network gets its own partial derivative of the loss.
Gradients
The gradient collects all partial derivatives into a single vector pointing in the direction of steepest increase. Negating it gives you the direction to move weights to reduce loss — the core operation in all neural network training.
Chain Rule
When functions are composed — one feeding into another — the chain rule tells you how to differentiate through the chain. It's not just a calculus rule; it is backpropagation.
Optimization Basics
Optimization means finding the input that minimizes (or maximizes) a function. In ML, the function is the loss and the inputs are the weights — and the gradient is the compass that guides the search.
Gradient Descent Intuition
Gradient descent is the algorithm that trains neural networks. It takes small steps in the direction of the negative gradient — downhill on the loss surface — until the loss is acceptably low.