Overfitting and Underfitting
Overfitting happens when a model memorizes training data instead of learning the pattern. Underfitting happens when a model is too simple to capture the pattern at all. Both produce bad models — for opposite reasons.
Intuition First
Imagine you're studying for an exam:
- Overfitting: You memorize every past exam question word-for-word. You score 100% on old exams but completely fail when the professor changes the wording slightly. You learned the specific questions, not the underlying subject.
- Underfitting: You barely studied. You can't answer old questions OR new ones. The material never made it in.
- Good generalization: You understood the concepts. You can handle familiar questions and novel variations of them.
What's Actually Happening
Underfitting happens when the model is too simple:
- A linear model for data that has a quadratic relationship
- A tiny neural network for a complex image task
- Stopping training too early
The model lacks the capacity to learn the pattern.
Overfitting happens when the model is too complex relative to the data:
- A 1000-parameter model trained on 50 examples
- Training for too many epochs
- No regularization
The model has more than enough capacity — it uses the excess to memorize noise.
Build the Idea Step-by-Step
Formal Explanation
The training/test error gap is the primary diagnostic:
Generalization gap = Test Error - Train Error
| Scenario | Train Error | Test Error | Gap | Problem |
|---|---|---|---|---|
| Underfitting | High | High | Small | Model too simple |
| Good fit | Low | Low | Small | Ideal |
| Overfitting | Very low | High | Large | Model too complex |
A large generalization gap means the model learned something about the training set specifically, rather than the underlying distribution.
Learning curves
Plot train and test error vs. number of training examples:
- Underfitting: Both errors are high and close together. More data won't help — the model can't use it.
- Overfitting: Train error is low; test error is high. They converge only when you have a lot of data.
- Good fit: Both errors are low. Test error is slightly above train error (normal).
Key Properties / Rules
| Symptom | Most Likely Cause | Fix |
|---|---|---|
| High train AND test error | Underfitting | Bigger model, more features, longer training |
| Low train, high test error | Overfitting | More data, regularization, dropout, simpler model |
| Low train error but weird test behavior | Overfitting to noise | Early stopping, cross-validation |
| Adding data doesn't improve test error | Underfitting (bias issue) | Increase model capacity |
| Adding data helps test error | Overfitting (variance issue) | Keep collecting data, or regularize |
Why It Matters
Every ML debugging workflow starts here. Before you change architecture, hyperparameters, or data — diagnose whether you have an underfitting or overfitting problem.
- Underfitting solutions: bigger model, more layers, more features, better architecture, train longer
- Overfitting solutions: more data, data augmentation, dropout, L1/L2 regularization, early stopping, simpler model
Getting this diagnosis right focuses your effort. Applying overfitting fixes to an underfitting model (or vice versa) will make things worse.
Common Pitfalls
- Not checking train error. Without it you can't tell if you're underfitting or overfitting — test error alone is ambiguous.
- Fixing the wrong problem. Adding regularization to an underfitting model increases bias. Adding capacity to an overfitting model with small data increases variance.
- Confusing training epochs with model capacity. Running more epochs doesn't make a simple model complex — it just overfits the limited capacity. Capacity is model size; epochs control how hard you optimize.
- Evaluating on test set during development. If you tune hyperparameters using test error, you'll overfit to the test set. Use a validation set for tuning.
Examples
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import train_test_split
# True pattern: quadratic
np.random.seed(0)
X = np.linspace(-3, 3, 60).reshape(-1, 1)
y = X.ravel()**2 + np.random.randn(60) * 0.5 # y = x² + noise
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
for degree in [1, 2, 10]:
model = make_pipeline(PolynomialFeatures(degree), LinearRegression())
model.fit(X_train, y_train)
train_err = np.mean((model.predict(X_train) - y_train)**2)
test_err = np.mean((model.predict(X_test) - y_test)**2)
print(f"degree={degree}: train={train_err:.2f}, test={test_err:.2f}")
# Expected:
# degree=1: train=4.50, test=4.70 ← underfitting (linear can't learn x²)
# degree=2: train=0.25, test=0.28 ← good fit (matches true function)
# degree=10: train=0.18, test=1.80 ← overfitting (memorized noise)
Neural network example:
import torch
import torch.nn as nn
# Underfitting: too few neurons
tiny_model = nn.Sequential(nn.Linear(10, 2), nn.ReLU(), nn.Linear(2, 1))
# Overfitting risk: too many neurons with tiny dataset
huge_model = nn.Sequential(
nn.Linear(10, 512), nn.ReLU(),
nn.Linear(512, 512), nn.ReLU(),
nn.Linear(512, 1)
)
# huge_model trained on 50 examples will overfit
# Balanced: reasonable capacity + dropout
balanced_model = nn.Sequential(
nn.Linear(10, 64), nn.ReLU(), nn.Dropout(0.3),
nn.Linear(64, 64), nn.ReLU(), nn.Dropout(0.3),
nn.Linear(64, 1)
)