Overfitting and Underfitting

Overfitting happens when a model memorizes training data instead of learning the pattern. Underfitting happens when a model is too simple to capture the pattern at all. Both produce bad models — for opposite reasons.

Intuition First

Imagine you're studying for an exam:

Overfitting: You memorize every past exam question word-for-word. You score 100% on old exams but completely fail when the professor changes the wording slightly. You learned the specific questions, not the underlying subject.
Underfitting: You barely studied. You can't answer old questions OR new ones. The material never made it in.
Good generalization: You understood the concepts. You can handle familiar questions and novel variations of them.

What's Actually Happening

Underfitting happens when the model is too simple:

A linear model for data that has a quadratic relationship
A tiny neural network for a complex image task
Stopping training too early

The model lacks the capacity to learn the pattern.

Overfitting happens when the model is too complex relative to the data:

A 1000-parameter model trained on 50 examples
Training for too many epochs
No regularization

The model has more than enough capacity — it uses the excess to memorize noise.

Build the Idea Step-by-Step

Collect data (signal + noise)

→

Train model

→

Underfitting: model too simple, misses signal

→

Overfitting: model too complex, memorizes noise

→

Good fit: captures signal, ignores noise

→

Detect via: train error vs. test error gap

Formal Explanation

The training/test error gap is the primary diagnostic:

Generalization gap = Test Error - Train Error

Scenario	Train Error	Test Error	Gap	Problem
Underfitting	High	High	Small	Model too simple
Good fit	Low	Low	Small	Ideal
Overfitting	Very low	High	Large	Model too complex

A large generalization gap means the model learned something about the training set specifically, rather than the underlying distribution.

Learning curves

Plot train and test error vs. number of training examples:

Underfitting: Both errors are high and close together. More data won't help — the model can't use it.
Overfitting: Train error is low; test error is high. They converge only when you have a lot of data.
Good fit: Both errors are low. Test error is slightly above train error (normal).

Key Properties / Rules

Symptom	Most Likely Cause	Fix
High train AND test error	Underfitting	Bigger model, more features, longer training
Low train, high test error	Overfitting	More data, regularization, dropout, simpler model
Low train error but weird test behavior	Overfitting to noise	Early stopping, cross-validation
Adding data doesn't improve test error	Underfitting (bias issue)	Increase model capacity
Adding data helps test error	Overfitting (variance issue)	Keep collecting data, or regularize

Why It Matters

Every ML debugging workflow starts here. Before you change architecture, hyperparameters, or data — diagnose whether you have an underfitting or overfitting problem.

Underfitting solutions: bigger model, more layers, more features, better architecture, train longer
Overfitting solutions: more data, data augmentation, dropout, L1/L2 regularization, early stopping, simpler model

Getting this diagnosis right focuses your effort. Applying overfitting fixes to an underfitting model (or vice versa) will make things worse.

Common Pitfalls

Not checking train error. Without it you can't tell if you're underfitting or overfitting — test error alone is ambiguous.
Fixing the wrong problem. Adding regularization to an underfitting model increases bias. Adding capacity to an overfitting model with small data increases variance.
Confusing training epochs with model capacity. Running more epochs doesn't make a simple model complex — it just overfits the limited capacity. Capacity is model size; epochs control how hard you optimize.
Evaluating on test set during development. If you tune hyperparameters using test error, you'll overfit to the test set. Use a validation set for tuning.

Examples

import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import train_test_split

# True pattern: quadratic
np.random.seed(0)
X = np.linspace(-3, 3, 60).reshape(-1, 1)
y = X.ravel()**2 + np.random.randn(60) * 0.5   # y = x² + noise

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

for degree in [1, 2, 10]:
    model = make_pipeline(PolynomialFeatures(degree), LinearRegression())
    model.fit(X_train, y_train)
    train_err = np.mean((model.predict(X_train) - y_train)**2)
    test_err  = np.mean((model.predict(X_test)  - y_test)**2)
    print(f"degree={degree}: train={train_err:.2f}, test={test_err:.2f}")

# Expected:
# degree=1:  train=4.50, test=4.70   ← underfitting (linear can't learn x²)
# degree=2:  train=0.25, test=0.28   ← good fit (matches true function)
# degree=10: train=0.18, test=1.80   ← overfitting (memorized noise)

Neural network example:

import torch
import torch.nn as nn

# Underfitting: too few neurons
tiny_model = nn.Sequential(nn.Linear(10, 2), nn.ReLU(), nn.Linear(2, 1))

# Overfitting risk: too many neurons with tiny dataset
huge_model = nn.Sequential(
    nn.Linear(10, 512), nn.ReLU(),
    nn.Linear(512, 512), nn.ReLU(),
    nn.Linear(512, 1)
)
# huge_model trained on 50 examples will overfit

# Balanced: reasonable capacity + dropout
balanced_model = nn.Sequential(
    nn.Linear(10, 64), nn.ReLU(), nn.Dropout(0.3),
    nn.Linear(64, 64), nn.ReLU(), nn.Dropout(0.3),
    nn.Linear(64, 1)
)