MnemosyneMnemosyne

Overfitting and Underfitting

Overfitting happens when a model memorizes training data instead of learning the pattern. Underfitting happens when a model is too simple to capture the pattern at all. Both produce bad models — for opposite reasons.

Intuition First

Imagine you're studying for an exam:

  • Overfitting: You memorize every past exam question word-for-word. You score 100% on old exams but completely fail when the professor changes the wording slightly. You learned the specific questions, not the underlying subject.
  • Underfitting: You barely studied. You can't answer old questions OR new ones. The material never made it in.
  • Good generalization: You understood the concepts. You can handle familiar questions and novel variations of them.

What's Actually Happening

Underfitting happens when the model is too simple:

  • A linear model for data that has a quadratic relationship
  • A tiny neural network for a complex image task
  • Stopping training too early

The model lacks the capacity to learn the pattern.

Overfitting happens when the model is too complex relative to the data:

  • A 1000-parameter model trained on 50 examples
  • Training for too many epochs
  • No regularization

The model has more than enough capacity — it uses the excess to memorize noise.


Build the Idea Step-by-Step

Collect data (signal + noise)
Train model
Underfitting: model too simple, misses signal
Overfitting: model too complex, memorizes noise
Good fit: captures signal, ignores noise
Detect via: train error vs. test error gap

Formal Explanation

The training/test error gap is the primary diagnostic:

Generalization gap = Test Error - Train Error
ScenarioTrain ErrorTest ErrorGapProblem
UnderfittingHighHighSmallModel too simple
Good fitLowLowSmallIdeal
OverfittingVery lowHighLargeModel too complex

A large generalization gap means the model learned something about the training set specifically, rather than the underlying distribution.

Learning curves

Plot train and test error vs. number of training examples:

  • Underfitting: Both errors are high and close together. More data won't help — the model can't use it.
  • Overfitting: Train error is low; test error is high. They converge only when you have a lot of data.
  • Good fit: Both errors are low. Test error is slightly above train error (normal).

Key Properties / Rules

SymptomMost Likely CauseFix
High train AND test errorUnderfittingBigger model, more features, longer training
Low train, high test errorOverfittingMore data, regularization, dropout, simpler model
Low train error but weird test behaviorOverfitting to noiseEarly stopping, cross-validation
Adding data doesn't improve test errorUnderfitting (bias issue)Increase model capacity
Adding data helps test errorOverfitting (variance issue)Keep collecting data, or regularize

Why It Matters

Every ML debugging workflow starts here. Before you change architecture, hyperparameters, or data — diagnose whether you have an underfitting or overfitting problem.

  • Underfitting solutions: bigger model, more layers, more features, better architecture, train longer
  • Overfitting solutions: more data, data augmentation, dropout, L1/L2 regularization, early stopping, simpler model

Getting this diagnosis right focuses your effort. Applying overfitting fixes to an underfitting model (or vice versa) will make things worse.


Common Pitfalls

  • Not checking train error. Without it you can't tell if you're underfitting or overfitting — test error alone is ambiguous.
  • Fixing the wrong problem. Adding regularization to an underfitting model increases bias. Adding capacity to an overfitting model with small data increases variance.
  • Confusing training epochs with model capacity. Running more epochs doesn't make a simple model complex — it just overfits the limited capacity. Capacity is model size; epochs control how hard you optimize.
  • Evaluating on test set during development. If you tune hyperparameters using test error, you'll overfit to the test set. Use a validation set for tuning.

Examples

import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import train_test_split

# True pattern: quadratic
np.random.seed(0)
X = np.linspace(-3, 3, 60).reshape(-1, 1)
y = X.ravel()**2 + np.random.randn(60) * 0.5   # y = x² + noise

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

for degree in [1, 2, 10]:
    model = make_pipeline(PolynomialFeatures(degree), LinearRegression())
    model.fit(X_train, y_train)
    train_err = np.mean((model.predict(X_train) - y_train)**2)
    test_err  = np.mean((model.predict(X_test)  - y_test)**2)
    print(f"degree={degree}: train={train_err:.2f}, test={test_err:.2f}")

# Expected:
# degree=1:  train=4.50, test=4.70   ← underfitting (linear can't learn x²)
# degree=2:  train=0.25, test=0.28   ← good fit (matches true function)
# degree=10: train=0.18, test=1.80   ← overfitting (memorized noise)

Neural network example:

import torch
import torch.nn as nn

# Underfitting: too few neurons
tiny_model = nn.Sequential(nn.Linear(10, 2), nn.ReLU(), nn.Linear(2, 1))

# Overfitting risk: too many neurons with tiny dataset
huge_model = nn.Sequential(
    nn.Linear(10, 512), nn.ReLU(),
    nn.Linear(512, 512), nn.ReLU(),
    nn.Linear(512, 1)
)
# huge_model trained on 50 examples will overfit

# Balanced: reasonable capacity + dropout
balanced_model = nn.Sequential(
    nn.Linear(10, 64), nn.ReLU(), nn.Dropout(0.3),
    nn.Linear(64, 64), nn.ReLU(), nn.Dropout(0.3),
    nn.Linear(64, 1)
)

Review Questions