MnemosyneMnemosyne

Bias vs Variance

Bias and variance are two distinct ways a model can fail. Bias is systematic error — consistently wrong in the same direction. Variance is sensitivity to training data — consistent on seen data, inconsistent on new data.

Intuition First

Imagine you're throwing darts at a target.

  • High bias: All your darts land in the same spot — but that spot is far from the bullseye. You're consistently wrong. The systematic error is built into your throwing technique.
  • High variance: Your darts are scattered all over the board. You're unpredictable — sometimes close, sometimes way off. You're sensitive to tiny fluctuations in how you throw.
  • Low bias, low variance: Darts clustered near the bullseye. This is what you want.

In ML: bias is "consistently wrong"; variance is "inconsistently right."


What's Actually Happening

Bias comes from a model that's too simple to capture the true pattern. It's baked into the model's structure.

Variance comes from a model that's too sensitive — it memorizes the training data, including noise, and fails to generalize.

Both cause bad test-set performance, but for opposite reasons:

  • High bias: model can't learn the pattern (underfitting)
  • High variance: model learns the noise instead of the pattern (overfitting)

Build the Idea Step-by-Step

Collect training data (with noise)
Train model on data
High bias: error comes from wrong assumptions
High variance: error comes from sensitivity to training data
Total error = Bias² + Variance + Irreducible noise
Goal: find the sweet spot in model complexity

Formal Explanation

For a model trained on dataset D to predict target y from input x:

Expected Test Error = Bias² + Variance + Irreducible Noise

Where:

  • Bias² = how far off the average prediction is from the true value
  • Variance = how much predictions vary across different training sets
  • Irreducible noise = randomness in the data itself — can't be fixed by any model

This decomposition reveals that total error has two controllable components that trade off against each other.

As model complexity increases:

ComplexityBiasVariance
Too low (linear for nonlinear data)HighLow
Just rightLowLow
Too high (overfitting)LowHigh

Key Properties / Rules

SignalLikely Cause
Train error high AND test error highHigh bias (underfitting)
Train error low AND test error highHigh variance (overfitting)
Train error low AND test error lowGood fit
Adding more data doesn't helpHigh bias (model can't use more data)
Adding more data helpsHigh variance (model was overfitting)

Why It Matters

The bias-variance tradeoff is the core tension in choosing model complexity:

  • A linear model applied to data with a nonlinear pattern → high bias
  • A 100-layer network trained on 50 examples → high variance

In practice:

  • Regularization, dropout, and early stopping reduce variance
  • Deeper networks, more features, and better architectures reduce bias
  • More data helps most with high variance

Understanding this tradeoff tells you which direction to push when your model isn't performing well.


Common Pitfalls

  • Confusing high bias and high variance from test error alone. Always check training error too — that's the diagnostic. High train error + high test error = bias. Low train error + high test error = variance.
  • Thinking more data fixes everything. More data reduces variance but doesn't fix a biased model. If the model structure is wrong, no amount of data helps.
  • Assuming a complex model is always better. More complexity reduces bias but increases variance — especially when data is limited.
  • Tuning on the test set. This collapses the train/test gap artificially, making variance invisible until production.

Examples

import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline

# True function: y = sin(x) + noise
np.random.seed(42)
X = np.sort(np.random.rand(30, 1) * 6, axis=0)
y = np.sin(X.ravel()) + np.random.randn(30) * 0.3

X_test = np.linspace(0, 6, 100).reshape(-1, 1)

# High bias: degree-1 (linear) — too simple for sin
model_bias = make_pipeline(PolynomialFeatures(1), LinearRegression())
model_bias.fit(X, y)
# Will have high train AND test error — can't capture the curve

# Good fit: degree-3
model_good = make_pipeline(PolynomialFeatures(3), LinearRegression())
model_good.fit(X, y)
# Low train and test error

# High variance: degree-15 — memorizes noise
model_var = make_pipeline(PolynomialFeatures(15), LinearRegression())
model_var.fit(X, y)
# Low train error, wild oscillations on test set

for name, model in [("degree-1", model_bias), ("degree-3", model_good), ("degree-15", model_var)]:
    train_err = np.mean((model.predict(X) - y)**2)
    print(f"{name}: train MSE = {train_err:.4f}")
    # degree-1:  ~0.25 (high bias — even train is bad)
    # degree-3:  ~0.08 (good fit)
    # degree-15: ~0.03 (overfits — train looks great, test is terrible)

Key diagnostic table:

ModelTrain ErrorTest ErrorProblem
degree-1HighHighHigh bias
degree-3LowLowIdeal
degree-15Very lowVery highHigh variance

Review Questions