Bias vs Variance
Bias and variance are two distinct ways a model can fail. Bias is systematic error — consistently wrong in the same direction. Variance is sensitivity to training data — consistent on seen data, inconsistent on new data.
Intuition First
Imagine you're throwing darts at a target.
- High bias: All your darts land in the same spot — but that spot is far from the bullseye. You're consistently wrong. The systematic error is built into your throwing technique.
- High variance: Your darts are scattered all over the board. You're unpredictable — sometimes close, sometimes way off. You're sensitive to tiny fluctuations in how you throw.
- Low bias, low variance: Darts clustered near the bullseye. This is what you want.
In ML: bias is "consistently wrong"; variance is "inconsistently right."
What's Actually Happening
Bias comes from a model that's too simple to capture the true pattern. It's baked into the model's structure.
Variance comes from a model that's too sensitive — it memorizes the training data, including noise, and fails to generalize.
Both cause bad test-set performance, but for opposite reasons:
- High bias: model can't learn the pattern (underfitting)
- High variance: model learns the noise instead of the pattern (overfitting)
Build the Idea Step-by-Step
Formal Explanation
For a model trained on dataset D to predict target y from input x:
Expected Test Error = Bias² + Variance + Irreducible Noise
Where:
- Bias² = how far off the average prediction is from the true value
- Variance = how much predictions vary across different training sets
- Irreducible noise = randomness in the data itself — can't be fixed by any model
This decomposition reveals that total error has two controllable components that trade off against each other.
As model complexity increases:
| Complexity | Bias | Variance |
|---|---|---|
| Too low (linear for nonlinear data) | High | Low |
| Just right | Low | Low |
| Too high (overfitting) | Low | High |
Key Properties / Rules
| Signal | Likely Cause |
|---|---|
| Train error high AND test error high | High bias (underfitting) |
| Train error low AND test error high | High variance (overfitting) |
| Train error low AND test error low | Good fit |
| Adding more data doesn't help | High bias (model can't use more data) |
| Adding more data helps | High variance (model was overfitting) |
Why It Matters
The bias-variance tradeoff is the core tension in choosing model complexity:
- A linear model applied to data with a nonlinear pattern → high bias
- A 100-layer network trained on 50 examples → high variance
In practice:
- Regularization, dropout, and early stopping reduce variance
- Deeper networks, more features, and better architectures reduce bias
- More data helps most with high variance
Understanding this tradeoff tells you which direction to push when your model isn't performing well.
Common Pitfalls
- Confusing high bias and high variance from test error alone. Always check training error too — that's the diagnostic. High train error + high test error = bias. Low train error + high test error = variance.
- Thinking more data fixes everything. More data reduces variance but doesn't fix a biased model. If the model structure is wrong, no amount of data helps.
- Assuming a complex model is always better. More complexity reduces bias but increases variance — especially when data is limited.
- Tuning on the test set. This collapses the train/test gap artificially, making variance invisible until production.
Examples
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline
# True function: y = sin(x) + noise
np.random.seed(42)
X = np.sort(np.random.rand(30, 1) * 6, axis=0)
y = np.sin(X.ravel()) + np.random.randn(30) * 0.3
X_test = np.linspace(0, 6, 100).reshape(-1, 1)
# High bias: degree-1 (linear) — too simple for sin
model_bias = make_pipeline(PolynomialFeatures(1), LinearRegression())
model_bias.fit(X, y)
# Will have high train AND test error — can't capture the curve
# Good fit: degree-3
model_good = make_pipeline(PolynomialFeatures(3), LinearRegression())
model_good.fit(X, y)
# Low train and test error
# High variance: degree-15 — memorizes noise
model_var = make_pipeline(PolynomialFeatures(15), LinearRegression())
model_var.fit(X, y)
# Low train error, wild oscillations on test set
for name, model in [("degree-1", model_bias), ("degree-3", model_good), ("degree-15", model_var)]:
train_err = np.mean((model.predict(X) - y)**2)
print(f"{name}: train MSE = {train_err:.4f}")
# degree-1: ~0.25 (high bias — even train is bad)
# degree-3: ~0.08 (good fit)
# degree-15: ~0.03 (overfits — train looks great, test is terrible)
Key diagnostic table:
| Model | Train Error | Test Error | Problem |
|---|---|---|---|
| degree-1 | High | High | High bias |
| degree-3 | Low | Low | Ideal |
| degree-15 | Very low | Very high | High variance |