Overfitting vs Underfitting¶
The primary goal of machine learning is to find the sweet spot between bias (underfitting) and variance (overfitting).
Underfitting (High Bias)¶
The model is too simple to capture the underlying pattern of the data. * Symptom: Bad performance on both the training set and the test set. * Cure: Use a more complex model (e.g., switch from Linear Regression to Random Forest), add more features, or reduce regularisation.
Overfitting (High Variance)¶
The model is too complex and has memorised the training data, including its noise and outliers. * Symptom: Excellent performance on the training set, but terrible performance on the test set. * Cure: Get more data, use a simpler model, use regularisation, or use early stopping.
KSB Mapping¶
| KSB | Description | How This Addresses It |
|---|---|---|
| K4.4 | Resource constraints and trade-offs | Balancing model complexity, performance, and computational cost |
| S1 | Scientific methods and hypothesis testing | Rigorous cross-validation and statistical model comparison |
| S4 | Building models and validating | Systematic hyperparameter tuning and performance evaluation |
| B5 | Impartial, hypothesis-driven approach | Preventing overfitting; honest reporting of generalisation metrics |