Metrics Reference¶

A complete reference of evaluation metrics for classification and regression.

Classification Metrics¶

Metric	Formula / Description	When to Use
Accuracy	Correct predictions / Total predictions	Balanced classes only
Precision	TP / (TP + FP)	When false positives are costly (e.g., spam detection)
Recall	TP / (TP + FN)	When false negatives are costly (e.g., disease detection)
F1 Score	2 × (Precision × Recall) / (Precision + Recall)	When you need a balance between precision and recall
ROC AUC	Area under the ROC curve	Comparing classifiers across all thresholds
Log Loss	-mean(y·log(p) + (1-y)·log(1-p))	When you care about probability calibration

Regression Metrics¶

Metric	Formula / Description	When to Use
MAE	Mean of \|actual - predicted\|	Robust to outliers; easy to interpret
MSE	Mean of (actual - predicted)²	Penalises large errors heavily
RMSE	√MSE	Same units as the target variable
R²	1 - (SS_res / SS_tot)	Proportion of variance explained (0–1 for good models)
MAPE	Mean of \|error / actual\| × 100	Percentage error — useful for comparing across scales

Quick Implementation¶

from sklearn.metrics import (
    accuracy_score, precision_score, recall_score, f1_score,
    roc_auc_score, mean_absolute_error, mean_squared_error, r2_score
)
import numpy as np

# Classification
y_true_cls = [1, 0, 1, 1, 0, 1, 0, 0]
y_pred_cls = [1, 0, 1, 0, 0, 1, 1, 0]

print(f"Accuracy:  {accuracy_score(y_true_cls, y_pred_cls):.3f}")
print(f"Precision: {precision_score(y_true_cls, y_pred_cls):.3f}")
print(f"Recall:    {recall_score(y_true_cls, y_pred_cls):.3f}")
print(f"F1:        {f1_score(y_true_cls, y_pred_cls):.3f}")

# Regression
y_true_reg = np.array([3.0, 5.0, 2.5, 7.0])
y_pred_reg = np.array([2.8, 5.2, 2.1, 6.5])

print(f"\nMAE:  {mean_absolute_error(y_true_reg, y_pred_reg):.3f}")
print(f"RMSE: {np.sqrt(mean_squared_error(y_true_reg, y_pred_reg)):.3f}")
print(f"R²:   {r2_score(y_true_reg, y_pred_reg):.3f}")

Common Pitfall

Never use accuracy on imbalanced data. A model that always predicts the majority class achieves high accuracy but is useless. Use F1, Precision/Recall, or ROC AUC instead.

KSB Mapping¶

KSB	Description	How This Addresses It
K4.4	Resource constraints and trade-offs	Balancing model complexity, performance, and computational cost
S1	Scientific methods and hypothesis testing	Rigorous cross-validation and statistical model comparison
S4	Building models and validating	Systematic hyperparameter tuning and performance evaluation
B5	Impartial, hypothesis-driven approach	Preventing overfitting; honest reporting of generalisation metrics