Regression vs Classification¶
Every supervised ML problem falls into one of two categories — the distinction is determined entirely by the type of your target variable.
Classification¶
Your target variable is categorical — it belongs to a discrete set of classes.
| Example | Target | Classes |
|---|---|---|
| Spam detection | is_spam |
0 (No), 1 (Yes) |
| Disease diagnosis | condition |
"Healthy", "Diabetic", "Pre-diabetic" |
| Customer segment | tier |
"Gold", "Silver", "Bronze" |
Algorithms: LogisticRegression, DecisionTreeClassifier, RandomForestClassifier, SVC
Metrics: Accuracy, Precision, Recall, F1-Score, ROC-AUC
Regression¶
Your target variable is continuous — it can take any numeric value within a range.
| Example | Target | Range |
|---|---|---|
| House pricing | sale_price |
£50,000 – £2,000,000 |
| Temperature forecasting | temp_celsius |
-30.0 – 50.0 |
| Revenue prediction | monthly_revenue |
£0 – £10,000,000 |
Algorithms: LinearRegression, Ridge, Lasso, RandomForestRegressor, GradientBoostingRegressor
Metrics: MAE, MSE, RMSE, R²
The Decision Rule¶
Ask yourself one question: "Can I meaningfully average two target values?"
- If averaging makes sense (e.g., the average of £200k and £300k is £250k), it is regression.
- If averaging is nonsensical (e.g., the average of "Cat" and "Dog" is meaningless), it is classification.
Common Pitfall
Binary targets encoded as 0/1 are still classification, not regression — even though they look numeric. Use LogisticRegression, not LinearRegression.
KSB Mapping¶
| KSB | Description | How This Addresses It |
|---|---|---|
| K4.1 | Statistical models and methods | Understanding the statistical basis of regression and classification |
| K4.2 | ML and AI techniques | Implementing and comparing supervised learning algorithms |
| K4.4 | Resource constraints and trade-offs | Model complexity vs interpretability; computational cost |
| S1 | Scientific methods and hypothesis testing | Formulating hypotheses and testing with rigorous validation |
| S4 | Building models and validating | Cross-validation, train/test evaluation, performance metrics |
| B5 | Impartial, hypothesis-driven approach | Honest evaluation of model performance and limitations |