Advanced Tree-Based Methods¶
Beyond standard Random Forests lie specialised architectures designed for speed, isolation detection, or extreme randomisation.
ExtraTrees (Extremely Randomised Trees)¶
A standard Random Forest searches for the best split at each node. ExtraTrees instead picks split thresholds at random, which is faster and often reduces variance further.
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import cross_val_score
X, y = make_classification(n_samples=1000, n_features=20, random_state=42)
et = ExtraTreesClassifier(n_estimators=100, random_state=42)
score = cross_val_score(et, X, y, cv=5, scoring="accuracy").mean()
print(f"ExtraTrees CV Accuracy: {score:.3f}")
When to use: When you want faster training than Random Forest with comparable (sometimes better) accuracy.
Isolation Forest (Anomaly Detection)¶
Isolation Forest detects outliers by building random trees and measuring how quickly each observation is isolated. Anomalies are isolated in fewer splits because they sit in sparse regions of the feature space.
from sklearn.ensemble import IsolationForest
import numpy as np
# Generate data with outliers
rng = np.random.default_rng(42)
X_normal = rng.normal(size=(200, 2))
X_outliers = rng.uniform(low=-4, high=4, size=(20, 2))
X_all = np.vstack([X_normal, X_outliers])
iso = IsolationForest(contamination=0.1, random_state=42)
labels = iso.fit_predict(X_all) # -1 = outlier, 1 = inlier
print(f"Outliers detected: {(labels == -1).sum()}")
When to use: Fraud detection, sensor anomaly detection, or any scenario where you need to flag unusual observations without labelled data.
Comparison Table¶
| Algorithm | Split Strategy | Primary Use | Speed |
|---|---|---|---|
RandomForestClassifier |
Best split from random feature subset | General classification/regression | Medium |
ExtraTreesClassifier |
Random split thresholds | Same as RF, faster training | Fast |
IsolationForest |
Random isolation depth | Anomaly / outlier detection | Fast |
Assessment Connection
Demonstrating awareness of specialised tree variants (not just Random Forest) shows breadth of algorithmic knowledge for your EPA.
KSB Mapping¶
| KSB | Description | How This Addresses It |
|---|---|---|
| K4.2 | Advanced ML techniques | Tree-based models, ensemble methods, KNN, SVM |
| K4.4 | Trade-offs in selecting algorithms | Comparing parametric vs non-parametric approaches |
| S4 | ML and optimisation | Hyperparameter tuning, ensemble construction, model selection |
| B1 | Curiosity and creativity | Exploring when non-parametric methods outperform parametric ones |
| B5 | Integrity in presenting conclusions | Avoiding overfitting; honest reporting of generalisation performance |