Skip to content

Advanced Tree-Based Methods

Beyond standard Random Forests lie specialised architectures designed for speed, isolation detection, or extreme randomisation.

ExtraTrees (Extremely Randomised Trees)

A standard Random Forest searches for the best split at each node. ExtraTrees instead picks split thresholds at random, which is faster and often reduces variance further.

from sklearn.ensemble import ExtraTreesClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import cross_val_score

X, y = make_classification(n_samples=1000, n_features=20, random_state=42)

et = ExtraTreesClassifier(n_estimators=100, random_state=42)
score = cross_val_score(et, X, y, cv=5, scoring="accuracy").mean()
print(f"ExtraTrees CV Accuracy: {score:.3f}")

When to use: When you want faster training than Random Forest with comparable (sometimes better) accuracy.

Isolation Forest (Anomaly Detection)

Isolation Forest detects outliers by building random trees and measuring how quickly each observation is isolated. Anomalies are isolated in fewer splits because they sit in sparse regions of the feature space.

from sklearn.ensemble import IsolationForest
import numpy as np

# Generate data with outliers
rng = np.random.default_rng(42)
X_normal = rng.normal(size=(200, 2))
X_outliers = rng.uniform(low=-4, high=4, size=(20, 2))
X_all = np.vstack([X_normal, X_outliers])

iso = IsolationForest(contamination=0.1, random_state=42)
labels = iso.fit_predict(X_all)  # -1 = outlier, 1 = inlier
print(f"Outliers detected: {(labels == -1).sum()}")

When to use: Fraud detection, sensor anomaly detection, or any scenario where you need to flag unusual observations without labelled data.

Comparison Table

Algorithm Split Strategy Primary Use Speed
RandomForestClassifier Best split from random feature subset General classification/regression Medium
ExtraTreesClassifier Random split thresholds Same as RF, faster training Fast
IsolationForest Random isolation depth Anomaly / outlier detection Fast

Assessment Connection

Demonstrating awareness of specialised tree variants (not just Random Forest) shows breadth of algorithmic knowledge for your EPA.

KSB Mapping

KSB Description How This Addresses It
K4.2 Advanced ML techniques Tree-based models, ensemble methods, KNN, SVM
K4.4 Trade-offs in selecting algorithms Comparing parametric vs non-parametric approaches
S4 ML and optimisation Hyperparameter tuning, ensemble construction, model selection
B1 Curiosity and creativity Exploring when non-parametric methods outperform parametric ones
B5 Integrity in presenting conclusions Avoiding overfitting; honest reporting of generalisation performance