Optuna — Bayesian Hyperparameter Optimisation¶

Optuna is a modern, Bayesian optimisation framework that intelligently searches the hyperparameter space — far more efficient than Grid or Random Search.

Why Optuna?¶

Bayesian optimisation: Learns from previous trials to focus on promising regions.
Pruning: Automatically stops unpromising trials early.
Flexible: Works with any ML framework (scikit-learn, XGBoost, LightGBM, PyTorch).

Installation¶

pip install optuna

Implementation¶

import optuna
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score
from sklearn.datasets import make_classification

X, y = make_classification(n_samples=1000, n_features=20, random_state=42)


def objective(trial):
    """Optuna objective function — returns the metric to optimise."""
    params = {
        "n_estimators": trial.suggest_int("n_estimators", 50, 500),
        "max_depth": trial.suggest_int("max_depth", 3, 20),
        "min_samples_split": trial.suggest_int("min_samples_split", 2, 20),
        "min_samples_leaf": trial.suggest_int("min_samples_leaf", 1, 10),
    }
    model = RandomForestClassifier(**params, random_state=42)
    score = cross_val_score(model, X, y, cv=5, scoring="accuracy").mean()
    return score


# Run optimisation
study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=50, show_progress_bar=True)

print(f"Best Accuracy: {study.best_value:.4f}")
print(f"Best Params: {study.best_params}")

With XGBoost¶

from xgboost import XGBClassifier


def xgb_objective(trial):
    """Optimise XGBoost hyperparameters."""
    params = {
        "n_estimators": trial.suggest_int("n_estimators", 100, 1000),
        "learning_rate": trial.suggest_float("learning_rate", 0.01, 0.3, log=True),
        "max_depth": trial.suggest_int("max_depth", 3, 10),
        "subsample": trial.suggest_float("subsample", 0.5, 1.0),
        "colsample_bytree": trial.suggest_float("colsample_bytree", 0.5, 1.0),
    }
    model = XGBClassifier(**params, random_state=42, eval_metric="logloss")
    score = cross_val_score(model, X, y, cv=5, scoring="accuracy").mean()
    return score


study = optuna.create_study(direction="maximize")
study.optimize(xgb_objective, n_trials=50)

Visualisation¶

# Plot optimisation history
optuna.visualization.plot_optimization_history(study)

# Plot parameter importance
optuna.visualization.plot_param_importances(study)

Workplace Tip

Optuna with 50–100 trials typically outperforms GridSearchCV with thousands of combinations. Use it as your default tuning strategy for production models.

KSB Mapping¶

KSB	Description	How This Addresses It
K4.4	Resource constraints and trade-offs	Balancing model complexity, performance, and computational cost
S1	Scientific methods and hypothesis testing	Rigorous cross-validation and statistical model comparison
S4	Building models and validating	Systematic hyperparameter tuning and performance evaluation
B5	Impartial, hypothesis-driven approach	Preventing overfitting; honest reporting of generalisation metrics