Grid Search & Random Search¶

Hyperparameter tuning is the process of finding the best configuration for your model. Grid Search and Random Search are the two foundational approaches.

Grid Search (Exhaustive)¶

GridSearchCV tries every combination in the parameter grid. It is thorough but expensive — a grid with 4 parameters × 5 values each tests 625 combinations.

from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification

X, y = make_classification(n_samples=1000, n_features=20, random_state=42)

param_grid = {
    "n_estimators": [50, 100, 200],
    "max_depth": [3, 5, 10],
    "min_samples_split": [2, 5, 10]
}

grid = GridSearchCV(
    RandomForestClassifier(random_state=42),
    param_grid,
    cv=5,
    scoring="accuracy",
    n_jobs=-1
)
grid.fit(X, y)

print(f"Best params: {grid.best_params_}")
print(f"Best CV Accuracy: {grid.best_score_:.4f}")

Random Search (Efficient)¶

RandomizedSearchCV samples a fixed number of random combinations from parameter distributions. It is faster and often finds equally good results.

from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import randint, uniform

param_distributions = {
    "n_estimators": randint(50, 500),
    "max_depth": randint(3, 20),
    "min_samples_split": randint(2, 20),
    "min_samples_leaf": randint(1, 10)
}

random_search = RandomizedSearchCV(
    RandomForestClassifier(random_state=42),
    param_distributions,
    n_iter=50,          # Try 50 random combinations
    cv=5,
    scoring="accuracy",
    random_state=42,
    n_jobs=-1
)
random_search.fit(X, y)

print(f"Best params: {random_search.best_params_}")
print(f"Best CV Accuracy: {random_search.best_score_:.4f}")

Grid vs Random¶

Aspect	Grid Search	Random Search
Coverage	Exhaustive	Sampled
Speed	Slow for large grids	Tuneable via `n_iter`
Best for	Small, focused grids	Large parameter spaces
Discovery	Only tests specified values	Can discover unexpected optima

Workplace Tip

Start with RandomizedSearchCV(n_iter=50) to identify promising regions, then refine with a focused GridSearchCV around the best values.

KSB Mapping¶

KSB	Description	How This Addresses It
K4.4	Resource constraints and trade-offs	Balancing model complexity, performance, and computational cost
S1	Scientific methods and hypothesis testing	Rigorous cross-validation and statistical model comparison
S4	Building models and validating	Systematic hyperparameter tuning and performance evaluation
B5	Impartial, hypothesis-driven approach	Preventing overfitting; honest reporting of generalisation metrics