Skip to content

Grid Search & Random Search

Hyperparameter tuning is the process of finding the best configuration for your model. Grid Search and Random Search are the two foundational approaches.

Grid Search (Exhaustive)

GridSearchCV tries every combination in the parameter grid. It is thorough but expensive — a grid with 4 parameters × 5 values each tests 625 combinations.

from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification

X, y = make_classification(n_samples=1000, n_features=20, random_state=42)

param_grid = {
    "n_estimators": [50, 100, 200],
    "max_depth": [3, 5, 10],
    "min_samples_split": [2, 5, 10]
}

grid = GridSearchCV(
    RandomForestClassifier(random_state=42),
    param_grid,
    cv=5,
    scoring="accuracy",
    n_jobs=-1
)
grid.fit(X, y)

print(f"Best params: {grid.best_params_}")
print(f"Best CV Accuracy: {grid.best_score_:.4f}")

Random Search (Efficient)

RandomizedSearchCV samples a fixed number of random combinations from parameter distributions. It is faster and often finds equally good results.

from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import randint, uniform

param_distributions = {
    "n_estimators": randint(50, 500),
    "max_depth": randint(3, 20),
    "min_samples_split": randint(2, 20),
    "min_samples_leaf": randint(1, 10)
}

random_search = RandomizedSearchCV(
    RandomForestClassifier(random_state=42),
    param_distributions,
    n_iter=50,          # Try 50 random combinations
    cv=5,
    scoring="accuracy",
    random_state=42,
    n_jobs=-1
)
random_search.fit(X, y)

print(f"Best params: {random_search.best_params_}")
print(f"Best CV Accuracy: {random_search.best_score_:.4f}")

Grid vs Random

Aspect Grid Search Random Search
Coverage Exhaustive Sampled
Speed Slow for large grids Tuneable via n_iter
Best for Small, focused grids Large parameter spaces
Discovery Only tests specified values Can discover unexpected optima

Workplace Tip

Start with RandomizedSearchCV(n_iter=50) to identify promising regions, then refine with a focused GridSearchCV around the best values.

KSB Mapping

KSB Description How This Addresses It
K4.4 Resource constraints and trade-offs Balancing model complexity, performance, and computational cost
S1 Scientific methods and hypothesis testing Rigorous cross-validation and statistical model comparison
S4 Building models and validating Systematic hyperparameter tuning and performance evaluation
B5 Impartial, hypothesis-driven approach Preventing overfitting; honest reporting of generalisation metrics