Skip to content

How to Choose the Right \(k\) Value

The \(k\) hyperparameter in k-Nearest Neighbours dictates whether your algorithm overfits (low \(k\)) or underfits (high \(k\)). Choosing the right value is critical.

The Tradeoff

\(k\) Value Behaviour Risk
Low (e.g., 1–3) Highly sensitive to individual data points Overfitting — noisy, jagged decision boundaries
High (e.g., 50+) Over-smoothed, ignores local patterns Underfitting — the model defaults to majority class

The Elbow Method

Sweep a range of \(k\) values using cross-validation and plot accuracy against \(k\). Choose the value where accuracy plateaus — the "elbow" of the curve.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import cross_val_score
from sklearn.datasets import make_classification

X, y = make_classification(n_samples=500, n_features=10, random_state=42)

k_range = range(1, 31)
scores = [cross_val_score(KNeighborsClassifier(n_neighbors=k), X, y, cv=5).mean()
          for k in k_range]

plt.figure(figsize=(8, 4))
plt.plot(k_range, scores, marker="o")
plt.xlabel("k")
plt.ylabel("Cross-Validated Accuracy")
plt.title("Elbow Plot for k Selection")
plt.tight_layout()
plt.show()

Using GridSearchCV

For a more automated approach, let scikit-learn select the best \(k\) for you:

from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import GridSearchCV

grid = GridSearchCV(
    KNeighborsClassifier(),
    param_grid={"n_neighbors": [3, 5, 7, 9, 11, 15, 21]},
    cv=5,
    scoring="accuracy"
)
grid.fit(X, y)
print(f"Best k: {grid.best_params_['n_neighbors']}")
print(f"Best CV Accuracy: {grid.best_score_:.3f}")

Workplace Tip

Always use an odd value for \(k\) in binary classification to avoid tied votes.

KSB Mapping

KSB Description How This Addresses It
K4.2 Advanced ML techniques Tree-based models, ensemble methods, KNN, SVM
K4.4 Trade-offs in selecting algorithms Comparing parametric vs non-parametric approaches
S4 ML and optimisation Hyperparameter tuning, ensemble construction, model selection
B1 Curiosity and creativity Exploring when non-parametric methods outperform parametric ones
B5 Integrity in presenting conclusions Avoiding overfitting; honest reporting of generalisation performance