Skip to content

Find the Optimal Number of Clusters

Use the Elbow Method and Silhouette Score together to select the right number of clusters for k-Means.

The Elbow Method

Plot the inertia (within-cluster sum of squares) for increasing values of \(k\). The "elbow" — where the rate of decrease sharply levels off — suggests the optimal cluster count.

import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs

X, _ = make_blobs(n_samples=300, centers=4, cluster_std=0.60, random_state=42)

inertias = []
K_range = range(1, 11)

for k in K_range:
    km = KMeans(n_clusters=k, random_state=42, n_init="auto")
    km.fit(X)
    inertias.append(km.inertia_)

plt.figure(figsize=(8, 4))
plt.plot(K_range, inertias, "bo-")
plt.xlabel("Number of Clusters (k)")
plt.ylabel("Inertia")
plt.title("Elbow Method")
plt.tight_layout()
plt.show()

The Silhouette Score

The Silhouette Score measures how similar each point is to its own cluster compared to neighbouring clusters. Values range from -1 (wrong cluster) to +1 (well-clustered).

from sklearn.metrics import silhouette_score

scores = []
K_range = range(2, 11)  # Silhouette needs at least 2 clusters

for k in K_range:
    km = KMeans(n_clusters=k, random_state=42, n_init="auto")
    labels = km.fit_predict(X)
    scores.append(silhouette_score(X, labels))

plt.figure(figsize=(8, 4))
plt.plot(K_range, scores, "ro-")
plt.xlabel("Number of Clusters (k)")
plt.ylabel("Silhouette Score")
plt.title("Silhouette Analysis")
plt.tight_layout()
plt.show()

Common Pitfall

The Elbow Method is subjective — the "elbow" is not always obvious. Always validate with the Silhouette Score and domain knowledge.

KSB Mapping

KSB Description How This Addresses It
K4.2 Advanced analytics and ML techniques Unsupervised learning algorithms for pattern discovery
K4.4 Trade-offs in selecting algorithms Choosing between clustering approaches based on data characteristics
S1 Scientific methods and hypothesis testing Validating cluster quality without ground truth labels
S4 Analysis and models to inform outcomes Using clustering to derive actionable segments
B1 Inquisitive approach Exploring hidden structure in unlabelled data