Build a Customer Segmentation¶

Apply k-Means to RFM (Recency, Frequency, Monetary) features to group customers into actionable segments.

What Is RFM?¶

RFM analysis scores each customer on three dimensions:

Metric	Meaning
Recency	How recently did they purchase? (lower = better)
Frequency	How often do they purchase? (higher = better)
Monetary	How much do they spend in total? (higher = better)

Implementation¶

import pandas as pd
import numpy as np
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt

# Simulate RFM data
rng = np.random.default_rng(42)
df = pd.DataFrame({
    "recency": rng.integers(1, 365, size=200),
    "frequency": rng.integers(1, 50, size=200),
    "monetary": rng.integers(10, 5000, size=200)
})

# CRITICAL: Scale features before clustering
scaler = StandardScaler()
X_scaled = scaler.fit_transform(df)

# Find optimal k using inertia
inertias = [KMeans(n_clusters=k, random_state=42, n_init="auto")
            .fit(X_scaled).inertia_ for k in range(2, 9)]

plt.plot(range(2, 9), inertias, "bo-")
plt.xlabel("k")
plt.ylabel("Inertia")
plt.title("Elbow Method — Customer Segments")
plt.show()

# Apply final clustering
km = KMeans(n_clusters=4, random_state=42, n_init="auto")
df["segment"] = km.fit_predict(X_scaled)

# Inspect segment profiles
print(df.groupby("segment")[["recency", "frequency", "monetary"]].mean().round(1))

Interpreting Segments¶

After clustering, label each segment based on its RFM profile:

Segment	Recency	Frequency	Monetary	Label
0	Low	High	High	Champions
1	High	Low	Low	At Risk
2	Medium	Medium	Medium	Loyal
3	Low	Low	Low	New Customers

Workplace Tip

Always standardise your RFM features before clustering. Without scaling, monetary values (in thousands) will dominate recency (in days) and frequency (in single digits).

KSB Mapping¶

KSB	Description	How This Addresses It
K4.2	Advanced analytics and ML techniques	Unsupervised learning algorithms for pattern discovery
K4.4	Trade-offs in selecting algorithms	Choosing between clustering approaches based on data characteristics
S1	Scientific methods and hypothesis testing	Validating cluster quality without ground truth labels
S4	Analysis and models to inform outcomes	Using clustering to derive actionable segments
B1	Inquisitive approach	Exploring hidden structure in unlabelled data