SVM and Kernel Methods¶

The true power of Support Vector Machines lies not in linear separation, but in the kernel trick — projecting data into a higher-dimensional space where a linear boundary becomes possible.

How SVMs Work¶

An SVM finds the hyperplane that maximises the margin — the distance between the decision boundary and the nearest data points from each class (the "support vectors").

Linear SVM: Draws a straight line (or hyperplane) between classes. Works only when classes are linearly separable.
Kernel SVM: Implicitly maps data into a higher-dimensional space where a linear separator exists, without ever computing the transformation explicitly.

The Kernel Trick¶

Instead of transforming your data into a high-dimensional space (which would be computationally expensive), the kernel trick computes the dot product in that space directly. This gives you the power of non-linear boundaries at a fraction of the cost.

Implementation¶

import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import SVC
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

X, y = make_moons(n_samples=300, noise=0.2, random_state=42)
X_tr, X_te, y_tr, y_te = train_test_split(X, y, random_state=42)

# Compare linear vs RBF kernel
for kernel in ["linear", "rbf"]:
    svc = SVC(kernel=kernel, random_state=42)
    svc.fit(X_tr, y_tr)
    acc = accuracy_score(y_te, svc.predict(X_te))
    print(f"{kernel:>6s} kernel accuracy: {acc:.2f}")

Visualising the Decision Boundary¶

xx, yy = np.meshgrid(
    np.linspace(X[:, 0].min() - 0.5, X[:, 0].max() + 0.5, 200),
    np.linspace(X[:, 1].min() - 0.5, X[:, 1].max() + 0.5, 200)
)

svc_rbf = SVC(kernel="rbf", random_state=42).fit(X_tr, y_tr)
Z = svc_rbf.predict(np.c_[xx.ravel(), yy.ravel()]).reshape(xx.shape)

plt.figure(figsize=(8, 5))
plt.contourf(xx, yy, Z, alpha=0.3, cmap="coolwarm")
plt.scatter(X[:, 0], X[:, 1], c=y, cmap="coolwarm", edgecolor="k", s=30)
plt.title("SVM with RBF Kernel — Non-Linear Boundary")
plt.tight_layout()
plt.show()

Key Hyperparameters¶

Parameter	Effect
`C`	Regularisation strength — higher values fit training data more tightly
`gamma`	RBF reach — higher values create tighter, more localised boundaries
`kernel`	`"linear"`, `"rbf"`, `"poly"`, `"sigmoid"`

Common Pitfall

SVMs are sensitive to feature scaling. Always standardise your features with StandardScaler before fitting.

KSB Mapping¶

KSB	Description	How This Addresses It
K4.2	Advanced ML techniques	Tree-based models, ensemble methods, KNN, SVM
K4.4	Trade-offs in selecting algorithms	Comparing parametric vs non-parametric approaches
S4	ML and optimisation	Hyperparameter tuning, ensemble construction, model selection
B1	Curiosity and creativity	Exploring when non-parametric methods outperform parametric ones
B5	Integrity in presenting conclusions	Avoiding overfitting; honest reporting of generalisation performance