Skip to content

Naive Bayes

A fast, scalable classification algorithm based on Bayes' Theorem with a "naive" assumption that all features are conditionally independent given the class.

The Concept

Naive Bayes calculates the probability of each class given the observed feature values using Bayes' Theorem:

\[P(\text{class} \mid \text{features}) = \frac{P(\text{features} \mid \text{class}) \cdot P(\text{class})}{P(\text{features})}\]

The "naive" assumption is that each feature contributes independently to the probability. This is almost never true in reality, yet Naive Bayes still performs surprisingly well — especially on text classification and high-dimensional sparse data.

Variants

Variant Use Case Assumption
GaussianNB Continuous features Features follow a normal distribution per class
MultinomialNB Count data (e.g., word frequencies) Features represent counts or frequencies
BernoulliNB Binary features (0/1) Features are binary indicators

Implementation

from sklearn.naive_bayes import GaussianNB
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

X, y = make_classification(n_samples=1000, n_features=10, random_state=42)
X_tr, X_te, y_tr, y_te = train_test_split(X, y, random_state=42)

nb = GaussianNB()
nb.fit(X_tr, y_tr)
print(classification_report(y_te, nb.predict(X_te)))

Strengths and Weaknesses

Strengths:

  • Extremely fast to train and predict — scales linearly with dataset size.
  • Works well with high-dimensional data (e.g., text with thousands of word features).
  • Handles missing data gracefully (each feature probability is computed independently).

Weaknesses:

  • The independence assumption means it cannot capture feature interactions.
  • Probability estimates are often poorly calibrated (overconfident).
  • Outperformed by tree-based ensembles on most tabular datasets.

Workplace Tip

Use Naive Bayes as a fast baseline. If it achieves 85% accuracy in seconds, you know any more complex model should beat that. It is also the go-to choice for spam filtering and text classification.

KSB Mapping

KSB Description How This Addresses It
K4.2 Advanced ML techniques Tree-based models, ensemble methods, KNN, SVM
K4.4 Trade-offs in selecting algorithms Comparing parametric vs non-parametric approaches
S4 ML and optimisation Hyperparameter tuning, ensemble construction, model selection
B1 Curiosity and creativity Exploring when non-parametric methods outperform parametric ones
B5 Integrity in presenting conclusions Avoiding overfitting; honest reporting of generalisation performance