Naive Bayes¶
A fast, scalable classification algorithm based on Bayes' Theorem with a "naive" assumption that all features are conditionally independent given the class.
The Concept¶
Naive Bayes calculates the probability of each class given the observed feature values using Bayes' Theorem:
\[P(\text{class} \mid \text{features}) = \frac{P(\text{features} \mid \text{class}) \cdot P(\text{class})}{P(\text{features})}\]
The "naive" assumption is that each feature contributes independently to the probability. This is almost never true in reality, yet Naive Bayes still performs surprisingly well — especially on text classification and high-dimensional sparse data.
Variants¶
| Variant | Use Case | Assumption |
|---|---|---|
GaussianNB |
Continuous features | Features follow a normal distribution per class |
MultinomialNB |
Count data (e.g., word frequencies) | Features represent counts or frequencies |
BernoulliNB |
Binary features (0/1) | Features are binary indicators |
Implementation¶
from sklearn.naive_bayes import GaussianNB
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
X, y = make_classification(n_samples=1000, n_features=10, random_state=42)
X_tr, X_te, y_tr, y_te = train_test_split(X, y, random_state=42)
nb = GaussianNB()
nb.fit(X_tr, y_tr)
print(classification_report(y_te, nb.predict(X_te)))
Strengths and Weaknesses¶
Strengths:
- Extremely fast to train and predict — scales linearly with dataset size.
- Works well with high-dimensional data (e.g., text with thousands of word features).
- Handles missing data gracefully (each feature probability is computed independently).
Weaknesses:
- The independence assumption means it cannot capture feature interactions.
- Probability estimates are often poorly calibrated (overconfident).
- Outperformed by tree-based ensembles on most tabular datasets.
Workplace Tip
Use Naive Bayes as a fast baseline. If it achieves 85% accuracy in seconds, you know any more complex model should beat that. It is also the go-to choice for spam filtering and text classification.
KSB Mapping¶
| KSB | Description | How This Addresses It |
|---|---|---|
| K4.2 | Advanced ML techniques | Tree-based models, ensemble methods, KNN, SVM |
| K4.4 | Trade-offs in selecting algorithms | Comparing parametric vs non-parametric approaches |
| S4 | ML and optimisation | Hyperparameter tuning, ensemble construction, model selection |
| B1 | Curiosity and creativity | Exploring when non-parametric methods outperform parametric ones |
| B5 | Integrity in presenting conclusions | Avoiding overfitting; honest reporting of generalisation performance |