Data Preparation & Preprocessing¶
Garbage in, garbage out.
Introduction¶
Data preparation is the foundation of every successful ML project. In workplace settings, raw data is rarely clean or analysis-ready.
What You Will Learn¶
- Identify and handle missing values, duplicates, and outliers
- Apply appropriate encoding strategies for categorical variables
- Scale and normalise features for algorithm compatibility
- Build reproducible preprocessing pipelines with sklearn
Assessment Connection¶
Section A (Methodology) — the rubric distinguishes "no preprocessing" (0–39%) from "detailed data optimization with justification" (70%+).