Skip to content

Data Preparation & Preprocessing

Garbage in, garbage out.

Introduction

Data preparation is the foundation of every successful ML project. In workplace settings, raw data is rarely clean or analysis-ready.

What You Will Learn

  • Identify and handle missing values, duplicates, and outliers
  • Apply appropriate encoding strategies for categorical variables
  • Scale and normalise features for algorithm compatibility
  • Build reproducible preprocessing pipelines with sklearn

Assessment Connection

Section A (Methodology) — the rubric distinguishes "no preprocessing" (0–39%) from "detailed data optimization with justification" (70%+).

Content