Scikit-Learn Model API Structure¶

The genius of scikit-learn is that every algorithm — from Linear Regression to Neural Networks — follows the exact same 4-step API.

The Core 4 Commands¶

Regardless of whether you deploy a LinearRegression or a MLPClassifier, the code structure remains identical.

1. Instantiate¶

Create the model object in memory with your chosen hyperparameters.

from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier(n_estimators=100, random_state=42)

2. Fit¶

Pass your training data so the algorithm learns patterns.

model.fit(X_train, y_train)

3. Predict¶

Generate predictions on new, unseen data.

predictions = model.predict(X_test)

4. Score¶

Extract the default evaluation metric (accuracy for classifiers, \(R^2\) for regressors).

accuracy = model.score(X_test, y_test)

Special Extension Methods¶

Method	Available On	Returns
`predict_proba()`	Classification models	Probability per class, e.g., `[0.12, 0.88]`
`decision_function()`	SVM, Linear models	Raw distance from the decision boundary
`transform()`	Preprocessors, PCA	Transformed feature matrix
`fit_transform()`	Preprocessors, PCA	Fit + transform in a single call

Pipeline Integration¶

You can chain preprocessing and modelling into a single Pipeline object that still follows the same fit / predict / score API:

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

pipe = Pipeline([
    ("scaler", StandardScaler()),
    ("model", RandomForestClassifier(random_state=42))
])

pipe.fit(X_train, y_train)
print(f"Accuracy: {pipe.score(X_test, y_test):.2f}")

Workplace Tip

Always wrap your workflow in a Pipeline. It prevents data leakage (the scaler is fitted only on training data) and makes your code reproducible and deployable.

KSB Mapping¶

KSB	Description	How This Addresses It
K4.1	Statistical models and methods	Understanding the statistical basis of regression and classification
K4.2	ML and AI techniques	Implementing and comparing supervised learning algorithms
K4.4	Resource constraints and trade-offs	Model complexity vs interpretability; computational cost
S1	Scientific methods and hypothesis testing	Formulating hypotheses and testing with rigorous validation
S4	Building models and validating	Cross-validation, train/test evaluation, performance metrics
B5	Impartial, hypothesis-driven approach	Honest evaluation of model performance and limitations