Skip to content

Embedded Methods

Filtering evaluates columns blindly. Wrapping evaluates models blindly. Embedded methods compute feature importance during the actual mathematical algorithmic training.

What You Will Learn

  • Define the Embedded Method architecture
  • Retrieve explicit .feature_importances_ coefficients natively from RandomForest
  • Prune decision trees algorithmically to eliminate low-value inputs

Prerequisites

  • Completed Filter Methods and Wrapper Methods
  • Core understanding of Random Forest or Decision Tree topologies

Step 1: The Beauty of the Embedded Workflow

Instead of wrapping a model inside a slow, repetitive loop (RFE), we can run a single model that natively tracks its own decisions.

Whenever a RandomForestClassifier mathematically splits a data node, it calculates how much "Impurity" (Gini Index/Entropy) decreased. Over the construction of 100 parallel trees, it sums those decreases. Features that generated massive, clean splits score highly. Features that failed to split the tree cleanly score ~0.

import pandas as pd
import seaborn as sns
from sklearn.ensemble import RandomForestClassifier

# Using the Penguins dataset with 4 native continuous features
df = sns.load_dataset('penguins').dropna()
X = df[['bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g']]
y = df['species']

# We train exactly 1 model normally
rf = RandomForestClassifier(n_estimators=100, random_state=42)
rf.fit(X, y)

# We request the model to print its own internal analytical diary!
importance_df = pd.DataFrame({
    'Feature': X.columns,
    'Importance': rf.feature_importances_
}).sort_values(by='Importance', ascending=False)

print(importance_df.round(4))
Expected Output
             Feature  Importance
2  flipper_length_mm      0.4190
0     bill_length_mm      0.3957
1      bill_depth_mm      0.1264
3        body_mass_g      0.0588

In ~20 milliseconds, the Random Forest confidently deduced that flipper_length contributed 42% naturally to all global decision splits, while body_mass_g barely managed to contribute 6%. No manual filtering or wrapper loops required!

import matplotlib.pyplot as plt

plt.figure(figsize=(8, 5))
sns.barplot(data=importance_df, x='Importance', y='Feature', color='#D94D26')
plt.title('Random Forest Gini Importance')
plt.tight_layout()
plt.show()
Expected Plot

Random Forest Feature Importance

Step 2: Selecting from Model Output

Once the Forest computes the coefficients internally, we execute SelectFromModel to physically delete the physical columns failing our arbitrary thresholds!

from sklearn.feature_selection import SelectFromModel

# We instruct Scikit-Learn to only keep features that perform better than the "mean" performance
selector = SelectFromModel(estimator=rf, prefit=True, threshold='mean')

# Execute the pruning array reduction
X_pruned = selector.transform(X)

print(f"Original features: {X.shape[1]}")
print(f"Pruned features: {X_pruned.shape[1]}")
Expected Output
Original features: 4
Pruned features: 2

The model calculated the mean importance of all columns was 0.250. Consequently, it dynamically deleted body_mass (0.05) and bill_depth (0.12) from the matrix forever, passing only the top 2 arrays cleanly to production!

Workplace Tip

If your stakeholder demands to know why the ML model denied a customer's loan, RandomForest Feature Importances are explicitly the definitive reporting tool. Visualizing the importance_df as a bar chart in Jira satisfies the C-Suite immediately.

Summary

  • Embedded Methods observe the algorithms organically calculating coefficients mid-training.
  • Models like RandomForest, DecisionTree, and Lasso possess native .feature_importances_ dictionaries.
  • SelectFromModel integrates seamlessly with Pipelines to mechanically enforce deletion parameters mid-stream without crashing.

Next Steps

PCA Dimensionality Reduction — What if deleting columns isn't enough? What if we must mathematically crush 50 columns into 2 dimensions entirely?

Stretch & Challenge

For Advanced Learners

Lasso (L1) Regularisation Shrinkage

Tree-based algorithms use Gini Impurity splits. Linear algorithms (like LogisticRegression) can dynamically eliminate features during training by utilizing the L1 penalty mathematically (also known as Lasso Regularisation).

from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler

# Standardisation is MANDATORY for Lasso mathematics
X_scaled = StandardScaler().fit_transform(X)

# The 'l1' penalty explicitly forces low-signal variables to mathematically equal 0.00
lasso_model = LogisticRegression(penalty='l1', solver='liblinear', C=0.1)
lasso_model.fit(X_scaled, y)

print(f"Coefficients dynamically assigned 0.00: {(lasso_model.coef_ == 0).sum()}")

Unlike standard regressions, Lasso rigorously pushes bad variables identically to exactly zero, fundamentally acting identically to an Embedded Selection filter!

KSB Mapping

KSB Description How This Addresses It
K4.2 Advanced analytics and ML techniques Feature selection algorithms and dimensionality reduction
K5.2 Data formats and structures Encoding categorical variables, handling mixed feature types
S2 Data engineering Creating and transforming features from raw data
S4 Feature selection and ML Applying feature selection methods and PCA
B1 Inquisitive approach Exploring creative feature engineering strategies