Common Engineered Features Cheat Sheet¶
Need inspiration for feature engineering? Here are the most common high-signal mathematical transformations utilized universally by Data Scientists.
Temporal Features (Datetime)¶
Derived purely mechanically from strictly one Timestamp column.
| Feature Type | How to Create | Why it Matters |
|---|---|---|
| Cyclical Units | df['date'].dt.month |
Identifies human seasonal behavior geometrically |
| Duration/Elapsed | (today - df['date']).dt.days |
Absolute chronologies (e.g., "Days Since Last Login") |
| Time of Day | df['date'].dt.hour |
High predictive signal logically for user engagement bursts |
| Is Weekend | df['date'].dt.dayofweek >= 5 |
Binary switch tracking structural behavior breaks |
Mathematical Ratios¶
Constructed structurally by mechanically dividing two continuous variables algorithmically.
| Feature Type | How to Create | Why it Matters |
|---|---|---|
| Per Capita | df['GDP'] / df['population'] |
Scales massive absolute volume into comparative individual density |
| Percentage Change | df['q4_sales'] / df['q3_sales'] |
Identifies acceleration vectors physically independently of scale |
| Proportions | df['bedrooms'] / df['total_rooms'] |
Uncovers geometric property layouts irrespective of house dimension |
Domain Boundaries (Binning)¶
Compressing structurally dispersed variance algebraically into logical categorical intervals explicitly.
| Feature Type | How to Create | Why it Matters |
|---|---|---|
| Generational Cohorts | pd.cut(df['age'], bins=[0, 18, 35, 65]) |
Forces predictive algorithms mechanically to respect sociological reality natively |
| Pricing Tiers | pd.qcut(df['price'], q=4) |
Divides natively continuous sales equally mechanically into "Budget, Mid, Premium, Luxury" quartiles |
String Metadata (Natural Language)¶
Bypassing extreme computationally intense NLP vectorization by aggressively tracking structural text dimensions recursively.
| Feature Type | How to Create | Why it Matters |
|---|---|---|
| Text Length | df['text'].str.len() |
Longer reviews geometrically explicitly correlate natively with anger or passion structurally |
| Word Count | df['text'].apply(lambda x: len(str(x).split())) |
Measures density over length structurally |
| Title Presence | df['name'].str.contains('Dr.') |
Extracts binary categorical demographic data deeply embedded natively within raw input arrays |
Workplace Tip
Whenever a Dataset physically lacks robust initial continuous independent variables structurally, deriving "Elapsed Duration" natively and "Mathematical Proportions" geometrically generates exactly the high-signal predictive tensors that Machine Learning classifiers implicitly crave mechanically!