Snippet: Tabular ML / Traditional Machine Learning¶

Domain Context¶

Structured/tabular data modeling: classification, regression, ranking on feature-engineered datasets. Feature engineering and data quality drive most of the performance gains — not model complexity.

Feature Engineering¶

Start with simple features, add complexity only when baselines are established
Encode categorical variables properly: label encoding for tree models, target/one-hot for linear
Handle missing values explicitly: document the imputation strategy and why
Create interaction features only when domain knowledge supports them
Time-based features: always use point-in-time correctness — no future leakage
Log all feature transformations for reproducibility (use sklearn Pipeline or equivalent)

Model Selection¶

Always start with a strong baseline: logistic regression / linear regression → then gradient boosting
Tree-based models (XGBoost, LightGBM, CatBoost) are default for tabular — justify using anything else
Neural networks on tabular data: only when >100K rows AND non-linear interactions are proven
Ensemble only if marginal gain justifies the complexity — document the improvement

Cross-Validation¶

Use stratified K-fold (K=5) for classification; standard K-fold for regression
For time-dependent data: time-series split only — never random shuffle
For grouped data (e.g., per-user): group K-fold — same group never in both train and val
Report mean ± std across folds — a single fold result is not reliable
Nested CV for hyperparameter tuning: inner loop tunes, outer loop evaluates

Hyperparameter Tuning¶

Use Optuna or similar Bayesian optimization — avoid grid search on large spaces
Define the search space based on domain knowledge, not arbitrary ranges
Budget: 50-100 trials for tree models; fewer for expensive models
Always compare tuned model against default hyperparameters — report the delta

Explainability¶

SHAP values: mandatory for any model going to production or stakeholder review
Feature importance: compute and log for every trained model
Partial dependence plots for top-5 features — sanity check against domain knowledge
If top features don't make domain sense, investigate data issues before celebrating metrics

Evaluation¶

Classification: report precision, recall, F1, AUC-ROC; confusion matrix for multi-class
Regression: report RMSE, MAE, R²; plot predicted vs. actual
Always evaluate on holdout test set after all tuning is done (never peek during tuning)
Segment-level evaluation: check performance across key demographic/business segments

Common Pitfalls¶

Target leakage from features computed using the target variable
Class imbalance: random accuracy baseline is misleading — use appropriate metrics
High cardinality categoricals: naive one-hot encoding causes memory explosion
Train/test distribution mismatch: validate feature distributions are consistent across splits