Snippet: Time Series Forecasting & Analysis¶

Domain Context¶

Forecasting, anomaly detection, or classification on temporal data. Time adds a structural constraint that most ML best practices must respect.

Temporal split is the only valid split strategy — never use random train/test split
Walkforward validation: train on [0, t], validate on [t, t+h], slide forward
Features must use only past data relative to the prediction point — verify point-in-time correctness
Lag features: always compute relative to the prediction timestamp, not absolute time
If using cross-validation: use TimeSeriesSplit, never KFold

Always check for: missing timestamps, irregular intervals, duplicate timestamps
Imputation strategy must be documented: forward-fill, interpolation, or model-based
Normalize/scale using only training set statistics — never fit on full dataset
Handle timezone and DST transitions explicitly in your pipeline
Log data frequency (hourly, daily, etc.) and calendar effects (holidays, weekends)

Standard temporal features: hour, day-of-week, month, is_holiday, is_weekend
Lag features: recent lags (1, 2, 7, 14, 28) + rolling statistics (mean, std, min, max)
Trend and seasonality decomposition (STL) as preprocessing or features
External regressors: weather, events, economic indicators — document each one
Fourier features for capturing complex seasonality without dummy variables

Baseline: seasonal naive (last period value) — everything must beat this
Statistical models (Prophet, ARIMA): start here for single series with clear patterns
Tree-based (LightGBM + lag features): strong default for multi-series with shared patterns
Deep learning (TFT, PatchTST, TimesFM): justify with scale (many series, complex dependencies)
Always compare multiple approaches — no single model dominates all time series problems

Metrics: MAPE, SMAPE, RMSE, MAE — report at least two to avoid metric gaming
Evaluate across forecast horizons separately (1-step, 7-step, 30-step are different problems)
Break down metrics by segment: high-volume vs. low-volume series behave differently
Visualize actual vs. predicted with confidence intervals — numbers alone hide systematic errors
Measure and report calibration of prediction intervals (e.g., 95% CI should contain 95% of actuals)

Look-ahead bias: accidentally using future information in feature computation
Non-stationarity: model trained on one regime performs poorly when the pattern changes
Concept drift: production model degrades over time — implement drift detection and retraining triggers
Aggregation level mismatch: model trained on daily data applied to hourly predictions
Outlier sensitivity: a few extreme values dominate RMSE — consider robust metrics or outlier handling