Skip to content

ML vs Statistical Forecasting

You can use XGBoost for time series, but it's fundamentally different from ARIMA.

Statistical Models (ARIMA, SARIMA)

  • How they work: They explicitly mode time, autocorrelation (lags), and seasonality.
  • Pros: Highly interpretable, very strong on small datasets, built-in confidence intervals.
  • Cons: Require strict assumptions (stationarity), struggle with many exogenous (external) variables, can't naturally train across multiple different time series at once.

Machine Learning Models (XGBoost, LSTMs)

  • How they work: You have to extract time features (e.g., "is_weekend", "month_number", "lag_1", "lag_7") and feed them in as standard tabular machine learning.
  • Pros: Can easily consume hundreds of external features (weather, price, holidays), often win mapping non-linear combinations of features.
  • Cons: No built-in understanding of time (they just see rows of data), they cannot extrapolate trends (a tree can never predict a value higher than it saw in training).

KSB Mapping

KSB Description How This Addresses It
K4.1 Statistical models and methods ARIMA, SARIMA, and exponential smoothing foundations
K4.2 Predictive analytics and ML techniques Time series forecasting and model comparison
K5.3 Common patterns in real-world data Identifying trends, seasonality, and stationarity
S1 Scientific methods and hypothesis testing Stationarity testing, model diagnostics, forecast validation
S4 Analysis and models to inform outcomes Building forecasts to support business planning
B5 Impartial, hypothesis-driven approach Honest evaluation of forecast accuracy and limitations