Stationarity & Differencing¶
Most statistical time series models (ARIMA, SARIMA) require stationarity — meaning the mean, variance, and autocorrelation structure do not change over time.
Why Stationarity Matters¶
Non-stationary data has trends, changing variance, or seasonal shifts that violate the assumptions of ARIMA-type models. Fitting these models on non-stationary data produces unreliable forecasts.
Testing for Stationarity¶
Augmented Dickey-Fuller (ADF) Test¶
- Null hypothesis: The series has a unit root (non-stationary).
- If p-value < 0.05: Reject the null → the series is stationary.
- If p-value ≥ 0.05: Fail to reject → the series is non-stationary (needs differencing).
import pandas as pd
import numpy as np
from statsmodels.tsa.stattools import adfuller
# Non-stationary: random walk with drift
rng = np.random.default_rng(42)
ts = pd.Series(rng.normal(0, 1, 200).cumsum())
result = adfuller(ts)
print(f"ADF Statistic: {result[0]:.4f}")
print(f"p-value: {result[1]:.4f}")
# Expected: p-value > 0.05 → non-stationary
Differencing¶
Differencing subtracts each observation from its predecessor, removing trends:
# First difference — removes linear trend
ts_diff1 = ts.diff().dropna()
# Second difference — removes quadratic trend (rarely needed)
ts_diff2 = ts_diff1.diff().dropna()
# Test again
result2 = adfuller(ts_diff1)
print(f"After differencing — p-value: {result2[1]:.4f}")
Visual Check¶
import matplotlib.pyplot as plt
fig, axes = plt.subplots(1, 2, figsize=(12, 4))
ts.plot(ax=axes[0], title="Original (Non-Stationary)")
ts_diff1.plot(ax=axes[1], title="First Difference (Stationary)")
plt.tight_layout()
plt.show()
Common Pitfall
Do not over-difference. If one round of differencing makes the series stationary (\(d = 1\)), stop. Over-differencing introduces artificial patterns and degrades model performance.
KSB Mapping¶
| KSB | Description | How This Addresses It |
|---|---|---|
| K4.1 | Statistical models and methods | ARIMA, SARIMA, and exponential smoothing foundations |
| K4.2 | Predictive analytics and ML techniques | Time series forecasting and model comparison |
| K5.3 | Common patterns in real-world data | Identifying trends, seasonality, and stationarity |
| S1 | Scientific methods and hypothesis testing | Stationarity testing, model diagnostics, forecast validation |
| S4 | Analysis and models to inform outcomes | Building forecasts to support business planning |
| B5 | Impartial, hypothesis-driven approach | Honest evaluation of forecast accuracy and limitations |