by Chee Yee Lim

Posted on 2021-05-17

Collection of notes on time series analysis (statistical point of view).

- A time series is a set of values observed sequentially through time.
- The next value in a time series is dependent on the previous values.
- The series may be denoted by \( X_1, X_2, ..., X_t \) where \( t \) refers to the time period and \( X \) refers to the value.
- If future values of \( X \) are exactly determined by a methematical formula, the series is said to be deterministic.
- If future values of \( X \) are described only by their probability distribution, the series is said to be a statistical or stochastic process.
- A special class of stochastic processes is a stationary stochastic process, which is required for Box-Jenkins ARIMA models.

- A time series can be made up of 3 key components.
- Trend - a long term increase or decrease.
- Seasonality - an effect of seasonal factors for a fixed or known period.
- Cycle - a periodic cycle that is not of fixed or known period.

- When analysing time series, it is important to check autocorrelation among the data points.
- This can be done by performing the autocorrelation function (ACF) plot analysis and partial autocorrelation function (PACF) plot analysis.
- ACF shows a bar chart between correlation coefficients and lags. PACF shows a bar chart between partial correlation coefficients and lags.
- It is a primary method for finding optimal time steps for ARIMA type of models.

- When generating training and test sets, it is important to separate data points in chunks and not completely randomly like in non-time series data.

- Independence is a general concept, while stationarity is a special case of independence.
- The key difference with a time series data is that each data point is not independent, i.e. the value of current data point depends on the values of previous data point.
- In the case of non-stationarity, time steps correlate with mean and/or variance.

- A statistical process is stationary if the probability distribution is the same for all starting values of \( t \).
- This implies that the mean and variance are constant for all values of \( t \).
- A series that exhibits a simple trend is not stationary because the values of the series depend on \( t \).

- A stationary stochastic process is completely defined by its mean, variance and autocorrelation function.
- Stationarity is a required assumption for ARIMA models, but not necessarily for other models.

- Refer to Data Preprocessing

- Average/mean method
- All future values are equal to the average of the historical data.
- \( \hat{y}_{T+h} = \bar{y} = \frac{( y_1 + ... + y_T )}{T} \)

- Prediction interval for \( h \)-step:
- \( \hat{\sigma}_h = \hat{\sigma} \sqrt{1 + \frac{1}{T}} \), where \( \hat{\sigma} \) is the residual standard deviation.

- All future values are equal to the average of the historical data.
- Naive method
- All forecasts are set to be the value of the last observation.
- \( \hat{y}_{T+h} = y_T \)

- This method works remarkably well for economic and financial time series.
- Because a naive forecast is optimal when data follow a random walk, it is also called a random walk forecast.
- Prediction interval for \( h \)-step:
- \( \hat{\sigma}_h = \hat{\sigma} \sqrt{h} \), where \( \hat{\sigma} \) is the residual standard deviation.

- All forecasts are set to be the value of the last observation.
- Seasonal naive method (snaive)
- Each forecast is set to be the value of the last observed value from the same season.
- \( \hat{y}
*{T+h} = y*\), where \( m \) is the seasonal period, \( k \) is the integer part of \( \frac{(h-1)}{m} \) (i.e. the number of complete years in the forecast period prior to time \( T + h \)).

- \( \hat{y}
- This method is useful for highly seasonal data.
- Prediction interval for \( h \)-step:
- \( \hat{\sigma}_h = \hat{\sigma} \sqrt{k + 1} \), where \( \hat{\sigma} \) is the residual standard deviation and \( k \) is the integer part of \( \frac{(h-1)}{m} \)

- Each forecast is set to be the value of the last observed value from the same season.
- Drift method
- Each forecast is obtained by extrapolating from a line fitted on the first and last observations.
- \( \hat{y}_{T+h} \)
- \( = y_T + \frac{h}{T-1} \sum_{t=2}^{T} (y_t - y_{t-1}) \)
- \( = y_T + h ( \frac{y_T - y_1}{T - 1} ) \)

- Prediction interval for \( h \)-step:
- \( \hat{\sigma}_h = \hat{\sigma} \sqrt{ h(1 + \frac{h}{T}) } \), where \( \hat{\sigma} \) is the residual standard deviation.

- Each forecast is obtained by extrapolating from a line fitted on the first and last observations.
- Regular linear model (no time lag between predictors and response)
- A regular linear model can be used with time series data, in which \( x_t \) observed at time point \( t \) is used to predict \( y_t \) observed at time point \( t \).
- \( y_t = \beta_0 + \beta_1 x_{1,t} + \beta_2 x_{2,t} + ... + \beta_k x_{k,t} + \epsilon_t \)
- i.e. predictors at time \( t \) are used to predict response at time \( t \).

- Using the no time lag approach, forecast values for \( x_{n,t} \) are needed to predict \( y_t \).
- For scenario-based forecast, this approach is very useful where there are multiple sets of forecast values for \( x_{n,t} \) derived based on assumptions.
- E.g. under best case scenario, we can assume +1% income growth rate and +0.5% savings growth rate for predicting change in consumption. Under worst case scenario, we can assume -1% income decline rate and -0.5% savings decline rate for predicting change in consumption.
- However, the no time lag approach is not suitable for forecasting into the future without assumed values for \( x_{n,t} \).
- Lagged values are required in that case. For example to predict \( y_{t+h} \) with \( x_{n,t} \), \( y_{t+h} = \beta_0 + \beta_1 x_{1,t} + \beta_2 x_{2,t} + ... + \beta_k x_{k,t} + \epsilon_{t+h} \) where \( h = 1,2,... \)

- Assumptions on this approach:
- Linear relationship between target and predictor variables.
- Errors have zero mean; otherwise forecasts are systematically biased.
- Errors are not autocorrelated; otherwise forecasts are systematically biased.
- Errors are unrelated to predictor variables; otherwise there would be more information that should be included in the systematic part of the model.
- Useful to have the errors normally distributed with a constant variance, in order to easily produce prediction intervals.

- The model can be fitted with least squares estimation, by minimising the following equation:
- \( \sum_{t=1}^T \epsilon_{t}^2 = \sum_{t=1}^T ( y_t - \beta_0 - \beta_1 x_{1,t} - \beta_2 x_{2,t} - ... - \beta_k x_{k,t} )^2 \)

- The goodness-of-fit for the model can be evaluated via:
- Coefficient of determination, \( R^2 \)
- Check Model Evaluation

- Residual standard error (i.e. also useful for prediction interval calculation)
- Check Model Evaluation

- ACF plot of residuals
- It is common to find autocorrelation in the residuals from a model fitted to time series data.
- This will violate the assumption of no autocorrelation in the errors of our model, and our forecasts may be inefficient - there is some information left over which should be accounted for in the model to obtain better forecasts.
- The forecasts from a model with autocorrelated errors are still unbiased, and so are not "wrong", but they will usually have larger prediction intervals than they need to.

- Coefficient of determination, \( R^2 \)

- A regular linear model can be used with time series data, in which \( x_t \) observed at time point \( t \) is used to predict \( y_t \) observed at time point \( t \).