Time Series Analysis

by Chee Yee Lim

Posted on 2021-05-17

Collection of notes on time series analysis (statistical point of view).

Time Series Analysis


  • A time series is a set of values observed sequentially through time.
    • The next value in a time series is dependent on the previous values.
    • The series may be denoted by \( X_1, X_2, ..., X_t \) where \( t \) refers to the time period and \( X \) refers to the value.
    • If future values of \( X \) are exactly determined by a methematical formula, the series is said to be deterministic.
    • If future values of \( X \) are described only by their probability distribution, the series is said to be a statistical or stochastic process.
    • A special class of stochastic processes is a stationary stochastic process, which is required for Box-Jenkins ARIMA models.
  • A time series can be made up of 3 key components.
    • Trend - a long term increase or decrease.
    • Seasonality - an effect of seasonal factors for a fixed or known period.
    • Cycle - a periodic cycle that is not of fixed or known period.
    • Time Series Component
  • When analysing time series, it is important to check autocorrelation among the data points.
    • This can be done by performing the autocorrelation function (ACF) plot analysis and partial autocorrelation function (PACF) plot analysis.
    • ACF shows a bar chart between correlation coefficients and lags. PACF shows a bar chart between partial correlation coefficients and lags.
    • It is a primary method for finding optimal time steps for ARIMA type of models.
  • When generating training and test sets, it is important to separate data points in chunks and not completely randomly like in non-time series data.
    • Time Series Data Split

Independence and Stationarity

  • Independence is a general concept, while stationarity is a special case of independence.
  • The key difference with a time series data is that each data point is not independent, i.e. the value of current data point depends on the values of previous data point.
    • In the case of non-stationarity, time steps correlate with mean and/or variance.
  • A statistical process is stationary if the probability distribution is the same for all starting values of \( t \).
    • This implies that the mean and variance are constant for all values of \( t \).
    • A series that exhibits a simple trend is not stationary because the values of the series depend on \( t \).
  • A stationary stochastic process is completely defined by its mean, variance and autocorrelation function.
  • Stationarity is a required assumption for ARIMA models, but not necessarily for other models.

Stationarise Time Series

List of Forecast Algorithms

  • Average/mean method
    • All future values are equal to the average of the historical data.
      • \( \hat{y}_{T+h} = \bar{y} = \frac{( y_1 + ... + y_T )}{T} \)
    • Prediction interval for \( h \)-step:
      • \( \hat{\sigma}_h = \hat{\sigma} \sqrt{1 + \frac{1}{T}} \), where \( \hat{\sigma} \) is the residual standard deviation.
  • Naive method
    • All forecasts are set to be the value of the last observation.
      • \( \hat{y}_{T+h} = y_T \)
    • This method works remarkably well for economic and financial time series.
    • Because a naive forecast is optimal when data follow a random walk, it is also called a random walk forecast.
    • Prediction interval for \( h \)-step:
      • \( \hat{\sigma}_h = \hat{\sigma} \sqrt{h} \), where \( \hat{\sigma} \) is the residual standard deviation.
  • Seasonal naive method (snaive)
    • Each forecast is set to be the value of the last observed value from the same season.
      • \( \hat{y}{T+h} = y \), where \( m \) is the seasonal period, \( k \) is the integer part of \( \frac{(h-1)}{m} \) (i.e. the number of complete years in the forecast period prior to time \( T + h \)).
    • This method is useful for highly seasonal data.
    • Prediction interval for \( h \)-step:
      • \( \hat{\sigma}_h = \hat{\sigma} \sqrt{k + 1} \), where \( \hat{\sigma} \) is the residual standard deviation and \( k \) is the integer part of \( \frac{(h-1)}{m} \)
    • Time Series Forecast Naive
  • Drift method
    • Each forecast is obtained by extrapolating from a line fitted on the first and last observations.
      • \( \hat{y}_{T+h} \)
      • \( = y_T + \frac{h}{T-1} \sum_{t=2}^{T} (y_t - y_{t-1}) \)
      • \( = y_T + h ( \frac{y_T - y_1}{T - 1} ) \)
    • Prediction interval for \( h \)-step:
      • \( \hat{\sigma}_h = \hat{\sigma} \sqrt{ h(1 + \frac{h}{T}) } \), where \( \hat{\sigma} \) is the residual standard deviation.
    • Time Series Forecast Drift
  • Regular linear model (no time lag between predictors and response)
    • A regular linear model can be used with time series data, in which \( x_t \) observed at time point \( t \) is used to predict \( y_t \) observed at time point \( t \).
      • \( y_t = \beta_0 + \beta_1 x_{1,t} + \beta_2 x_{2,t} + ... + \beta_k x_{k,t} + \epsilon_t \)
      • i.e. predictors at time \( t \) are used to predict response at time \( t \).
    • Using the no time lag approach, forecast values for \( x_{n,t} \) are needed to predict \( y_t \).
      • For scenario-based forecast, this approach is very useful where there are multiple sets of forecast values for \( x_{n,t} \) derived based on assumptions.
      • E.g. under best case scenario, we can assume +1% income growth rate and +0.5% savings growth rate for predicting change in consumption. Under worst case scenario, we can assume -1% income decline rate and -0.5% savings decline rate for predicting change in consumption.
      • However, the no time lag approach is not suitable for forecasting into the future without assumed values for \( x_{n,t} \).
      • Lagged values are required in that case. For example to predict \( y_{t+h} \) with \( x_{n,t} \), \( y_{t+h} = \beta_0 + \beta_1 x_{1,t} + \beta_2 x_{2,t} + ... + \beta_k x_{k,t} + \epsilon_{t+h} \) where \( h = 1,2,... \)
    • Assumptions on this approach:
      • Linear relationship between target and predictor variables.
      • Errors have zero mean; otherwise forecasts are systematically biased.
      • Errors are not autocorrelated; otherwise forecasts are systematically biased.
      • Errors are unrelated to predictor variables; otherwise there would be more information that should be included in the systematic part of the model.
      • Useful to have the errors normally distributed with a constant variance, in order to easily produce prediction intervals.
    • The model can be fitted with least squares estimation, by minimising the following equation:
      • \( \sum_{t=1}^T \epsilon_{t}^2 = \sum_{t=1}^T ( y_t - \beta_0 - \beta_1 x_{1,t} - \beta_2 x_{2,t} - ... - \beta_k x_{k,t} )^2 \)
    • The goodness-of-fit for the model can be evaluated via:
      1. Coefficient of determination, \( R^2 \)
      2. Residual standard error (i.e. also useful for prediction interval calculation)
      3. ACF plot of residuals
        • It is common to find autocorrelation in the residuals from a model fitted to time series data.
        • This will violate the assumption of no autocorrelation in the errors of our model, and our forecasts may be inefficient - there is some information left over which should be accounted for in the model to obtain better forecasts.
        • The forecasts from a model with autocorrelated errors are still unbiased, and so are not "wrong", but they will usually have larger prediction intervals than they need to.