Hi guys!! ARIMA is the part of time series tutorial series, in few last tutorial article we have learned alot about time series data like reading, plotting, decomposing and forecasting time series data. By reading previous articles you must have gained some knowledge about time series. So in this tutorial article we will learn about ARIMA models.
Why do we need ARIMA models? Ok let me explain, in last article, we learned forecasting using simple exponential smoothing method. Simple exponential smoothing(SES) methods are useful for making forecasts, but SES method does not make any assumptions about correlations between successive values of time series. However, if you want to make prediction intervals forecast made using exponential smoothing method, the prediction intervals requires that forecast errores should not be correlated and normally distributed with mean zero constant variance.
While exponential smoothing methods do not make any assumptions about correlations between successive values
of the time series, in some cases you can make a better predictive model by taking correlations in the data into
Autoregressive Integrated Moving Average (ARIMA) models include an explicit statistical model for the
irregular component of a time series, that allows for non-zero autocorrelations in the irregular component.
Steps for working with ARIMA models:
- Differencing a time series.
- Selecting a candidate ARIMA model.
- Forecsting using ARIMA model.
Step:1 Differencing a time series
ARIMA models are defined for stationary time series. Though if you want to start off with non-stationary time series, therefore, you will need to differenciate the time series until you get stationary time series. If you have to difference the time series d times to obtain a stationery time series, then you have to make ARIMA(p,d,q), d is the order of differentiating the time series. Now you must be curious like me, what are the other parameters(p,q), p and q are the correlation and autocorrelation values of ARIMA model.
You can difference a time series using the “diff()” function in R.
Lets take an example of kings death which we have used in how to do time series analysis using R part-1.
Lets see this data is stationary or not?
Above graph is not stationary mean, to calculate the time series of first difference:
kingstimeseriesdiff1<-diff(kingstimeseries, difference=1) plot.ts(kingstimeseriesdiff1)
Here is the plot
The time series of first differences appears to be stationary in mean and variance, and so an ARIMA(p,1,q) model
is probably appropriate for the time series of the age of death of the kings. By taking the time series of first differences, we have removed the trend component of the time series of the ages at death of the kings, other components are remains same. We can now examine whether there are correlations between successive terms
of this irregular component, if so, this could help us to make a predictive model for the ages at death of the kings.
Till here we have finished the first part, whether our time series data is stationary or not. Now, next part is select ARIMA model
Step 2: Selecting ARIMA model
Selecting ARIMA model means finding the appropriate values of p and q for ARIMA(p,d,q).
Lets find out the correlations and partial correlation between the successive intervals of time series data.
To plot correlation and partial correlation, we can use the “acf()” and “pacf()” functions in R, respectively. To
get the actual values of the auto correlations and partial auto correlations, we set “plot=FALSE” in the “acf()” and
To plot the correlation for lags 1-20 of the once “differenced” time series of the ages at death of the kings, and to get the values of the autocorrelations, we type:
acf(kingstimeseriesdiff1, lag.max=20) # plot correlogram acf(kingstimeseriesdiff1,lag.max=20, plot=FALSE) #to get the partial autocorrelation values
here is the R console output and correlogram plot
We see from the correlogram that the autocorrelation at lag 1 (-0.360) exceeds the significance bounds, but all
other autocorrelations between lags 1-20 do not exceed the significance bounds.
To plot the partial correlogram for lags 1-20 for the once differenced time series of the ages at death of the English
kings, and get the values of the partial autocorrelations, we use the “pacf()” function
pacf(kingstimeseriesdiff1,lag.max=20) #plot partial correlogram pacf(kingstimeseriesdiff1,lag.max=20,plot=FALSE) # get partial auto correlation value
Here is the R console output and plot
we can see from plot, that partial autocorrelations at lags 1, 2 and 3 exceed the significance bounds, are negative, and are slowly increasing with increasing lag (lag 1: -0.360, lag 2: -0.335, lag 3: -0.321). The partial autocorrelations tail off to zero after lag 3.
Since the correlogram is zero after lag 1, and the partial correlogram tails off to zero after lag 3, this means that
the following ARMA (autoregressive moving average) models are possible for the time series of first differences:
• An ARMA(3,0) model, that is, an autoregressive model of order p=3, since the partial autocorrelogram is
zero after lag 3, and the autocorrelogram tails off to zero.
• An ARMA(0,1) model, that is, a moving average model of order q=1, since the autocorrelogram is zero after
lag 1 and the partial autocorrelogram tails off to zero
• An ARMA(p,q) model, that is, a mixed model with p and q greater than 0, since the autocorrelogram and
partial correlogram tail off to zero.
We use the principle of parsimony to decide which model is best: that is, we assume that the model with the
fewest parameters is best. The ARMA(3,0) model has 3 parameters, the ARMA(0,1) model has 1 parameter, and
the ARMA(p,q) model has at least 2 parameters. Therefore, the ARMA(0,1) model is taken as the best model.
An ARMA(0,1) model is a moving average model of order 1, or MA(1) model. This model can be written as:
X(t)- μ = Z(t) – (θ * Z(t)-1)
- X(t) is the stationary time series.
- μ is the mean of time series X(t).
- Z(t) is white noise with mean zero and constant variance.
- θ is a parameter that can be estimated.
A MA (moving average) model is usually used to model a time series that shows short-term dependencies between
We can directly form appropriate ARIMA model in R by using auto.arima() function and forecast package.
Step 3: Forecasting using ARIMA model
Once you have selected the best candidate ARIMA(p,d,q) model for your time series data, you can estimate the
parameters of that ARIMA model, and use that as a predictive model for making forecasts for future values of
your time series.
Lets stat forecasting, we discussed above that an ARIMA(0,1,1) model seems appropriate model for the given time series. You can specify the values of p, d and q in the ARIMA model by using the “order” argument of the “arima()” function in R. To fit an ARIMA(p,d,q) model to this time series.
kingstimeseriesarima<-arima(kingstimeseries, order=c(0,1,1)) kingstimeseriesarima
Here is the R console output
An ARMA(0,1) model can be written X(t) – μ = Z(t) – (θ * Z(t)-1). From the output of the “arima()” R function, the estimated value of θ (given as ‘ma1’ in the R output) is -0.7218 in the case of the ARIMA(0,1,1) model fitted to the time series.
We can then use the ARIMA model to make forecasts for future values of the time series, using the “forecast.
Arima()” function in the “forecast” R package. For example, to forecast the ages at death of the next five
> library("forecast") # load the "forecast" R library > kingstimeseriesforecasts <- forecast.Arima(kingstimeseriesarima, h=5) > kingstimeseriesforecasts
Here is the r console output
The original time series includes the ages at death of 42 kings. The forecast.Arima() function gives us a forecast of the age of death of the next five kings (kings 43-47), as well as 80% and 95% prediction intervals for those Predictions. The age of death of the 42nd English king was 56 years, and the ARIMA model gives the forecasted age at death of the next five kings as 67.75 years.
We can plot the observed ages of death for the first 42 kings, as well as the ages that would be predicted for these
42 kings and for the next 5 kings using our ARIMA(0,1,1) model
Plot the predicted five kings age of death
As in the case of exponential smoothing models, it is a good idea to investigate whether the forecast errors of an
ARIMA model are normally distributed with mean zero and constant variance, and whether the are correlations between successive forecast errors.
For example, we can make a correlogram of the forecast errors for our ARIMA(0,1,1) model for the ages at death
of kings, and perform the Ljung-Box test for lags 1-20, by typing:
> acf(kingstimeseriesforecasts$residuals, lag.max=20) > Box.test(kingstimeseriesforecasts$residuals, lag=20, type="Ljung-Box")
Here is the R console output and Plot
Since the correlogram shows that none of the sample autocorrelations for lags 1-20 exceed the significance bounds, and the p-value for the Ljung-Box test is 0.9, we can conclude that there is very little evidence for non-zero
autocorrelations in the forecast errors at lags 1-20.
The time series’s tutorials series is completed, if you have any doubts please ask in comments or shoot me an email @ firstname.lastname@example.org.