Hi guys!! ARIMA is the part of time series tutorial series, in few last tutorial article we have learned alot about time series data like reading, plotting, decomposing and forecasting time series data. By reading previous articles you must have gained some knowledge about time series. So in this tuorial article we will learn about ARIMA models.

Why do we need ARIMA models? Ok let me explain, in last article, we learned forecasting using simple exponential smoothing method. Simple exponential smoothing(SES) methods are useful for making forecasts, but SES method does not make any assumptions about correlations between successive values of time series. However, if you want to make prediction intervals forecast made using exponential smoothing method, the prediction intervals requires that forecast errores should not be correlated and normally distributed with mean zero constant variance.

While exponential smoothing methods do not make any assumptions about correlations between successive values

of the time series, in some cases you can make a better predictive model by taking correlations in the data into

account.

**Autoregressive Integrated Moving Average (ARIMA)** models include an explicit statistical model for the

irregular component of a time series, that allows for non-zero autocorrelations in the irregular component.

Steps for working with ARIMA models:

- Differencing a time series.
- Selecting a candidate ARIMA model.
- Forecsting using ARIMA model.

**Step:1**** Differencing a time series**

ARIMA models are defined for **stationary time series**. Though if you want to start off with **non-stationary time series**, therefore, you will need to differenciate the time series until you get **stationary time series**. If you have to difference the time series d times to obtain a stationery time series, then you have to make **ARIMA(p,d,q)**, **d is the order of differentiating the time series**. Now you must be curious like me, what are the other parameters(p,q), p and q are the correlation and autocorrelation values of ARIMA model.

#### You can difference a time series using the “diff()” function in R.

Lets take an example of kings death which we have used in how to do time series analysis using R part-1.

Lets see this data is stationary or not?

Above graph is not stationary mean, to calculate the time series of first difference:

kingstimeseriesdiff1<-diff(kingstimeseries, difference=1) plot.ts(kingstimeseriesdiff1)

Here is the plot

The time series of first differences appears to be stationary in mean and variance, and so an **ARIMA(p,1,q)** model

is probably appropriate for the time series of the age of death of the kings. By taking the time series of first differences, we have removed the **trend component** of the time series of the ages at death of the kings, other components are remains same. We can now examine whether there are correlations between successive terms

of this irregular component, if so, this could help us to make a predictive model for the ages at death of the kings.

Till here we have finished the first part, whether our time series data is stationary or not. Now, next part is select ARIMA model

**Step 2:** **S****electing ARIMA model**

Selecting ARIMA model means finding the appropriate values of **p** and **q** for ARIMA(p,d,q).

Lets find out the correlations and partial correlation between the successive intervals of time series data.

To plot** correlation** and **partial correlation**, we can use the “**acf()”** and “**pacf()”** functions in R, respectively. To

get the actual values of the auto correlations and partial auto correlations, we set “plot=FALSE” in the “acf()” and

“pacf()” functions.

To plot the correlation for lags 1-20 of the once **“differenced”** time series of the ages at death of the kings, and to get the values of the autocorrelations, we type:

acf(kingstimeseriesdiff1, lag.max=20) # plot correlogram acf(kingstimeseriesdiff1,lag.max=20, plot=FALSE) #to get the partial autocorrelation values

here is the R console output and correlogram plot

We see from the correlogram that the autocorrelation at lag 1 (-0.360) exceeds the significance bounds, but all

other autocorrelations between lags 1-20 do not exceed the significance bounds.

To plot the partial correlogram for lags 1-20 for the once differenced time series of the ages at death of the English

kings, and get the values of the partial autocorrelations, we use the “pacf()” function

pacf(kingstimeseriesdiff1,lag.max=20) #plot partial correlogram pacf(kingstimeseriesdiff1,lag.max=20,plot=FALSE) # get partial auto correlation value

Here is the R console output and plot

we can see from plot, that partial autocorrelations at lags 1, 2 and 3 exceed the significance bounds, are negative, and are slowly increasing with increasing lag (lag 1: -0.360, lag 2: -0.335, lag 3: -0.321). The partial autocorrelations tail off to zero after lag 3.

Since the correlogram is zero after lag 1, and the partial correlogram tails off to zero after lag 3, this means that

the following ARMA (autoregressive moving average) models are possible for the time series of first differences:

• An ARMA(3,0) model, that is, an autoregressive model of order p=3, since the partial autocorrelogram is

zero after lag 3, and the autocorrelogram tails off to zero.

• An ARMA(0,1) model, that is, a moving average model of order q=1, since the autocorrelogram is zero after

lag 1 and the partial autocorrelogram tails off to zero

• An ARMA(p,q) model, that is, a mixed model with p and q greater than 0, since the autocorrelogram and

partial correlogram tail off to zero.

We use the principle of parsimony to decide which model is best: that is, we assume that the model with the

fewest parameters is best. The ARMA(3,0) model has 3 parameters, the ARMA(0,1) model has 1 parameter, and

the ARMA(p,q) model has at least 2 parameters. Therefore, the ARMA(0,1) model is taken as the best model.

An ARMA(0,1) model is a moving average model of order 1, or MA(1) model. This model can be written as:

X(t)- μ = Z(t) – (θ * Z(t)-1)

where

**X(t)**is the stationary time series.**μ**is the mean of time series**X(t).****Z(t)**is white noise with mean zero and constant variance.**θ**is a parameter that can be estimated.

A MA (moving average) model is usually used to model a time series that shows short-term dependencies between

successive observations.

**We can directly form appropriate ARIMA model in R by using auto.arima() function and forecast package.**

**Step 3:** **Forecasting using ARIMA mode****l**

Once you have selected the best candidate ARIMA(p,d,q) model for your time series data, you can estimate the

parameters of that ARIMA model, and use that as a predictive model for making forecasts for future values of

your time series.

Lets stat forecasting, we discussed above that an ARIMA(0,1,1) model seems appropriate model for the given time series. You can specify the values of p, d and q in the ARIMA model by using the “order” argument of the “arima()” function in R. To fit an ARIMA(p,d,q) model to this time series.

kingstimeseriesarima<-arima(kingstimeseries, order=c(0,1,1)) kingstimeseriesarima

Here is the R console output

An ARMA(0,1) model can be written X(t) – μ = Z(t) – (θ * Z(t)-1). From the output of the “arima()” R function, the estimated value of θ (given as ‘ma1’ in the R output) is -0.7218 in the case of the ARIMA(0,1,1) model fitted to the time series.

We can then use the ARIMA model to make forecasts for future values of the time series, using the “forecast.

Arima()” function in the “forecast” R package. For example, to forecast the ages at death of the next five

English kings

> library("forecast") # load the "forecast" R library > kingstimeseriesforecasts <- forecast.Arima(kingstimeseriesarima, h=5) > kingstimeseriesforecasts

Here is the r console output

The original time series includes the ages at death of 42 kings. The forecast.Arima() function gives us a forecast of the age of death of the next five kings (kings 43-47), as well as 80% and 95% prediction intervals for those Predictions. The age of death of the 42nd English king was 56 years, and the ARIMA model gives the forecasted age at death of the next five kings as 67.75 years.

We can plot the observed ages of death for the first 42 kings, as well as the ages that would be predicted for these

42 kings and for the next 5 kings using our ARIMA(0,1,1) model

plot.forecast(kingstimeseriesforecast)

Plot the predicted five kings age of death

As in the case of exponential smoothing models, it is a good idea to investigate whether the forecast errors of an

ARIMA model are normally distributed with mean zero and constant variance, and whether the are correlations between successive forecast errors.

For example, we can make a correlogram of the forecast errors for our ARIMA(0,1,1) model for the ages at death

of kings, and perform the Ljung-Box test for lags 1-20, by typing:

> acf(kingstimeseriesforecasts$residuals, lag.max=20) > Box.test(kingstimeseriesforecasts$residuals, lag=20, type="Ljung-Box")

Here is the R console output and Plot

Plot

Since the correlogram shows that none of the sample autocorrelations for lags 1-20 exceed the significance bounds, and the p-value for the Ljung-Box test is 0.9, we can conclude that there is very little evidence for non-zero

autocorrelations in the forecast errors at lags 1-20.

The time series’s tutorials series is completed, if you have any doubts please ask in comments or shoot me an email @ irrfankhann29@gmail.com.

very useful..

LikeLike

Keep visiting for awesome article on data science!!!@Ramesh

LikeLike

seasonality and event(like holiday event) how did we account for that.

LikeLike