Model Building For ARIMA time series

Slides:



Advertisements
Similar presentations
FINANCIAL TIME-SERIES ECONOMETRICS SUN LIJIAN Feb 23,2001.
Advertisements

Autocorrelation Functions and ARIMA Modelling
Stationary Time Series
Dates for term tests Friday, February 07 Friday, March 07
Model Building For ARIMA time series
Part II – TIME SERIES ANALYSIS C5 ARIMA (Box-Jenkins) Models
Model specification (identification) We already know about the sample autocorrelation function (SAC): Properties: Not unbiased (since a ratio between two.
Time Series Building 1. Model Identification
R. Werner Solar Terrestrial Influences Institute - BAS Time Series Analysis by means of inference statistical methods.
STAT 497 LECTURE NOTES 8 ESTIMATION.
How should these data be modelled?. Identification step: Look at the SAC and SPAC Looks like an AR(1)- process. (Spikes are clearly decreasing in SAC.
Non-Seasonal Box-Jenkins Models
Modeling Cycles By ARMA
Prediction and model selection
ARIMA Forecasting Lecture 7 and 8 - March 14-16, 2011
Financial Econometrics
Modern methods The classical approach: MethodProsCons Time series regression Easy to implement Fairly easy to interpret Covariates may be added (normalization)
Non-Seasonal Box-Jenkins Models
BOX JENKINS METHODOLOGY
Box Jenkins or Arima Forecasting. H:\My Documents\classes\eco346\Lectures\chap ter 7\Autoregressive Models.docH:\My Documents\classes\eco346\Lectures\chap.
AR- MA- och ARMA-.
ARMA models Gloria González-Rivera University of California, Riverside
Time Series Forecasting (Part II)
Time Series Analysis.
STAT 497 LECTURE NOTES 2.
Functions of Random Variables. Methods for determining the distribution of functions of Random Variables 1.Distribution function method 2.Moment generating.
Slide 1 DSCI 5340: Predictive Modeling and Business Forecasting Spring 2013 – Dr. Nick Evangelopoulos Lecture 8: Estimation & Diagnostic Checking in Box-Jenkins.
Linear Stationary Processes. ARMA models. This lecture introduces the basic linear models for stationary processes. Considering only stationary processes.
Slide 1 DSCI 5340: Predictive Modeling and Business Forecasting Spring 2013 – Dr. Nick Evangelopoulos Exam 2 review: Quizzes 7-12* (*) Please note that.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Models for Non-Stationary Time Series The ARIMA(p,d,q) time series.
FORECASTING. Minimum Mean Square Error Forecasting.
Autocorrelation, Box Jenkins or ARIMA Forecasting.
Generalised method of moments approach to testing the CAPM Nimesh Mistry Filipp Levin.
K. Ensor, STAT Spring 2004 Memory characterization of a process How would the ACF behave for a process with no memory? What is a short memory series?
1 Chapter 3:Box-Jenkins Seasonal Modelling 3.1Stationarity Transformation “Pre-differencing transformation” is often used to stablize the seasonal variation.
Time Series Basics Fin250f: Lecture 8.1 Spring 2010 Reading: Brooks, chapter
STAT 497 LECTURE NOTE 9 DIAGNOSTIC CHECKS 1. After identifying and estimating a time series model, the goodness-of-fit of the model and validity of the.
MULTIVARIATE TIME SERIES & FORECASTING 1. 2 : autocovariance function of the individual time series.
Auto Regressive, Integrated, Moving Average Box-Jenkins models A stationary times series can be modelled on basis of the serial correlations in it. A non-stationary.
Estimation Method of Moments (MM) Methods of Moment estimation is a general method where equations for estimating parameters are found by equating population.
Linear Filters. denote a bivariate time series with zero mean. Let.
Review and Summary Box-Jenkins models Stationary Time series AR(p), MA(q), ARMA(p,q)
The Box-Jenkins (ARIMA) Methodology
MODELS FOR NONSTATIONARY TIME SERIES By Eni Sumarminingsih, SSi, MM.
Computacion Inteligente Least-Square Methods for System Identification.
Introduction to stochastic processes
Analysis of financial data Anders Lundquist Spring 2010.
Models for Non-Stationary Time Series
Time Series Analysis.
Ch8 Time Series Modeling
Lecture 8 ARIMA Forecasting II
Chapter 6: Autoregressive Integrated Moving Average (ARIMA) Models
Statistics 153 Review - Sept 30, 2008
STAT 497 LECTURE NOTE 9 DIAGNOSTIC CHECKS.
Hidden Markov Autoregressive Models
Chapter 6: Forecasting/Prediction
Box-Jenkins models Stationary Time series AR(p), MA(q), ARMA(p,q)
Machine Learning Week 4.
Chapter 3 ARMA Time Series Models
Numerical Analysis Lecture 26.
Module 3 Forecasting a Single Variable from its own History, continued
Linear Filters.
The Spectral Representation of Stationary Time Series
Tutorial 10 SEG7550.
Lecturer Dr. Veronika Alhanaqtah
CH2 Time series.
BOX JENKINS (ARIMA) METHODOLOGY
Chap 7: Seasonal ARIMA Models
Presentation transcript:

Model Building For ARIMA time series Consists of three steps Identification Estimation Diagnostic checking

Determination of p, d and q ARIMA Model building Identification Determination of p, d and q

To identify an ARIMA(p,d,q) we use extensively the autocorrelation function {rh : - < h < } and the partial autocorrelation function, {Fkk: 0  k < }.

The definition of the sample covariance function {Cx(h) : - < h < } and the sample autocorrelation function {rh: - < h < } are given below: The divisor is T, some statisticians use T – h (If T is large, both give approximately the same results.)

It can be shown that: Thus Assuming rk = 0 for k > q

The sample partial autocorrelation function is defined by:

It can be shown that:

Identification of an Arima process Determining the values of p,d,q

Recall that if a process is non-stationary one of the roots of the autoregressive operator is equal to one. This will cause the limiting value of the autocorrelation function to be non-zero. Thus a nonstationary process is identified by an autocorrelation function that does not tail away to zero quickly or cut-off after a finite number of steps.

To determine the value of d Note: the autocorrelation function for a stationary ARMA time series satisfies the following difference equation The solution to this equation has general form where r1, r2, r1, … rp, are the roots of the polynomial

For a stationary ARMA time series The roots r1, r2, r1, … rp, have absolute value greater than 1. Therefore If the ARMA time series is non-stationary some of the roots r1, r2, r1, … rp, have absolute value equal to 1, and

stationary non-stationary

If the process is non-stationary then first differences of the series are computed to determine if that operation results in a stationary series. The process is continued until a stationary time series is found. This then determines the value of d.

Determination of the values of p and q. Identification Determination of the values of p and q.

To determine the value of p and q we use the graphical properties of the autocorrelation function and the partial autocorrelation function. Again recall the following:

Patterns of the ACF and PACF of AR(2) Time Series More specically some typical patterns of the autocorrelation function and the partial autocorrelation function for some important ARMA series are as follows: Patterns of the ACF and PACF of AR(2) Time Series In the shaded region the roots of the AR operator are complex

Patterns of the ACF and PACF of MA(2) Time Series In the shaded region the roots of the MA operator are complex

Patterns of the ACF and PACF of ARMA(1.1) Time Series Note: The patterns exhibited by the ACF and the PACF give important and useful information relating to the values of the parameters of the time series.

Summary: To determine p and q. Use the following table. MA(q) AR(p) ARMA(p,q) ACF Cuts after q Tails off PACF Cuts after p Note: Usually p + q ≤ 4. There is no harm in over identifying the time series. (allowing more parameters in the model than necessary. We can always test to determine if the extra parameters are zero.)

Examples

The data

The data

Possible Identifications d = 0, p = 1, q= 1 d = 1, p = 0, q= 1

ACF and PACF for xt ,Dxt and D2xt (Sunspot Data)

Possible Identification d = 0, p = 2, q= 0

ACF and PACF for xt ,Dxt and D2xt (IBM Stock Price Data)

Possible Identification d = 1, p =0, q= 0

Determination of p, d and q ARIMA Model building Identification Determination of p, d and q

Until you find a stationary time series. Determining d Look at the autocorrelation function rh If {xt}is stationary then If {xt}is non-stationary then non-stationary stationary If {xt}is non-stationary then look at {Dxt}, {D2xt}, … Until you find a stationary time series. This determines the value of d

Determining the value of p and q Use the following table: MA(q) time series rh - cuts off to zero after lag q. AR(p) time series Fkk - cuts off to zero after lag p. ARMA(p,q) time series - rh and Fkk – tail off to zero

Estimation of ARIMA parameters Use both Method of Moments Maximum Likelihood Maximum Likelihood results in the most efficient estimators

Preliminary Estimation Using the Method of moments Equate sample statistics to population paramaters

Estimation of parameters of an MA(q) series The theoretical autocorrelation function in terms the parameters of an MA(q) process is given by. To estimate a1, a2, … , aq we solve the system of equations:

This set of equations is non-linear and generally very difficult to solve For q = 1 the equation becomes: Thus or This equation has the two solutions One solution will result in the MA(1) time series being invertible

For q = 2 the equations become:

Estimation of parameters of an ARMA(p,q) series We use a similar technique. Namely: Obtain an expression for rh in terms b1, b2 , ... , bp ; a1, a1, ... , aq of and set up q + p equations for the estimates of b1, b2 , ... , bp ; a1, a2, ... , aq by replacing rh by rh.

Estimation of parameters of an ARMA(p,q) series Example: The ARMA(1,1) process The expression for r1 and r2 in terms of b1 and a1 are: Further

Thus the expression for the estimates of b1, a1, and s2 are :

Hence or This is a quadratic equation which can be solved

Example (ChemicalConcentration Data) the time series was identified as either an ARIMA(1,0,1) time series or an ARIMA(0,1,1) series. If we use the first identification then series xt is an ARMA(1,1) series.

Identifying the series xt is an ARMA(1,1) series. The autocorrelation at lag 1 is r1 = 0.570 and the autocorrelation at lag 2 is r2 = 0.495 . Thus the estimate of b1 is 0.495/0.570 = 0.87. Also the quadratic equation becomes which has the two solutions -0.48 and -2.08. Again we select as our estimate of a1 to be the solution -0.48, resulting in an invertible estimated series.

Since d = m(1 - b1) the estimate of d can be computed as follows: Thus the identified model in this case is xt = 0.87 xt-1 + ut - 0.48 ut-1 + 2.25

If we use the second identification then series Dxt = xt – xt-1 is an MA(1) series. Thus the estimate of a1 is: The value of r1 = -0.413. Thus the estimate of a1 is: The estimate of a1 = -0.53, corresponds to an invertible time series. This is the solution that we will choose

The estimate of the parameter m is the sample mean. Thus the identified model in this case is: Dxt = ut - 0.53 ut-1 + 0.002 or xt = xt-1 + ut - 0.53 ut-1 + 0.002 This compares with the other identification: (An ARIMA(0,1,1) model) xt = 0.87 xt-1 + ut - 0.48 ut-1 + 2.25 (An ARIMA(1,0,1) model)

Preliminary Estimation of the Parameters of an AR(p) Process

The regression coefficients b1, b2, … The regression coefficients b1, b2, …., bp and the auto correlation function rh satisfy the Yule-Walker equations: and

The Yule-Walker equations can be used to estimate the regression coefficients b1, b2, …., bp using the sample auto correlation function rh by replacing rh with rh. and

Example Considering the data in example 1 (Sunspot Data) the time series was identified as an AR(2) time series . The autocorrelation at lag 1 is r1 = 0.807 and the autocorrelation at lag 2 is r2 = 0.429 . The equations for the estimators of the parameters of this series are which has solution Since d = m( 1 -b1 - b2) then it can be estimated as follows:

Thus the identified model in this case is xt = 1.321 xt-1 -0.637 xt-2 + ut +14.9

Maximum Likelihood Estimation of the parameters of an ARMA(p,q) Series

The method of Maximum Likelihood Estimation selects as estimators of a set of parameters q1,q2, ... , qk , the values that maximize L(q1,q2, ... , qk) = f(x1,x2, ... , xN;q1,q2, ... , qk) where f(x1,x2, ... , xN;q1,q2, ... , qk) is the joint density function of the observations x1,x2, ... , xN. L(q1,q2, ... , qk) is called the Likelihood function.

It is important to note that: finding the values -q1,q2, ... , qk- to maximize L(q1,q2, ... , qk) is equivalent to finding the values to maximize l(q1,q2, ... , qk) = ln L(q1,q2, ... , qk). l(q1,q2, ... , qk) is called the log-Likelihood function.

Again let {ut : t ÎT} be identically distributed and uncorrelated with mean zero. In addition assume that each is normally distributed . Consider the time series {xt : t ÎT} defined by the equation: (*) xt = b1xt-1 + b2xt-2 +... +bpxt-p + d + ut +a1ut-1 + a2ut-2 +... +aqut-q

Assume that x1, x2, ...,xN are observations on the time series up to time t = N. To estimate the p + q + 2 parameters b1, b2, ... ,bp ; a1, a2, ... ,aq ; d , s2 by the method of Maximum Likelihood estimation we need to find the joint density function of x1, x2, ...,xN f(x1, x2, ..., xN |b1, b2, ... ,bp ; a1, a2, ... ,aq , d, s2) = f(x| b, a, d ,s2).

We know that u1, u2, ...,uN are independent normal with mean zero and variance s2. Thus the joint density function of u1, u2, ...,uN is g(u1, u2, ...,uN ; s2) = g(u ; s2) is given by.

It is difficult to determine the exact density function of x1,x2, It is difficult to determine the exact density function of x1,x2, ... , xN from this information however if we assume that p starting values on the x-process x* = (x1-p,x2-p, ... , xo) and q starting values on the u-process u* = (u1-q,u2-q, ... , uo) have been observed then the conditional distribution of x = (x1,x2, ... , xN) given x* = (x1-p,x2-p, ... , xo) and u* = (u1-q,u2-q, ... , uo) can easily be determined.

The system of equations : x1 = b1x0 + b2x-1 +... +bpx1-p + d + u1 +a1u0 + a2u-1 +... + aqu1-q x2 = b1x1 + b2x0 +... +bpx2-p + d + u2 +a1u1 + a2u0 +... +aqu2-q ... xN= b1xN-1 + b2xN-2 +... +bpxN-p + d + uN +a1uN-1 + a2uN-2 +... + aquN-q

can be solved for: u1 = u1 (x, x*, u*; b, a, d) u2 = u2 (x, x*, u*; b, a, d) ... uN = uN (x, x*, u*; b, a, d) (The jacobian of the transformation is 1)

Then the joint density of x given x* and u* is given by:

Let: = “conditional likelihood function”

“conditional log likelihood function” =

The values that maximize are the values that minimize with

Comment: The minimization of: Requires a iterative numerical minimization procedure to find: Steepest descent Simulated annealing etc

Comment: The computation of: for specific values of can be achieved by using the forecast equations

Comment: The minimization of : assumes we know the value of starting values of the time series {xt| t T} and {ut| t  T} Namely x* and u*.

Approaches: Use estimated values: Use forecasting and backcasting equations to estimate the values:

Backcasting: If the time series {xt|t  T} satisfies the equation: It can also be shown to satisfy the equation: Both equations result in a time series with the same mean, variance and autocorrelation function: In the same way that the first equation can be used to forecast into the future the second equation can be used to backcast into the past:

Approaches to handling starting values of the series {xt|t  T} and {ut|t  T} Initially start with the values: Estimate the parameters of the model using Maximum Likelihood estimation and the conditional Likelihood function. Use the estimated parameters to backcast the components of x*. The backcasted components of u* will still be zero.

Repeat steps 2 and 3 until the estimates stablize. This algorithm is an application of the E-M algorithm This general algorithm is frequently used when there are missing values. The E stands for Expectation (using a model to estimate the missing values) The M stands for Maximum Likelihood Estimation, the process used to estimate the parameters of the model.

Some Examples using: R