Intervention models Something’s happened around t = 200.

Slides:



Advertisements
Similar presentations
A Brief Introduction to Spatial Regression
Advertisements

SMA 6304 / MIT / MIT Manufacturing Systems Lecture 11: Forecasting Lecturer: Prof. Duane S. Boning Copyright 2003 © Duane S. Boning. 1.
Time series modelling and statistical trends
Autocorrelation Functions and ARIMA Modelling
Using R for Time Series Analysis R is a package that can be downloaded for free.
Multiple Regression and Model Building
DSCI 5340: Predictive Modeling and Business Forecasting Spring 2013 – Dr. Nick Evangelopoulos Exam 1 review: Quizzes 1-6.
Using SAS for Time Series Data
Model Building For ARIMA time series
Part II – TIME SERIES ANALYSIS C5 ARIMA (Box-Jenkins) Models
Unit Roots & Forecasting
Slide 1 DSCI 5340: Predictive Modeling and Business Forecasting Spring 2013 – Dr. Nick Evangelopoulos Lecture 7: Box-Jenkins Models – Part II (Ch. 9) Material.
Time Series Building 1. Model Identification
R. Werner Solar Terrestrial Influences Institute - BAS Time Series Analysis by means of inference statistical methods.
STAT 497 LECTURE NOTES 8 ESTIMATION.
How should these data be modelled?. Identification step: Look at the SAC and SPAC Looks like an AR(1)- process. (Spikes are clearly decreasing in SAC.
Tutorial for solution of Assignment week 40 “Forecasting monthly values of Consumer Price Index Data set: Swedish Consumer Price Index” sparetime.
Regression with ARMA Errors. Example: Seat-belt legislation Story: In February 1983 seat-belt legislation was introduced in UK in the hope of reducing.
The Multiple Regression Model Prepared by Vera Tabakova, East Carolina University.
Assignment week 38 Exponential smoothing of monthly observations of the General Index of the Stockholm Stock Exchange. A. Graphical illustration of data.
ARIMA-models for non-stationary time series
Some more issues of time series analysis Time series regression with modelling of error terms In a time series regression model the error terms are tentatively.
Econ 140 Lecture 131 Multiple Regression Models Lecture 13.
Chapter 4 Multiple Regression.
January 6, morning session 1 Statistics Micro Mini Multiple Regression January 5-9, 2008 Beth Ayers.
Econ 140 Lecture 181 Multiple Regression Applications III Lecture 18.
ARIMA Forecasting Lecture 7 and 8 - March 14-16, 2011
Time series analysis - lecture 4 Consumer Price Index - Clothing and Footwear.
12 Autocorrelation Serial Correlation exists when errors are correlated across periods -One source of serial correlation is misspecification of the model.
Lecture 5 Correlation and Regression
Correlation & Regression
BOX JENKINS METHODOLOGY
Box Jenkins or Arima Forecasting. H:\My Documents\classes\eco346\Lectures\chap ter 7\Autoregressive Models.docH:\My Documents\classes\eco346\Lectures\chap.
AR- MA- och ARMA-.
Inference for regression - Simple linear regression
Regression Method.
Lecture 3: Inference in Simple Linear Regression BMTRY 701 Biostatistical Methods II.
CHAPTER 14 MULTIPLE REGRESSION
Slide 1 DSCI 5340: Predictive Modeling and Business Forecasting Spring 2013 – Dr. Nick Evangelopoulos Lecture 8: Estimation & Diagnostic Checking in Box-Jenkins.
1 Chapter 10 Correlation and Regression 10.2 Correlation 10.3 Regression.
#1 EC 485: Time Series Analysis in a Nut Shell. #2 Data Preparation: 1)Plot data and examine for stationarity 2)Examine ACF for stationarity 3)If not.
Tutorial for solution of Assignment week 39 “A. Time series without seasonal variation Use the data in the file 'dollar.txt'. “
Chapter 10 Correlation and Regression
Slide 1 DSCI 5340: Predictive Modeling and Business Forecasting Spring 2013 – Dr. Nick Evangelopoulos Lecture 3: Time Series Regression (Ch. 6) Material.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
FORECASTING. Minimum Mean Square Error Forecasting.
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
Lecture 12 Time Series Model Estimation
STAT 497 LECTURE NOTE 9 DIAGNOSTIC CHECKS 1. After identifying and estimating a time series model, the goodness-of-fit of the model and validity of the.
Estimation Method of Moments (MM) Methods of Moment estimation is a general method where equations for estimating parameters are found by equating population.
Forecasting (prediction) limits Example Linear deterministic trend estimated by least-squares Note! The average of the numbers 1, 2, …, t is.
Correlation & Regression Analysis
Regression Analysis: Part 2 Inference Dummies / Interactions Multicollinearity / Heteroscedasticity Residual Analysis / Outliers.
Chapter 8: Simple Linear Regression Yang Zhenlin.
Example x y We wish to check for a non zero correlation.
1 Chapter 5 : Volatility Models Similar to linear regression analysis, many time series exhibit a non-constant variance (heteroscedasticity). In a regression.
The Box-Jenkins (ARIMA) Methodology
Module 4 Forecasting Multiple Variables from their own Histories EC 827.
Seasonal ARIMA FPP Chapter 8.
2/25/ lecture 121 STATS 330: Lecture 12. 2/25/ lecture 122 Diagnostics 4 Aim of today’s lecture To discuss diagnostics for independence.
Quantitative Methods. Bivariate Regression (OLS) We’ll start with OLS regression. Stands for  Ordinary Least Squares Regression. Relatively basic multivariate.
Analysis of financial data Anders Lundquist Spring 2010.
MODEL DIAGNOSTICS By Eni Sumarminingsih, Ssi, MM.
Lecture 12 Time Series Model Estimation Materials for lecture 12 Read Chapter 15 pages 30 to 37 Lecture 12 Time Series.XLSX Lecture 12 Vector Autoregression.XLSX.
The simple linear regression model and parameter estimation
Inference for Least Squares Lines
Lecture 12 Time Series Model Estimation
Chapter 6: Autoregressive Integrated Moving Average (ARIMA) Models
Product moment correlation
BOX JENKINS (ARIMA) METHODOLOGY
Presentation transcript:

Intervention models Something’s happened around t = 200

The first example Seems like a series that is generally stationary, but shifts level around t = 200. Look separately at the parts before and after the level shift. There are in total 400 time-points. Select the first 190 and the last 190

First 190 values Could be an AR(1) or an MA(1) or an ARMA(1,1). Quite clearly stationary!

Last 190 values Points more towards an ARMA(1,1)

The change in level would most probably be modelled using a step function A complete intervention model for the times series can therefore be since there seems to be a permanent immediate constant change in level at t = 200 How can this model be fitted using R?

strange.model =200)), transfer=list(c(0,0))) The arimax command works like the arima command, but allows inclusion of covariates. The argument xtransf is followed by a data frame in which each column correspond to a covariate time series (same number of observations as Y t ). Here this data frame is constructed with the command 1* ( seq(strange)>=200) The command seq(strange) returns the indices of the vector strange The command seq(strange)>=200 returns a vector (with the same length as strange in which a term is FALSE if the corresponding index of strange is less than 200 and TRUE otherwise. Finally, the multiplication with 1 transforms FALSE into 0 and TRUE into 1 and the variable in the data frame is also given the name step200 (for convenience) Hence, the resulting column is a step function of the kind we want.

The argument transfer is followed by a list comprising one two-dimensional vector for each covariate specified by xtransf Here we have the argument list(c(0,0)) implying that the covariate shall be included as it stands (no lagging, no filtering). Note that the argument must always be followed by a list (even if there is only one covariate). Giving an argument c(r,s) where both r and s are > 0 will enter the term into the model. Since we have specified c(0,0) the term included will be

print(strange.model) Series: strange ARIMA(1,0,1) with non-zero mean Coefficients: ar1 ma1 intercept step200-MA s.e sigma^2 estimated as : log likelihood= AIC= AICc= BIC= Thus, the estimated model is

tsdiag(strange.model) Seems to be some autocorrelation left in the residuals. Try an ARMA(1,2)

strange.model2 =200)), transfer=list(c(0,0))) print(strange.model2) Series: strange ARIMA(1,0,2) with non-zero mean Coefficients: ar1 ma1 ma2 intercept step200-MA s.e sigma^2 estimated as : log likelihood= AIC= AICc= BIC= Coefficients seem to be significantly different from zero (divided by s.e. and compare with 2) Log-likelihood slightly higher.

tsdiag(strange.model2) Clear improvement!

plot(y=strange,x=seq(strange),type="l",xlab="Time") lines(y=fitted(strange.model),x=seq(strange),col="blue", lwd=2) lines(y=fitted(strange.model2),x=seq(strange),col="red", lwd=1) legend("bottomright",legend=c("original","model1","model2"),col=c("b lack","blue","red"),lty=1,lwd=c(1,2,1)) Model 2 (ARMA(1,2) is less smooth, but may follow the correlation structure better. However, this cannot be clearly seen from the plot.

The second example Seems like a series that is from the beginning stationary, but gets a linear drift (upward trend) around t = 200. Look at the part before. There are in total 400 time-points. Select the first 200.

First 200 values Looks (again) like an ARMA(1,1)

eacf(strange[1:200]) AR/MA x o o o o o o o o o o o o o 1 o o o o o o o o o o o o o o 2 x o o o o o o o o o o o o o 3 x x x o o o o o o o o o o o 4 o x x o o o o o o o o o o o 5 o x x o o o o o o o o o o o 6 x x o o o o o o o o o o o o 7 x x o o o o o o o o o o o o

The drift in level could be modelled using a linearly increasing step function A complete intervention model for the times series can therefore be

The term will be problematic to estimate. However, the following holds Hence, create a covariate that is 0 until t = 200 and then 1, 2, …, 200 and use it with transfer=list(c(0,0)) Alternatively, and more efficient is to include this variable as an ordinary explanatory variable (a regression predictor), using the argument xreg

strange_b.model <-arimax(strange_b,order=c(1,0,1), xreg=data.frame(x=c(rep(0,200),1:200))) print(strange_b.model) Call: arimax(x = strange_b, order = c(1, 0, 1), xreg = data.frame(x = c(rep(0, 200), 1:200))) Coefficients: ar1 ma1 intercept x s.e sigma^2 estimated as : log likelihood = , aic = Note! This can also be seen as a simple linear regression model with ARMA(1,1) error terms.

tsdiag(strange_b.model) Satisfactory!

Transfer-function models Consider the data set boardings referred to in Exercise data(boardings) summary(boardings) log.boardings log.price Min. :12.40 Min. : st Qu.: st Qu.:4.973 Median :12.53 Median :5.038 Mean :12.53 Mean : rd Qu.: rd Qu.:5.241 Max. :12.70 Max. :5.684 Two time-series, both with log-transformed values

plot.ts(boardings) Could the price affect the boardings?

The cross-correlation function …measures the degree of linear dependence between the two series

Sample cross-correlation function With R: ccf For the boardings data set, we can try to calculate the cross-correlation function between the two series

ccf(boardings[,1],boardings[,2],main=”boardings & price”, ylab=”CCF”) Typical look when at least one of the times series is non-stationary

Take first-order regular differences diff_boardings<-diff(boardings[,1]) diff_price<-diff(boardings[,2]) ccf(diff_boardings,diff_price,ylab=”CCF”) Still not satisfactory. Since we have monthly data, we should possibly try first- order seasonal differences as well.

diffs_boardings<-diff(diff_boardings,12) diffs_price<-diff(diff_price,12) ccf(diffs_boardings,diffs_price,ylab=”CCF”)) Better, but how do we interpret this plot? The two significant spikes for negative lags says that the difference in price depends on the difference in boardings some months earlier. The significant spike at lag 6 says that the difference in boardings depends on the difference in price some months earlier. What explains what?

A problem: Since both series would show autocorrelations, these are unevitably part of the cross-correlations we are estimating (cf. auto-correlation and partial auto-correlation). To solve this we need to “remove” the autocorrelation in the two series before we investigate the cross-correlation.  We should estimate cross-correlations between residual series from modelling with ARMA-models This procedure is known as pre-whitening Normal procedure: 1.Find a suitable ARMA model for the (differenced) series that is assumed to constitute the covariate series. 2.Fit this model to both series 3.Investigate the cross-correlations between the residual series.

Could be an ARMA(1,1,1,0) 12 or an ARMA(1,1,1,1) 12

model1=arima(diffs_price,order=c(1,0,1),seasonal=list(order= c(1,0,0),lag=12)) tsdiag(model1) Could do!

model2=arima(diffs_price,order=c(1,0,1),seasonal=list(order= c(1,0,1),lag=12)) tsdiag(model2) Ljung-Box was not possible to do here! Better!

model21=arima(diffs_boardings,order=c(1,0,1),seasonal=list(o rder=c(1,0,1),lag=12)) Applying the last model to the differenced boardings series ccf(residuals(model2),residuals(model21),ylab="CCF") Well, not that much cross- correlation left…

THE TSA package provide the command prewhiten with which prewhitening is made and the resulting CCF is plotted. The default set-up is that an AR model is fit to the covariate series (the first series specified. The AR model that minimizes AIC is chosen The model can however be specified.

prewhiten(diffs_price,diffs_boardings,x.model=model2, ylab="CCF") Should be the same as the manually developed CCF earlier

With the default settings pw=prewhiten(diffs_price,diffs_boardings,ylab="CCF") Picture is clearer? No significant cross- correlations left What AR model has been used?

print(pw) $ccf Autocorrelations of series ‘X’, by lag $model Call: ar.ols(x = x) Coefficients: Intercept: ( ) Order selected 10 sigma^2 estimated as

Check with a scatter plot Reasonable that there is no significant cross- correlation

Another example Observations of the input gas rate to a gas furnace and the percentage of carbon dioxide (CO2) in the output from the same furnace stationary?

Not that far from stationary. In that case an AR(2) would be the first choice. However, we also try first-order regular differences gasrate_diff< diff(gasrate) gasrate series

More stationary than before?

CO2 series Stationary. AR(2) ?

prewhiten(gasrate,CO2,ylab="CCF")