(c) Martin L. Puterman1 BABS 502 Regression Based Forecasting March 4, 2014.

Slides:



Advertisements
Similar presentations
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Advertisements

EPI 809/Spring Probability Distribution of Random Error.
Chapter 12 Simple Linear Regression
Time Series Analysis Autocorrelation Naive & Simple Averaging
BABS 502 Lecture 9 ARIMA Forecasting II March 23, 2009.
Moving Averages Ft(1) is average of last m observations
Chapter 13 Multiple Regression
The Multiple Regression Model Prepared by Vera Tabakova, East Carolina University.
Multiple regression analysis
Chapter 13 Additional Topics in Regression Analysis
Economics Prof. Buckles1 Time Series Data y t =  0 +  1 x t  k x tk + u t 1. Basic Analysis.
Chapter 10 Simple Regression.
Some more issues of time series analysis Time series regression with modelling of error terms In a time series regression model the error terms are tentatively.
Chapter 12 Multiple Regression
Data Sources The most sophisticated forecasting model will fail if it is applied to unreliable data Data should be reliable and accurate Data should be.
Econ Prof. Buckles1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 4. Further Issues.
1Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 4. Further Issues.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 13-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
BABS 502 Lecture 8 ARIMA Forecasting II March 16 and 21, 2011.
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
ARIMA Forecasting Lecture 7 and 8 - March 14-16, 2011
Chapter 11 Multiple Regression.
Topic 3: Regression.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
Introduction to Regression Analysis, Chapter 13,
Slides 13b: Time-Series Models; Measuring Forecast Error
Copyright ©2011 Pearson Education 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft Excel 6 th Global Edition.
Checking Regression Model Assumptions NBA 2013/14 Player Heights and Weights.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
1 BABS 502 Moving Averages, Decomposition and Exponential Smoothing Revised March 11, 2011.
12 Autocorrelation Serial Correlation exists when errors are correlated across periods -One source of serial correlation is misspecification of the model.
Objectives of Multiple Regression
(c) Martin L. Puterman1 BABS 502 Regression Based Forecasting February 28, 2011.
Inference for regression - Simple linear regression
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Regression Method.
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft.
How do Lawyers Set fees?. Learning Objectives 1.Model i.e. “Story” or question 2.Multiple regression review 3.Omitted variables (our first failure of.
Regression Examples. Gas Mileage 1993 SOURCES: Consumer Reports: The 1993 Cars - Annual Auto Issue (April 1993), Yonkers, NY: Consumers Union. PACE New.
© 2001 Prentice-Hall, Inc. Statistics for Business and Economics Simple Linear Regression Chapter 10.
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Time Series Forecasting Chapter 13.
Chap 14-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 14 Additional Topics in Regression Analysis Statistics for Business.
Managerial Economics Demand Estimation & Forecasting.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Autocorrelation in Time Series KNNL – Chapter 12.
Chapter 5 Demand Estimation Managerial Economics: Economic Tools for Today’s Decision Makers, 4/e By Paul Keat and Philip Young.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Copyright © 2011 Pearson Education, Inc. Time Series Chapter 27.
Diagnostics – Part II Using statistical tests to check to see if the assumptions we made about the model are realistic.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Multiple Regression I 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 1) Terry Dielman.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Correlation & Regression Analysis
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Statistics for Managers Using Microsoft® Excel 5th Edition
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
1 BABS 502 Moving Averages, Decomposition and Exponential Smoothing Revised March 14, 2010.
2/25/ lecture 121 STATS 330: Lecture 12. 2/25/ lecture 122 Diagnostics 4 Aim of today’s lecture To discuss diagnostics for independence.
Lecturer: Ing. Martina Hanová, PhD.. Regression analysis Regression analysis is a tool for analyzing relationships between financial variables:  Identify.
Forecast 2 Linear trend Forecast error Seasonal demand.
Forecasting. Model with indicator variables The choice of a forecasting technique depends on the components identified in the time series. The techniques.
1 Autocorrelation in Time Series data KNN Ch. 12 (pp )
Chapter 15 Multiple Regression Model Building
Correlation and Simple Linear Regression
Lecture 8 ARIMA Forecasting II
Undergraduated Econometrics
Presentation transcript:

(c) Martin L. Puterman1 BABS 502 Regression Based Forecasting March 4, 2014

(c) Martin L. Puterman2 Simple and Multiple Regression A widely used set of statistical tools that are useful for: –forecasting –data summary –adjustment for uncontrolled factors Basic idea is to fit an equation of the following form relating a dependent variable to one or more independent variables y =  0 +  1 x 1 +  2 x 2 +  3 x 3 + … It’s power is that by choosing the y and x i ’s in different ways a wide range of different effects can be taken into account. The theoretical model assumes that each observation is subject to an additive error which is normally distributed with mean zero and the same variance for every observation so that one observes the signal and noise components in aggregate. In forecasting the signal part provides the point forecast and the random part provides an accuracy measure.

(c) Martin L. Puterman3 Regression in forecasting - trend extrapolation Fit a trend to historical data –linear Y t = a + bt –quadratic Y t = a + bt + ct 2 –exponential Y t = ae bt or Log (Y t ) = a + bt Assumption is that the same trend occurred throughout the past and that it will persist into future Fit using lm or tslm in R –Quadratic fit - tslm(y~poly(trend,2,raw=TRUE)) Extensive regression theory available to guide use

(c) Martin L. Puterman4 Trend Regression – Births Data Linear TrendQuadratic Trend Cubic Trend

Cubic regression forecast of barley yields in BC (c) Martin L. Puterman 5 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.094e e < 2e-16 *** poly(trend, 3, raw = TRUE) e e ** poly(trend, 3, raw = TRUE) e e ** poly(trend, 3, raw = TRUE) e e * --- Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: on 100 degrees of freedom Multiple R-squared: ,Adjusted R-squared: F-statistic: on 3 and 100 DF, p-value: 2.096e-06

Some R commands for regression birthsts<-ts(births[,2],start=1946) b<-births{,2] t<-births[,1] t2<-t^2 plot(b) plot(t,b,type="l") lines(lm(b~t)$fit,col=2,lwd=2) lines(lm(b~t+t2)$fit,col=3,lwd=2) #residuals r<-residuals(lm(b~t)) plot(t,r) acf(r) print(acf(r)) dwtest(b~t) summary(lm(b~poly(t,3,raw=TRUE))) dwtest(b~poly(t,3,raw=TRUE)) # techniques/ plot(t,b,type="l") fit3<-lm(b~poly(t,2,raw=TRUE)) lines(t,predict(fit2),col=2) #Using ts regression commands to get fits and plots tslm(birthsts~trend) fitq<-tslm(birthsts~trend+I(trend^2)) fq<-forecast(fitq,h=5,level=95) summary(fq) plot(fq) lines(fitted(fitq),col=2) #fitting cubic trend with forecast intervals fitc<-tslm(birthsts~poly(trend,3,raw=TRUE)) fc<-forecast(fitc,h=8,level=95) plot(fc) lines(fitted(fitc),col=3) dwtest(b~poly(t,3,raw=TRUE)) (c) Martin L. Puterman 6

7 Dummy Variables Dummy Variables are independent variables in regression that assume the values of either 0 or 1. –A value 1 means a condition is present; a value 0 means it is not. –When an observation in regression corresponds to a condition being present, then the value of that observation is decreased or increased by a constant amount equal to the value of the coefficient of the dummy variable in the regression. If a condition has three possible values; say “high”, “medium” or “low”. We encode its value with two dummy variables. The first variable, High, equals 1 if the condition is “high” and zero otherwise and the second variable, Medium, equals 1 if the condition is “medium” and zero otherwise. When the condition is “low” both values are zero. The Baseline condition “low” is reflected in the constant in the regression equation. In time series regression, we use dummy variables for seasons, and use S-1 dummy variables if there are S seasons. We are free to chose the baseline season from which all others are measured.

(c) Martin L. Puterman8 Trend Regression with Seasonality My experience suggests that a quadratic trend regression plus (additive) seasonality is useful for forecasting Uses “dummy variables” for seasons Must be fit with regression software Equation with linear trend and additive monthly seasonality Y t = a + bt + dt 2 + c 2 Feb t + c 3 Mar t + … + c 12 Dec t Also enables multiple levels of seasonality such as weekly and monthly.

(c) Martin L. Puterman9 Trend Regression with Seasonality In previous Feb t, Mar t, … are dummy variables –they equal 1 if observation Y t is from the indicated month and 0 otherwise –Note that there is no dummy variable for January January is the baseline for comparison Examples: Y t = a + btObservation t in January Y t = a + bt + c 2 Observation t in February Y t = a + bt + c 3 Observation t in March In R, tslm automatically generates seasonal dummies for a ts object –tslm(y~trend+season)

(c) Martin L. Puterman10 Trend Regression With Seasonality - Example Some forecasts: Jan: F 156 (1) = *157 = Feb: F 156 (2) = * = Mar: F 156 (3) = * =

(c) Martin L. Puterman11 Regression Example: Forecast Updating During Season Goal: Improve total sales forecasts using interim sales data Data; early forecast, interim sales and total sales data for a wide range of products. Fitted Model: Total Sales = Interim Sales +.3 Early Forecast Example: Early Forecast of Total Sales = 3000; Interim Sales =1400 Revised Total Sales Forecast Total Sales = * *3000 = 1860 Forecast Standard Deviation is Regression RMSE

(c) Martin L. Puterman12 Regression Example: Impact of Advertising Goal: Take into account effect of advertising expenditures on sales Data; Sales and advertising expenditures in previous quarter Fitted Model: Sales t = Quarter t +.8 Sales t (Advertising t-1 ) 1/2 Example: Sales in last quarter = 2000 and Advertising in previous quarter = 10,000 Total Sales Forecast Sales = * *100 =1655 Forecast Standard Deviation is Regression RMSE

(c) Martin L. Puterman13 Some special concerns when using regression with time series data Often the usual regression assumption of uncorrelated errors is violated –This means that the residuals contain information. Case A: This is usually due to model mis-specification; i.e. omission of important variables Case B: But sometimes we have what we think is a good model and there is nothing obvious to add. Difficulty – Standard errors are underestimated so model seems better than it really is. –Concept: Since observations are not independent, there is less information in the data than you would think –Reject H o : β j = 0 when we shouldn’t. Detection –Some systematic pattern in residual plot vs. time –Durbin-Watson Test (see next slide). –(Best approach) ACF of residuals

(c) Martin L. Puterman14 Durbin-Watson test; comments The Durbin-Watson test is a not so good alternative to using the ACF of the residuals but it is widely used probably because of historical reasons. It is based on the Durbin-Watson test statistic D. It tests only for first order autocorrelation in the errors. –Formally it tests H 0 :  =0 vs. H a :   0 –The test is reject H 0 and conclude that there is autocorrelation in the residuals if D is well below 2 or well above 2; I suggest being imprecise here. I would worry about values less than 1.4 or greater than 2.6. In economic data, when  is not zero, it is usually positive.

(c) Martin L. Puterman15 Regression with (auto)-correlated residuals Approaches for obtaining more reliable estimates; –Add variables, such as trend squared, or use the lagged dependent variable as an explanatory variable. (See sales and advertising example on previous slide; Sales t-1 is a lagged variable.) –Use time series regression models – which except for a special case (AR1 errors) requires advanced software such as SAS or R. Package Orcutt in R obtains estimates for models with AR(1) errors See Hyndman text section 9.1 for more on this. –Difference data if lag one autocorrelation is large (and positive) and software such as that above is not availabl e.

(c) Martin L. Puterman16 Regression with auto-correlated errors Model y t = β 0 + β 1 x 1t + + β m x mt + ε t where ε t = ρ ε t-1 + ν t and ν t ~ N(0, σ 2 ) and independent The quantity ρ is called the first order auto- correlation or serial correlation parameter and is between -1 and +1. The Corchrane-Orcutt procedure, which is in the R package “Orcutt”, estimates the regression coefficients and ρ for this model. The help manual describes the algorithm in some detail. Note that usually the regression coefficients will not change much from ordinary regression but their standard errors will be larger.

(c) Martin L. Puterman17 Example – BC Incorporations Trend Regression Regression Equation Section RegressionStandardT-ValueRejectPower IndependentCoefficientErrorto test ProbH0 atof Test Variableb(i)Sb(i)H0:B(i)=0Level5%?at 5% Intercept Yes trend Yes Serial Correlation of Residuals Section SerialSerialSerial LagCorrelationLagCorrelationLagCorrelation Above serial correlations significant if their absolute values are greater than Durbin-Watson Test For Serial Correlation Did the Test Reject ParameterValueH0: Rho(1) = 0? Durbin-Watson Value Prob. Level: Positive Serial Correlation0.0000Yes Prob. Level: Negative Serial Correlation1.0000No

(c) Martin L. Puterman18 Same data using serial correlation routine Run Summary Section ParameterValueParameterValue Dependent VariableBCRows Processed17 Number Ind. Variables1Rows Filtered Out0 Weight VariableNoneRows with X's Missing0 R Rows with Weight Missing0 Adj R Rows with Y Missing0 Coefficient of Variation0.4879Rows Used in Estimation17 Mean Square Error Sum of Weights Square Root of MSE Completion StatusNormal Completion Ave Abs Pct Error17.034Autocorrelation (Rho) Regression Equation Section RegressionStandardT-ValueReject IndependentCoefficientErrorto test ProbH0 at Variableb(i)Sb(i)H0:B(i)=0Level5%? Intercept No trend No

(c) Martin L. Puterman19 Same data but adding extra variables In above – dummy = 1 if year for year > 2001 and dummmyXyear allows for a shift in trends. Note there is still some autocorrelation present, lag 1 serial autocorrelation equals.32 (which is insignificant) and the Durbin-Watson Test is significant but much less so than without extra variables. The purpose of this example was to show that autocorrelation can result from the omission of independent variables. Regression Equation Section RegressionStandardT-ValueRejectPower IndependentCoefficientErrorto test ProbH0 atof Test Variableb(i)Sb(i)H0:B(i)=0Level5%?at 5% Intercept Yes dummy Yes dummyXyear Yes year No0.0646

(c) Martin L. Puterman20 What if seasonality is multiplicative and we want to use regression? Problem; Model on nominal scale assumes additive effect of seasonal dummy variables. Solution: Do regression on the logarithmic scale. This means that we transform the dependent variable by taking logarithms (base 10 or base e) and then do regression. Why does this work? Multiplicative seasonality is additive on the log scale! Thus we can do forecasts using the model on the log scale and then transform back to the original scale by exponentiating the forecast on the log scale. –Example: If forecast on Log 10 scale is 3.4, then forecast on the nominal (original) scale is = units. But predictions based on these transformations are often biased. Alternative ad hoc approach: Deseasonalize data; fit model to deseasonalized data and then multiply back by seasonal factors to get forecasts. This is how time series decomposition works. Also trends and dummy’s on the log-scale have nice interpretations. Consider the model for 4 seasons log 10 (y t ) = t +.19 Season2 t -.13 Season3 t +.04 Season4 t Then the value of the series is increasing 1.4% per period. The value in Season2 is about 19% above the what the trend alone predicts for that season.