FORECASTING WITH REGRESSION MODELS TREND ANALYSIS BUSINESS FORECASTING Prof. Dr. Burç Ülengin ITU MANAGEMENT ENGINEERING FACULTY FALL 2011.

FORECASTING WITH REGRESSION MODELS TREND ANALYSIS BUSINESS FORECASTING Prof. Dr. Burç Ülengin ITU MANAGEMENT ENGINEERING FACULTY FALL 2011

OVERVIEW  The bivarite regression model  Data inspection  Regression forecast process  Forecasting with simple linear trend  Causal regression model  Statistical evaluation of regression model  Examples...

The Bivariate Regression Model  The bivariate regression model is also known a simple regression model  It is a statistical tool that estimates the relationship between a dependent variable(Y) and a single independent variable(X).  The dependent variable is a variable which we want to forecast

The Bivariate Regression Model General form Dependent variable Independent variable Specific form: Linear Regression Model Random disturbance

The Bivariate Regression Model The regression model is indeed a line equation  1 = slope coefficient that tell us the rate of change in Y per unit change in X If  1 = 5, it means that one unit increase in X causes 5 unit increase in Y  is random disturbance, which causes for given X, Y can take different values Objective is to estimate  0 and  1 such a way that the fitted values should be as close as possible

The Bivariate Regression Model Geometrical Representation          X Y Poor fit Good fit The red line is more close the data points than the blue one

Best Fit Estimates population sample

Best Fit Estimates-OLS

Misleading Best Fits                                     X Y X Y X Y X Y  e 2 = 100

THE CLASSICAL ASSUMPTIONS 1.The regression model is linear in the coefficients, correctly specified, & has an additive error term. 2.E(  ) = 0. 3.All explanatory variables are uncorrelated with the error term. 4.Errors corresponding to different observations are uncorrelated with each other. 5.The error term has a constant variance. 6.No explanatory variable is an exact linear function of any other explanatory variable(s). 7.The error term is normally distributed such that:

Regression Forecasting Process  Data consideration: plot the graph of each variable over time and scatter plot. Look at  Trend  Seasonal fluctuation  Outliers  To forecast Y we need the forecasted value of X  Reserve a holdout period for evaluation and test the estimated equation in the holdout period

An Example: Retail Car Sales  The main explanatory variables:  Income  Price of a car  Interest rates- credit usage  General price level  Population  Car park-number of cars sold up to time-replacement purchases  Expectation about future  For simple-bivariate regression, income is chosen as an explanatory variable

Bi-variate Regression Model  Population regression model  Our expectation is  1 >0  But, we have no all available data at hand, the data set only covers the 1990s.  We have to estimate model over the sample period  Sample regression model is

Retail Car Sales and Disposable Personal Income Figures Quarterly car sales 000 cars Disposable income $

OLS Estimate Dependent Variable: RCS Method: Least Squares Sample: 1990:1 1998:4 Included observations: 36 VariableCoefficientStd. Errort-StatisticProb. C541010.9746347.90.7248780.4735 DPI62.3942840.007931.5595480.1281 R-squared0.066759 Mean dependent var1704222. Adjusted R-squared0.039311 S.D. dependent var164399.9 S.E. of regression161136.1 Akaike info criterion26.87184 Sum squared resid8.83E+11 Schwarz criterion26.95981 Log likelihood-481.6931 F-statistic2.432189 Durbin-Watson stat1.596908 Prob(F-statistic)0.128128

Basic Statistical Evaluation   1 is the slope coefficient that tell us the rate of change in Y per unit change in X  When the DPI increases one $, the number of cars sold increases 62.  Hypothesis test related with  1  H 0 :  1 =0  H 1 :  1  0  t test is used to test the validity of H 0  t =  1 /se(  1 ) If t statistic > t table Reject H 0 or Pr <  (exp.  =0.05) Reject H 0 If t statistic  Do not reject H 0 t= 1,56 0.05 Do not Reject H 0 DPI has no effect on RCS

Basic Statistical Evaluation  R 2 is the coefficient of determination that tells us the fraction of the variation in Y explained by X  0<R 2 <1,  R 2 = 0 indicates no explanatory power of X-the equation.  R 2 = 1 indicates perfect explanation of Y by X-the equation.  R 2 = 0.066 indicates very weak explanation power  Hypothesis test related with R 2  H 0 : R 2 =0  H 1 : R 2  0  F test check the hypothesis If F statistic > F table Reject H 0 or Pr <  (exp.  =0.05) Reject H 0 If F statistic  Do not reject H 0 F-statistic=2.43 0.05 Do not reject H 0 Estimated equation has no power to explain RCS figures

Graphical Evaluation of Fit and Error Terms Residuls show clear seasonal pattern

Model Improvement  When we look the graph of the series, the RCS exhibits clear seasonal fluctuations, but PDI does not.  Remove seasonality using seasonal adjustment method.  Then, use seasonally adjusted RCS as a dependent variable.

Seasonal Adjustment  Sample: 1990:1 1998:4  Included observations: 36  Ratio to Moving Average  Original Series: RCS  Adjusted Series: RCSSA  Scaling Factors: 1 0.941503 2 1.119916 3 1.016419 4 0.933083

Seasonally Adjusted RCS and RCS

OLS Estimate Dependent Variable: RCSSA Method: Least Squares Sample: 1990:1 1998:4 Included observations: 36 VariableCoefficientStd. Errort-StatisticProb. C481394.3464812.81.0356740.3077 DPI65.3655924.916262.6234110.0129 R-squared0.168344 Mean dependent var1700000. Adjusted R-squared0.143883 S.D. dependent var108458.4 S.E. of regression100352.8 Akaike info criterion25.92472 Sum squared resid3.42E+11 Schwarz criterion26.01270 Log likelihood-464.6450 F-statistic6.882286 Durbin-Watson stat0.693102 Prob(F-statistic)0.012939

Basic Statistical Evaluation   1 is the slope coefficient that tell us the rate of change in Y per unit change in X  When the DPI increases one $, the number of cars sold increases 65.  Hypothesis test related with  1  H 0 :  1 =0  H 1 :  1  0  t test is used to test the validity of H 0  t =  1 /se(  1 ) If t statistic > t table Reject H 0 or Pr <  (exp.  =0.05) Reject H 0 If t statistic  Do not reject H 0 t= 2,62 < t table or Pr = 0.012 < 0.05 Reject H 0 DPI has statistically significant effect on RCS

Basic Statistical Evaluation  R 2 is the coefficient of determination that tells us the fraction of the variation in Y explained by X  0<R 2 <1,  R 2 = 0 indicates no explanatory power of X-the equation.  R 2 = 1 indicates perfect explanation of Y by X-the equation.  R 2 = 0.1683 indicates very weak explanation power  Hypothesis test related with R 2  H 0 : R 2 =0  H 1 : R 2  0  F test check the hypothesis If F statistic > F table Reject H 0 or Pr <  (exp.  =0.05) Reject H 0 If F statistic  Do not reject H 0 F-statistic = 6.88 < F table or Pr = 0.012 < 0.05 Reject H 0 Estimated equation has some power to explain RCS figures

Graphical Evaluation of Fit and Error Terms No seasonality but it still does not look random disturbance Omitted Variable? Business Cycle?

Trend Models

Simple Regression Model Special Case: Trend Model Independent variable Time, t = 1, 2, 3,...., T-1, T There is no need to forecast the independent variable Using simple transformations, variety of nonlinear trend equations can be estimated, therefore the estimated model can mimic the pattern of the data

Suitable Data Pattern NO SEASONALITY ADDITIVE SEASONALITY MULTIPLICTIVE SEASONALITY NO TREND ADDITIVE TREND MULTIPLICATIVE TREND

Chapter 3 Exercise 13 College Tuition Consumers' Price Index by Quarter Holdout period

OLS Estimates Dependent Variable: FEE Method: Least Squares Sample: 1986:1 1994:4 Included observations: 36 VariableCoefficientStd. Errort-StatisticProb. C115.73121.98216658.386240.0000 @TREND3.8375800.09739939.400800.0000 R-squared0.978568 Mean dependent var182.8889 Adjusted R-squared0.977938 S.D. dependent var40.87177 S.E. of regression6.070829 Akaike info criterion6.498820 Sum squared resid1253.069 Schwarz criterion6.586793 Log likelihood -114.9788 F-statistic1552.423 Durbin-Watson stat0.284362 Prob(F-statistic)0.000000 e2e2

Basic Statistical Evaluation   1 is the slope coefficient that tell us the rate of change in Y per unit change in X  Each year tuition increases 3.83 points.  Hypothesis test related with  1  H 0 :  1 =0  H 1 :  1  0  t test is used to test the validity of H 0  t =  1 /se(  1 ) If t statistic > t table Reject H 0 or Pr <  (exp.  =0.05) Reject H 0 If t statistic  Do not reject H 0 t= 39,4 > t table or Pr = 0.0000 < 0.05 Reject H 0

Basic Statistical Evaluation  R 2 is the coefficient of determination that tells us the fraction of the variation in Y explained by X  0<R 2 <1,  R 2 = 0 indicates no explanatory power of X-the equation.  R 2 = 1 indicates perfect explanation of Y by X-the equation.  R 2 = 0.9785 indicates very weak explanation power  Hypothesis test related with R 2  H 0 : R 2 =0  H 1 : R 2  0  F test check the hypothesis If F statistic > F table Reject H 0 or Pr <  (exp.  =0.05) Reject H 0 If F statistic  Do not reject H 0 F-statistic= 1552 < F table or Pr = 0.0000 < 0.05 Reject H 0 Estimated equation has explanatory power

Graphical Evaluation of Fit Holdout period ACTUAL FORECAST 1995 Q1 260.00 253.88 1995 Q2 259.00 257.72 1995 Q3 266.00 261.55 1995 Q4 274.00 265.39

Graphical Evaluation of Fit and Error Terms Residuals exhibit clear pattern, they are not random Also the seasonal fluctuations can not be modelled Regression model is misspecified

Model Improvement  Data may exhibit exponential trend  In this case, take the logarithm of the dependent variable  Calculate the trend by OLS  After OLS estimation forecast the holdout period  Take exponential of the logarithmic forecasted values in order to reach original units

Original and Logarithmic Transformed Data LOG(FEE) FEE 4.844187 127.000 4.867534 130.000 4.912655 136.000 4.919981 137.000 4.941642 140.000 4.976734 145.000 4.983607 146.000

OLS Estimate of the Logrithmin Trend Model Dependent Variable: LFEE Method: Least Squares Sample: 1986:1 1994:4 Included observations: 36 VariableCoefficientStd. Errort-StatisticProb. C4.8167080.005806829.56350.0000 @TREND0.0210340.00028573.722770.0000 R-squared0.993783 Mean dependent var5.184797 Adjusted R-squared0.993600 S.D. dependent var0.222295 S.E. of regression0.017783 Akaike info criterion-5.167178 Sum squared resid0.010752 Schwarz criterion-5.079205 Log likelihood95.00921 F-statistic5435.047 Durbin-Watson stat0.893477 Prob(F-statistic)0.000000

Forecast Calculations obs FEE LFEEF FEELF=exp(LFEEF) 1993:1 228.0000 5.405651 222.6610 1993:2 228.0000 5.426684 227.3940 1993:3 235.0000 5.447718 232.2276 1993:4 243.0000 5.468751 237.1639 1994:1 244.0000 5.489785 242.2052 1994:2 245.0000 5.510819 247.3536 1994:3 251.0000 5.531852 252.6114 1994:4 259.0000 5.552886 257.9810 1995:1 260.0000 5.573920 263.4648 1995:2 259.0000 5.594953 269.0651 1995:3 266.0000 5.615987 274.7845 1995:4 274.0000 5.637021 280.6254

Graphical Evaluation of Fit and Error Terms Residuals exhibit clear pattern, they are not random Also the seasonal fluctuations can not be modelled Regression model is misspecified

Model Improvement  In order to deal with seasonal variations remove seasonal pattern from the data  Fit regression model to seasonally adjusted data  Generate forecasts  Add seasonal movements to the forecasted values

Multiplicative Seasonal Adjustment  Included observations: 40  Ratio to Moving Average  Original Series: FEE  Adjusted Series: FEESA  Scaling Factors: 1 1.002372 2 0.985197 3 0.996746 4 1.015929

Original and Seasonally Adjusted Data

OLS Estimate of the Seasonally Adjusted Trend Model Dependent Variable: FEESA Method: Least Squares Sample: 1986:1 1995:4 Included observations: 40 VariableCoefficientStd. Errort-StatisticProb. C115.03871.72763266.587490.0000 @TREND3.8974880.07624051.121520.0000 R-squared0.985668 Mean dependent var191.0397 Adjusted R-squared0.985291 S.D. dependent var45.89346 S.E. of regression5.566018 Akaike info criterion6.319943 Sum squared resid1177.261 Schwarz criterion6.404387 Log likelihood-124.3989 F-statistic2613.410 Durbin-Watson stat0.055041 Prob(F-statistic)0.000000

Graphical Evaluation of Fit and Error Terms Residuals exhibit clear pattern, they are not random There is no seasonal fluctuations Regression model is misspecified

Model Improvement  Take the logarithm in order to remove existing nonlinearity  Use additive seasonal adjustment to logarithmic data  Apply OLS to seasonally adjusted logrithmic data  Forecast holdout period  Add seasonal movements to reach seasonal forecasts  Take an exponential in order to reach original seasonal forecasts

Logarithmic Transformation and Additive Seasonal Adjustment Sample: 1986:1 1995:4 Included observations: 40 Difference from Moving Average Original Series: LFEE=log(FEE) Adjusted Series: LFEESA Scaling Factors: 1 0.002216 2-0.014944 3-0.003099 4 0.015828

Original and Logarithmic Additive Seasonally Adjustment Series

OLS Estimate of the Logarithmic Additive Seasonally Adjustment Data Dependent Variable: LFEESA Method: Least Squares Sample: 1986:1 1995:4 Included observations: 40 VariableCoefficientStd. Errort-StatisticProb. C4.8221220.0047611012.7790.0000 @TREND0.0206180.00021098.127600.0000 R-squared0.996069 Mean dependent var5.224171 Adjusted R-squared0.995966 S.D. dependent var0.241508 S.E. of regression0.015340 Akaike info criterion-5.468039 Sum squared resid0.008942 Schwarz criterion-5.383595 Log likelihood111.3608 F-statistic9629.026 Durbin-Watson stat0.149558 Prob(F-statistic)0.000000

Graphical Evaluation of Fit and Error Terms Residuals exhibit clear pattern, they are not random There is no seasonal fluctuations Regression model is misspecified

Autoregressive Model  Some cases the growth model may be more suitable to the data  If data exhibits the nonlinearity, the autoregressive model can be adjusted to model exponential pattern

OLS Estimate of Autoregressive Model Dependent Variable: FEE Method: Least Squares Sample(adjusted): 1986:2 1995:4 Included observations: 39 after adjusting endpoints VariableCoefficientStd. Errort-StatisticProb. C0.7394902.3056540.3207290.7502 FEE(-1)1.0160350.01188485.497180.0000 R-squared0.994964 Mean dependent var192.7179 Adjusted R-squared0.994828 S.D. dependent var45.45787 S.E. of regression3.269285 Akaike info criterion5.256940 Sum squared resid395.4643 Schwarz criterion5.342251 Log likelihood-100.5103 F-statistic7309.767 Durbin-Watson stat1.888939 Prob(F-statistic)0.000000

Graphical Evaluation of Fit and Error Terms Clear seasonal pattern Model is misspecified

Model Improvement  To remove seasonal fluctuations  Seasonally adjust the data  Apply OLS to Autoregressive Trend Model  Forecast seasonally adjusted data  Add seasonal movement to forecasted values

Dependent Variable: FEESA Method: Least Squares Sample(adjusted): 1986:2 1995:4 Included observations: 39 after adjusting endpoints VariableCoefficientStd. Errort-StatisticProb. C1.1253150.8114811.3867430.1738 FEESA(-1)1.0134450.004181242.40270.0000 R-squared0.999371 Mean dependent var192.6894 Adjusted R-squared0.999354 S.D. dependent var45.27587 S.E. of regression1.151024 Akaike info criterion3.169101 Sum squared resid49.01968 Schwarz criterion3.254412 Log likelihood-59.79748 F-statistic58759.08 Durbin-Watson stat1.335932 Prob(F-statistic)0.000000 OLS Estimate of Seasonally Adjusted Autoregressive Model

Graphical Evaluation of Fit and Error Terms No seasonal pattern in the residuals Model specification seems more corret than the previous estimates

Seasonal Autoregressive Model  If data exhibits sesonal fluctutions, the growth model should be remodeled  If data exhibits the nonlinearity and sesonality together, the seasonal autoregressive model can be adjusted to model exponential pattern

New Product Forecasting Growth Curve Fitting  For new products, the main problem is typically lack of historical data.  Trend or Seasonal pattern can not be determined.  Forecasters can use a number of models that generally fall in the category called Diffusion Models.  These models are alternatively called S-curves, growth models, saturation models, or substitution curves.  These models imitate life cycle of poducts. Life cycles follows a common pattern:  A period of slow growth just after introduction of new product  A period of rapid growth  Slowing growth in a mature phase  Decline

New Product Forecasting Growth Curve Fitting  Growth models has its own lower and upper limit.  A significant benefit of using diffusion models is to identfy and predict the timing of the four phases of the life cycle.  The usual reason for the transition from very slow initial growth to rapid growth is often the result of solutions to technical difficulties and the market’s acceptance of the new product / technology.  There are uper limits and a maturity phase occurs in which growth slows and finally ceases.

GOMPERTZ CURVE  Gompertz function is given as where L = Upper limit of Y e = Natural number = 2.718262..... a and b = coefficients describing the curve  The Gompertz curve will range in value from zero to L as t varies from zero to infinity.  Gompertz curve is a way to summarize the growth with a few parameters.

GOMPERTZ CURVE An Example HDTV: LCD and Plazma TV sales figures YEAR HDTV 2000 1200 2001 1500 2002 1770 2003 3350 2004 5500 2005 9700 2006 15000

GOMPERTZ CURVE An Example

LOGISTICS CURVE  Logistic function is given as where L = Upper limit of Y e = Natural number = 2.718262..... a and b = coefficients describing the curve  The Logistic curve will range in value from zero to L as t varies from zero to infinity.  The Logistic curve is symetric about its point of inflection. The Gompertz curve is not necessarily symmetric.

LOGISTICS or GOMPERTZ CURVES ?  The answer lies in whether, in a particular situation, it is easier to achieve the maximum value the closer you get to it, or whether it becomes more difficult to attain the maximum value the closer you get to it.  Are there factors assisting the attainment of the maximum value once you get close to it, or  Are there factors preventing the attainment of the maximum value once it is nearly attained?  If there is an offsetting factor such that growth is more difficult to maintain as the maximum is approached, then the Gompertz curve will be the best choice.  If there are no such offsetting factors hindering than attainment of the maximum value, the logistics curve will be the best choice.

LOGISTICS CURVE An Example HDTV: LCD and Plazma TV sales figures YEAR HDTV 2000 1200 2001 1500 2002 1770 2003 3350 2004 5500 2005 9700 2006 15000

LOGISTICS versus GOMPERTZ CURVES

FORECASTING WITH MULTIPLE REGRESSION MODELS BUSINESS FORECASTING

CONTENT  DEFINITION  INDEPENDENT VARIABLE SELECTION,FORECASTING WITH MULTIPLE REGRESSION MODEL  STATISTICAL EVALUATION OF THE MODEL  SERIAL CORRELATION  SEASONALITY TREATMENT  GENERAL AUTOREGRESSIVE MODEL  ADVICES  EXAMPLES....

MULTIPLE REGRESSION MODEL  DEPENDENT VARIABLE, Y, IS A FUNCTION OF MORE THAN ONE INDEPENDENT VARIABLE, X 1, X 2,..X k

SELECTING INDEPENDENT VARIABLES  FIRST, DETERMINE DEPENDENT VARIABLE  SEARCH LITERATURE, USE COMMONSENSE AND LIST THE MAIN POTENTIAL EXPLANATORY VARIABLES  IF TWO VARIABLE SHARE THE SAME INFORMATION SUCH AS GDP AND GNP SELECT THE MOST RELEVANT ONE  IF A VARITION OF A VARIABLE IS VERY LITTLE, FIND OUT MORE VARIABLE ONE  SET THE EXPECTED SIGNS OF THE PARAMETERS TO BE ESTIMATED

AN EXAMPLE: SELECTING INDEPENDENT VARIABLES  LIQUID PETROLIUM GAS-LPG- MARKET SIZE FORECAST  POTENTIAL EXPLANATORY VARIABLES  POPULATION  PRICE  URBANIZATION RATIO  GNP or GDP  EXPECTATIONS

PARAMETER ESTIMATES-OLS ESTIMATION IT IS VERY COMPLEX TO CALCULATE b’s, MATRIX ALGEBRA IS USED TO ESTIMATE b’s.

FORECASTING WITH MULTIPLE REGRESSION MODEL  Ln(SALES t ) = 23 + 1.24*Ln(GDP t ) - 0.90*Ln(PRICE t )  IF GDP INCREASES 1%, SALES INCRESES 1.24%  IF PRICE INCREASES 1% SALES DECRAESES 0.9%  PERIODGDPPRICESALES 1001245 100 230 1011300 103 ?  Ln(SALES t ) = 23 + 1.24*Ln(1300) - 0.90*Ln(103)  Ln(SALES t ) = 3.63  e 3.63 = 235

EXAMPLE : LPG FORECAST

LOGARITHMIC TRANSFORMATION

SCATTER DIAGRAM UNEXPECTED RELATION

LSATA=f(LGNP) Dependent Variable: LSATA Method: Least Squares Sample: 1968 1997 Included observations: 30 VariableCoefficientStd. Errort-StatisticProb. C-44.911503.097045-14.501400.0000 LGNP4.0819380.22026518.531950.0000 R-squared0.924616 Mean dependent var12.47858 Adjusted R-squared0.921924 S.D. dependent var0.736099 S.E. of regression0.205681 Akaike info criterion-0.260637 Sum squared resid1.184535 Schwarz criterion-0.167224 Log likelihood5.909555 F-statistic343.4333 Durbin-Watson stat0.485414 Prob(F-statistic)0.000000

Graphical Evaluation of Fit and Error Terms NOT RANDOM

LSATA=f(LP) Dependent Variable: LSATA Method: Least Squares Sample(adjusted): 1969 1997 Included observations: 29 after adjusting endpoints VariableCoefficientStd. Errort-StatisticProb. C11.707260.081886142.96940.0000 LP0.1901280.01509612.594920.0000 R-squared0.854551 Mean dependent var12.53724 Adjusted R-squared0.849164 S.D. dependent var0.674006 S.E. of regression0.261768 Akaike info criterion0.223756 Sum squared resid1.850107 Schwarz criterion0.318052 Log likelihood-1.244459 F-statistic158.6319 Durbin-Watson stat0.187322 Prob(F-statistic)0.000000

LSATA=f(LGNP,LP) Dependent Variable: LSATA Method: Least Squares Sample(adjusted): 1969 1997 Included observations: 29 after adjusting endpoints VariableCoefficientStd. Errort-StatisticProb. C-30.8084107.715902-3.9928460.0005 LGNP 3.0666550.5565335.5102840.0000 LP 0.0453180.0282811.6024360.1211 R-squared0.932905 Mean dependent var12.53724 Adjusted R-squared0.927744 S.D. dependent var0.674006 S.E. of regression0.181176 Akaike info criterion-0.480999 Sum squared resid0.853443 Schwarz criterion-0.339555 Log likelihood9.974488 F-statistic180.7558 Durbin-Watson stat0.364799 Prob(F-statistic)0.000000

WHAT IS MISSING?  GNP AND PRICE ARE THE MOST IMPORTANT VARIABLES BUT THE COEFFICIENT OF THE PRICE IS NOT SIGNIFICANT AND HAS UNEXPECTED SIGN  RESIDUAL DISTRIBUTION IS NOT RANDOM  WHAT IS MISSING?  WRONG FUNCTION-NONLINEAR MODEL?  LACK OF DYNAMIC MODELLING?  MISSING IMPORTANT VARIABLE? POPULATION?

Dependent Variable: LSATA Method: Least Squares Sample(adjusted): 1969 1997 Included observations: 29 after adjusting endpoints VariableCoefficientStd. Errort-StatisticProb. C-50.9134203.992134-12.753430.0000 LGNP 0.7554450.3378942.2357460.0345 LP -0.1315080.021528-6.1085680.0000 LPOP 4.9559450.48688710.178850.0000 R-squared0.986958 Mean dependent var2.53724 Adjusted R-squared0.985393 S.D. dependent var0.674006 S.E. of regression0.081461 Akaike info criterion-2.049934 Sum squared resid0.165899 Schwarz criterion-1.861342 Log likelihood33.72405 F-statistic630.6084 Durbin-Watson stat0.398661 Prob(F-statistic)0.000000 LSATA=f(LGNP,LP,LPOP)

WHAT IS MISSING?  GNP, POPULATION AND PRICE ARE THE MOST IMPORTANT VARIABLES. THEY ARE SIGNIFICANT THEY HAVE EXPECTED SIGN  RESIDUAL DISTRIBUTION IS NOT RANDOM  WHAT IS MISSING? WRONG FUNCTION-NONLINEAR MODEL? LACK OF DYNAMIC MODELLING?  YES. MISSING IMPORTANT VARIABLE?  YES, URBANIZATION

Dependent Variable: LSATA Method: Least Squares Sample(adjusted): 1969 1997 Included observations: 29 after adjusting endpoints VariableCoefficientStd. Errort-StatisticProb. C-16.1859103.832897-4.2228930.0003 LGNP 0.5236570.1509713.4685850.0020 LP -0.0339640.013483-2.5189340.0188 LPOP 1.2797530.4195663.0501820.0055 LSATA(-1) 0.6199860.06075610.204460.0000 R-squared0.997557 Mean dependent var2.53724 Adjusted R-squared0.997150 S.D. dependent var0.674006 S.E. of regression0.035983 Akaike info criterion-3.655968 Sum squared resid0.031074 Schwarz criterion-3.420227 Log likelihood58.01154 F-statistic2450.048 Durbin-Watson stat2.118752 Prob(F-statistic)0.000000 LSATA=f(LGNP,LP,LPOP,LSATA t-1 )

Graphical Evaluation of Fit and Error Terms RANDOM

Basic Statistical Evaluation   1 is the slope coefficient that tell us the rate of change in Y per unit change in X  When the GNP increases 1%, the volume of LPG sales increases 0.52%.  Hypothesis test related with  1 H 0 :  1 =0 H 1 :  1  0 t test is used to test the validity of H 0 t =  1 /se(  1 )  If t statistic > t table Reject H 0 or Pr <  (exp.  =0.05) Reject H 0  If t statistic  Do not reject H 0  t= 3,46 < t table or Pr = 0.002 < 0.05 Reject H 0  GNP has effect on RCS

Basic Statistical Evaluation  R 2 is the coefficient of determination that tells us the fraction of the variation in Y explained by X  0<R 2 <1, R 2 = 0 indicates no explanatory power of X-the equation. R 2 = 1 indicates perfect explanation of Y by X-the equation. R 2 = 0.9975 indicates very strong explanation power  Hypothesis test related with R 2 H 0 : R 2 =0 H 1 : R 2  0 F test check the hypothesis  If F statistic > F table Reject H 0 or Pr <  (exp.  =0.05) Reject H 0  If F statistic  Do not reject H 0  F-statistic=2450 < F table or Pr = 0.0000 < 0.05 Reject H 0  Estimated equation has power to explain RCS figures

SHORT AND LONG TERM IMPACTS  If we specify a dynamic model, we can estimate short and a long term impact of independent variables simultaneously on the dependent variable Short term effect of x Long term effect of x

AN EXAMPLE: SHORT AND LONG TERM IMPACTS Short Term ImpactLong Term Impact LGNP 0.523657 1.3778 LP -0.033964 -0.0892 LPOP 1.279753 3.3657  If GNP INCREASES 1% AT TIME t, THE LPG SALES INCREASES 0.52% AT TIME t  IN THE LONG RUN, WITHIN 3-5 YEARS, THE LPG SALES INCREASES 1.38%

SESONALITY AND MULTIPLE REGRESSION MODEL  SEASONAL DUMMY VARIABLES CAN BE USED TO MODEL SEASONAL PATTERNS  DUMMY VARIABLE IS A BINARY VARIABLE THAT ONLY TAKES THE VALUES 0 AND 1.  DUMMY VARIABLES RE THE INDICATOR VARIABLES, IF THE DUMMY VARIABLE TAKES 1 IN A GIVEN TIME, IT MEANS THAT SOMETHING HAPPENS IN THAT PERIOD.

SEASONAL DUMMY VARIABLES  THE SOMETHING CAN BE SPECIFIC SEASON  THE DUMMY VARIABLE INDICATES THE SPECIFIC SEASON  D1 IS A DUMMY VARIABLE WHICH INDICATES THE FIRST QUARTERS »1990Q11 »1990Q20 »1990Q30 »1990Q40 »1991Q11 »1991Q20 »1991Q30 »1991Q40 »1992Q11 »1992Q20 »1992Q30 »1992Q40

BASE PERIOD DATE D1D2D3  1990 Q1 1 0 0  1990 Q2 0 1 0  1990 Q3 0 0 1  1990 Q4 0 0 0  1990 Q1 1 0 0  1991 Q2 0 1 0  1991 Q3 0 0 1  1991 Q4 0 0 0  1992 Q1 1 0 0  1992 Q2 0 1 0  1992 Q3 0 0 1  1992 Q4 0 0 0 FULL SEASONAL DUMMY VARIABLE REPRESANTATION

COLLEGE TUITION CONSUMERS' PRICE INDEX BY QUARTER

 QUARTERLY DATA THEREFORE 3 DUMMY VARIABLES WILL BE SUFFICIENT TO CAPTURE THE SEASONAL PATTERN DATE D1D2D3  1990 Q1 1 0 0  1990 Q2 0 1 0  1990 Q3 0 0 1  1990 Q4 0 0 0

SEASONAL PATTERN MODELLED COLLEGE TUITION PRICE INDEX TREND ESTIMATION Dependent Variable: LOG(FEE) Method: Least Squares Sample(adjusted): 1986:3 1995:4 Included observations: 38 after adjusting endpoints VariableCoefficientStd. Errort-StatisticProb. C 4.8323350.006948695.47710.0000 @TREND 0.0207800.00023289.571050.0000 D1-0.0112590.007202-1.5633440.1275 D1(-1)-0.0295260.007198-4.1019480.0003 D1(-2)-0.0170820.007010-2.4368060.0204 R-squared0.995921 Mean dependent var5.244170 Adjusted R-squared0.995427 S.D. dependent var0.231661 S.E. of regression0.015666 Akaike info criterion-5.352558 Sum squared resid0.008099 Schwarz criterion-5.137087 Log likelihood106.6986 F-statistic2014.429 Durbin-Watson stat0.161634 Prob(F-statistic)0.000000

COLLEGE TUITION PRICE INDEX AUTOREGRESSIVE TREND ESTIMATION Dependent Variable: LOG(FEE) Method: Least Squares Sample(adjusted): 1986:3 1995:4 Included observations: 38 after adjusting endpoints VariableCoefficientStd. Errort-StatisticProb. C0.0508870.0229692.2155240.0337 LOG(FEE(-1))0.9975100.004375227.99580.0000 D1-0.0316340.002833-11.167040.0000 D1(-1)-0.0353350.002833-12.473010.0000 D1(-2)-0.0067750.002761-2.4541990.0196 R-squared0.999368 Mean dependent var5.244170 Adjusted R-squared0.999292 S.D. dependent var0.231661 S.E. of regression0.006165 Akaike info criterion-7.217678 Sum squared resid0.001254 Schwarz criterion-7.002206 Log likelihood142.1359 F-statistic13051.60 Durbin-Watson stat1.605178 Prob(F-statistic)0.000000

SEASONAL PART OF THE MODEL DYNAMIC PART OF THE MODEL COLLEGE TUITION PRICE INDEX GENERALIZED AUTOREGRESSIVE TREND ESTIMATION Dependent Variable: LFEE Method: Least Squares Sample(adjusted): 1987:1 1995:4 Included observations: 36 after adjusting endpoints VariableCoefficientStd. Errort-StatisticProb. C 0.0487520.0241142.0217600.0529 LFEE(-1) 1.1263660.1829706.1560100.0000 LFEE(-2) 0.2921520.2564881.1390510.2643 LFEE(-3) -0.3449630.253185-1.3624910.1839 LFEE(-4) -0.0768550.181751-0.4228570.6756 D1 -0.0438790.005597-7.8401180.0000 D1(-1) -0.0485620.010241-4.7420400.0001 D1(-2) -0.0053690.009855-0.5448140.5902 R-squared0.999502 Mean dependent var5.263841 Adjusted R-squared0.999377 S.D. dependent var0.221681 S.E. of regression0.005532 Akaike info criterion-7.363447 Sum squared resid0.000857 Schwarz criterion-7.011554 Log likelihood140.5420 F-statistic8025.362 Durbin-Watson stat1.892211 Prob(F-statistic)0.000000

GAP SALES FORECAST

SIMPLE AUTOREGRESSIVE REGRESSION MODEL Dependent Variable: LSALES Method: Least Squares Sample(adjusted): 1985:2 1999:4 Included observations: 59 after adjusting endpoints VariableCoefficientStd. Errort-StatisticProb. C0.6131600.4841631.2664330.2105 LSALES(-1)0.9587140.03612826.536230.0000 R-squared0.925115 Mean dependent var13.43549 Adjusted R-squared0.923802 S.D. dependent var0.848687 S.E. of regression0.234272 Akaike info criterion-0.031358 Sum squared resid3.128350 Schwarz criterion0.039067 Log likelihood2.925062 F-statistic704.1714 Durbin-Watson stat2.159164 Prob(F-statistic)0.000000 SEASONALITY IS NOT MODELLED

AUTOREGRESSIVE REGRESSION MODEL WITH SEASONAL DUMMIES Dependent Variable: LSALES Method: Least Squares Sample(adjusted): 1985:3 1999:4 Included observations: 58 after adjusting endpoints VariableCoefficientStd. Errort-StatisticProb. C0.2997340.1115642.6866560.0096 LSALES(-1)0.9944730.008213121.08730.0000 D1-0.5472510.018685-29.287660.0000 D1(-1)-0.1754050.018732-9.3641260.0000 D1(-2)0.0332810.0184581.8030730.0771 R-squared0.996547 Mean dependent var13.46547 Adjusted R-squared0.996287 S.D. dependent var0.823972 S.E. of regression0.050210 Akaike info criterion-3.062940 Sum squared resid0.133616 Schwarz criterion-2.885316 Log likelihood93.82526 F-statistic3824.335 Durbin-Watson stat1.828642 Prob(F-statistic)0.000000

ALTERNATIVE SEASONAL MODELLING  FOR NONSEASONAL DATA, THE AUTOREGRESSIVE MODEL CAN BE WRITTEN AS  IF THE LENGTH OF THE SEASONALITY IS S, THE SESONAL AUTOREGRESSIVE MODEL CAN BE WRITTEN AS

SEASONAL LAGGED AUTOREGRESSIVE REGRESSION MODEL Dependent Variable: LSALES Method: Least Squares Sample(adjusted): 1986:1 1999:4 Included observations: 56 after adjusting endpoints VariableCoefficientStd. Errort-StatisticProb. C0.3299800.1694851.9469530.0567 LSALES(-4)0.9908770.01272077.899490.0000 R-squared0.991180 Mean dependent var3.50893 Adjusted R-squared0.991016 S.D. dependent var0.804465 S.E. of regression0.076248 Akaike info criterion2.274583 Sum squared resid0.313945 Schwarz criterion-2.202249 Log likelihood65.68834 F-statistic6068.330 Durbin-Watson stat0.434696 Prob(F-statistic)0.000000

Graphical Evaluation of Fit and Error Terms

FORECASTING WITH REGRESSION MODELS TREND ANALYSIS BUSINESS FORECASTING Prof. Dr. Burç Ülengin ITU MANAGEMENT ENGINEERING FACULTY FALL 2011.

Similar presentations

Presentation on theme: "FORECASTING WITH REGRESSION MODELS TREND ANALYSIS BUSINESS FORECASTING Prof. Dr. Burç Ülengin ITU MANAGEMENT ENGINEERING FACULTY FALL 2011."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

FORECASTING WITH REGRESSION MODELS TREND ANALYSIS BUSINESS FORECASTING Prof. Dr. Burç Ülengin ITU MANAGEMENT ENGINEERING FACULTY FALL 2011.

Similar presentations

Presentation on theme: "FORECASTING WITH REGRESSION MODELS TREND ANALYSIS BUSINESS FORECASTING Prof. Dr. Burç Ülengin ITU MANAGEMENT ENGINEERING FACULTY FALL 2011."— Presentation transcript:

Similar presentations

About project

Feedback