Part 18: Regression Modeling 18-1/44 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics.

Part 18: Regression Modeling 18-1/44 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics

Part 18: Regression Modeling 18-2/44 Statistics and Data Analysis Part 18 – Regression Modeling

Part 18: Regression Modeling 18-3/44 Linear Regression Models  Least squares results Regression model Sample statistics Estimates of population parameters  How good is the model? In the abstract Statistical measures of model fit  Assessing the validity of the relationship

Part 18: Regression Modeling 18-4/44 Regression Model  Regression relationship y i = α + β x i + ε i Random ε i implies random y i Observed random y i has two unobserved components: Explained: α + β x i Unexplained: ε i  Random component ε i zero mean, standard deviation σ, normal distribution.

Part 18: Regression Modeling 18-5/44 Linear Regression: Model Assumption

Part 18: Regression Modeling 18-6/44 Least Squares Results

Part 18: Regression Modeling 18-7/44 Using the Regression Model  Prediction: Use x i as information to predict y i. The natural predictor is the mean, x i provides more information. With x i, the predictor is

Part 18: Regression Modeling 18-8/44 Regression Fits Regression of salary vs. Regression of fuel bill vs. number years of experience of rooms for a sample of homes

Part 18: Regression Modeling 18-9/44 Regression Arithmetic

Part 18: Regression Modeling 18-10/44 Analysis of Variance

Part 18: Regression Modeling 18-11/44 Fit of the Model to the Data

Part 18: Regression Modeling 18-12/44 Explained Variation  The proportion of variation “explained” by the regression is called R-squared (R 2 )  It is also called the Coefficient of Determination

Part 18: Regression Modeling 18-13/44 Movie Madness Fit R2R2

Part 18: Regression Modeling 18-14/44 Regression Fits R 2 = 0.522 R 2 = 0.880 R 2 = 0.424 R 2 = 0.924

Part 18: Regression Modeling 18-15/44 R 2 = 0.338 R 2 is still positive even if the correlation is negative.

Part 18: Regression Modeling 18-16/44 R Squared Benchmarks  Aggregate time series: expect.9+  Cross sections,.5 is good. Sometimes we do much better.  Large survey data sets,.2 is not bad. R 2 = 0.924 in this cross section.

Part 18: Regression Modeling 18-17/44 Correlation Coefficient

Part 18: Regression Modeling 18-18/44 Correlations r xy = 0.723 r xy = -.402r xy = +1.000

Part 18: Regression Modeling 18-19/44 R-Squared is r xy 2  R-squared is the square of the correlation between y i and the predicted y i which is a + bx i.  The correlation between y i and (a+bx i ) is the same as the correlation between y i and x i.  Therefore,….  A regression with a high R 2 predicts y i well.

Part 18: Regression Modeling 18-20/44 Adjusted R-Squared  We will discover when we study regression with more than one variable, a researcher can increase R 2 just by adding variables to a model, even if those variables do not really explain y or have any real relationship at all.  To have a fit measure that accounts for this, “Adjusted R 2 ” is a number that increases with the correlation, but decreases with the number of variables.

Part 18: Regression Modeling 18-21/44 Movie Madness Fit

Part 18: Regression Modeling 18-22/44 Notes About Adjusted R 2

Part 18: Regression Modeling 18-23/44 Is R 2 Large?  Is there really a relationship between x and y? We cannot be 100% certain. We can be “statistically certain” (within limits) by examining R 2. F is used for this purpose.

Part 18: Regression Modeling 18-24/44 The F Ratio

Part 18: Regression Modeling 18-25/44 Is R 2 Large?  Since F = (N-2)R 2 /(1 – R 2 ), if R 2 is “large,” then F will be large.  For a model with one explanatory variable in it, the standard benchmark value for a ‘large’ F is 4.

Part 18: Regression Modeling 18-26/44 Movie Madness Fit R2R2 F

Part 18: Regression Modeling 18-27/44 Why Use F and not R 2 ?  When is R 2 “large?” we have no benchmarks to decide.  How large is “large?” We have a table for F statistics to determine when F is statistically large: yes or no.

Part 18: Regression Modeling 18-28/44 F Table The “critical value” depends on the number of observations. If F is larger than the appropriate value in the table, conclude that there is a “statistically significant” relationship. There is a huge F table on pages 732-742 of your text. Analysts now use computer programs, not tables like this, to find the critical values of F for their model/data. n 2 is N-2

Part 18: Regression Modeling 18-29/44 Internet Buzz Regression Regression Analysis: BoxOffice versus Buzz The regression equation is BoxOffice = - 14.4 + 72.7 Buzz Predictor Coef SE Coef T P Constant -14.360 5.546 -2.59 0.012 Buzz 72.72 10.94 6.65 0.000 S = 13.3863 R-Sq = 42.4% R-Sq(adj) = 41.4% Analysis of Variance Source DF SS MS F P Regression 1 7913.6 7913.6 44.16 0.000 Residual Error 60 10751.5 179.2 Total 61 18665.1 n 2 is N-2

Part 18: Regression Modeling 18-30/44 $135 Million http://www.nytimes.com/2006/06/19/arts/design/19klim.html?ex=1308369600&en=37eb32381038a74 9&ei=5088&partner=rssnyt&emc=rss Klimt, to Ronald Lauder

Part 18: Regression Modeling 18-31/44 $100 Million … sort of Stephen Wynn with a Prized Possession, 2007

Part 18: Regression Modeling 18-32/44 An Enduring Art Mystery Why do larger paintings command higher prices? The Persistence of Memory. Salvador Dali, 1931 The Persistence of Statistics. Hildebrand, Ott and Gray, 2005 Graphics show relative sizes of the two works.

Part 18: Regression Modeling 18-33/44

Part 18: Regression Modeling 18-34/44

Part 18: Regression Modeling 18-35/44 Monet in Large and Small Log of $price = a + b log surface area + e Sale prices of 328 signed Monet paintings The residuals do not show any obvious patterns that seem inconsistent with the assumptions of the model.

Part 18: Regression Modeling 18-36/44 The Data Note: Using logs in this context. This is common when analyzing financial measurements (e.g., price) and when percentage changes are more interesting than unit changes. (E.g., what is the % premium when the painting is 10% larger?)

Part 18: Regression Modeling 18-37/44 Monet Regression: There seems to be a regression. Is there a theory?

Part 18: Regression Modeling 18-38/44 Conclusions about F  R 2 answers the question of how well the model fits the data  F answers the question of whether there is a statistically valid fit (as opposed to no fit).  What remains is the question of whether there is a valid relationship – i.e., is β different from zero.

Part 18: Regression Modeling 18-39/44 The Regression Slope  The model is y i = α+βx i +ε i The “relationship” depends on β. If β equals zero, there is no relationship  The least squares slope, b, is the estimate of β based on the sample. It is a statistic based on a random sample. We cannot be sure it equals the true β.  To accommodate this view, we form a range of uncertainty around b. I.e., a confidence interval.

Part 18: Regression Modeling 18-40/44 Uncertainty About the Regression Slope Hypothetical Regression Fuel Bill vs. Number of Rooms The regression equation is Fuel Bill = -252 + 136 Number of Rooms Predictor Coef SE Coef T P Constant -251.9 44.88 -5.20 0.000 Rooms 136.2 7.09 19.9 0.000 S = 144.456 R-Sq = 72.2% R-Sq(adj) = 72.0% This is b, the estimate of β This “Standard Error,” (SE) is the measure of uncertainty about the true value. The “range of uncertainty” is b ± 2 SE(b). (Actually 1.96, but people use 2)

Part 18: Regression Modeling 18-41/44 Internet Buzz Regression Regression Analysis: BoxOffice versus Buzz The regression equation is BoxOffice = - 14.4 + 72.7 Buzz Predictor Coef SE Coef T P Constant -14.360 5.546 -2.59 0.012 Buzz 72.72 10.94 6.65 0.000 S = 13.3863 R-Sq = 42.4% R-Sq(adj) = 41.4% Analysis of Variance Source DF SS MS F P Regression 1 7913.6 7913.6 44.16 0.000 Residual Error 60 10751.5 179.2 Total 61 18665.1 Range of Uncertainty for b is 72.72+1.96(10.94) to 72.72-1.96(10.94) = [51.27 to 94.17]

Part 18: Regression Modeling 18-42/44 Elasticity in the Monet Regression: b = 1.7246. This is the elasticity of price with respect to area. The confidence interval would be 1.7246  1.96(.1908) = [1.3506 to 2.0986] The fact that this does not include 1.0 is an important result – prices for Monet paintings are extremely elastic with respect to the area.

Part 18: Regression Modeling 18-43/44 Conclusion about b  So, should we conclude the slope is not zero? Does the range of uncertainty include zero? No, then you should conclude the slope is not zero. Yes, then you can’t be very sure that β is not zero.  Tying it together. If the range of uncertainty does not include 0.0 then, The ratio b/SE is larger than2. The square of the ratio is larger than 4. The square of the ratio is F. F larger than 4 gave the same conclusion. They are looking at the same thing.

Part 18: Regression Modeling 18-44/44 Summary  The regression model – theory  Least squares results, a, b, s, R 2  The fit of the regression model to the data  ANOVA and R 2  The F statistic and R 2  Uncertainty about the regression slope

Part 18: Regression Modeling 18-1/44 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics.

Similar presentations

Presentation on theme: "Part 18: Regression Modeling 18-1/44 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Part 18: Regression Modeling 18-1/44 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics.

Similar presentations

Presentation on theme: "Part 18: Regression Modeling 18-1/44 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics."— Presentation transcript:

Similar presentations

About project

Feedback