Part 18: Regression Modeling 18-1/44 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics.

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

Regression and correlation methods
13- 1 Chapter Thirteen McGraw-Hill/Irwin © 2005 The McGraw-Hill Companies, Inc., All Rights Reserved.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Simple Linear Regression and Correlation
Correlation and regression
Objectives (BPS chapter 24)
Chapter 13 Multiple Regression
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
Chapter 12 Simple Regression
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Simple Linear Regression Basic Business Statistics 11 th Edition.
Chapter 12 Multiple Regression
Statistics for Business and Economics
Linear Regression and Correlation
Statistical Inference and Regression Analysis: GB Professor William Greene Stern School of Business IOMS Department Department of Economics.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 11 th Edition.
Chapter Topics Types of Regression Models
Stat 217 – Day 25 Regression. Last Time - ANOVA When?  Comparing 2 or means (one categorical and one quantitative variable) Research question  Null.
REGRESSION AND CORRELATION
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Part 7: Multiple Regression Analysis 7-1/54 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics.
Simple Linear Regression Analysis
Lecture 5 Correlation and Regression
Part 3: Regression and Correlation 3-1/41 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics.
Part 24: Multiple Regression – Part /45 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department.
Introduction to Linear Regression and Correlation Analysis
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Correlation and Linear Regression
Hypothesis Testing in Linear Regression Analysis
CPE 619 Simple Linear Regression Models Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama.
Simple Linear Regression Models
M23- Residuals & Minitab 1  Department of ISM, University of Alabama, ResidualsResiduals A continuation of regression analysis.
Part 17: Regression Residuals 17-1/38 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics.
Statistics for Business and Economics Chapter 10 Simple Linear Regression.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Time Series Analysis – Chapter 2 Simple Regression Essentially, all models are wrong, but some are useful. - George Box Empirical Model-Building and Response.
© 2001 Prentice-Hall, Inc. Statistics for Business and Economics Simple Linear Regression Chapter 10.
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc. Chap 12-1 Correlation and Regression.
Introduction to Linear Regression
Chapter 11 Linear Regression Straight Lines, Least-Squares and More Chapter 11A Can you pick out the straight lines and find the least-square?
Introduction to Probability and Statistics Thirteenth Edition Chapter 12 Linear Regression and Correlation.
Chap 14-1 Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Part 2: Model and Inference 2-1/49 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics.
Section 9-1: Inference for Slope and Correlation Section 9-3: Confidence and Prediction Intervals Visit the Maths Study Centre.
Part 16: Regression Model Specification 16-1/25 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department.
Inference for regression - More details about simple linear regression IPS chapter 10.2 © 2006 W.H. Freeman and Company.
Lecture 10 Chapter 23. Inference for regression. Objectives (PSLS Chapter 23) Inference for regression (NHST Regression Inference Award)[B level award]
Multiple Regression I 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 1) Terry Dielman.
Inference for regression - More details about simple linear regression IPS chapter 10.2 © 2006 W.H. Freeman and Company.
Regression Analysis Presentation 13. Regression In Chapter 15, we looked at associations between two categorical variables. We will now focus on relationships.
11-1 Copyright © 2014, 2011, and 2008 Pearson Education, Inc.
Descriptive measures of the degree of linear association R-squared and correlation.
Stats Methods at IC Lecture 3: Regression.
Chapter 13 Simple Linear Regression
Chapter 14 Introduction to Multiple Regression
Chapter 4 Basic Estimation Techniques
Inference for Least Squares Lines
(Residuals and
Correlation and Simple Linear Regression
Stats Club Marnie Brennan
Prepared by Lee Revere and John Large
Correlation and Simple Linear Regression
SIMPLE LINEAR REGRESSION
Simple Linear Regression and Correlation
MGS 3100 Business Analysis Regression Feb 18, 2016
Inferences 10-3.
Presentation transcript:

Part 18: Regression Modeling 18-1/44 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics

Part 18: Regression Modeling 18-2/44 Statistics and Data Analysis Part 18 – Regression Modeling

Part 18: Regression Modeling 18-3/44 Linear Regression Models  Least squares results Regression model Sample statistics Estimates of population parameters  How good is the model? In the abstract Statistical measures of model fit  Assessing the validity of the relationship

Part 18: Regression Modeling 18-4/44 Regression Model  Regression relationship y i = α + β x i + ε i Random ε i implies random y i Observed random y i has two unobserved components: Explained: α + β x i Unexplained: ε i  Random component ε i zero mean, standard deviation σ, normal distribution.

Part 18: Regression Modeling 18-5/44 Linear Regression: Model Assumption

Part 18: Regression Modeling 18-6/44 Least Squares Results

Part 18: Regression Modeling 18-7/44 Using the Regression Model  Prediction: Use x i as information to predict y i. The natural predictor is the mean, x i provides more information. With x i, the predictor is

Part 18: Regression Modeling 18-8/44 Regression Fits Regression of salary vs. Regression of fuel bill vs. number years of experience of rooms for a sample of homes

Part 18: Regression Modeling 18-9/44 Regression Arithmetic

Part 18: Regression Modeling 18-10/44 Analysis of Variance

Part 18: Regression Modeling 18-11/44 Fit of the Model to the Data

Part 18: Regression Modeling 18-12/44 Explained Variation  The proportion of variation “explained” by the regression is called R-squared (R 2 )  It is also called the Coefficient of Determination

Part 18: Regression Modeling 18-13/44 Movie Madness Fit R2R2

Part 18: Regression Modeling 18-14/44 Regression Fits R 2 = R 2 = R 2 = R 2 = 0.924

Part 18: Regression Modeling 18-15/44 R 2 = R 2 is still positive even if the correlation is negative.

Part 18: Regression Modeling 18-16/44 R Squared Benchmarks  Aggregate time series: expect.9+  Cross sections,.5 is good. Sometimes we do much better.  Large survey data sets,.2 is not bad. R 2 = in this cross section.

Part 18: Regression Modeling 18-17/44 Correlation Coefficient

Part 18: Regression Modeling 18-18/44 Correlations r xy = r xy = -.402r xy =

Part 18: Regression Modeling 18-19/44 R-Squared is r xy 2  R-squared is the square of the correlation between y i and the predicted y i which is a + bx i.  The correlation between y i and (a+bx i ) is the same as the correlation between y i and x i.  Therefore,….  A regression with a high R 2 predicts y i well.

Part 18: Regression Modeling 18-20/44 Adjusted R-Squared  We will discover when we study regression with more than one variable, a researcher can increase R 2 just by adding variables to a model, even if those variables do not really explain y or have any real relationship at all.  To have a fit measure that accounts for this, “Adjusted R 2 ” is a number that increases with the correlation, but decreases with the number of variables.

Part 18: Regression Modeling 18-21/44 Movie Madness Fit

Part 18: Regression Modeling 18-22/44 Notes About Adjusted R 2

Part 18: Regression Modeling 18-23/44 Is R 2 Large?  Is there really a relationship between x and y? We cannot be 100% certain. We can be “statistically certain” (within limits) by examining R 2. F is used for this purpose.

Part 18: Regression Modeling 18-24/44 The F Ratio

Part 18: Regression Modeling 18-25/44 Is R 2 Large?  Since F = (N-2)R 2 /(1 – R 2 ), if R 2 is “large,” then F will be large.  For a model with one explanatory variable in it, the standard benchmark value for a ‘large’ F is 4.

Part 18: Regression Modeling 18-26/44 Movie Madness Fit R2R2 F

Part 18: Regression Modeling 18-27/44 Why Use F and not R 2 ?  When is R 2 “large?” we have no benchmarks to decide.  How large is “large?” We have a table for F statistics to determine when F is statistically large: yes or no.

Part 18: Regression Modeling 18-28/44 F Table The “critical value” depends on the number of observations. If F is larger than the appropriate value in the table, conclude that there is a “statistically significant” relationship. There is a huge F table on pages of your text. Analysts now use computer programs, not tables like this, to find the critical values of F for their model/data. n 2 is N-2

Part 18: Regression Modeling 18-29/44 Internet Buzz Regression Regression Analysis: BoxOffice versus Buzz The regression equation is BoxOffice = Buzz Predictor Coef SE Coef T P Constant Buzz S = R-Sq = 42.4% R-Sq(adj) = 41.4% Analysis of Variance Source DF SS MS F P Regression Residual Error Total n 2 is N-2

Part 18: Regression Modeling 18-30/44 $135 Million 9&ei=5088&partner=rssnyt&emc=rss Klimt, to Ronald Lauder

Part 18: Regression Modeling 18-31/44 $100 Million … sort of Stephen Wynn with a Prized Possession, 2007

Part 18: Regression Modeling 18-32/44 An Enduring Art Mystery Why do larger paintings command higher prices? The Persistence of Memory. Salvador Dali, 1931 The Persistence of Statistics. Hildebrand, Ott and Gray, 2005 Graphics show relative sizes of the two works.

Part 18: Regression Modeling 18-33/44

Part 18: Regression Modeling 18-34/44

Part 18: Regression Modeling 18-35/44 Monet in Large and Small Log of $price = a + b log surface area + e Sale prices of 328 signed Monet paintings The residuals do not show any obvious patterns that seem inconsistent with the assumptions of the model.

Part 18: Regression Modeling 18-36/44 The Data Note: Using logs in this context. This is common when analyzing financial measurements (e.g., price) and when percentage changes are more interesting than unit changes. (E.g., what is the % premium when the painting is 10% larger?)

Part 18: Regression Modeling 18-37/44 Monet Regression: There seems to be a regression. Is there a theory?

Part 18: Regression Modeling 18-38/44 Conclusions about F  R 2 answers the question of how well the model fits the data  F answers the question of whether there is a statistically valid fit (as opposed to no fit).  What remains is the question of whether there is a valid relationship – i.e., is β different from zero.

Part 18: Regression Modeling 18-39/44 The Regression Slope  The model is y i = α+βx i +ε i The “relationship” depends on β. If β equals zero, there is no relationship  The least squares slope, b, is the estimate of β based on the sample. It is a statistic based on a random sample. We cannot be sure it equals the true β.  To accommodate this view, we form a range of uncertainty around b. I.e., a confidence interval.

Part 18: Regression Modeling 18-40/44 Uncertainty About the Regression Slope Hypothetical Regression Fuel Bill vs. Number of Rooms The regression equation is Fuel Bill = Number of Rooms Predictor Coef SE Coef T P Constant Rooms S = R-Sq = 72.2% R-Sq(adj) = 72.0% This is b, the estimate of β This “Standard Error,” (SE) is the measure of uncertainty about the true value. The “range of uncertainty” is b ± 2 SE(b). (Actually 1.96, but people use 2)

Part 18: Regression Modeling 18-41/44 Internet Buzz Regression Regression Analysis: BoxOffice versus Buzz The regression equation is BoxOffice = Buzz Predictor Coef SE Coef T P Constant Buzz S = R-Sq = 42.4% R-Sq(adj) = 41.4% Analysis of Variance Source DF SS MS F P Regression Residual Error Total Range of Uncertainty for b is (10.94) to (10.94) = [51.27 to 94.17]

Part 18: Regression Modeling 18-42/44 Elasticity in the Monet Regression: b = This is the elasticity of price with respect to area. The confidence interval would be  1.96(.1908) = [ to ] The fact that this does not include 1.0 is an important result – prices for Monet paintings are extremely elastic with respect to the area.

Part 18: Regression Modeling 18-43/44 Conclusion about b  So, should we conclude the slope is not zero? Does the range of uncertainty include zero? No, then you should conclude the slope is not zero. Yes, then you can’t be very sure that β is not zero.  Tying it together. If the range of uncertainty does not include 0.0 then, The ratio b/SE is larger than2. The square of the ratio is larger than 4. The square of the ratio is F. F larger than 4 gave the same conclusion. They are looking at the same thing.

Part 18: Regression Modeling 18-44/44 Summary  The regression model – theory  Least squares results, a, b, s, R 2  The fit of the regression model to the data  ANOVA and R 2  The F statistic and R 2  Uncertainty about the regression slope