Part 7: Multiple Regression Analysis 7-1/54 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics.

Slides:



Advertisements
Similar presentations
Part 24: Hypothesis Tests 24-1/33 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics.
Advertisements

Multiple Regression and Model Building
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Managerial Economics in a Global Economy
The Multiple Regression Model.
Part 17: Nonlinear Regression 17-1/26 Econometrics I Professor William Greene Stern School of Business Department of Economics.
13- 1 Chapter Thirteen McGraw-Hill/Irwin © 2005 The McGraw-Hill Companies, Inc., All Rights Reserved.
Hypothesis Testing Steps in Hypothesis Testing:
Part 7: Estimating the Variance of b 7-1/53 Econometrics I Professor William Greene Stern School of Business Department of Economics.
Pengujian Parameter Regresi Pertemuan 26 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Part 4: Prediction 4-1/22 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics.
Objectives (BPS chapter 24)
Chapter 10 Simple Regression.
Chapter 12 Simple Regression
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Simple Linear Regression Basic Business Statistics 11 th Edition.
1 1 Slide 統計學 Spring 2004 授課教師:統計系余清祥 日期: 2004 年 5 月 4 日 第十二週:複迴歸.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 11 th Edition.
Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics.
Chapter 11 Multiple Regression.
Simple Linear Regression Analysis
REGRESSION AND CORRELATION
1 1 Slide © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Part 18: Regression Modeling 18-1/44 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics.
SIMPLE LINEAR REGRESSION
Ch. 14: The Multiple Regression Model building
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Correlation and Regression Analysis
Introduction to Regression Analysis, Chapter 13,
Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.
Review Guess the correlation. A.-2.0 B.-0.9 C.-0.1 D.0.1 E.0.9.
Part 3: Regression and Correlation 3-1/41 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics.
Part 24: Multiple Regression – Part /45 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department.
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
1 Tests with two+ groups We have examined tests of means for a single group, and for a difference if we have a matched sample (as in husbands and wives)
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
1 1 Slide © 2003 Thomson/South-Western Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple Coefficient of Determination.
CHAPTER 14 MULTIPLE REGRESSION
Introduction to Linear Regression
Part 9: Hypothesis Testing /29 Econometrics I Professor William Greene Stern School of Business Department of Economics.
Chapter 11 Linear Regression Straight Lines, Least-Squares and More Chapter 11A Can you pick out the straight lines and find the least-square?
An alternative approach to testing for a linear association The Analysis of Variance (ANOVA) Table.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Chap 14-1 Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics.
1 Lecture 4 Main Tasks Today 1. Review of Lecture 3 2. Accuracy of the LS estimators 3. Significance Tests of the Parameters 4. Confidence Interval 5.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Part 2: Model and Inference 2-1/49 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics.
Part 16: Regression Model Specification 16-1/25 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department.
Solutions to Tutorial 5 Problems Source Sum of Squares df Mean Square F-test Regression Residual Total ANOVA Table Variable.
14- 1 Chapter Fourteen McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved.
VI. Regression Analysis A. Simple Linear Regression 1. Scatter Plots Regression analysis is best taught via an example. Pencil lead is a ceramic material.
Environmental Modeling Basic Testing Methods - Statistics III.
Multiple Regression I 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 1) Terry Dielman.
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Business Research Methods
Significance Tests for Regression Analysis. A. Testing the Significance of Regression Models The first important significance test is for the regression.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Multiple Regression Chapter 14.
Chapter 10: The t Test For Two Independent Samples.
Analysis of variance approach to regression analysis … an (alternative) approach to testing for a linear association.
Chapter 20 Linear and Multiple Regression
Chapter 4 Basic Estimation Techniques
John Loucks St. Edward’s University . SLIDES . BY.
Chapter 13 Simple Linear Regression
Interval Estimation and Hypothesis Testing
SIMPLE LINEAR REGRESSION
Econometrics I Professor William Greene Stern School of Business
Essentials of Statistics for Business and Economics (8e)
Presentation transcript:

Part 7: Multiple Regression Analysis 7-1/54 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics

Part 7: Multiple Regression Analysis 7-2/54 Regression and Forecasting Models Part 7 – Multiple Regression Analysis

Part 7: Multiple Regression Analysis 7-3/54 Model Assumptions  y i = β 0 + β 1 x i1 + β 2 x i2 + β 3 x i3 … + β K x iK + ε i β 0 + β 1 x i1 + β 2 x i2 + β 3 x i3 … + β K x iK is the ‘regression function’  Contains the ‘information’ about y i in x i1, …, x iK  Unobserved because β 0,β 1,…, β K are not known for certain ε i is the ‘disturbance.’ It is the unobserved random component  Observed y i is the sum of the two unobserved parts.

Part 7: Multiple Regression Analysis 7-4/54 Regression Model Assumptions About ε i  Random Variable (1) The regression is the mean of y i for a particular x i1, …, x iK. ε i is the deviation of y i from the regression line. (2) ε i has mean zero. (3) ε i has variance σ 2.  ‘Random’ Noise (4) ε i is unrelated to any values of x i1, …, x iK (no covariance) – it’s “random noise” (5) ε i is unrelated to any other observations on ε j (not “autocorrelated”) (6) Normal distribution - ε i is the sum of many small influences

Part 7: Multiple Regression Analysis 7-5/54 Regression model for U.S. gasoline market, y x1 x2 x3 x4 x5

Part 7: Multiple Regression Analysis 7-6/54 Least Squares

Part 7: Multiple Regression Analysis 7-7/54 An Elaborate Multiple Loglinear Regression Model

Part 7: Multiple Regression Analysis 7-8/54 An Elaborate Multiple Loglinear Regression Model Specified Equation

Part 7: Multiple Regression Analysis 7-9/54 An Elaborate Multiple Loglinear Regression Model Minimized sum of squared residuals

Part 7: Multiple Regression Analysis 7-10/54 An Elaborate Multiple Loglinear Regression Model Least Squares Coefficients

Part 7: Multiple Regression Analysis 7-11/54 An Elaborate Multiple Loglinear Regression Model N=52 K=5

Part 7: Multiple Regression Analysis 7-12/54 An Elaborate Multiple Loglinear Regression Model Standard Errors

Part 7: Multiple Regression Analysis 7-13/54 An Elaborate Multiple Loglinear Regression Model Confidence Intervals b k  t*  SE  logIncome   2.013(.1457) = [ to ]

Part 7: Multiple Regression Analysis 7-14/54 An Elaborate Multiple Loglinear Regression Model t statistics for testing individual slopes = 0

Part 7: Multiple Regression Analysis 7-15/54 An Elaborate Multiple Loglinear Regression Model P values for individual tests

Part 7: Multiple Regression Analysis 7-16/54 An Elaborate Multiple Loglinear Regression Model Standard error of regression s e

Part 7: Multiple Regression Analysis 7-17/54 An Elaborate Multiple Loglinear Regression Model R2R2

Part 7: Multiple Regression Analysis 7-18/54 We used McDonald’s Per Capita

Part 7: Multiple Regression Analysis 7-19/54 Movie Madness Data (n=2198)

Part 7: Multiple Regression Analysis 7-20/54 CRIME is the left out GENRE. AUSTRIA is the left out country. Australia and UK were left out for other reasons (algebraic problem with only 8 countries).

Part 7: Multiple Regression Analysis 7-21/54 Use individual “T” statistics. T > +2 or T < -2 suggests the variable is “significant.” T for LogPCMacs = This is large.

Part 7: Multiple Regression Analysis 7-22/54 Partial Effect  Hypothesis: If we include the signature effect, size does not explain the sale prices of Monet paintings.  Test: Compute the multiple regression; then H 0 : β 1 = 0.  α level for the test = 0.05 as usual  Rejection Region: Large value of b 1 (coefficient)  Test based on t = b 1 /StandardError Regression Analysis: ln (US$) versus ln (SurfaceArea), Signed The regression equation is ln (US$) = ln (SurfaceArea) Signed Predictor Coef SE Coef T P Constant ln (SurfaceArea) Signed S = R-Sq = 46.2% R-Sq(adj) = 46.0% Reject H 0. Degrees of Freedom for the t statistic is N-3 = N-number of predictors – 1.

Part 7: Multiple Regression Analysis 7-23/54 Model Fit  How well does the model fit the data?  R 2 measures fit – the larger the better Time series: expect.9 or better Cross sections: it depends  Social science data:.1 is good  Industry or market data:.5 is routine

Part 7: Multiple Regression Analysis 7-24/54 Two Views of R 2

Part 7: Multiple Regression Analysis 7-25/54 Pretty Good Fit: R 2 =.722 Regression of Fuel Bill on Number of Rooms

Part 7: Multiple Regression Analysis 7-26/54 Testing “The Regression” Degrees of Freedom for the F statistic are K and N-K-1

Part 7: Multiple Regression Analysis 7-27/54 A Formal Test of the Regression Model  Is there a significant “relationship?” Equivalently, is R 2 > 0? Statistically, not numerically.  Testing: Compute Determine if F is large using the appropriate “table”

Part 7: Multiple Regression Analysis 7-28/54 n 1 = Number of predictors n 2 = Sample size – number of predictors – 1

Part 7: Multiple Regression Analysis 7-29/54 An Elaborate Multiple Loglinear Regression Model R2R2

Part 7: Multiple Regression Analysis 7-30/54 An Elaborate Multiple Loglinear Regression Model Overall F test for the model

Part 7: Multiple Regression Analysis 7-31/54 An Elaborate Multiple Loglinear Regression Model P value for overall F test

Part 7: Multiple Regression Analysis 7-32/54 Cost “Function” Regression The regression is “significant.” F is huge. Which variables are significant? Which variables are not significant?

Part 7: Multiple Regression Analysis 7-33/54 The F Test for the Model  Determine the appropriate “critical” value from the table.  Is the F from the computed model larger than the theoretical F from the table? Yes: Conclude the relationship is significant No: Conclude R 2 = 0.

Part 7: Multiple Regression Analysis 7-34/54 Compare Sample F to Critical F  F = for More Movie Madness  Critical value from the table is  Reject the hypothesis of no relationship.

Part 7: Multiple Regression Analysis 7-35/54 An Equivalent Approach  What is the “P Value?”  We observed an F of (or, whatever it is).  If there really were no relationship, how likely is it that we would have observed an F this large (or larger)? Depends on N and K The probability is reported with the regression results as the P Value.

Part 7: Multiple Regression Analysis 7-36/54 The F Test for More Movie Madness S = R-Sq = 57.0% R-Sq(adj) = 56.6% Analysis of Variance Source DF SS MS F P Regression Residual Error Total

Part 7: Multiple Regression Analysis 7-37/54 What About a Group of Variables?  Is Genre significant? There are 12 genre variables Some are “significant” (fantasy, mystery, horror) some are not. Can we conclude the group as a whole is?  Maybe. We need a test.

Part 7: Multiple Regression Analysis 7-38/54 Application: Part of a Regression Model  Regression model includes variables x 1, x 2,… I am sure of these variables.  Maybe variables z 1, z 2,… I am not sure of these.  Model: y = β 0 +β 1 x 1 +β 2 x 2 + δ 1 z 1 +δ 2 z 2 + ε  Hypothesis: δ 1 =0 and δ 2 =0.  Strategy: Start with model including x 1 and x 2. Compute R 2. Compute new model that also includes z 1 and z 2.  Rejection region: R 2 increases a lot.

Part 7: Multiple Regression Analysis 7-39/54 Theory for the Test  A larger model has a higher R 2 than a smaller one.  (Larger model means it has all the variables in the smaller one, plus some additional ones)  Compute this statistic with a calculator

Part 7: Multiple Regression Analysis 7-40/54 Test Statistic

Part 7: Multiple Regression Analysis 7-41/54 Gasoline Market

Part 7: Multiple Regression Analysis 7-42/54 Gasoline Market Regression Analysis: logG versus logIncome, logPG The regression equation is logG = logIncome logPG Predictor Coef SE Coef T P Constant logIncome logPG S = R-Sq = 93.6% R-Sq(adj) = 93.4% Analysis of Variance Source DF SS MS F P Regression Residual Error Total R 2 = / =

Part 7: Multiple Regression Analysis 7-43/54 Gasoline Market Regression Analysis: logG versus logIncome, logPG,... The regression equation is logG = logIncome logPG logPNC logPUC logPPT Predictor Coef SE Coef T P Constant logIncome logPG logPNC logPUC logPPT S = R-Sq = 96.0% R-Sq(adj) = 95.6% Analysis of Variance Source DF SS MS F P Regression Residual Error Total Now, R 2 = / = Previously, R 2 = / =

Part 7: Multiple Regression Analysis 7-44/54 Improvement in R 2 Inverse Cumulative Distribution Function F distribution with 3 DF in numerator and 46 DF in denominator P( X <= x ) = 0.95 x = The null hypothesis is rejected. Notice that none of the three individual variables are “significant” but the three of them together are.

Part 7: Multiple Regression Analysis 7-45/54 Is Genre Significant? Calc -> Probability Distributions -> F… The critical value shown by Minitab is 1.76 With the 12 Genre indicator variables: R-Squared = 57.0% Without the 12 Genre indicator variables: R-Squared = 55.4% The F statistic is F is greater than the critical value. Reject the hypothesis that all the genre coefficients are zero.

Part 7: Multiple Regression Analysis 7-46/54 Application  Health satisfaction depends on many factors: Age, Income, Children, Education, Marital Status Do these factors figure differently in a model for women compared to one for men?  Investigation: Multiple regression  Null hypothesis: The regressions are the same.  Rejection Region: Estimated regressions that are very different.

Part 7: Multiple Regression Analysis 7-47/54 Equal Regressions  Setting: Two groups of observations (men/women, countries, two different periods, firms, etc.)  Regression Model: y = β 0 +β 1 x 1 +β 2 x 2 + … + ε  Hypothesis: The same model applies to both groups  Rejection region: Large values of F

Part 7: Multiple Regression Analysis 7-48/54 Procedure: Equal Regressions  There are N1 observations in Group 1 and N2 in Group 2.  There are K variables and the constant term in the model.  This test requires you to compute three regressions and retain the sum of squared residuals from each: SS1 = sum of squares from N 1 observations in group 1 SS2 = sum of squares from N 2 observations in group 2 SSALL = sum of squares from N ALL =N 1 +N 2 observations when the two groups are pooled.  The hypothesis of equal regressions is rejected if F is larger than the critical value from the F table (K numerator and N ALL -2K-2 denominator degrees of freedom)

Part 7: Multiple Regression Analysis 7-49/ |Variable| Coefficient | Standard Error | T |P value]| Mean of X| Women===|=[NW = 13083]================================================ Constant| AGE | EDUC | HHNINC | HHKIDS | MARRIED | Men=====|=[NM = 14243]================================================ Constant| AGE | EDUC | HHNINC | HHKIDS | MARRIED | Both====|=[NALL = 27326]============================================== Constant| AGE | EDUC | HHNINC | HHKIDS | MARRIED | German survey data over 7 years, 1984 to 1991 (with a gap). 27,326 observations on Health Satisfaction and several covariates. Health Satisfaction Models: Men vs. Women

Part 7: Multiple Regression Analysis 7-50/54 Computing the F Statistic | Women Men All | | HEALTH Mean = | | Standard deviation = | | Number of observs. = | | Model size Parameters = | | Degrees of freedom = | | Residuals Sum of squares = | | Standard error of e = | | Fit R-squared = | | Model test F (P value) = (.000) (.000) (.0000) |

Part 7: Multiple Regression Analysis 7-51/54 A Huge Theorem  R 2 always goes up when you add variables to your model.  Always.

Part 7: Multiple Regression Analysis 7-52/54 The Adjusted R Squared  Adjusted R 2 penalizes your model for obtaining its fit with lots of variables. Adjusted R 2 = 1 – [(N-1)/(N-K-1)]*(1 – R 2 )  Adjusted R 2 is denoted  Adjusted R 2 is not the mean of anything and it is not a square. This is just a name.

Part 7: Multiple Regression Analysis 7-53/54 An Elaborate Multiple Loglinear Regression Model Adjusted R 2

Part 7: Multiple Regression Analysis 7-54/54 Adjusted R 2 for More Movie Madness S = R-Sq = 57.0% R-Sq(adj) = 56.6% Analysis of Variance Source DF SS MS F P Regression Residual Error Total If N is very large, R 2 and Adjusted R 2 will not differ by very much is quite large for this purpose.