Chapter 14 Multiple Regression Models. 2  A general additive multiple regression model, which relates a dependent variable y to k predictor variables.

Slides:



Advertisements
Similar presentations
Multiple Regression and Model Building
Advertisements

Inference for Regression
Pengujian Parameter Regresi Pertemuan 26 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 13 Nonlinear and Multiple Regression.
Objectives (BPS chapter 24)
1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Summarizing Bivariate Data Introduction to Linear Regression.
Statistics II: An Overview of Statistics. Outline for Statistics II Lecture: SPSS Syntax – Some examples. Normal Distribution Curve. Sampling Distribution.
Chapter 12 Simple Regression
Note 14 of 5E Statistics with Economics and Business Applications Chapter 12 Multiple Regression Analysis A brief exposition.
1 1 Slide 統計學 Spring 2004 授課教師:統計系余清祥 日期: 2004 年 5 月 4 日 第十二週:複迴歸.
REGRESSION AND CORRELATION
Multiple Linear Regression
1 1 Slide © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Correlation and Regression Analysis
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Simple Linear Regression Analysis
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Introduction to Linear Regression and Correlation Analysis
Inference for regression - Simple linear regression
Chapter 13: Inference in Regression
Simple linear regression Linear regression with one predictor variable.
Chapter 12 Multiple Regression and Model Building.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
BPS - 3rd Ed. Chapter 211 Inference for Regression.
Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables.
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide © 2003 Thomson/South-Western Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple Coefficient of Determination.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved OPIM 303-Lecture #9 Jose M. Cruz Assistant Professor.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
1 1 Slide Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple Coefficient of Determination n Model Assumptions n Testing.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
CHAPTER 14 MULTIPLE REGRESSION
Introduction to Linear Regression
1 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 5 Summarizing Bivariate Data.
Summarizing Bivariate Data
Chapter 11 Linear Regression Straight Lines, Least-Squares and More Chapter 11A Can you pick out the straight lines and find the least-square?
Introduction to Probability and Statistics Thirteenth Edition Chapter 12 Linear Regression and Correlation.
An alternative approach to testing for a linear association The Analysis of Variance (ANOVA) Table.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.3 Using Multiple Regression to Make Inferences.
14- 1 Chapter Fourteen McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved.
Chapter 13 Multiple Regression
Lack of Fit (LOF) Test A formal F test for checking whether a specific type of regression function adequately fits the data.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
Multiple regression. Example: Brain and body size predictive of intelligence? Sample of n = 38 college students Response (Y): intelligence based on the.
Lecture 10 Chapter 23. Inference for regression. Objectives (PSLS Chapter 23) Inference for regression (NHST Regression Inference Award)[B level award]
28. Multiple regression The Practice of Statistics in the Life Sciences Second Edition.
Multiple Regression I 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 1) Terry Dielman.
1 1 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Inference for regression - More details about simple linear regression IPS chapter 10.2 © 2006 W.H. Freeman and Company.
Business Research Methods
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Multiple Regression Chapter 14.
Regression Analysis Presentation 13. Regression In Chapter 15, we looked at associations between two categorical variables. We will now focus on relationships.
Analysis of variance approach to regression analysis … an (alternative) approach to testing for a linear association.
Chapter 15 Inference for Regression. How is this similar to what we have done in the past few chapters?  We have been using statistics to estimate parameters.
Stats Methods at IC Lecture 3: Regression.
Multiple Regression Models
Multiple Regression Analysis
John Loucks St. Edward’s University . SLIDES . BY.
CHAPTER 29: Multiple Regression*
Multiple Regression Analysis
Introduction to Regression
Presentation transcript:

Chapter 14 Multiple Regression Models

2  A general additive multiple regression model, which relates a dependent variable y to k predictor variables x 1, x 2,…, x k is given by the model equation y =  +  1 x 1 +  2 x 2 + … +  k x k + e The random deviation e is assumed to be normally distributed with mean value 0 and variance  2 for any particular values of x 1, x 2,…, x k. This implies that for fixed x 1, x 2,…, x k values, y has a normal distribution with variance  2 and (mean y value for fixed x 1, x 2,…, x k values) =  +  1 x 1 +  2 x 2 + … +  k x k Multiple Regression Models

3 The  i ’s are called population regression coefficients; each  i can be interpreted as the true average change in y when the predictor x i increases by 1 unit and the values of all the other predictors remain fixed. The deterministic portion  +  1 x 1 +  2 x 2 + … +  k x k is called the population regression function. Multiple Regression Models

4 The k th degree polynomial regression model y =  +  1 x +  2 x 2 + … +  k x k + e Is a special case of the general multiple regression model with x 1 = x, x 2 = x 2, …, x k = x k. The population regression function (mean value of y for fixed values of the predictors) is  +  1 x +  2 x 2 + … +  k x k. The most important special case other than simple linear regression (k = 1) is the quadratic regression model y =  +  1 x +  2 x 2. This model replaces the line y =  +  x with a parabolic cure of mean values  +  1 x +  2 x 2. If  2 > 0, the curve opens upward, whereas if  2 < 0, the curve opens downward. Polynomial Regression Models

5 If the change in the mean y value associated with a 1-unit increase in one independent variable depends on the value of a second independent variable, there is interaction between these two variables. When the variables are denoted by x 1 and x 2, such interaction can be modeled by including x 1 x 2, the product of the variables that interact, as a predictor variable. Interaction

6 Up to now, we have only considered the inclusion of quantitative (numerical) predictor variables in a multiple regression model. Two types are very common:  Dichotomous variable: One with just two possible categories coded 0 and 1 Example Gender {male, female} Marriage status {married, not-married}  Ordinal variables: Categorical variables that have a natural ordering Activity level {light, moderate, heavy} coded respectively as 1, 2 and 3 Education level {none, elementary, secondary, college, graduate} coded respectively 1, 2, 3, 4, 5 (or for that matter any 5 consecutive integers} Qualitative Predictor Variables.

7 According to the principle of least squares, the fit of a particular estimated regression function a + b 1 x 1 + b 2 x 2 + … + b k x k to the observed data is measured by the sum of squared deviations between the observed y values and the y values predicted by the estimated function:  [y –(a + b 1 x 1 + b 2 x 2 + … + b k x k )] 2 The least squares estimates of ,  1,  2,…,  k are those values of a, b 1, b 2, …, b k that make this sum of squared deviations as small as possible. Least Square Estimates

8 Predicted Values & Residuals

9 Sums of Squares

10 Estimate for  2

11 Coefficient of Multiple Determination, R 2

12 Adjusted R 2 Generally, a model with large R 2 and small s e are desirable. If a large number of variables (relative to the number of data points) is used those conditions may be satisfied but the model will be unrealistic and difficult to interpret.

13 F Distributions F distributions are similar to a Chi-Square Distributions, but have two parameters, df den and df num.

14 The F Test for Model Utility The regression sum of squares denoted by SSReg is defined by SSREG = SSTo - SSresid

15 The F Test for Model Utility

16 The F Test Utility of the Model y =  +  1 x 1 +  2 x 2 + … +  k x k + e Null hypothesis: H 0 :  1 =  2 = … =  k =0 (There is no useful linear relationship between y and any of the predictors.) Alternate hypothesis: H a : At least one among  1,  2, …,  k is not zero (There is a useful linear relationship between y and at least one of the predictors.)

17 The F Test Utility of the Model y =  +  1 x 1 +  2 x 2 + … +  k x k + e

18 The F Test Utility of the Model y =  +  1 x 1 +  2 x 2 + … +  k x k + e The test is upper-tailed, and the information in the Table of Values that capture specified upper-tail F curve areas is used to obtain a bound or bounds on the P-value using numerator df = k and denominator df = n - (k + 1). Assumptions: For any particular combination of predictor variable values, the distribution of e, the random deviation, is normal with mean 0 and constant variance.

19 An Example During a summer NSF program for teachers of statistics, the participants were asked to break into groups and develop a project similar in scope to what we would like to have our students develop. One of these groups decided that it would study lung capacity of adult humans measured in liters. To measure the capacities of a sample of adults (the sample was not particularly easy to obtain on the campus during the summer so we “shanghaied” everyone that was willing to stand still, be measured and interviewed. We used borrowed (antique liquid displacement apparatus) equipment and collected data.

20 An Example This group recorded a number of variables including gender (m or f), age (yrs), height (in), weight (lbs), waist (in), chest girth (in), smoking (Y or N), activity level (1 - light, 1 - medium, 3 - heavy) along with the lung capacity (liters). The code for the gender is 0 = Female 1 = Male The code for smoking is0 = No 1 = Yes The data follows on the next slides

21 An Example - The Data

22 An Example - The Data

23 An Example - The Data

24 Analysis - 1 st with Minitab Regression Analysis: Capacity versus Age, Height,... The regression equation is Capacity = Age Height Weight Chest Waist Activity Smoke Gender 40 cases used 1 cases contain missing values Predictor Coef SE Coef T P Constant Age Height Weight Chest Waist Activity Smoke Gender S = R-Sq = 84.3% R-Sq(adj) = 80.2%

25 Analysis - 2 nd with Minitab Notice that the P-values on the right suggest that only the predictors height (P-value = 0.000) and activity level (P-value = 0.012) are significant at the 0.05 level of significance. The only other variable that seem possibly significant are age (P- value = and gender (P-value =0.148). When stepwise regression techniques are applied using Minitab, the variables that remain significant are height, activity level, age and gender. The output is on the next two slides.

26 Analysis - 2 nd with Minitab Stepwise Regression: Capacity versus Age, Height,... Alpha-to-Enter: 0.1 Alpha-to-Remove: 0.1 Response is Capacity on 8 predictors, with N = 40 N(cases with missing observations) = 1 N(all cases) = 41 Step Constant Height T-Value P-Value Activity T-Value P-Value

27 Analysis - 2 nd with Minitab Activity T-Value P-Value Age T-Value P-Value Gender T-Value P-Value S R-Sq R-Sq(adj) C-p

28 Analysis - 2 nd with Minitab The resulting Minitab output from the regression analysis using those 4 predictors follows. Regression Analysis: Capacity versus Height, Activity, Gender, Age The regression equation is Capacity = Height Activity Gender Age 40 cases used 1 cases contain missing values Predictor Coef SE Coef T P Constant Height Activity Gender Age S = R-Sq = 83.2% R-Sq(adj) = 81.3%

29 Analysis - 2 nd with Minitab Consider the following graphs: residuals vs fits and the normal plot of the residual.

30 Analysis - 2 nd with Minitab

31 Analysis - 2 nd with Minitab Notice that both of these graphs appear to indicate that the assumptions made were justifiable. This multilinear model appears to provide a reasonably acceptable model for estimating lung capacity.

32 Analysis - 3 rd with Minitab An number of the members on the project team felt that other variables, specifically height/weight and chest/waist rations as well as the square of the chest girth multiplied by the height might be better predictor variables. When these three combination variables were calculated and added to the height, activity level, age and gender the following Minitab output was obtained.

33 Analysis - 3 rd with Minitab Regression Analysis: Capacity versus Height, Activity,... The regression equation is Capacity = Height Activity Gender Age HT/WT CH/Waist c2h 40 cases used 1 cases contain missing values Predictor Coef SE Coef T P Constant Height Activity Gender Age HT/WT CH/Waist c2h S = R-Sq = 83.6% R-Sq(adj) = 80.0%

34 Analysis - 1 st with Minitab None of these three variables appeared to be significant. The fact that the girth 2 height which would be proportional (approximately) to the volume of the body came as a surprise to the members of the team. As a side note, the literature on spirography suggests that height is the most significant factor in lung capacity and this was what this particular study indicated after it was completely analyzed.