CHAPTER 29: Multiple Regression*

Slides:



Advertisements
Similar presentations
Chapter 12 Inference for Linear Regression
Advertisements

Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Inference for Regression
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Objectives (BPS chapter 24)
Chapter 10 Simple Regression.
Chapter 12 Simple Regression
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
SIMPLE LINEAR REGRESSION
Chapter Topics Types of Regression Models
Lecture 16 – Thurs, Oct. 30 Inference for Regression (Sections ): –Hypothesis Tests and Confidence Intervals for Intercept and Slope –Confidence.
Ch. 14: The Multiple Regression Model building
Correlation and Regression Analysis
Chapter 12 Section 1 Inference for Linear Regression.
Simple Linear Regression Analysis
Correlation & Regression
Regression and Correlation Methods Judy Zhong Ph.D.
Introduction to Linear Regression and Correlation Analysis
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
Inference for regression - Simple linear regression
Chapter 13: Inference in Regression
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Inferences for Regression
BPS - 3rd Ed. Chapter 211 Inference for Regression.
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Multiple regression - Inference for multiple regression - A case study IPS chapters 11.1 and 11.2 © 2006 W.H. Freeman and Company.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Inference for Regression Chapter 14. Linear Regression We can use least squares regression to estimate the linear relationship between two quantitative.
+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Lecture 10 Chapter 23. Inference for regression. Objectives (PSLS Chapter 23) Inference for regression (NHST Regression Inference Award)[B level award]
28. Multiple regression The Practice of Statistics in the Life Sciences Second Edition.
Chapter 10 Inference for Regression
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
BPS - 5th Ed. Chapter 231 Inference for Regression.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
Lecture Slides Elementary Statistics Twelfth Edition
CHAPTER 12 More About Regression
23. Inference for regression
Chapter 14 Introduction to Multiple Regression
Chapter 20 Linear and Multiple Regression
Chapter 14 Inference on the Least-Squares Regression Model and Multiple Regression.
Inference for Regression (Chapter 14) A.P. Stats Review Topic #3
AP Statistics Chapter 14 Section 1.
Statistics for Managers using Microsoft Excel 3rd Edition
Inferences for Regression
Inference for Regression
Multiple Regression Analysis and Model Building
CHAPTER 12 More About Regression
Chapter 11 Simple Regression
(Residuals and
Slides by JOHN LOUCKS St. Edward’s University.
Simple Linear Regression - Introduction
The Practice of Statistics in the Life Sciences Fourth Edition
CHAPTER 26: Inference for Regression
Regression Models - Introduction
CHAPTER 22: Inference about a Population Proportion
Chapter 14 Inference for Regression
Basic Practice of Statistics - 3rd Edition Inference for Regression
CHAPTER 12 More About Regression
CHAPTER 12 More About Regression
Inferences for Regression
Presentation transcript:

CHAPTER 29: Multiple Regression* Basic Practice of Statistics - 3rd Edition CHAPTER 29: Multiple Regression* Basic Practice of Statistics 7th Edition Lecture PowerPoint Slides Chapter 5

In Chapter 29, We Cover … Parallel regression lines Estimating parameters Using technology Inference for multiple regression Interaction The general multiple linear regression model The woes of regression coefficients Inference for regression parameters Checking the conditions for inference

Introduction When a scatterplot shows a linear relationship between a quantitative explanatory variable x and a quantitative response variable y, we fit a regression line to the data to describe the relationship. Previously, we did regression with just one explanatory variable—we will now call this simple linear regression to remind us that this is a special case. In some cases, other explanatory variables might improve our understanding of the response y and help us to better predict y. We now explore the more general case of multiple regression, which allows for several explanatory variables to combine in explaining a response variable.

Parallel Regression Lines Consider a scatterplot that shows two parallel straight lines linking y to x1 (one line, my = β0 + β1 x1, for each of two groups). An indicator variable (x2) can be added to the regression equation to denote the two categories: my = β0 + β1 x1 + β2 x2 Indicator Variable An indicator variable places individuals into one of two categories, usually coded by the values 0 and 1.

Parallel Regression Line Model with indicator variable (x2): my = β0 + β1 x1 + β2 x2 when x2 = 0, my = β0 + β1 x1 when x2 = 1, my = β0 + β1 x1 + β2 = (β0 + β2) + β1 x1 Note that the slopes (β1) are the same, but the intercepts may be different (the difference is determined by β2).

Example Percent of possible jurors reporting for jury duty based on the reporting date (reporting date is coded as “1” for the first two weeks of the year up to “26” for the last two weeks of the year), for 1998 and 2000:

Example Lines without using indicator variable: 𝑦 =95.571−0.765 𝑥 1 𝑦 =76.426−0.668 𝑥 1

Example Lines using indicator variable to force equal slopes: overall model: 𝑦 =77.082−0.717 𝑥 1 +17.833 𝑥 2 𝑦 =94.915−0.717 𝑥 1 𝑥 2 =1 𝑦 =77.082−0.717 𝑥 1 𝑥 2 =0

Estimating Parameters How shall we estimate the β’s in the model my = β0 + β1 x1 + β2 x2? The method of least squares obtains estimates of the βi’s (denoted bi’s) by choosing the values that minimize the sum of squared deviations in the y-direction: observed 𝑦−predicted 𝑦 2 = 𝑦− 𝑦 2

Estimating Parameters These differences between the actual y-values and the predicted y-values are called residuals. Estimate the “left-over variation” about the regression model. The remaining parameter to estimate is s, the standard deviation of the response variable y about the mean (assumed the same for all combinations of the x’s). The standard deviation s of the residuals is used to estimate s (also called the regression standard error).

Estimating Parameters regression standard error The regression standard error for the multiple regression model 𝑦 = b0 + b1 x1 + b2 x2 is : 𝑠 = 1 𝑛−3 residual 2 𝑠 = 1 𝑛−3 𝑦− 𝑦 2 Use s to estimate the standard deviation s of the responses about the mean given by the population regression model. Here, note that we are estimating b0, b1, and b2; this makes our denominator (and the degrees of freedom for the regression standard error) n – 3. In general, this will be n – (the number of b parameters).

Using Technology Example of output from using technology: Potential Jurors Example of output from using technology: Parameter estimates Standard error ANOVA table Sum of squares (SS) due to the model Sum of squares due to error Total SS = model SS + error SS Squared multiple correlation coefficient (R2)

R2 R2 tells us what proportion of the variation in the response variable y is explained by using the set of explanatory variables in the multiple regression model. squared multiple correlation coefficient The squared multiple correlation coefficient (R2) is the square of the correlation coefficient between the observed responses y and the predicted responses 𝑦 ; it is also equal to: 𝑅 2 = variability explained by model total variability in y = model sum of squares total sum of squares R2 is almost always given with a regression model to describe the fit of the model to the data.

Inference for Multiple Regression Conditions: Linear trend (model is correct) Scatterplots of y vs. xi show linear patterns. Normality Residuals are symmetric about 0 and approximately Normal. Constant variance (s the same for all values of x’s) Plot of residuals vs. 𝑦 shows unstructured pattern with approximately equal spread in the y-direction. Independence Observations are not dependent on previous observations; residual plot shows no pattern based on the order of the observations.

Potential Jurors – Checking Conditions Example (a) Linear trend Potential Jurors – Checking Conditions (c) Equal variance (b) Normality (d) Independence

Inference for Multiple Regression For testing the null hypothesis that all of the regression coefficients (b ’s), except b 0, are equal to zero F statistic for regression model The analysis of variance F statistic for testing the null hypotheses that all of the regression coefficients (b ’s), except b 0, are equal to zero has the form 𝐹= variation due to model variation due to error = Model mean square Error mean square

Inference for Multiple Regression If the overall F test is significant, then we may want to know which individual parameters are different from zero. Individual t tests for coefficients To test the null hypothesis that one of the b ’s in a specific regression model is zero, compute the t statistic: 𝑡= parameter estimate standard error of estimate = 𝑏 SE 𝑏 If the conditions for inference are met, then the t distribution with (n – 3) degrees of freedom can be used to compute confidence intervals and conduct hypothesis tests for b0, b1 and b2.

Interaction Parallel linear patterns for two categories are somewhat rare; it is more common to see two linear patterns that are not parallel. An interaction term (x1x2) can be added to the regression equation to allow for unequal slopes: my = β0 + β1 x1 + β2 x2 + β3 x1x2 Interaction means that the relationship between the mean response and one explanatory variable x1 changes when we change the value of the other explanatory variable x2.

Interaction Model with interaction term (x1x2): my = β0 + β1 x1 + β2 x2 + β3 x1x2 when x2 = 0, my = β0 + β1 x1 when x2 = 1, my = β0 + β1 x1 + β2 + β3 x1 W hen x2=1, my = (β0 + β2) + (β1+ β3)x1 Note that, in addition to the intercepts being different, the slopes are now different as well (the difference is determined by β3).

The General Multiple Linear Regression Model THE MULTIPLE LINEAR REGRESSION MODEL We have observations on 𝑛 individuals. Each observation consists of values of 𝑝 explanatory variables 𝑥 1 , 𝑥 2 , . . . , 𝑥 𝑝 and a response variable 𝑦. Our goal is to study or predict the behavior of y given the values of the explanatory variables. For any set of fixed values of the explanatory variables, the response 𝑦 varies according to a Normal distribution. Repeated responses 𝑦 are independent of each other. The mean response 𝜇 𝑦 has a linear relationship given by the population regression model 𝜇 𝑦 = 𝛽 0 + 𝛽 1 𝑥 1 + 𝛽 2 𝑥 2 +⋯+ 𝛽 𝑝 𝑥 𝑝 The 𝛽 𝑖 ’s are unknown parameters. The standard deviation of 𝑦 (call it 𝜎) is the same for all values of the explanatory variables. The value of 𝝈 is unknown. This model has 𝑝+2 parameters that we must estimate from data: the 𝑝+1 coefficients, 𝛽 0 , 𝛽 1 , ⋯ 𝛽 𝑝 and the standard deviation 𝜎.

The Woes of Regression Coefficients When we start to explore models with several explanatory variables, we quickly meet the big new idea of multiple regression in practice: The relationship between the response y and any one explanatory variable can change greatly depending on what other explanatory variables are present in the model.

Inference for Regression Parameters ANOVA Table:

Inference for Regression Parameters The first formal test in most multiple regression studies is the ANOVA F test. This test is used to check if the complete set of explanatory variables is helpful in predicting the response variable. analysis of variance F test The analysis of variance F statistic for testing the null hypotheses that all of the regression coefficients (b ’s), except b 0, are equal to zero has the form 𝐹= variation due to model variation due to error P-values come from the F distribution with p and n – p – 1 degrees of freedom.

Inference for Regression Parameters Remember that an individual t assesses the contribution of its variable in the presence of the other variables in this specific model. Confidence intervals and individual t tests for coefficients A level C confidence interval for the regression coefficient b is 𝑏± 𝑡 ∗ SE 𝑏 . The critical value 𝑡 ∗ is obtained from the 𝑡 𝑛−𝑝−1 distribution.

Inference for Regression Parameters Confidence intervals and individual t tests for coefficients The 𝑡 statistic for testing the null hypothesis that a regression coefficient 𝛽 is equal to zero has the form 𝑡= parameter estimate standard error of estimate = 𝑏 SE 𝑏 In terms of a random variable T having the 𝑡 𝑛−𝑝−1 distribution, the P-value for a test of H0 against 𝐻 𝑎 :𝛽>0 is 𝑃 𝑇≥𝑡 𝐻 𝑎 :𝛽<0 is 𝑃 𝑇≤𝑡 𝐻 𝑎 :𝛽≠0 is 2𝑃 𝑇≥ 𝑡

Inference for Regression Parameters CONFIDENCE AND PREDICTION INTERVALS FOR MULTIPLE REGRESSION RESPONSE A level C confidence interval for the mean response, 𝝁 𝒚 , is 𝑦 ± 𝑡 ∗ 𝑆𝐸 𝜇 . A level C prediction interval for a single response, 𝒚, is 𝑦 ± 𝑡 ∗ 𝑆𝐸 𝑦 . In both intervals, 𝑡 ∗ is the critical value for the 𝑡 𝑛−𝑝−1 density curve with area C between − 𝑡 ∗ and 𝑡 ∗ .

Checking the Conditions for Inference Plot the response variable against each of the explanatory variables. Plot the residuals against the predicted values and against all of the explanatory variables in the model. Look for outliers and influential observations in all residual plots. Ideally, we would like all of the explanatory variables to be independent and the observations on the response variable to be independent. To check the condition that the response should vary Normally about the multiple regression model, make a histogram or stemplot of the residuals.