Copyright © 2010, 2007, 2004 Pearson Education, Inc. *Chapter 29 Multiple Regression.

Slides:



Advertisements
Similar presentations
Continuation of inference testing 9E
Advertisements

Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Inferences for Regression.
Copyright © 2010 Pearson Education, Inc. Chapter 27 Inferences for Regression.
Chapter 27 Inferences for Regression This is just for one sample We want to talk about the relation between waist size and %body fat for the complete population.
Inferences for Regression
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
Copyright © 2010 Pearson Education, Inc. Slide
Inference for Regression
Chapter 14 Comparing two groups Dr Richard Bußmann.
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 25, Slide 1 Chapter 25 Comparing Counts.
Inferences for Regression
Class 5: Thurs., Sep. 23 Example of using regression to make predictions and understand the likely errors in the predictions: salaries of teachers and.
Copyright © 2006 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide Are the Means of Several Groups Equal? Ho:Ha: Consider the following.
Copyright © 2010 Pearson Education, Inc. Chapter 24 Comparing Means.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Correlation and Regression Analysis
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Simple Linear Regression Analysis
Active Learning Lecture Slides
Copyright © 2009 Pearson Education, Inc. Chapter 28 Analysis of Variance.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. *Chapter 28 Analysis of Variance.
Chapter 13: Inference in Regression
STA291 Statistical Methods Lecture 27. Inference for Regression.
Copyright © 2010 Pearson Education, Inc. Chapter 23 Inference About Means.
Copyright © 2009 Pearson Education, Inc. Chapter 23 Inferences About Means.
Slide 23-1 Copyright © 2004 Pearson Education, Inc.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
STA291 Statistical Methods Lecture 31. Analyzing a Design in One Factor – The One-Way Analysis of Variance Consider an experiment with a single factor.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 26 Comparing Counts.
Copyright © 2010 Pearson Education, Inc. Warm Up- Good Morning! If all the values of a data set are the same, all of the following must equal zero except.
Inference for Regression
Inferences for Regression
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 24 Comparing Means.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 6 The Standard Deviation as a Ruler and the Normal Model.
Copyright © 2009 Pearson Education, Inc. Chapter 6 The Standard Deviation as a Ruler and the Normal Model.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Chapter 14 Inference for Regression © 2011 Pearson Education, Inc. 1 Business Statistics: A First Course.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.3 Using Multiple Regression to Make Inferences.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
Copyright © 2010 Pearson Education, Inc. Warm Up- Good Morning! If all the values of a data set are the same, all of the following must equal zero except.
Copyright © 2012 Pearson Education. All rights reserved Copyright © 2012 Pearson Education. All rights reserved. Chapter 18 Multiple Regression.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 26, Slide 1 Chapter 27 Inferences for Regression.
Inference About Means Chapter 23. Getting Started Now that we know how to create confidence intervals and test hypotheses about proportions, it’d be nice.
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 6 The Standard Deviation as a Ruler and the Normal Model.
Comparing Counts Chapter 26. Goodness-of-Fit A test of whether the distribution of counts in one categorical variable matches the distribution predicted.
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Multiple Regression Chapter 14.
Chapter 26 Inferences for Regression. An Example: Body Fat and Waist Size Our chapter example revolves around the relationship between % body fat and.
Statistics 27 Inferences for Regression. An Example: Body Fat and Waist Size Our chapter example revolves around the relationship between % body fat and.
Statistics 27 Inferences for Regression. An Example: Body Fat and Waist Size Our chapter example revolves around the relationship between % body fat and.
Chapter 13 Lesson 13.2a Simple Linear Regression and Correlation: Inferential Methods 13.2: Inferences About the Slope of the Population Regression Line.
Chapter 13 Lesson 13.2a Simple Linear Regression and Correlation: Inferential Methods 13.2: Inferences About the Slope of the Population Regression Line.
Stats Methods at IC Lecture 3: Regression.
Inferences for Regression
Inference for Regression
Chapter 25 Comparing Counts.
Monday, April 10 Warm-up Using the given data Create a scatterplot
CHAPTER 29: Multiple Regression*
Paired Samples and Blocks
Chapter 26 Comparing Counts.
Chapter 26 Comparing Counts Copyright © 2009 Pearson Education, Inc.
Inferences for Regression
Chapter 26 Comparing Counts.
Presentation transcript:

Copyright © 2010, 2007, 2004 Pearson Education, Inc. *Chapter 29 Multiple Regression

Copyright © 2010, 2007, 2004 Pearson Education, Inc. Slide Just Do It The method of least squares can be expanded to include more than one predictor. The method is known as multiple regression. For simple regression we found the Least Squares solution, the one whose coefficients made the sum of the squared residuals as small as possible. For multiple regression, we’ll do the same thing, but this time with more coefficients.

Copyright © 2010, 2007, 2004 Pearson Education, Inc. Slide Just Do It (cont.) You should recognize most of the numbers in the following example (%body fat) of a multiple regression table. Most of them mean what you expect them to.

Copyright © 2010, 2007, 2004 Pearson Education, Inc. Slide So What’s New? The meaning of the coefficients in the regression model has changed in a subtle but important way. Multiple regression is an extraordinarily versatile calculation, underlying many widely used Statistics methods. Multiple regression offers our first glimpse into statistical methods that use more than two quantitative variables.

Copyright © 2010, 2007, 2004 Pearson Education, Inc. Slide What Multiple Regression Coefficients Mean We said that height might be important in predicting body fat in men. What’s the relationship between %body fat and height in men? Here’s the scatterplot:

Copyright © 2010, 2007, 2004 Pearson Education, Inc. Slide What Multiple Regression Coefficients Mean (cont.) It doesn’t look like height tells us much about %body fat. Or does it? The coefficient of height in the multiple regression model was statistically significant, so it did contribute to the multiple regression model. How can this be? The multiple regression coefficient of height takes account of the other predictor (waist size) in the regression model.

Copyright © 2010, 2007, 2004 Pearson Education, Inc. Slide What Multiple Regression Coefficients Mean (cont.) For example, when we restrict our attention to men with waist sizes between 36 and 38 inches (points in blue), we can see a relationship between %body fat and height:

Copyright © 2010, 2007, 2004 Pearson Education, Inc. Slide What Multiple Regression Coefficients Mean (cont.) So, overall there’s little relationship between %body fat and height, but when we focus on particular waist sizes there is a relationship. This relationship is conditional because we’ve restricted our set to only those men with a certain range of waist sizes. For men with that waist size, an extra inch of height is associated with a decrease of about 0.60% in body fat. If that relationship is consistent for each waist size, then the multiple regression coefficient will estimate it.

Copyright © 2010, 2007, 2004 Pearson Education, Inc. Slide What Multiple Regression Coefficients Mean (cont.) The following partial regression plot shows the coefficient of height in the regression model has a slope equal to the coefficient value in the multiple regression model:

Copyright © 2010, 2007, 2004 Pearson Education, Inc. Slide The Multiple RegressionModel For a multiple regression with k predictors, the model is: The assumptions and conditions for the multiple regression model sound nearly the same as for simple regression, but with more variables in the model, we’ll have to make a few changes.

Copyright © 2010, 2007, 2004 Pearson Education, Inc. Slide Assumptions and Conditions Linearity Assumption: Straight Enough Condition: Check the scatterplot for each candidate predictor variable—the shape must not be obviously curved or we can’t consider that predictor in our multiple regression model. Independence Assumption: Randomization Condition: The data should arise from a random sample or randomized experiment. Also, check the residuals plot - the residuals should appear to be randomly scattered.

Copyright © 2010, 2007, 2004 Pearson Education, Inc. Slide Assumptions and Conditions (cont.) Equal Variance Assumption: Does the Plot Thicken? Condition: Check the residuals plot—the spread of the residuals should be uniform.

Copyright © 2010, 2007, 2004 Pearson Education, Inc. Slide Assumptions and Conditions (cont.) Normality Assumption: Nearly Normal Condition: Check a histogram of the residuals—the distribution of the residuals should be unimodal and symmetric, and the Normal probability plot should be straight.

Copyright © 2010, 2007, 2004 Pearson Education, Inc. Slide Assumptions and Conditions (cont.) Summary of the checks of conditions in order: 1. Check the Straight Enough Condition with scatterplots of the y-variable against each x- variable. 2. If the scatterplots are straight enough, fit a multiple regression model to the data. 3. Find the residuals and predicted values. 4. Make and check a scatterplot of the residuals against the predicted values. This plot should look patternless.

Copyright © 2010, 2007, 2004 Pearson Education, Inc. Slide Assumptions and Conditions (cont.) Summary of the checks of conditions in order: 5. Think about how the data were collected. Randomization? Representative? Plot residuals against time - patterns? 6. If the conditions check out this far, feel free to interpret the regression model and use it for prediction. 7. If you wish to test hypotheses about the coefficients or about the overall regression, then make a histogram and Normal probability plot of the residuals to check the Nearly Normal Condition.

Copyright © 2010, 2007, 2004 Pearson Education, Inc. Slide Multiple Regression Inference I: I Thought I Saw an ANOVA Table… Now that we have more than one predictor, there’s an overall test we should consider before we do more inference on the coefficients. We ask the global question “Is this multiple regression model any good at all?” We test The F-statistic and associated P-value from the ANOVA table are used to answer our question.

Copyright © 2010, 2007, 2004 Pearson Education, Inc. Slide Multiple Regression Inference II: Testing the Coefficients Once we check the F-test and reject the null hypothesis, we can move on to checking the test statistics for the individual coefficients. For each coefficient, we test If the assumptions and conditions are met (including the Nearly Normal Condition), these ratios follow a Student’s t-distribution.

Copyright © 2010, 2007, 2004 Pearson Education, Inc. Slide Multiple Regression Inference II: Testing the Coefficients (cont.) We can also find a confidence interval for each coefficient:

Copyright © 2010, 2007, 2004 Pearson Education, Inc. Slide Multiple Regression Inference II: Testing the Coefficients (cont.) Keep in mind that the meaning of a multiple regression coefficient depends on all the other predictors in the multiple regression model. If we fail to reject the null hypothesis for a multiple regression coefficient, it does not mean that the corresponding predictor variable has no linear relationship to y. It means that the corresponding predictor contributes nothing to modeling y after allowing for all the other predictors.

Copyright © 2010, 2007, 2004 Pearson Education, Inc. Slide How’s That, Again? It looks like each coefficient in the multiple regression tells us the effect of its associated predictor on the response variable. But, that is not so. The coefficient of a predictor in a multiple regression depends as much on the other predictors as it does on the predictor itself.

Copyright © 2010, 2007, 2004 Pearson Education, Inc. Slide Comparing Multiple Regression Models How do we know that some other choice of predictors might not provide a better model? What exactly would make an alternative model better? These questions are not easy—there’s no simple measure of the success of a multiple regression model.

Copyright © 2010, 2007, 2004 Pearson Education, Inc. Slide Comparing Multiple Regression Models (cont.) Regression models should make sense. Predictors that are easy to understand are usually better choices than obscure variables. Similarly, if there is a known mechanism by which a predictor has an effect on the response variable, that predictor is usually a good choice for the regression model. The simple answer is that we can’t know whether we have the best possible model.

Copyright © 2010, 2007, 2004 Pearson Education, Inc. Slide Adjusted R 2 There is another statistic in the full regression table called the adjusted R 2. This statistic is a rough attempt to adjust for the simple fact that when we add another predictor to a multiple regression, the R 2 can’t go down and will most likely get larger. This fact makes it difficult to compare alternative regression models that have different numbers of predictors.

Copyright © 2010, 2007, 2004 Pearson Education, Inc. Slide Adjusted R 2 (cont.) The formula for R 2 is while the formula for adjusted R 2 is

Copyright © 2010, 2007, 2004 Pearson Education, Inc. Slide Adjusted R 2 (cont.) Because the mean squares are sums of squares divided by their degrees of freedom, they are adjusted for the number of predictors in the model. As a result, the adjusted R 2 value won’t necessarily increase when a new predictor is added to the multiple regression model. That’s fine, but adjusted R 2 no longer tells the fraction of variability accounted for by the model—it isn’t even bounded by 0 and 100%.

Copyright © 2010, 2007, 2004 Pearson Education, Inc. Slide Adjusted R 2 (cont.) Comparing alternative regression models is a challenge, especially when they have different numbers of predictors. Adjusted R 2 is one way to help you choose your model. But, don’t use it as the sole decision criterion when you compare different regression models.

Copyright © 2010, 2007, 2004 Pearson Education, Inc. Slide What Can Go Wrong? Interpreting Coefficients Don’t claim to “hold everything else constant” for a single individual. Don’t interpret regression causally. Be cautious about interpreting a regression model as predictive. Don’t think that the sign of a coefficient is special. If a coefficient’s t-statistic is not significant, don’t interpret it at all.

Copyright © 2010, 2007, 2004 Pearson Education, Inc. Slide What Else Can Go Wrong? Don’t fit a linear regression to data that aren’t straight. Watch out for the plot thickening. Make sure the errors are nearly Normal. Watch out for high-influence points and outliers.

Copyright © 2010, 2007, 2004 Pearson Education, Inc. Slide What have we learned? There are many similarities between simple and multiple regression: The assumptions and conditions are essentially the same. R 2 still gives us the fraction of the total variation in y accounted for by the model. s e is still the standard deviation of the residuals. The degrees of freedom follows the same rule: n minus the number of parameters estimated. The regression table produced by any statistics package shows a row for each coefficient, giving its estimate, a standard error, a t-statistic, and a P-value. If all conditions are met, we can test each coefficient against the null hypothesis.

Copyright © 2010, 2007, 2004 Pearson Education, Inc. Slide What have we learned? We have also learned some new things: We can perform an overall test of whether the multiple regression model provides a better summary for y than its mean by using the F- distribution. We learned that R 2 may not be appropriate for comparing multiple regression models with different numbers of predictors. The adjusted R 2 is one approach to this problem.

Copyright © 2010, 2007, 2004 Pearson Education, Inc. Slide What have we learned? (cont.) There are some profound differences in interpretation when adding more predictors: The coefficient of each x indicates the average change in y we’d expect to see for a unit change in that x for particular values of all the other x-variables. The coefficient of a predictor variable can change sign when another variable is entered or dropped from the model. Finding a suitable model from among the possibly hundreds of potential models is not straightforward.