Multiple Regression continued… STAT E-150 Statistical Methods.

Slides:



Advertisements
Similar presentations
STAT E-150 Statistical Methods
Advertisements

Qualitative predictor variables
Inference for Regression
Regression Inferential Methods
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Multiple Regression Fenster Today we start on the last part of the course: multivariate analysis. Up to now we have been concerned with testing the significance.
Chapter 13 Multiple Regression
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Interaksi Dalam Regresi (Lanjutan) Pertemuan 25 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Regresi dan Rancangan Faktorial Pertemuan 23 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Chapter 12 Multiple Regression
© 2000 Prentice-Hall, Inc. Chap Multiple Regression Models.
Multiple Regression Models. The Multiple Regression Model The relationship between one dependent & two or more independent variables is a linear function.
© 2003 Prentice-Hall, Inc.Chap 14-1 Basic Business Statistics (9 th Edition) Chapter 14 Introduction to Multiple Regression.
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 11 th Edition.
Linear Regression and Correlation Analysis
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Business Statistics - QBM117 Interval estimation for the slope and y-intercept Hypothesis tests for regression.
Ch. 14: The Multiple Regression Model building
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Linear Regression.  Uses correlations  Predicts value of one variable from the value of another  ***computes UKNOWN outcomes from present, known outcomes.
Review Regression and Pearson’s R SPSS Demo
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers.
Chapter 13: Inference in Regression
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.
ASSOCIATION BETWEEN INTERVAL-RATIO VARIABLES
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Multiple Regression SECTIONS 10.1, 10.3 (?) Multiple explanatory variables.
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide © 2003 Thomson/South-Western Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple Coefficient of Determination.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
Chapter 14 Introduction to Multiple Regression
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Regression. Height Weight Suppose you took many samples of the same size from this population & calculated the LSRL for each. Using the slope from each.
Statistics and Quantitative Analysis U4320 Segment 12: Extension of Multiple Regression Analysis Prof. Sharyn O’Halloran.
Chap 14-1 Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics.
Regression with Inference Notes: Page 231. Height Weight Suppose you took many samples of the same size from this population & calculated the LSRL for.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.3 Using Multiple Regression to Make Inferences.
Regression analysis and multiple regression: Here’s the beef* *Graphic kindly provided by Microsoft.
Chapter 16 Data Analysis: Testing for Associations.
Chapter 13 Multiple Regression
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
Lecture 4 Introduction to Multiple Regression
Multiple Logistic Regression STAT E-150 Statistical Methods.
Regression Analysis: Part 2 Inference Dummies / Interactions Multicollinearity / Heteroscedasticity Residual Analysis / Outliers.
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 14-1 Chapter 14 Introduction to Multiple Regression Statistics for Managers using Microsoft.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice- Hall, Inc. Chap 14-1 Business Statistics: A Decision-Making Approach 6 th Edition.
A first order model with one binary and one quantitative predictor variable.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Multiple Regression SECTIONS 10.1, 10.3 Multiple explanatory variables (10.1,
رگرسیون چندگانه Multiple Regression
Regression Inference. Height Weight How much would an adult male weigh if he were 5 feet tall? He could weigh varying amounts (in other words, there is.
Chapter 14 Introduction to Multiple Regression
Multiple Regression Analysis and Model Building
Regression Analysis Simple Linear Regression
STAT 250 Dr. Kari Lock Morgan
Regression.
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Chapter 12 Regression.
Multiple Regression BPS 7e Chapter 29 © 2015 W. H. Freeman and Company.
Regression.
Regression.
Regression.
Regression Chapter 8.
Regression.
Regression.
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Multiple Regression continued… STAT E-150 Statistical Methods

2 When we discussed simple linear regression, we briefly introduced prediction intervals and confidence intervals: Confidence Intervals and Prediction Intervals Let x be a specific value of x. The predicted value of y is We can create two different intervals: a prediction interval for an individual value of x a confidence interval for the mean predicted value at x

3 The basic format for an interval is When we want to find a mean predicted value, When we want to find an individual predicted value,

4 Let us return to our earlier discussion of the age of adolescent mothers and the weight of their babies. We found that there was a linear relationship between these variables: weight = age – How can we use this model to make predictions?

5 Suppose we want to predict the weight of a baby born to a mother who is 16 years old. When we analyze the data, we can choose to save the predicted values, the confidence interval and the prediction interval for each predictor value. The results will appear in the datasheet: x-value predicted 95% CI 95% CI y-value confidence interval prediction interval

6 What weight is expected for a baby of a 16 year old mother?

7 What weight is expected for a baby of a 16 year old mother? 2759 g

8 What is the prediction interval estimate for the weight of a baby of a 16 year old mother?

9 What is the prediction interval estimate for the weight of a baby of a 16 year old mother? to g What does it tell you? We are 95% confident that the birthweight of a baby born to a 16 year old mother is between and g.

10 What is the prediction interval estimate for the weight of a baby of a 16 year old mother? to g What does it tell you? We are 95% confident that the birthweight of a baby born to a 16 year old mother is between and g.

11 What is the confidence interval estimate for the mean weight of babies of 16 year old mothers?

12 What is the confidence interval estimate for the mean weight of babies of 16 year old mothers? to g What does it tell you? We are 95% confident

13 What is the confidence interval estimate for the mean weight of babies of 16 year old mothers? to g What does it tell you? We are 95% confident that the mean birthweight of babies born to 16 year old mothers is between and g. We are 95% confident

14 The 95% confidence interval is ( , ) The 95% prediction interval is ( , ) Which is interval is wider? Why?

15 The 95% confidence interval is ( , ) The 95% prediction interval is ( , ) Which is interval is wider? Why? The prediction interval is wider, because means vary less than individual values.

16 In the data concerning body fat percentages in men, the predictor variables were waist and height, and we found a regression equation which we can now use to make predictions: %BodyFat = waist height – We can find prediction intervals and confidence intervals as we did when we used a single predictor.

17 Suppose we want to predict the body fat percentage associated with a waist size of 34 inches and a height of 6 feet. We can proceed as we did with a single predictor, by entering these values in the data window, and then saving the results of the linear regression analysis.

18 When you scroll to the right, you will see these results: What is the predicted body fat %?

19 When you scroll to the right, you will see these results: What is the predicted body fat %? %

20 When you scroll to the right, you will see these results: What is the prediction interval? What does it tell you?

21 When you scroll to the right, you will see these results: What is the prediction interval? What does it tell you? The 95% prediction interval is (5.05, 22.69)

22 When you scroll to the right, you will see these results: What is the prediction interval? What does it tell you? We are 95% confident that a man who is 6 feet tall and has a 34 inch waist will have a body fat percentage between 5.05 and

23 When you scroll to the right, you will see these results: What is the confidence interval? What does it tell you?

24 When you scroll to the right, you will see these results: What is the confidence interval? What does it tell you? The 95% confidence interval is (13.10, 14.65)

25 When you scroll to the right, you will see these results: What is the confidence interval? What does it tell you? We are 95% confident that the mean body fat percentage for men who are 6 feet tall and have a 34 inch waist is between and

26 Models with Categorical Predictors Categorical (or qualitative) variables can also be included in multiple regression models. These variables are coded as numbers so that we can employ the methods we have discussed. These coded values are called indicator variables or dummy variables. They are often coded using 0 and 1, where 0 = absence or 0 = "no" 1 = presence 1 = "yes"

27 Example: One way colleges measure success is by graduation rates. The Education Trust publishes 6-year graduation rates along with other college characteristics on its website,

28 Here is a sample of the data, which represents a random sample of 22 colleges selected from the 1037 colleges in the United States with enrollments under 5000 students:

29 We define these variables: y = 6-year graduation rate x 1 = median SAT score of students accepted to the college x 2 = student-related expense per full-time student (in dollars)

30 The regression model is y = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3 + ε For single-sex colleges: Rate = β 0 + β 1 SAT + β 2 Expense + β 3 (1) = β 0 + β 1 SAT + β 2 Expense + β 3 + ε For coeducational colleges: Rate = β 0 + β 1 SAT + β 2 Expense + β 3 (0) = β 0 + β 1 SAT + β 2 Expense + ε In either case, the slopes are determined using data from both types of colleges.

31 For single-sex colleges, the intercept is β 0 + β 3 : Rate = β 0 + β 1 SAT + β 2 Expense + β 3 (1) = β 0 + β 1 SAT + β 2 Expense + β 3 + ε = (β 0 + β 3 ) + β 1 SAT + β 2 Expense + ε For coeducational colleges: Rate = β 0 + β 1 SAT + β 2 Expense + β 3 (0) = β 0 + β 1 SAT + β 2 Expense + ε In other words, the coefficient of the indicator variable represents the difference in intercepts for the regression lines for the two types of colleges.

32 What are the hypotheses? H 0 : β 1 = β 2 = β 3 = 0 H a : The coefficients are not all zero

33 What are the hypotheses? H 0 : β 1 = β 2 = β 3 = 0 H a : The coefficients are not all zero

34 Here is part of the SPSS analysis: What is your conclusion?

35 What is your conclusion? Since F is large and p is close to 0, the null hypothesis is rejected. We can conclude that there is a linear relationship between the 6- year graduation rate and the median SAT score, the student-related expense per full-time student, and the gender of the student body.

36 What is the regression equation?

37 What is the regression equation? y =.001x x x

38 For single-sex colleges: y =.001x x (1) y =.001x x

39 For coed colleges: y =.001x x

40 What is the meaning of the coefficient β 3 ? We can interpret the value.125 as the “correction” we would make to the predicted graduation rate to incorporate the difference associated with having only male or only female students.

41 What is the meaning of the coefficient β 3 ? We can interpret the value.125 as the difference in intercepts for the two different types of colleges.

42 Interaction and Collinearity If the change in the mean y-value associated with a 1-unit increase in one predictor variable depends on the value of a second predictor variable, there is interaction between the two predictor variables. If we represent the variables as x 1 and x 2, the interaction can be modeled by including their product, x 1 x 2, as a predictor variable.

43 Interaction and Collinearity The regression model for two predictor variables would now include a cross-product term: Y = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 1 x 2 +ε where β 1 + β 3 x 2 represents the change in Y for every one-unit increase in x 1, keeping x 2 fixed β 2 + β 3 x 1 represents the change in Y for every one-unit increase in x 2, keeping x 1 fixed If you find that there is a linear association, be sure to check the coefficient of the interaction term.

44 We determine collinearity by examining a correlation matrix: What is the correlation between Pct BF and Height?-.029Is this value significant? No; p=.322 Pct BF and Waist? Is this value significant? Height and Waist? Is this value significant? Correlations HeightWaist Pearson CorrelationPct BF Height Waist Sig. (1-tailed)Pct BF Height..002 Waist.002. NPct BF250 Height250 Waist250

45 We determine collinearity by examining a correlation matrix: What is the correlation between Pct BF and Height?-.029Is this value significant? No; p =.322 Pct BF and Waist?.824Is this value significant? Yes; p =.000 Height and Waist?.187Is this value significant? Yes; p =.002 It is important to note that this information only refers to the pair of variables in question, without regard to the influences of other variables. Correlations HeightWaist Pearson CorrelationPct BF Height Waist Sig. (1-tailed)Pct BF Height..002 Waist.002. NPct BF250 Height250 Waist250

46 Another way to assess collinearity: VIF is the Variance Inflation Factor, which indicates whether a predictor has a strong linear relationship with the other predictors. There is reason for concern if the largest VIF is greater than 5. The Tolerance statistic is the reciprocal of the VIF. There is a serious problem if this value is less than.2. Coefficients a Model Unstandardized Coefficients Standardized Coefficients tSig. Collinearity Statistics BStd. ErrorBetaToleranceVIF 1(Constant) Waist Height a. Dependent Variable: Pct BF