Class 21: Tues., Nov. 23 Today: Multicollinearity, One-way analysis of variance Schedule: –Tues., Nov. 30 th – Review, Homework 8 due –Thurs., Dec. 2 nd.

Slides:



Advertisements
Similar presentations
Simple Linear Regression Analysis
Advertisements

Multiple Regression and Model Building
Lecture 17: Tues., March 16 Inference for simple linear regression (Ch ) R2 statistic (Ch ) Association is not causation (Ch ) Next.
The Multiple Regression Model.
Chap 12-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 12 Simple Regression Statistics for Business and Economics 6.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Ch11 Curve Fitting Dr. Deshi Ye
Class 16: Thursday, Nov. 4 Note: I will you some info on the final project this weekend and will discuss in class on Tuesday.
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
The Multiple Regression Model Prepared by Vera Tabakova, East Carolina University.
Chapter 13 Additional Topics in Regression Analysis
Multiple Linear Regression Model
Chapter 10 Simple Regression.
Class 23: Thursday, Dec. 2nd Today: One-way analysis of variance, multiple comparisons. Next week: Two-way analysis of variance. I will the final.
Lecture 23: Tues., Dec. 2 Today: Thursday:
Lecture 25 Multiple Regression Diagnostics (Sections )
Stat 112: Lecture 22 Notes Chapter 9.1: One-way Analysis of Variance. Chapter 9.3: Two-way Analysis of Variance Homework 6 is due on Friday.
Class 5: Thurs., Sep. 23 Example of using regression to make predictions and understand the likely errors in the predictions: salaries of teachers and.
Lecture 23: Tues., April 6 Interpretation of regression coefficients (handout) Inference for multiple regression.
Class 22: Tuesday, Nov. 30th Today: One-way analysis of variance I will you tonight or tomorrow morning with comments on your project. Schedule:
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Lecture 24 Multiple Regression (Sections )
Linear Regression and Correlation Analysis
1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.
Chapter 11 Multiple Regression.
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Lecture 23 Multiple Regression (Sections )
Lecture 16 – Thurs, Oct. 30 Inference for Regression (Sections ): –Hypothesis Tests and Confidence Intervals for Intercept and Slope –Confidence.
Topic 3: Regression.
1 4. Multiple Regression I ECON 251 Research Methods.
Class 24: Tues., Dec. 7th Today: Two-way analysis of variance Thursday: Design of Experiments Schedule: –Thurs., Dec. 9 th – Final class –Mon., Dec. 13.
Lecture 17 Interaction Plots Simple Linear Regression (Chapter ) Homework 4 due Friday. JMP instructions for question are actually for.
Class 20: Thurs., Nov. 18 Specially Constructed Explanatory Variables –Dummy variables for categorical variables –Interactions involving dummy variables.
Lecture 20 – Tues., Nov. 18th Multiple Regression: –Case Studies: Chapter 9.1 –Regression Coefficients in the Multiple Linear Regression Model: Chapter.
Stat 112: Lecture 21 Notes Model Building (Brief Discussion) Chapter 9.1: One way Analysis of Variance. Homework 6 is due Friday, Dec. 1 st. I will be.
Stat 112: Lecture 9 Notes Homework 3: Due next Thursday
AAEC 4302 ADVANCED STATISTICAL METHODS IN AGRICULTURAL RESEARCH Chapter 13.3 Multicollinearity.
Multiple Linear Regression A method for analyzing the effects of several predictor variables concurrently. - Simultaneously - Stepwise Minimizing the squared.
1 Chapter 10 Correlation and Regression We deal with two variables, x and y. Main goal: Investigate how x and y are related, or correlated; how much they.
Objectives of Multiple Regression
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Chapter 13: Inference in Regression
MAT 254 – Probability and Statistics Sections 1,2 & Spring.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Model Building III – Remedial Measures KNNL – Chapter 11.
12a - 1 © 2000 Prentice-Hall, Inc. Statistics Multiple Regression and Model Building Chapter 12 part I.
CHAPTER 14 MULTIPLE REGRESSION
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
Chap 14-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 14 Additional Topics in Regression Analysis Statistics for Business.
Economics 173 Business Statistics Lecture 20 Fall, 2001© Professor J. Petry
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
CHAPTER 11 SECTION 2 Inference for Relationships.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
Stat 112 Notes 9 Today: –Multicollinearity (Chapter 4.6) –Multiple regression and causal inference.
Lecture 10: Correlation and Regression Model.
Chapter 8: Simple Linear Regression Yang Zhenlin.
I271B QUANTITATIVE METHODS Regression and Diagnostics.
Stat 112 Notes 23. Quiz 4 Info 4 double sided sheets of notes Covers interactions, models with categorical variables and interactions, one way analysis.
More on regression Petter Mostad More on indicator variables If an independent variable is an indicator variable, cases where it is 1 will.
Chapter 14 Introduction to Multiple Regression
Prepared by Lee Revere and John Large
Presentation transcript:

Class 21: Tues., Nov. 23 Today: Multicollinearity, One-way analysis of variance Schedule: –Tues., Nov. 30 th – Review, Homework 8 due –Thurs., Dec. 2 nd – Midterm II –Tues., Dec. 7 th, Thurs., Dec. 9 th – Analysis of variance continued –Mon., Dec. 13 th – Rough draft of project due –Tues., Dec. 21 st – Final draft of project, Homework 9 due

Multicollinearity Multicollinearity in multiple regression refers to situation in which two or more explanatory variables are highly correlated. Multicollinearity between X 1 and X 2 makes it difficult to distinguish the effect of X 1 on Y holding X 2 fixed (coefficient on X 1 ) and effect of X 2 on Y holding X 1 fixed (coefficient on X 2 ). Multicollinearity leads to high standard errors of estimated coefficients.

House Prices Example A real estate agent wanted to develop a model to predict the selling price of a home. The agent believed that the most important variables in determining the price of a house are its size, number of bedrooms, and lot size. Accordingly the agent took a random sample of 100 recently sold homes and recorded the selling price (Y), the number of bedrooms (X 1 ), the size in square feet (X 2 ) and the lot size in square feet (X 3 ). The data is in housesellingprice.JMP

The p-value for the overall F-test is < Strong evidence that the model is useful – it provides better predictions than the sample mean of selling price. But none of the explanatory variables are useful for predicting Y once the other explanatory variables are taken into account –the p-values for the t-tests are all >0.05. What is happening?

Multicollinearity: House size and lot size are highly correlated (correlation = ). House size and bedrooms (0.8465) and lot size and bedroorms (0.8374) also fairly highly correlated.

Variance Inflation Factors: Method for Recognizing Multicollinearity Variance inflation factors (VIF): An index of the effect of multicollinearity on the standard error of coefficient estimates in regression. A VIF of 9 for explanatory variable X 1 implies that the standard error of the coefficient estimate of X 1 is times larger than it would be if X 1 was uncorrelated with other explanatory variables. If a coefficient has a VIF of greater than 10, that indicates that multicollinearity is having a substantial impact on it. VIFs in JMP. After Fit model, go to area of parameter estimates, right click, click Columns and then click VIF.

Coefficients on house size and lot size are being strongly affected by multicollinearity.

Methods for Dealing with Multicollinearity 1.Suffer: If prediction within the range of the data (interpolation) is the only goal, then leave it alone. Make sure, however, that the observations to be predicted are comparable to those used to construct the model – you will get very wide prediction intervals otherwise. 2.Transform or combine: In some case, we can transform or combine two variables that are highly correlated. 3.Omit one: If X 1 and X 2 are highly correlated, we can omit X 1. But note that coefficient on X 2 after X 1 is omitted has different interpretation. There is no good solution to multicollinearity if we want to understand the effect of X 2 on Y holding X 1 fixed.

Omitting lot size gives us a good estimate of the effect of house size -- 95% CI is (39.28,82.72). But lot size is not held fixed in this regression. If lot size/house size basically always increase in the same proportions together, we can view the coefficient on house size as the increase in mean house price for a one square foot increase in house size and corresponding increase in lot size when bedrooms is held fixed.

Analysis of Variance The goal of analysis of variance is to compare the means of several (many) groups. Analysis of variance is regression with only categorical variables One-way analysis of variance: Groups are defined by one categorical variable.

Milgrams Obedience Experiments Subjects recruited to take part in an experiment on memory and learning. The subject is the teacher. The subject conducted a paired-associated learning task with the student. The subject is instructed by the experimenter to administer a shock to the student each time he gave a wrong response. Moreover, the subject was instructed to move one level higher on the shock generator each time the learner gives a wrong answer. The subject was also instructed to announce the voltage level before administering a shock.

Four Experimental Conditions 1.Remote-Feedback condition: Student is placed in a room where he cannot be seen by the subject nor can his voice be heard; his answers flash silently on signal box. However, at 300 volts the laboratory walls resound as he pounds in protest. After 315 volts, no further answers appear, and the pounding ceases. 2.Voice-Feedback condition: Same as remote- feedback condition except that vocal protests were introduced that could be heard clearly through the walls of the laboratory.

3.Proximity: Same as the voice-feedback condition except that student was placed in the same room as the subject, a few feet from subject. Thus, he was visible as well as audible. 4.Touch-Proximity: Same as proximity condition except that student received a shock only when his rested on a shock plate. At the 150- volt level, the student demanded to be let free and refused to place his hand on the shock plate. The experimenter ordered the subject to force the victims hand onto the plate.

Two Key Questions 1.Is there any difference among the mean voltage levels of the four conditions? 2.If there are differences, what conditions specifically are different?

Multiple Regression Model for Analysis of Variance To answer these questions, we can fit a multiple regression model with voltage level as the response and one categorical explanatory variable (condition). We obtain a sample from each level of the categorical variable (group) and are interested in estimating the population means of the groups based on these samples. Assumptions of multiple regression model for one-way analysis of variance: –Linearity: automatically satisfied. –Constant variance: Spread within each group is the same. –Normality: Distribution within each group is normally distributed. –Independence: Sample consists of independent observations.

Comparing the Groups The coefficient on Condition[Proximity]= means that proximity is estimated to have a mean that is less than the mean of the means of all the conditions. Sample mean of proximity group.

Effect Test tests null hypothesis that the mean in all four conditions is the same versus alternative hypothesis that at least two of the conditions have different means. p-value of Effect Test < Strong evidence that population means are not the same for all four conditions.

1.Is there any difference among the mean voltage levels of the four conditions? Yes, there is strong evidence of a difference. p-value of Effect Test < If there are differences, what conditions specifically are different? This involves the problem of multiple comparisons. We will study this on Tuesday, December 7 th after the midterm.