Topic 7 – Other Regression Issues Reading: Some parts of Chapters 11 and 15.

Slides:



Advertisements
Similar presentations
Bivariate &/vs. Multivariate
Advertisements

Week 13 November Three Mini-Lectures QMM 510 Fall 2014.
Multiple Regression and Model Building
Topic 12: Multiple Linear Regression
Automated Regression Modeling Descriptive vs. Predictive Regression Models Four common automated modeling procedures Forward Modeling Backward Modeling.
 Population multiple regression model  Data for multiple regression  Multiple linear regression model  Confidence intervals and significance tests.
NOTATION & ASSUMPTIONS 2 Y i =  1 +  2 X 2i +  3 X 3i + U i Zero mean value of U i No serial correlation Homoscedasticity Zero covariance between U.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Correlation & Regression Chapter 15. Correlation statistical technique that is used to measure and describe a relationship between two variables (X and.
Creating Graphs on Saturn GOPTIONS DEVICE = png HTITLE=2 HTEXT=1.5 GSFMODE = replace; PROC REG DATA=agebp; MODEL sbp = age; PLOT sbp*age; RUN; This will.
CORRELATON & REGRESSION
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
Section 4.2 Fitting Curves and Surfaces by Least Squares.
Statistics for Managers Using Microsoft® Excel 5th Edition
Statistics for Managers Using Microsoft® Excel 5th Edition
Multiple Regression Involves the use of more than one independent variable. Multivariate analysis involves more than one dependent variable - OMS 633 Adding.
Econ Prof. Buckles1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 4. Further Issues.
1Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 4. Further Issues.
Lecture 6: Multiple Regression
BIOST 536 Lecture 9 1 Lecture 9 – Prediction and Association example Low birth weight dataset Consider a prediction model for low birth weight (< 2500.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
Chapter 15: Model Building
Correlation 1. Correlation - degree to which variables are associated or covary. (Changes in the value of one tends to be associated with changes in the.
Correlation and Regression Analysis
Copyright ©2011 Pearson Education 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft Excel 6 th Global Edition.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Multiple Linear Regression Response Variable: Y Explanatory Variables: X 1,...,X k Model (Extension of Simple Regression): E(Y) =  +  1 X 1 +  +  k.
Objectives of Multiple Regression
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft.
CORRELATION & REGRESSION
Moderation & Mediation
Multiple Linear Regression. Multiple Regression In multiple regression we have multiple predictors X 1, X 2, …, X p and we are interested in modeling.
So far... We have been estimating differences caused by application of various treatments, and determining the probability that an observed difference.
SEM: Basics Byrne Chapter 1 Tabachnick SEM
1 Chapter 3 Multiple Linear Regression Multiple Regression Models Suppose that the yield in pounds of conversion in a chemical process depends.
2 Multicollinearity Presented by: Shahram Arsang Isfahan University of Medical Sciences April 2014.
Chapter 10 Correlation and Regression
Regression. Population Covariance and Correlation.
Topic 10 - Linear Regression Least squares principle - pages 301 – – 309 Hypothesis tests/confidence intervals/prediction intervals for regression.
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 8 Linear Regression.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Lecture 11 Multicollinearity BMTRY 701 Biostatistical Methods II.
Fitting Curves to Data 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 5: Fitting Curves to Data Terry Dielman Applied Regression.
1 Quadratic Model In order to account for curvature in the relationship between an explanatory and a response variable, one often adds the square of the.
CHAPTER 5 Regression BPS - 5TH ED.CHAPTER 5 1. PREDICTION VIA REGRESSION LINE NUMBER OF NEW BIRDS AND PERCENT RETURNING BPS - 5TH ED.CHAPTER 5 2.
Copyright © 2010 Pearson Education, Inc. Chapter 7 Scatterplots, Association, and Correlation.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide 8- 1.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Regression Analysis1. 2 INTRODUCTION TO EMPIRICAL MODELS LEAST SQUARES ESTIMATION OF THE PARAMETERS PROPERTIES OF THE LEAST SQUARES ESTIMATORS AND ESTIMATION.
1 Topic 3 – Multiple Regression Analysis Regression on Several Predictor Variables (Chapter 8)
Chapters 8 Linear Regression. Correlation and Regression Correlation = linear relationship between two variables. Summarize relationship with line. Called.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Regression Analysis: A statistical procedure used to find relations among a set of variables B. Klinkenberg G
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: Multiple, Logistic and Proportional Hazards Regression.
DSCI 346 Yamasaki Lecture 6 Multiple Regression and Model Building.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Multiple Regression.
Chapter 15 Multiple Regression Model Building
Chapter 3: Describing Relationships
Multiple Regression Analysis and Model Building
Multiple Regression.
Multiple Regression Models
Multiple Linear Regression
Multicollinearity What does it mean? A high degree of correlation amongst the explanatory variables What are its consequences? It may be difficult to separate.
Exercise 1: Gestational age and birthweight
Multiple Regression Berlin Chen
Presentation transcript:

Topic 7 – Other Regression Issues Reading: Some parts of Chapters 11 and 15

Overview Confounding (Chapter 11) Interaction (Chapter 11) Using Polynomial Terms (Chapter 15)

Regression: Primary Goals We usually are focused on one of the following goals:  Predicting the response variable based on a set of predictors  Reliability  Quantifying the relationship between the predictors and the response--Interpretability It both situations, confounding and interaction can be concerns.

What is “Confounding”? We saw this with the Smoking and Age predictors in our SBP example. We consider the relationship of SBP to… Smoking Status alone Smoking Status along with age Our interest is in determining whether smoking raises blood pressure.

SBP Example Continued

Smoking is confounded with Age Smoking by itself is not significant Without age, we are not able to see a difference in the smoking groups. (The groups are actually different but we cannot see it until we add age (a covariate).

Smoking is confounded with Age (2) Smoking variable tests significant After adjusting for age, the two smoking groups are clearly different!

Estimates The effect of smoking is confounded with age – if we don’t first adjust for age we cannot won’t see accurately the effect of smoking.

Confounding Confounding exists if meaningfully different interpretations of a relationship of interest can be made depending on whether or not a nuisance variable (or covariate) is included in the model. How to find confounding?  Get lucky and stumble upon it (like we did)  Look for it intentionally by running a lot of different models and watching for variables that aren’t significant at first but become significant when adding other variables (covariates).

Confounding (2) If confounding is present, it may lead to inaccurate results if not careful – important covariates MUST be included (even if they aren’t even significant!)  Making the variable of interest significant is enough to warrant including the covariate If we had failed to adjust for age, we will not get a good estimate for the difference due to smoking, and will also have wrongly conclude that smoking status doesn’t matter.

Confounding vs. Multicollinearity Parameter estimates will change wildly when (multi)collinearity is involved too! They are almost opposite SE’s increase and X1 becomes insignificant (added last) when X2 is in the model – (MULTI)COLLINEARITY  This (usually) works both ways—both variables “fight” SE’s decrease and X1 becomes significant (added last) only when X2 is in the model – CONFOUNDING  Confounding is usually only one way—the covariate(Z) helps the confounded variable(X)  Age is helping Smoking

Confounding vs. Multicollinearity (2) Can catch (multi)collinearity in the correlation matrix  Any single correlation > 0.9  collinearity between just those two predictors  Any predictor that has several values between 0.5 and 0.9 with other predictors  multi-collinearity For confounding, there will usually be some correlation between X and Z but it will not be very large.  Our example:

Interaction Interaction is (sort of) one step beyond confounding – not only does it make a difference to adjust for Z, but the relationship between Y and X is fundamentally different at different levels of Z. Can think of this as having a differerent regression line for each fixed level of Z. With no interaction, these lines would be parallel.

SBP Example We found Age and Smk to both be important. Is it possible that they are interacting?  X = age  Z = 0 for non-smokers, 1 for smokers

Interaction Looking at plots can give us some idea of interaction (parallel lines). However... It is very easy to just test to see if the XZ interaction term is important. Treat it just as you would any other variable and do a partial F-test. Note that if a model includes XZ interaction term, it should also include X and Z main effects. We would never just look at the XZ term by itself.

Age/Smk Interaction Model Interaction mathematically described using a product term: Or just: where X 3 is X 1 X 2

SBP Example The interaction tests insignificant, there is no significant interaction between age and smk Suppose it was significant  Would then have to keep the age_smk interaction term AS WELL AS both the age and smk variables (even if age and smk themselves are insignificant)

Confounding vs. Interaction Y = response X = predictor Z = covariate / 2 nd predictor Is the estimated relationship between Y and X dramatically different if one adjusts or does not adjust for Z?  Confounding Is the estimated relationship between Y and X meaningfully different at different values of Z?  Interaction

Correlations One problem with using interaction terms is that they tend to be highly correlated with one or both of the original variables  In our example: Correlation between SMK and AGE_SMK turned out to be 0.98 This is NOT REAL!!! It is a form of “fake” collinearity, the variables aren’t really “fighting” to explain SS  To remove this “fake” collinearity just center the variables Subtract the mean from all predictors  This doesn’t change any significance tests or p-values, it only removes what we are calling fake collinearity

How to center? SBP Example Mean age was 53.25, subtract from all the ages in the dataset and use these new values in the analysis Mean smk was , (do the same thing) After centering:  Correlation between SMK and AGE_SMK is now (so they weren’t really fighting, it just looked like it because we didn’t center) Maybe we should always center???

Polynomial Regression Chapter 15

General Uses Polynomial models used in situations where the relationship between Y and X is non- linear  Can usually see it in scatterplots  Should definitely catch it in residual plots! Somewhat dangerous, since a polynomial model of order n – 1 will always fit n data points exactly.  Example?

Strategy for fitting CENTER your variables to avoid the “fake” (multi)collinearity. Use a special type of backward elimination procedure  Test highest order term first!  If a higher order term is significant, you MUST include all lower order terms for that variable

Example Problem 15.7 (sas/data available online) X = amount of vaccine, Y = measure of skin response in rats. 12 data points If we run just a simple linear regression, the R- square is only 45%, we will consider a polynomial model and try to do better!

Scatter Plot

Residual plot

Cubic Model x is X, x2 is X 2 =X*X, x3 is X 3 =X*X*X, etc X 3 is important – Must keep X 2 and X, why? Cubic model, model with X, X 2, and X 3 now explains 82% of the variation (was only 45% for the linear model)