Interactions Interaction: Does the relationship between two variables depend on a third variable? Does the relationship of age to BP depend on gender Does.

Slides:



Advertisements
Similar presentations
All Possible Regressions and Statistics for Comparing Models
Advertisements

Qualitative predictor variables
A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
Regression and correlation methods
Xuhua Xia Multiple regression Xuhua Xia
Introduction to Predictive Modeling with Examples Nationwide Insurance Company, November 2 D. A. Dickey.
Polynomial Regression and Transformations STA 671 Summer 2008.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
EPI 809/Spring Probability Distribution of Random Error.
Multiple Linear Regression
Topic 3: Simple Linear Regression. Outline Simple linear regression model –Model parameters –Distribution of error terms Estimation of regression parameters.
Creating Graphs on Saturn GOPTIONS DEVICE = png HTITLE=2 HTEXT=1.5 GSFMODE = replace; PROC REG DATA=agebp; MODEL sbp = age; PLOT sbp*age; RUN; This will.
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
Multiple regression analysis
Chapter 11: Inferential methods in Regression and Correlation
Reading – Linear Regression Le (Chapter 8 through 8.1.6) C &S (Chapter 5:F,G,H)
Statistics for Managers Using Microsoft® Excel 5th Edition
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3.
Chapter 15: Model Building
Copyright ©2011 Pearson Education 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft Excel 6 th Global Edition.
Chapter Outline 5.0 PROC GLM
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft.
© 2004 Prentice-Hall, Inc.Chap 15-1 Basic Business Statistics (9 th Edition) Chapter 15 Multiple Regression Model Building.
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables.
April 11 Logistic Regression –Modeling interactions –Analysis of case-control studies –Data presentation.
1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation.
23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable,
EIPB 698E Lecture 10 Raul Cruz-Cano Fall Comments for future evaluations Include only output used for conclusions Mention p-values explicitly (also.
Topic 14: Inference in Multiple Regression. Outline Review multiple linear regression Inference of regression coefficients –Application to book example.
1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation Note: Homework Due Thursday.
6-3 Multiple Regression Estimation of Parameters in Multiple Regression.
Regression Examples. Gas Mileage 1993 SOURCES: Consumer Reports: The 1993 Cars - Annual Auto Issue (April 1993), Yonkers, NY: Consumers Union. PACE New.
Regression For the purposes of this class: –Does Y depend on X? –Does a change in X cause a change in Y? –Can Y be predicted from X? Y= mX + b Predicted.
Warsaw Summer School 2015, OSU Study Abroad Program Regression.
6-1 Introduction To Empirical Models Based on the scatter diagram, it is probably reasonable to assume that the mean of the random variable Y is.
The Completely Randomized Design (§8.3)
March 28, 30 Return exam Analyses of covariance 2-way ANOVA Analyses of binary outcomes.
6-3 Multiple Regression Estimation of Parameters in Multiple Regression.
1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics.
PSYC 3030 Review Session April 19, Housekeeping Exam: –April 26, 2004 (Monday) –RN 203 –Use pencil, bring calculator & eraser –Make use of your.
Simple Linear Regression. Data available : (X,Y) Goal : To predict the response Y. (i.e. to obtain the fitted response function f(X)) Least Squares Fitting.
Environmental Modeling Basic Testing Methods - Statistics III.
 Relationship between education level, income, and length of time out of school  Our new regression equation: is the predicted value of the dependent.
1 Experimental Statistics - week 12 Chapter 12: Multiple Regression Chapter 13: Variable Selection Model Checking.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
1 Experimental Statistics - week 13 Multiple Regression Miscellaneous Topics.
Topic 27: Strategies of Analysis. Outline Strategy for analysis of two-way studies –Interaction is not significant –Interaction is significant What if.
1 Experimental Statistics - week 12 Chapter 11: Linear Regression and Correlation Chapter 12: Multiple Regression.
1 Experimental Statistics - week 11 Chapter 11: Linear Regression and Correlation.
1 BUSI 6220 By Dr. Nick Evangelopoulos, © 2012 Brief overview of Linear Regression Models (Pre-MBA level)
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Chapter 15 Multiple Regression Model Building
Regression Models First-order with Two Independent Variables
Multiple Regression Models
Chapter 9 Multiple Linear Regression
Relationship with one independent variable
Business Statistics, 4e by Ken Black
Statistics for Business and Economics
Scientific Practice Regression.
Linear Model Selection and regularization
Our theory states Y=f(X) Regression is used to test theory.
So far --> looked at the effect of a discrete variable on a continuous variable t-test, ANOVA, 2-way ANOVA.
Relationship with one independent variable
Business Statistics, 4e by Ken Black
Adding variables. There is a difference between assessing the statistical significance of a variable acting alone and a variable being added to a model.
Presentation transcript:

Interactions Interaction: Does the relationship between two variables depend on a third variable? Does the relationship of age to BP depend on gender Does a certain BP-lowering drug work as well in blacks than in non-blacks Does the relationship between education and income differ by region of the country Sometimes called “effect modification”

Model for FEV Example Y = b0 + b1X1 + b2X2 X1 = smoking status (1=smoker, 0=nonsmoker) X2 = age Smokers FEV = b0 + b1 + b2age Non Smokers FEV = b0 + b2age FEV (smokers) – FEV (non-smokers) = b1 Assumes the slope of age is same for smokers and non-smokers

Non-smokers FEV Smokers b1 b2 b1 b2 AGE

Modeling Interaction for FEV Example Y = b0 + b1X1 + b2X2 + b3X3 X1 = smoking status (1=smoker, 0=nonsmoker) X2 = age X3 = age x smoking status Smokers: FEV = Non Smokers: FEV = FEV (Smokers) – FEV (Non-smokers) = Ho: b3 = 0 b0 + b1 + (b2 + b3) age b0 + b2 age b1 + b3age

Non-smokers FEV b1 + b3age smokers b2 b2 + b3 AGE Note: Difference in slopes implies smoker/nonsmoker difference depends on age (and vice versa) Non-smokers FEV b1 + b3age smokers b2 b2 + b3 AGE

DATA fev; INFILE DATALINES; INPUT age smk fev; agesmk = age*smk; DATALINES; 28 1 4.0 30 1 3.9 30 1 3.7 31 1 3.6

PROC REG; MODEL fev = age; PLOT fev*age; WHERE smk=0; TITLE 'Non-smokers'; RUN; WHERE smk=1; TITLE 'Smokers';

SMOKERS Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 5.50002 0.36163 15.21 <.0001 age 1 -0.05508 0.00885 -6.22 <.0001 NON SMOKERS Intercept 1 5.24764 0.38050 13.79 <.0001 age 1 -0.03911 0.00887 -4.41 0.0007 B1 for smokers = -0.05508 B1 for non-smk = -0.03911 Are these statistically significant?

MODEL fev = age smk agesmk; RUN; PROC REG; MODEL fev = age smk agesmk; RUN; Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 5.24764 0.37846 13.87 <.0001 age 1 -0.03911 0.00882 -4.43 0.0002 smk 1 0.25238 0.52482 0.48 0.6346 agesmk 1 -0.01597 0.01253 -1.27 0.2138 Interpretation: B(agesmk) = -0.01597 is difference in slopes between smk/nonsmk B(age) = -0.03911 is slope for non-smokers (smk=0) SMOKERS Intercept 1 5.50002 0.36163 15.21 <.0001 age 1 -0.05508 0.00885 -6.22 <.0001 NON-SMOKERS Intercept 1 5.24764 0.38050 13.79 <.0001 age 1 -0.03911 0.00887 -4.41 0.0007

Polynomial Regression: Adding Quadratic Term Y = bo + b1X + b2X2 Can be used if linear relationship does not hold Example: alcohol intake and mortality Example: cholesterol and mortality Add a quadratic (squared) term Can test hypothesis that quadratic term in needed Ho: b2 = 0 Ha: b2 ≠ 0

Linear Regression Does not Fit Well

Adding Quadratic Term Plot mvo2kg*ffbw predicted.*ffbw/overlay

PROC REG DATA = physfit ; MODEL mvo2kg = ffbw; Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 22211 22211 3.33 0.0724 Error 69 460225 6669.93228 Corrected Total 70 482436 Root MSE 81.66965 R-Square 0.0460 Dependent Mean 455.26761 Adj R-Sq 0.0322 Coeff Var 17.93882 Variable DF Estimate SE t Value Pr > |t| Intercept 1 382.51711 41.02856 9.32 <.0001 ffbw 1 0.17710 0.09705 1.82 0.0724

PROC REG DATA = physfit ; MODEL mvo2kg = ffbw; MODEL mvo2kg = ffbw ffbw2; Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 2 113179 56589 10.42 0.0001 Error 68 369257 5430.25411 Corrected Total 70 482436 Root MSE 73.69026 R-Square 0.2346 Dependent Mean 455.26761 Adj R-Sq 0.2121 Coeff Var 16.18614 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 980.95393 150.82611 6.50 <.0001 ffbw 1 -2.68220 0.70406 -3.81 0.0003 ffbw2 1 0.00322 0.00078761 4.09 0.0001 ffbw2 = ffbw * ffbw Computed in datastep

Model Selection Measure many predictors; how do you decide which to include in your model? Depends on reason for fitting model Prediction? Examine specific effects? Statistical criteria do exist, should not be used in place of scientific criteria Best used in exploratory context

Statistical principles to use Forward, backward, and stepwise selection Compare p-values of terms; add/remove based on  = 0.05 or 0.10 R2 methods Look for models with highest R2 Other methods exist

Possible Uses for Using Statistical Criteria Outcome: Measure of Teenage Drinking Many Possible Predictors Questionnaire on relationships, friends, family, church support etc. Outcome: Echocardographic determined hypertrophy of the heart Many Possible ECG predictors Computer measurements from ECG

Backward selection procedure Removes worst variable, then second worst, etc PROC REG DATA = physfit; MODEL mvo2kg = male age hgt wgt ffbw rhr / selection=backward; RUN; Final model: Parameter Standard Variable Estimate Error Type II SS F Value Pr > F Intercept 574.86126 56.50900 167151 103.49 <.0001 male 88.90825 12.02381 88312 54.68 <.0001 age -6.85862 3.80692 5242.56660 3.25 0.0762 wgt -6.00865 1.02203 55827 34.56 <.0001 ffbw 0.75073 0.12729 56184 34.79 <.0001 rhr -0.79442 0.41916 5801.82822 3.59 0.0625

Forward selection procedure Start with best single variable, adds next best, etc PROC REG DATA = physfit; MODEL mvo2kg = male age hgt wgt ffbw rhr / selection=forward; RUN; This example - ends up including all terms except height Exactly same model as one picked by backward selection

“MAXR” method PROC REG DATA = physfit; Select several models based on maximal R2 PROC REG DATA = physfit; MODEL mvo2kg = male age hgt wgt ffbw rhr / selection=maxr; RUN; Will give “best” models with 1, 2, 3... Terms You choose best overall among the “best”

Final models by MAXR method

Two general principles to use Parsimony - less is more Common sense Don’t use social security number to predict height! Cautionary Note Models with several variables are not as good at predicting as model might suggest.