Download presentation
Presentation is loading. Please wait.
Published byJonah Malone Modified over 6 years ago
1
Interactions Interaction: Does the relationship between two variables depend on a third variable? Does the relationship of age to BP depend on gender Does a certain BP-lowering drug work as well in blacks than in non-blacks Does the relationship between education and income differ by region of the country Sometimes called “effect modification”
2
Model for FEV Example Y = b0 + b1X1 + b2X2
X1 = smoking status (1=smoker, 0=nonsmoker) X2 = age Smokers FEV = b0 + b1 + b2age Non Smokers FEV = b0 + b2age FEV (smokers) – FEV (non-smokers) = b1 Assumes the slope of age is same for smokers and non-smokers
3
Non-smokers FEV Smokers b1 b2 b1 b2 AGE
4
Modeling Interaction for FEV Example
Y = b0 + b1X1 + b2X2 + b3X3 X1 = smoking status (1=smoker, 0=nonsmoker) X2 = age X3 = age x smoking status Smokers: FEV = Non Smokers: FEV = FEV (Smokers) – FEV (Non-smokers) = Ho: b3 = 0 b0 + b1 + (b2 + b3) age b0 + b2 age b1 + b3age
5
Non-smokers FEV b1 + b3age smokers b2 b2 + b3 AGE
Note: Difference in slopes implies smoker/nonsmoker difference depends on age (and vice versa) Non-smokers FEV b1 + b3age smokers b2 b2 + b3 AGE
8
DATA fev; INFILE DATALINES; INPUT age smk fev; agesmk = age*smk; DATALINES;
9
PROC REG; MODEL fev = age; PLOT fev*age; WHERE smk=0; TITLE 'Non-smokers'; RUN; WHERE smk=1; TITLE 'Smokers';
10
SMOKERS Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept <.0001 age <.0001 NON SMOKERS Intercept <.0001 age B1 for smokers = B1 for non-smk = Are these statistically significant?
11
MODEL fev = age smk agesmk; RUN;
PROC REG; MODEL fev = age smk agesmk; RUN; Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept <.0001 age smk agesmk Interpretation: B(agesmk) = is difference in slopes between smk/nonsmk B(age) = is slope for non-smokers (smk=0) SMOKERS Intercept <.0001 age <.0001 NON-SMOKERS Intercept <.0001 age
12
Polynomial Regression: Adding Quadratic Term
Y = bo + b1X + b2X2 Can be used if linear relationship does not hold Example: alcohol intake and mortality Example: cholesterol and mortality Add a quadratic (squared) term Can test hypothesis that quadratic term in needed Ho: b2 = 0 Ha: b2 ≠ 0
14
Linear Regression Does not Fit Well
15
Adding Quadratic Term Plot mvo2kg*ffbw predicted.*ffbw/overlay
16
PROC REG DATA = physfit ; MODEL mvo2kg = ffbw;
Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Variable DF Estimate SE t Value Pr > |t| Intercept <.0001 ffbw
17
PROC REG DATA = physfit ; MODEL mvo2kg = ffbw;
MODEL mvo2kg = ffbw ffbw2; Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept <.0001 ffbw ffbw ffbw2 = ffbw * ffbw Computed in datastep
18
Model Selection Measure many predictors; how do you decide which to include in your model? Depends on reason for fitting model Prediction? Examine specific effects? Statistical criteria do exist, should not be used in place of scientific criteria Best used in exploratory context
19
Statistical principles to use
Forward, backward, and stepwise selection Compare p-values of terms; add/remove based on = 0.05 or 0.10 R2 methods Look for models with highest R2 Other methods exist
20
Possible Uses for Using Statistical Criteria
Outcome: Measure of Teenage Drinking Many Possible Predictors Questionnaire on relationships, friends, family, church support etc. Outcome: Echocardographic determined hypertrophy of the heart Many Possible ECG predictors Computer measurements from ECG
21
Backward selection procedure
Removes worst variable, then second worst, etc PROC REG DATA = physfit; MODEL mvo2kg = male age hgt wgt ffbw rhr / selection=backward; RUN; Final model: Parameter Standard Variable Estimate Error Type II SS F Value Pr > F Intercept <.0001 male <.0001 age wgt <.0001 ffbw <.0001 rhr
22
Forward selection procedure
Start with best single variable, adds next best, etc PROC REG DATA = physfit; MODEL mvo2kg = male age hgt wgt ffbw rhr / selection=forward; RUN; This example - ends up including all terms except height Exactly same model as one picked by backward selection
23
“MAXR” method PROC REG DATA = physfit;
Select several models based on maximal R2 PROC REG DATA = physfit; MODEL mvo2kg = male age hgt wgt ffbw rhr / selection=maxr; RUN; Will give “best” models with 1, 2, 3... Terms You choose best overall among the “best”
24
Final models by MAXR method
25
Two general principles to use
Parsimony - less is more Common sense Don’t use social security number to predict height! Cautionary Note Models with several variables are not as good at predicting as model might suggest.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.