Creating Graphs on Saturn GOPTIONS DEVICE = png HTITLE=2 HTEXT=1.5 GSFMODE = replace; PROC REG DATA=agebp; MODEL sbp = age; PLOT sbp*age; RUN; This will create file sasgraph.png 1. Transfer file to PC (binary mode) 2. Open Word 3. Choose Insert picture from file PROC REG DATA=agebp LP; MODEL sbp = age; PLOT sbp*age; RUN;
Multiple Linear Regression More than 1 independent variable –See how combinations of several variables are associated with and can predict the dependent variable. How much of the total variability can be explained? –Control for confounding (interested in the effect of one variable but want to “adjust” for another variable) –Explore interactions PROC REG DATA=datasetname ; MODEL depvar = x1; MODEL depvar = x1 x2; MODEL depvar = x1 x2 x3; RUN;
Question Explored Using Multiple Regression How much of the variation in test scores among school districts can be explained by several district characteristics? Is calcium intake related to BP independent of age? Is the relationship between age and BP the same for men and women.
Reminder Y variable is continuous and is normally distributed for each combination of X’s with the same variability X variables can be continuous or indicator variables and do not need to be normally distributed
2 Factors 1.Y = 0 + 1 X 1 2.Y = 0 + 2 X 2 3.Y = 0 + 1 X 1 + 2 X 2 Do you get the same slope in models 1 and 3
Control for confounding Both SLR models for each cohort significant Overall not significant (negative confounding)
n The equation that describes how the mean value of y is related to x 1, x 2,... x p. y = 0 + 1 x 1 + 2 x p x p Multiple Regression Equation = Mean of y when all x variables are equal to 0 i = change in mean y corresponding to a 1 unit change in x i considering all other predictors fixed Implied: The impact of x 1 is the same for each of the other values of x 2, x 3, … x p
Multiple Regression Model n The equation that describes how the dependent variable y is related to the independent variables x 1, x 2,... x p and an error term is called the multiple regression model. y = 0 + 1 x 1 + 2 x p x p + reflects how individuals deviate from others with the same values of x’s reflects how individuals deviate from others with the same values of x’s
n The estimated multiple regression equation is: y = b 0 + b 1 x 1 + b 2 x b p x p Estimated Multiple Regression Equation ^ b i estimates i y y is estimated (or predicted) value for a set of x’s ^
Estimation n Least Squares Criterion n Computation of Coefficients Values The formulas for the regression coefficients b 0, b 1, b 2,... b p involve the use of matrix algebra. We will use SAS to perform the calculations. ^
Find the best multidimensional plane
Testing for Significance: Global Test n Hypotheses H 0 : 1 = 2 =... = p = 0 H 0 : 1 = 2 =... = p = 0 H a : One or more of the parameters H a : One or more of the parameters is not equal to zero. is not equal to zero. n Test Statistic F = MSR/MSE n Rejection Rule Reject H 0 if F > F where F is based on an F distribution with p d.f. in the numerator and n - p - 1 d.f. in the denominator.
Testing for Significance: Individual ’s n Hypotheses H 0 : i = 0 H 0 : i = 0 H a : i = 0 H a : i = 0 n Test Statistic n Rejection Rule Reject H 0 for small or large t Meaning: Is X i related to Y after taking into account all other variables in the model
Possibilities n X1 is related to Y alone but after adjusting for X2, then X1 is no longer related to Y n X1 is not related to Y alone but after adjusting for X2, then X1 is related to Y n Relation of X1 with Y1 gets stronger after adjusting for X2 n Relation of X1 with Y gets weaker after adjusting for X2
Pulmonary Function Example Dependent Variable: Forced Expired Volume (FEV 1.0 ) Independent Variables: –Age of person –Smoking status of person Questions: –Is age related to FEV independent of smoking status –Is smoking status related to FEV independent of age –How much of the variability in FEV is explained by age and smoking combined
Model for FEV Example Y = 0 + 1 X 1 + 2 X 2 X 1 = smoking status (1=smoker, 0=nonsmoker) X 2 = age Smokers FEV = 0 + 1 + 2 age Non Smokers FEV = 0 + 2 age
Interpretation of Parameters Smokers FEV = 0 + 1 + 2 age Non Smokers FEV = 0 + 2 age 1 is the effect of smoking for fixed levels of age 2 is the effect of age pooled over smokers and non-smokers This model assumes the relation of age to FEV is the same for smokers and non-smokers
DATA fev; INFILE DATALINES; INPUT age smk fev; DATALINES; More data
PROC MEANS; VAR fev; CLASS smk; RUN; The MEANS Procedure Analysis Variable : fev N smk Obs N Mean Std Dev Minimum Maximum
PROC CORR DATA=fev; Pearson Correlation Coefficients, N = 30 Prob > |r| under H0: Rho=0 age smk fev age <.0001 smk fev <
PROC REG; MODEL fev = age smk ; RUN; Dependent Variable: fev Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 2 SSR <.0001 Error 27 SSE Corrected Total 29 SST Root MSE R-Square Dependent Mean Coeff Var Tests Ho: 1 = 0; 2 =0 Proportion of variance explained by both variables
PROC REG; MODEL fev = age smk ; MODEL fev = age ; MODEL fev = smk ; RUN; Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept <.0001 age <.0001 smk Intercept <.0001 age <.0001 Intercept <.0001 smk R 2 =.7038 R 2 =.5333 R 2 =.1000
PROC REG; MODEL fev = age smk; PROC REG; MODEL fev = age ; WHERE smk = 0; PROC REG; MODEL fev = age ; WHERE smk = 1; Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept <.0001 age <.0001 smk Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept <.0001 age Intercept <.0001 age <.0001 Non-smokers Smokers