SPH 247 Statistical Analysis of Laboratory Data 1April 23, 2010SPH 247 Statistical Analysis of Laboratory Data
Cystic Fibrosis Data Cystic fibrosis lung function data lung function data for cystic fibrosis patients (7-23 years old) age a numeric vector. Age in years. sex a numeric vector code. 0: male, 1:female. height a numeric vector. Height (cm). weight a numeric vector. Weight (kg). bmp a numeric vector. Body mass (% of normal). fev1 a numeric vector. Forced expiratory volume. rv a numeric vector. Residual volume. frc a numeric vector. Functional residual capacity. tlc a numeric vector. Total lung capacity. pemax a numeric vector. Maximum expiratory pressure. April 23, 2010SPH 247 Statistical Analysis of Laboratory Data2
April 23, 2010SPH 247 Statistical Analysis of Laboratory Data3 cf <- read.csv("cystfibr.csv") pairs(cf) attach(cf) cf.lm <- lm(pemax ~ age+sex+height+weight+bmp+fev1+rv+frc+tlc) print(summary(cf.lm)) print(anova(cf.lm)) print(drop1(cf.lm,test="F")) plot(cf.lm) step(cf.lm) detach(cf)
April 23, 2010SPH 247 Statistical Analysis of Laboratory Data4
April 23, 2010SPH 247 Statistical Analysis of Laboratory Data5 > source("cystfibr.r") > cf.lm <- lm(pemax ~ age + sex + height + weight + bmp + fev1 + rv + frc + tlc) > print(summary(cf.lm)) … Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) age sex height weight bmp fev rv frc tlc Residual standard error: on 15 degrees of freedom Multiple R-Squared: , Adjusted R-squared: F-statistic: on 9 and 15 DF, p-value:
April 23, 2010SPH 247 Statistical Analysis of Laboratory Data6 > print(anova(cf.lm)) Analysis of Variance Table Response: pemax Df Sum Sq Mean Sq F value Pr(>F) age ** sex height weight bmp fev rv frc tlc Residuals Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Performs sequential ANOVA
April 23, 2010SPH 247 Statistical Analysis of Laboratory Data7 > print(drop1(cf.lm, test = "F")) Single term deletions Model: pemax ~ age + sex + height + weight + bmp + fev1 + rv + frc + tlc Df Sum of Sq RSS AIC F value Pr(F) age sex height weight bmp fev rv frc tlc Performs Type III ANOVA
April 23, 2010SPH 247 Statistical Analysis of Laboratory Data8
April 23, 2010SPH 247 Statistical Analysis of Laboratory Data9
April 23, 2010SPH 247 Statistical Analysis of Laboratory Data10
April 23, 2010SPH 247 Statistical Analysis of Laboratory Data11
April 23, 2010SPH 247 Statistical Analysis of Laboratory Data12 > step(cf.lm) Start: AIC= pemax ~ age + sex + height + weight + bmp + fev1 + rv + frc + tlc Df Sum of Sq RSS AIC - sex tlc height age frc fev rv weight bmp Step: AIC=167.2 pemax ~ age + height + weight + bmp + fev1 + rv + frc + tlc ……………
April 23, 2010SPH 247 Statistical Analysis of Laboratory Data13 Step: AIC= pemax ~ weight + bmp + fev1 + rv Df Sum of Sq RSS AIC rv bmp fev weight Call: lm(formula = pemax ~ weight + bmp + fev1 + rv) Coefficients: (Intercept) weight bmp fev1 rv
April 23, 2010SPH 247 Statistical Analysis of Laboratory Data14 > cf.lm2 <- lm(pemax ~ rv+bmp+fev1+weight) > summary(cf.lm2) Call: lm(formula = pemax ~ rv + bmp + fev1 + weight) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) rv bmp * fev * weight *** --- Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 20 degrees of freedom Multiple R-Squared: , Adjusted R-squared: F-statistic: on 4 and 20 DF, p-value:
Cautionary Notes The significance levels are not necessarily believable after variable selection The original full model F-statistic is significant, indicating that there is some significant relationship: F(9,15) = 2.93, p = After variable selection, F(3,21) = 9.28, p = , which is biased. April 23, 2010SPH 247 Statistical Analysis of Laboratory Data15
April 23, 2010SPH 247 Statistical Analysis of Laboratory Data16 set obs 25 generate x1 = invnormal(uniform()) generate x2 = invnormal(uniform()) generate x3 = invnormal(uniform()) generate x4 = invnormal(uniform()) generate x5 = invnormal(uniform()) generate x6 = invnormal(uniform()) generate x7 = invnormal(uniform()) generate x8 = invnormal(uniform()) generate x9 = invnormal(uniform()) generate y = invnormal(uniform()) regress y x1 x2 x3 x4 x5 x6 x7 x8 x9 stepwise, pr(.1): regress y x1 x2 x3 x4 x5 x6 x7 x8 x9
April 23, 2010SPH 247 Statistical Analysis of Laboratory Data17. regress y x1 x2 x3 x4 x5 x6 x7 x8 x9 Source | SS df MS Number of obs = F( 9, 15) = 0.91 Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = y | Coef. Std. Err. t P>|t| [95% Conf. Interval] x1 | x2 | x3 | x4 | x5 | x6 | x7 | x8 | x9 | _cons |
April 23, 2010SPH 247 Statistical Analysis of Laboratory Data18. stepwise, pr(.1): regress y x1 x2 x3 x4 x5 x6 x7 x8 x9 begin with full model p = >= removing x4 p = >= removing x6 p = >= removing x1 p = >= removing x7 p = >= removing x8 p = >= removing x3 p = >= removing x5 p = >= removing x9 Source | SS df MS Number of obs = F( 1, 23) = 7.23 Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = y | Coef. Std. Err. t P>|t| [95% Conf. Interval] x2 | _cons |