Cystic Fibrosis Data Cystic fibrosis lung function data lung function data for cystic fibrosis patients (7-23 years old) age a numeric vector. Age in years. sex a numeric vector code. 0: male, 1:female. height a numeric vector. Height (cm). weight a numeric vector. Weight (kg). bmp a numeric vector. Body mass (% of normal). fev1 a numeric vector. Forced expiratory volume. rv a numeric vector. Residual volume. frc a numeric vector. Functional residual capacity. tlc a numeric vector. Total lung capacity. pemax a numeric vector. Maximum expiratory pressure.
cf <- read.csv("cystfibr.csv") pairs(cf) attach(cf) cf.lm <- lm(pemax ~ age+sex+height+weight+bmp+fev1+rv+frc+tlc) print(summary(cf.lm)) print(anova(cf.lm)) print(drop1(cf.lm,test="F")) plot(cf.lm) step(cf.lm) detach(cf)
> source("cystfibr.r") > cf.lm <- lm(pemax ~ age + sex + height + weight + bmp + fev1 + rv + frc + tlc) > print(summary(cf.lm)) … Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) age sex height weight bmp fev rv frc tlc Residual standard error: on 15 degrees of freedom Multiple R-Squared: , Adjusted R-squared: F-statistic: on 9 and 15 DF, p-value:
> print(anova(cf.lm)) Analysis of Variance Table Response: pemax Df Sum Sq Mean Sq F value Pr(>F) age ** sex height weight bmp fev rv frc tlc Residuals Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Performs sequential ANOVA
> print(drop1(cf.lm, test = "F")) Single term deletions Model: pemax ~ age + sex + height + weight + bmp + fev1 + rv + frc + tlc Df Sum of Sq RSS AIC F value Pr(F) age sex height weight bmp fev rv frc tlc Performs Type III ANOVA
> step(cf.lm) Start: AIC= pemax ~ age + sex + height + weight + bmp + fev1 + rv + frc + tlc Df Sum of Sq RSS AIC - sex tlc height age frc fev rv weight bmp Step: AIC=167.2 pemax ~ age + height + weight + bmp + fev1 + rv + frc + tlc ……………
Step: AIC= pemax ~ weight + bmp + fev1 + rv Df Sum of Sq RSS AIC rv bmp fev weight Call: lm(formula = pemax ~ weight + bmp + fev1 + rv) Coefficients: (Intercept) weight bmp fev1 rv
> cf.lm2 <- lm(pemax ~ rv+bmp+fev1+weight) > summary(cf.lm2) Call: lm(formula = pemax ~ rv + bmp + fev1 + weight) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) rv bmp * fev * weight *** --- Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 20 degrees of freedom Multiple R-Squared: , Adjusted R-squared: F-statistic: on 4 and 20 DF, p-value:
Cautionary Notes The significance levels are not necessarily believable after variable selection The original full model F-statistic is significant, indicating that there is some significant relationship: F(9,15) = 2.93, p = After variable selection, F(3,21) = 9.28, p = , which is biased. April 23, 2010SPH 247 Statistical Analysis of Laboratory Data15
set obs 25 generate x1 = invnormal(uniform()) generate x2 = invnormal(uniform()) generate x3 = invnormal(uniform()) generate x4 = invnormal(uniform()) generate x5 = invnormal(uniform()) generate x6 = invnormal(uniform()) generate x7 = invnormal(uniform()) generate x8 = invnormal(uniform()) generate x9 = invnormal(uniform()) generate y = invnormal(uniform()) regress y x1 x2 x3 x4 x5 x6 x7 x8 x9 stepwise, pr(.1): regress y x1 x2 x3 x4 x5 x6 x7 x8 x9
. regress y x1 x2 x3 x4 x5 x6 x7 x8 x9 Source | SS df MS Number of obs = F( 9, 15) = 0.91 Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = y | Coef. Std. Err. t P>|t| [95% Conf. Interval] x1 | x2 | x3 | x4 | x5 | x6 | x7 | x8 | x9 | _cons |
. stepwise, pr(.1): regress y x1 x2 x3 x4 x5 x6 x7 x8 x9 begin with full model p = >= removing x4 p = >= removing x6 p = >= removing x1 p = >= removing x7 p = >= removing x8 p = >= removing x3 p = >= removing x5 p = >= removing x9 Source | SS df MS Number of obs = F( 1, 23) = 7.23 Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = y | Coef. Std. Err. t P>|t| [95% Conf. Interval] x2 | _cons |