1 Multiple Regression EPP 245/298 Statistical Analysis of Laboratory Data
October 26, 2006EPP 245 Statistical Analysis of Laboratory Data 2 Cystic Fibrosis Data Cystic fibrosis lung function data lung function data for cystic fibrosis patients (7-23 years old) age a numeric vector. Age in years. sex a numeric vector code. 0: male, 1:female. height a numeric vector. Height (cm). weight a numeric vector. Weight (kg). bmp a numeric vector. Body mass (% of normal). fev1 a numeric vector. Forced expiratory volume. rv a numeric vector. Residual volume. frc a numeric vector. Functional residual capacity. tlc a numeric vector. Total lung capacity. pemax a numeric vector. Maximum expiratory pressure.
October 26, 2006EPP 245 Statistical Analysis of Laboratory Data 3 Some Stata Commands. insheet using "C:\TD\CLASS\K30Bench2005\cystfibr.csv" (11 vars, 25 obs). graph matrix age sex height weight bmp fev1 rv frc tlc pemax. graph export cystfibr-scm.wmf. regress pemax age sex height weight bmp fev1 rv frc tlc. rvfplot. graph export cystfibr-rvf.wmf
October 26, 2006EPP 245 Statistical Analysis of Laboratory Data 4
October 26, 2006EPP 245 Statistical Analysis of Laboratory Data 5 Source | SS df MS Number of obs = F( 9, 15) = 2.93 Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval] age | sex | height | weight | bmp | fev1 | rv | frc | tlc | _cons |
October 26, 2006EPP 245 Statistical Analysis of Laboratory Data 6 Source | SS df MS Number of obs = F( 9, 15) = 2.93 Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval] age | sex | height | weight | bmp | fev1 | rv | frc | tlc | _cons | T-test of additional value of variable
October 26, 2006EPP 245 Statistical Analysis of Laboratory Data 7 Source | SS df MS Number of obs = F( 9, 15) = 2.93 Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval] age | sex | height | weight | bmp | fev1 | rv | frc | tlc | _cons | Test of whole model
October 26, 2006EPP 245 Statistical Analysis of Laboratory Data 8
October 26, 2006EPP 245 Statistical Analysis of Laboratory Data 9 Source | SS df MS Number of obs = F( 9, 15) = 2.93 Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval] age | sex | height | weight | bmp | fev1 | rv | frc | tlc | _cons | Least significant variable
October 26, 2006EPP 245 Statistical Analysis of Laboratory Data 10. regress pemax age height weight bmp fev1 rv frc tlc Source | SS df MS Number of obs = F( 8, 16) = 3.49 Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval] age | height | weight | bmp | fev1 | rv | frc | tlc | _cons | Least significant variable
October 26, 2006EPP 245 Statistical Analysis of Laboratory Data 11. regress pemax age height weight bmp fev1 rv frc Source | SS df MS Number of obs = F( 7, 17) = 4.16 Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval] age | height | weight | bmp | fev1 | rv | frc | _cons | Least significant variable
October 26, 2006EPP 245 Statistical Analysis of Laboratory Data 12. regress pemax age height weight bmp fev1 rv Source | SS df MS Number of obs = F( 6, 18) = 5.04 Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval] age | height | weight | bmp | fev1 | rv | _cons | Least significant variable
October 26, 2006EPP 245 Statistical Analysis of Laboratory Data 13. regress pemax height weight bmp fev1 rv Source | SS df MS Number of obs = F( 5, 19) = 6.23 Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval] height | weight | bmp | fev1 | rv | _cons | Least significant variable
October 26, 2006EPP 245 Statistical Analysis of Laboratory Data 14. regress pemax weight bmp fev1 rv Source | SS df MS Number of obs = F( 4, 20) = 7.96 Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval] weight | bmp | fev1 | rv | _cons | Least significant variable
October 26, 2006EPP 245 Statistical Analysis of Laboratory Data 15. regress pemax weight bmp fev1 Source | SS df MS Number of obs = F( 3, 21) = 9.28 Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval] weight | bmp | fev1 | _cons |
October 26, 2006EPP 245 Statistical Analysis of Laboratory Data 16. stepwise, pr(.05): regress pemax age sex height weight bmp fev1 rv frc tlc begin with full model p = >= removing sex p = >= removing tlc p = >= removing frc p = >= removing age p = >= removing height p = >= removing rv Source | SS df MS Number of obs = F( 3, 21) = 9.28 Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval] fev1 | weight | bmp | _cons |
October 26, 2006EPP 245 Statistical Analysis of Laboratory Data 17. stepwise, pr(.1) pe(.05): regress pemax age sex height weight bmp fev1 rv frc tlc begin with full model p = >= removing sex p = >= removing tlc p = >= removing frc p = >= removing age p = >= removing height p = >= removing rv Source | SS df MS Number of obs = F( 3, 21) = 9.28 Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval] fev1 | weight | bmp | _cons |
October 26, 2006EPP 245 Statistical Analysis of Laboratory Data 18 Cautionary Notes The significance levels are not necessarily believable after variable selection The original full model F-statistic is significant, indicating that there is some significant relationship: F(9,15) = 2.93, p = After variable selection, F(3,21) = 9.28, p = , which is biased.
October 26, 2006EPP 245 Statistical Analysis of Laboratory Data 19 set obs 25 generate x1 = invnormal(uniform()) generate x2 = invnormal(uniform()) generate x3 = invnormal(uniform()) generate x4 = invnormal(uniform()) generate x5 = invnormal(uniform()) generate x6 = invnormal(uniform()) generate x7 = invnormal(uniform()) generate x8 = invnormal(uniform()) generate x9 = invnormal(uniform()) generate y = invnormal(uniform()) regress y x1 x2 x3 x4 x5 x6 x7 x8 x9 stepwise, pr(.1): regress y x1 x2 x3 x4 x5 x6 x7 x8 x9
October 26, 2006EPP 245 Statistical Analysis of Laboratory Data 20. regress y x1 x2 x3 x4 x5 x6 x7 x8 x9 Source | SS df MS Number of obs = F( 9, 15) = 0.91 Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = y | Coef. Std. Err. t P>|t| [95% Conf. Interval] x1 | x2 | x3 | x4 | x5 | x6 | x7 | x8 | x9 | _cons |
October 26, 2006EPP 245 Statistical Analysis of Laboratory Data 21. stepwise, pr(.1): regress y x1 x2 x3 x4 x5 x6 x7 x8 x9 begin with full model p = >= removing x4 p = >= removing x6 p = >= removing x1 p = >= removing x7 p = >= removing x8 p = >= removing x3 p = >= removing x5 p = >= removing x9 Source | SS df MS Number of obs = F( 1, 23) = 7.23 Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = y | Coef. Std. Err. t P>|t| [95% Conf. Interval] x2 | _cons |