EPP 245 Statistical Analysis of Laboratory Data 5/25/2019 Multiple Regression EPP 245 Statistical Analysis of Laboratory Data
EPP 245 Statistical Analysis of Laboratory Data 5/25/2019 Cystic Fibrosis Data Cystic fibrosis lung function data lung function data for cystic fibrosis patients (7-23 years old) age a numeric vector. Age in years. sex a numeric vector code. 0: male, 1:female. height a numeric vector. Height (cm). weight a numeric vector. Weight (kg). bmp a numeric vector. Body mass (% of normal). fev1 a numeric vector. Forced expiratory volume. rv a numeric vector. Residual volume. frc a numeric vector. Functional residual capacity. tlc a numeric vector. Total lung capacity. pemax a numeric vector. Maximum expiratory pressure. October 25, 2007 EPP 245 Statistical Analysis of Laboratory Data
EPP 245 Statistical Analysis of Laboratory Data 5/25/2019 Some Stata Commands . insheet using "cystfibr.csv" (11 vars, 25 obs) . graph matrix age sex height weight bmp fev1 rv frc tlc pemax . graph export cystfibr-scm.wmf . regress pemax age sex height weight bmp fev1 rv frc tlc . rvfplot . graph export cystfibr-rvf.wmf October 25, 2007 EPP 245 Statistical Analysis of Laboratory Data
EPP 245 Statistical Analysis of Laboratory Data 5/25/2019 October 25, 2007 EPP 245 Statistical Analysis of Laboratory Data
EPP 245 Statistical Analysis of Laboratory Data 5/25/2019 Source | SS df MS Number of obs = 25 -------------+------------------------------ F( 9, 15) = 2.93 Model | 17101.3907 9 1900.15452 Prob > F = 0.0320 Residual | 9731.24928 15 648.749952 R-squared = 0.6373 -------------+------------------------------ Adj R-squared = 0.4197 Total | 26832.64 24 1118.02667 Root MSE = 25.471 ------------------------------------------------------------------------------ pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | -2.54196 4.801699 -0.53 0.604 -12.77654 7.692618 sex | -3.736782 15.45982 -0.24 0.812 -36.68861 29.21505 height | -.4462549 .9033548 -0.49 0.628 -2.37171 1.4792 weight | 2.992816 2.007957 1.49 0.157 -1.287044 7.272675 bmp | -1.744944 1.155237 -1.51 0.152 -4.207274 .7173865 fev1 | 1.080697 1.080947 1.00 0.333 -1.223288 3.384682 rv | .196972 .1962136 1.00 0.331 -.2212474 .6151915 frc | -.3084314 .4923899 -0.63 0.540 -1.357936 .7410729 tlc | .1886017 .4997351 0.38 0.711 -.8765585 1.253762 _cons | 176.0582 225.8911 0.78 0.448 -305.4174 657.5338 October 25, 2007 EPP 245 Statistical Analysis of Laboratory Data
EPP 245 Statistical Analysis of Laboratory Data 5/25/2019 Source | SS df MS Number of obs = 25 -------------+------------------------------ F( 9, 15) = 2.93 Model | 17101.3907 9 1900.15452 Prob > F = 0.0320 Residual | 9731.24928 15 648.749952 R-squared = 0.6373 -------------+------------------------------ Adj R-squared = 0.4197 Total | 26832.64 24 1118.02667 Root MSE = 25.471 ------------------------------------------------------------------------------ pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | -2.54196 4.801699 -0.53 0.604 -12.77654 7.692618 sex | -3.736782 15.45982 -0.24 0.812 -36.68861 29.21505 height | -.4462549 .9033548 -0.49 0.628 -2.37171 1.4792 weight | 2.992816 2.007957 1.49 0.157 -1.287044 7.272675 bmp | -1.744944 1.155237 -1.51 0.152 -4.207274 .7173865 fev1 | 1.080697 1.080947 1.00 0.333 -1.223288 3.384682 rv | .196972 .1962136 1.00 0.331 -.2212474 .6151915 frc | -.3084314 .4923899 -0.63 0.540 -1.357936 .7410729 tlc | .1886017 .4997351 0.38 0.711 -.8765585 1.253762 _cons | 176.0582 225.8911 0.78 0.448 -305.4174 657.5338 T-test of additional value of variable October 25, 2007 EPP 245 Statistical Analysis of Laboratory Data
EPP 245 Statistical Analysis of Laboratory Data 5/25/2019 Source | SS df MS Number of obs = 25 -------------+------------------------------ F( 9, 15) = 2.93 Model | 17101.3907 9 1900.15452 Prob > F = 0.0320 Residual | 9731.24928 15 648.749952 R-squared = 0.6373 -------------+------------------------------ Adj R-squared = 0.4197 Total | 26832.64 24 1118.02667 Root MSE = 25.471 ------------------------------------------------------------------------------ pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | -2.54196 4.801699 -0.53 0.604 -12.77654 7.692618 sex | -3.736782 15.45982 -0.24 0.812 -36.68861 29.21505 height | -.4462549 .9033548 -0.49 0.628 -2.37171 1.4792 weight | 2.992816 2.007957 1.49 0.157 -1.287044 7.272675 bmp | -1.744944 1.155237 -1.51 0.152 -4.207274 .7173865 fev1 | 1.080697 1.080947 1.00 0.333 -1.223288 3.384682 rv | .196972 .1962136 1.00 0.331 -.2212474 .6151915 frc | -.3084314 .4923899 -0.63 0.540 -1.357936 .7410729 tlc | .1886017 .4997351 0.38 0.711 -.8765585 1.253762 _cons | 176.0582 225.8911 0.78 0.448 -305.4174 657.5338 Test of whole model October 25, 2007 EPP 245 Statistical Analysis of Laboratory Data
EPP 245 Statistical Analysis of Laboratory Data 5/25/2019 October 25, 2007 EPP 245 Statistical Analysis of Laboratory Data
EPP 245 Statistical Analysis of Laboratory Data 5/25/2019 Source | SS df MS Number of obs = 25 -------------+------------------------------ F( 9, 15) = 2.93 Model | 17101.3907 9 1900.15452 Prob > F = 0.0320 Residual | 9731.24928 15 648.749952 R-squared = 0.6373 -------------+------------------------------ Adj R-squared = 0.4197 Total | 26832.64 24 1118.02667 Root MSE = 25.471 ------------------------------------------------------------------------------ pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | -2.54196 4.801699 -0.53 0.604 -12.77654 7.692618 sex | -3.736782 15.45982 -0.24 0.812 -36.68861 29.21505 height | -.4462549 .9033548 -0.49 0.628 -2.37171 1.4792 weight | 2.992816 2.007957 1.49 0.157 -1.287044 7.272675 bmp | -1.744944 1.155237 -1.51 0.152 -4.207274 .7173865 fev1 | 1.080697 1.080947 1.00 0.333 -1.223288 3.384682 rv | .196972 .1962136 1.00 0.331 -.2212474 .6151915 frc | -.3084314 .4923899 -0.63 0.540 -1.357936 .7410729 tlc | .1886017 .4997351 0.38 0.711 -.8765585 1.253762 _cons | 176.0582 225.8911 0.78 0.448 -305.4174 657.5338 Least significant variable October 25, 2007 EPP 245 Statistical Analysis of Laboratory Data
EPP 245 Statistical Analysis of Laboratory Data 5/25/2019 . regress pemax age height weight bmp fev1 rv frc tlc Source | SS df MS Number of obs = 25 -------------+------------------------------ F( 8, 16) = 3.49 Model | 17063.4886 8 2132.93607 Prob > F = 0.0159 Residual | 9769.15144 16 610.571965 R-squared = 0.6359 -------------+------------------------------ Adj R-squared = 0.4539 Total | 26832.64 24 1118.02667 Root MSE = 24.71 ------------------------------------------------------------------------------ pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | -2.114515 4.330841 -0.49 0.632 -11.29549 7.066459 height | -.394836 .851725 -0.46 0.649 -2.200412 1.41074 weight | 2.834909 1.841995 1.54 0.143 -1.069947 6.739765 bmp | -1.741637 1.120651 -1.55 0.140 -4.117312 .634038 fev1 | 1.26509 .7429407 1.70 0.108 -.3098737 2.840054 rv | .1779046 .1742911 1.02 0.323 -.1915759 .5473852 frc | -.2483218 .4122804 -0.60 0.555 -1.122317 .6256736 tlc | .2084044 .4782484 0.44 0.669 -.8054369 1.222246 _cons | 153.0385 198.7149 0.77 0.452 -268.2183 574.2953 Least significant variable October 25, 2007 EPP 245 Statistical Analysis of Laboratory Data
EPP 245 Statistical Analysis of Laboratory Data 5/25/2019 . regress pemax age height weight bmp fev1 rv frc Source | SS df MS Number of obs = 25 -------------+------------------------------ F( 7, 17) = 4.16 Model | 16947.5458 7 2421.07798 Prob > F = 0.0077 Residual | 9885.09416 17 581.476127 R-squared = 0.6316 -------------+------------------------------ Adj R-squared = 0.4799 Total | 26832.64 24 1118.02667 Root MSE = 24.114 ------------------------------------------------------------------------------ pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | -2.663193 4.043832 -0.66 0.519 -11.19493 5.868546 height | -.4895733 .8036502 -0.61 0.550 -2.185127 1.205981 weight | 3.155659 1.647815 1.92 0.072 -.3209274 6.632245 bmp | -1.962543 .9753332 -2.01 0.060 -4.020316 .0952305 fev1 | 1.247861 .7239953 1.72 0.103 -.2796361 2.775357 rv | .1595988 .1650733 0.97 0.347 -.1886753 .5078729 frc | -.1764595 .368749 -0.48 0.638 -.9544518 .6015328 _cons | 198.2942 165.3311 1.20 0.247 -150.5238 547.1123 Least significant variable October 25, 2007 EPP 245 Statistical Analysis of Laboratory Data
EPP 245 Statistical Analysis of Laboratory Data 5/25/2019 . regress pemax age height weight bmp fev1 rv Source | SS df MS Number of obs = 25 -------------+------------------------------ F( 6, 18) = 5.04 Model | 16814.3899 6 2802.39832 Prob > F = 0.0034 Residual | 10018.2501 18 556.569447 R-squared = 0.6266 -------------+------------------------------ Adj R-squared = 0.5022 Total | 26832.64 24 1118.02667 Root MSE = 23.592 ------------------------------------------------------------------------------ pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | -1.819342 3.560301 -0.51 0.616 -9.299258 5.660573 height | -.4101508 .7693006 -0.53 0.600 -2.026391 1.20609 weight | 2.874434 1.506126 1.91 0.072 -.2898203 6.038688 bmp | -1.949083 .9538193 -2.04 0.056 -3.952983 .0548169 fev1 | 1.411959 .6238279 2.26 0.036 .1013452 2.722573 rv | .0955779 .0946057 1.01 0.326 -.1031813 .2943371 _cons | 166.9049 148.4762 1.12 0.276 -145.0321 478.8418 Least significant variable October 25, 2007 EPP 245 Statistical Analysis of Laboratory Data
EPP 245 Statistical Analysis of Laboratory Data 5/25/2019 . regress pemax height weight bmp fev1 rv Source | SS df MS Number of obs = 25 -------------+------------------------------ F( 5, 19) = 6.23 Model | 16669.0534 5 3333.81068 Prob > F = 0.0014 Residual | 10163.5866 19 534.92561 R-squared = 0.6212 -------------+------------------------------ Adj R-squared = 0.5215 Total | 26832.64 24 1118.02667 Root MSE = 23.128 ------------------------------------------------------------------------------ pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- height | -.4485274 .7505918 -0.60 0.557 -2.019534 1.122479 weight | 2.338692 1.060094 2.21 0.040 .1198889 4.557495 bmp | -1.641001 .7246036 -2.26 0.035 -3.157614 -.1243885 fev1 | 1.471767 .6007182 2.45 0.024 .2144491 2.729084 rv | .110117 .0884543 1.24 0.228 -.07502 .295254 _cons | 137.0958 133.8559 1.02 0.319 -143.0677 417.2594 Least significant variable October 25, 2007 EPP 245 Statistical Analysis of Laboratory Data
EPP 245 Statistical Analysis of Laboratory Data 5/25/2019 . regress pemax weight bmp fev1 rv Source | SS df MS Number of obs = 25 -------------+------------------------------ F( 4, 20) = 7.96 Model | 16478.0401 4 4119.51002 Prob > F = 0.0005 Residual | 10354.5999 20 517.729996 R-squared = 0.6141 -------------+------------------------------ Adj R-squared = 0.5369 Total | 26832.64 24 1118.02667 Root MSE = 22.754 ------------------------------------------------------------------------------ pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- weight | 1.748914 .3806332 4.59 0.000 .9549274 2.542901 bmp | -1.377243 .5653421 -2.44 0.024 -2.556526 -.1979604 fev1 | 1.547698 .5776112 2.68 0.014 .3428223 2.752574 rv | .1257152 .0831456 1.51 0.146 -.0477234 .2991538 _cons | 63.9467 53.27673 1.20 0.244 -47.18661 175.08 Least significant variable October 25, 2007 EPP 245 Statistical Analysis of Laboratory Data
EPP 245 Statistical Analysis of Laboratory Data 5/25/2019 . regress pemax weight bmp fev1 Source | SS df MS Number of obs = 25 -------------+------------------------------ F( 3, 21) = 9.28 Model | 15294.4519 3 5098.15064 Prob > F = 0.0004 Residual | 11538.1881 21 549.437528 R-squared = 0.5700 -------------+------------------------------ Adj R-squared = 0.5086 Total | 26832.64 24 1118.02667 Root MSE = 23.44 ------------------------------------------------------------------------------ pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- weight | 1.536475 .3644235 4.22 0.000 .7786149 2.294335 bmp | -1.465406 .5792906 -2.53 0.019 -2.670106 -.260705 fev1 | 1.108629 .5143694 2.16 0.043 .0389396 2.178319 _cons | 126.3336 34.71986 3.64 0.002 54.12965 198.5375 October 25, 2007 EPP 245 Statistical Analysis of Laboratory Data
EPP 245 Statistical Analysis of Laboratory Data 5/25/2019 . stepwise, pr(.05): regress pemax age sex height weight bmp fev1 rv frc tlc begin with full model p = 0.8123 >= 0.0500 removing sex p = 0.6688 >= 0.0500 removing tlc p = 0.6384 >= 0.0500 removing frc p = 0.6156 >= 0.0500 removing age p = 0.5572 >= 0.0500 removing height p = 0.1462 >= 0.0500 removing rv Source | SS df MS Number of obs = 25 -------------+------------------------------ F( 3, 21) = 9.28 Model | 15294.4519 3 5098.15064 Prob > F = 0.0004 Residual | 11538.1881 21 549.437528 R-squared = 0.5700 -------------+------------------------------ Adj R-squared = 0.5086 Total | 26832.64 24 1118.02667 Root MSE = 23.44 ------------------------------------------------------------------------------ pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- fev1 | 1.108629 .5143694 2.16 0.043 .0389396 2.178319 weight | 1.536475 .3644235 4.22 0.000 .7786149 2.294335 bmp | -1.465406 .5792906 -2.53 0.019 -2.670106 -.260705 _cons | 126.3336 34.71986 3.64 0.002 54.12965 198.5375 October 25, 2007 EPP 245 Statistical Analysis of Laboratory Data
EPP 245 Statistical Analysis of Laboratory Data 5/25/2019 . stepwise, pr(.1) pe(.05): regress pemax age sex height weight bmp fev1 rv frc tlc begin with full model p = 0.8123 >= 0.1000 removing sex p = 0.6688 >= 0.1000 removing tlc p = 0.6384 >= 0.1000 removing frc p = 0.6156 >= 0.1000 removing age p = 0.5572 >= 0.1000 removing height p = 0.1462 >= 0.1000 removing rv Source | SS df MS Number of obs = 25 -------------+------------------------------ F( 3, 21) = 9.28 Model | 15294.4519 3 5098.15064 Prob > F = 0.0004 Residual | 11538.1881 21 549.437528 R-squared = 0.5700 -------------+------------------------------ Adj R-squared = 0.5086 Total | 26832.64 24 1118.02667 Root MSE = 23.44 ------------------------------------------------------------------------------ pemax | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- fev1 | 1.108629 .5143694 2.16 0.043 .0389396 2.178319 weight | 1.536475 .3644235 4.22 0.000 .7786149 2.294335 bmp | -1.465406 .5792906 -2.53 0.019 -2.670106 -.260705 _cons | 126.3336 34.71986 3.64 0.002 54.12965 198.5375 October 25, 2007 EPP 245 Statistical Analysis of Laboratory Data
EPP 245 Statistical Analysis of Laboratory Data 5/25/2019 Cautionary Notes The significance levels are not necessarily believable after variable selection The original full model F-statistic is significant, indicating that there is some significant relationship: F(9,15) = 2.93, p = 0.0320 After variable selection, F(3,21) = 9.28, p = 0.0004, which is biased. October 25, 2007 EPP 245 Statistical Analysis of Laboratory Data
EPP 245 Statistical Analysis of Laboratory Data set obs 25 generate x1 = invnormal(uniform()) generate x2 = invnormal(uniform()) generate x3 = invnormal(uniform()) generate x4 = invnormal(uniform()) generate x5 = invnormal(uniform()) generate x6 = invnormal(uniform()) generate x7 = invnormal(uniform()) generate x8 = invnormal(uniform()) generate x9 = invnormal(uniform()) generate y = invnormal(uniform()) regress y x1 x2 x3 x4 x5 x6 x7 x8 x9 stepwise, pr(.1): regress y x1 x2 x3 x4 x5 x6 x7 x8 x9 5/25/2019 October 25, 2007 EPP 245 Statistical Analysis of Laboratory Data
EPP 245 Statistical Analysis of Laboratory Data 5/25/2019 . regress y x1 x2 x3 x4 x5 x6 x7 x8 x9 Source | SS df MS Number of obs = 25 -------------+------------------------------ F( 9, 15) = 0.91 Model | 12.3235639 9 1.36928488 Prob > F = 0.5397 Residual | 22.5105993 15 1.50070662 R-squared = 0.3538 -------------+------------------------------ Adj R-squared = -0.0340 Total | 34.8341632 24 1.45142347 Root MSE = 1.225 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- x1 | -.0441858 .2998066 -0.15 0.885 -.6832085 .594837 x2 | -.9078136 .4347798 -2.09 0.054 -1.834525 .0188976 x3 | .2076754 .3789522 0.55 0.592 -.6000421 1.015393 x4 | -.0056383 .3319125 -0.02 0.987 -.7130931 .7018166 x5 | -.330546 .3854497 -0.86 0.405 -1.152113 .4910207 x6 | .0202964 .3470704 0.06 0.954 -.7194666 .7600594 x7 | -.073401 .3135234 -0.23 0.818 -.7416603 .5948583 x8 | -.0552909 .3026913 -0.18 0.858 -.7004621 .5898803 x9 | -.3190092 .3137931 -1.02 0.325 -.9878434 .349825 _cons | -.2490392 .3078424 -0.81 0.431 -.9051898 .4071113 October 25, 2007 EPP 245 Statistical Analysis of Laboratory Data
EPP 245 Statistical Analysis of Laboratory Data 5/25/2019 . stepwise, pr(.1): regress y x1 x2 x3 x4 x5 x6 x7 x8 x9 begin with full model p = 0.9867 >= 0.1000 removing x4 p = 0.9545 >= 0.1000 removing x6 p = 0.8456 >= 0.1000 removing x1 p = 0.8165 >= 0.1000 removing x7 p = 0.7506 >= 0.1000 removing x8 p = 0.5023 >= 0.1000 removing x3 p = 0.2866 >= 0.1000 removing x5 p = 0.2081 >= 0.1000 removing x9 Source | SS df MS Number of obs = 25 -------------+------------------------------ F( 1, 23) = 7.23 Model | 8.33379862 1 8.33379862 Prob > F = 0.0131 Residual | 26.5003646 23 1.15218977 R-squared = 0.2392 -------------+------------------------------ Adj R-squared = 0.2062 Total | 34.8341632 24 1.45142347 Root MSE = 1.0734 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- x2 | -.6644002 .2470417 -2.69 0.013 -1.175445 -.1533555 _cons | -.1523124 .214703 -0.71 0.485 -.5964594 .2918346 October 25, 2007 EPP 245 Statistical Analysis of Laboratory Data