Download presentation
Presentation is loading. Please wait.
Published byJocelin Maxwell Modified over 9 years ago
1
A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics
3
Area = 0.16 1.00
4
Area = 0.47 2.00
5
Area = 0.81 3.00
6
3.87 Area = 0.955
8
Buzzwords Beta ( ) = P(Type II error) = P(Conclude the experimental groups are the same when they really are different) Power = 1 - = P(Conclude experimental groups are different when they really are!)
9
The Non Centrality Parameter Two Group t-test
10
An Example Scenario Alpha =0.05, sigma=2 |mu1 – mu2| = 2, that is, a two unit diff in means for a population Propose n1 = 10 and n2 = 10
11
Rejection region for two tailed t- test alpha=0.05, df = 18
12
Noncentrality value =2.236, Critical value = |2.101| Table B.5, Values between 2.0 and 3.0, alpha = 0.05, df = 18 Power between 0.47 and 0.81, SAS calculation 0.56195
14
The Key Point of the Review One conjectures the difference in means to estimate power in studies that compare means. In regression models, one conjectures the difference in R-square between a model that includes predictors of interest and a model without these predictors.
15
Regression Power and Sample Size Power for specific predictors in the presence of other covariates in a model. More complex to conceptualize than testing differences among means.
16
Example Data Set
17
The Hypothetical Scenario A model with 4 terms Predictors for PSA of interest that we choose to power: 1.SVI 2.c_volume Two Covariates to be included : cpen, gleason
18
Approaches in Estimating the Parameters to Calculate Power Plan A Complete specification of the parts for the expression:
19
Details The full model We want to power the test that a model with these 2 predictors is statistically better than a model excluding them. The reduced model
20
Full Model Root MSE30.98987R-Square 0.4467 Dependent Mean23.73013Adj R-Sq0.4226 Coeff Var130.59291 Predictors of interest Note Parameter Estimates VariableDF Parameter Estimate Standard Errort ValuePr > |t| Intercept1-40.7687833.24420-1.230.2232 c_volume12.028210.584043.470.0008 svi117.8569010.750491.660.1001 cpen11.103811.325380.830.4071 gleason16.392945.025221.270.2065
21
Reduced Model Root MSE33.42074R-Square 0.3424 Dependent Mean23.73013Adj R-Sq0.3285 Coeff Var140.83671 Note R-Square difference 0.45 – 0.34= 0.11 Parameter Estimates VariableDF Parameter Estimate Standard Errort ValuePr > |t| Intercept1-71.5982734.91893-2.050.0431 cpen14.828681.016324.75<.0001 gleason112.286615.198732.360.0202
22
proc power ; multreg model=fixed alpha=.05 nfullpredictors= 4 ntestpredictors= 2 rsqfull=0.45 rsqdiff=0.11 ntotal= 97 80 70 60 50 40 power=. ; plot x=n min=40 max=100 key = oncurves yopts=(ref=0.8.977 crossref=yes) ; run; The POWER Procedure Type III F Test in Multiple Regression Fixed Scenario Elements Method Exact Model Fixed X Number of Predictors in Full Model 4 Number of Test Predictors 2 Alpha 0.05 R-square of Full Model 0.45 Difference in R-square 0.11 Computed Power N Index Total Power 1 97 0.979 2 80 0.949 3 70 0.916 4 60 0.864 5 50 0.787 6 40 0.677
24
Great, but I don’t have a dataset
25
Pearson Correlation Coefficients, N = 97 Prob > |r| under H0: Rho=0 psac_volumesvicpengleason psa1.000000.62415 <.0001 0.52862 <.0001 0.55079 <.0001 0.42958 <.0001 c_volume0.62415 <.0001 1.000000.58174 <.0001 0.69290 <.0001 0.48144 <.0001 svi0.52862 <.0001 0.58174 <.0001 1.000000.68028 <.0001 0.42857 <.0001 cpen0.55079 <.0001 0.69290 <.0001 0.68028 <.0001 1.000000.46157 <.0001 gleason0.42958 <.0001 0.48144 <.0001 0.42857 <.0001 0.46157 <.0001 1.00000 Use the Correlation Matrix
26
Pearson Correlation Coefficients, N = 97 Prob > |r| under H0: Rho=0 psac_volumesvicpengleason psa 0.62415 <.0001 0.52862 <.0001 0.55079 <.0001 0.42958 <.0001 c_volume svi cpen gleason Piece 1 Correlation of Y with all Predictors
27
Pearson Correlation Coefficients, N = 97 Prob > |r| under H0: Rho=0 psac_volumesvicpengleason psa c_volume1.000000.58174 <.0001 0.69290 <.0001 0.48144 <.0001 svi0.58174 <.0001 1.000000.68028 <.0001 0.42857 <.0001 cpen0.69290 <.0001 0.68028 <.0001 1.000000.46157 <.0001 gleason0.48144 <.0001 0.42857 <.0001 0.46157 <.0001 1.00000 Piece 2 Correlation of All Predictors with Each Other
28
Pearson Correlation Coefficients, N = 97 Prob > |r| under H0: Rho=0 psac_volumesvicpengleason psa 0.55079 <.0001 0.42958 <.0001 c_volume svi cpen gleason Piece 3 Correlation of Y with Reduced Model Predictors
29
Pearson Correlation Coefficients, N = 97 Prob > |r| under H0: Rho=0 psac_volumesvicpengleason psa c_volume svi cpen1.000000.46157 <.0001 gleason0.46157 <.0001 1.00000 Piece 4 Correlation of All Reduced Predictors with Each Other
30
Matrix Arithmetic with Correlation Matrix
31
Hold on, we will find out to do this arithmetic later
32
Different Rsquare Reductions proc power ; multreg model=fixed alpha=.05 nfullpredictors= 4 ntestpredictors= 2 rsqfull=0.45 rsqdiff=0.11.10.09.08 ntotal= 97 80 70 60 50 40 power=. ; plot x=n min=40 max=100 key = oncurves yopts=(ref=0.8.977 crossref=yes) ; run;
34
Matrix Arithmetic with Compound Correlation Matrix
35
proc iml; %let phi=0.35; %let rx=0.2; phi_yx_full={&phi,&phi,.2,.2}; rxx_full={1 &rx &rx &rx, &rx 1 &rx &rx, &rx &rx 1 &rx, &rx &rx &rx 1 }; phi_yx_red={&rx,&rx}; rxx_red={1 &rx, &rx 1 }; r2_full=(phi_yx_full)` * (rxx_full**(-1)) * (phi_yx_full); r2_red=phi_yx_red` * rxx_red**(-1) * phi_yx_red; r2diff=r2_full-r2_red; partial = (r2diff/(1-r2_red))**.5; print r2_full r2_red r2diff partial; run;quit; R2_FULL R2_RED R2DIFF PARTIAL 0.2171875 0.0666667 0.1505208 0.4015873
36
proc power ; multreg model=fixed alpha=.05 nfullpredictors= 4 ntestpredictors= 2 rsqfull=0.22 rsqdiff=0.15.16 ntotal= 40 50 60 70 power=. ; plot x=n min=40 max=100 key = oncurves yopts=(ref=0.8 crossref=yes) ; run; The POWER Procedure Type III F Test in Multiple Regression Fixed Scenario Elements Method Exact Model Fixed X Number of Predictors in Full Model 4 Number of Test Predictors 2 Alpha 0.05 R-square of Full Model 0.22 Computed Power R-square N Index Diff Total Power 1 0.15 40 0.659 2 0.15 50 0.770 3 0.15 60 0.850 4 0.15 70 0.905 5 0.16 40 0.689 6 0.16 50 0.798 7 0.16 60 0.873 8 0.16 70 0.923
38
Plan B Specify the typical value of the multiple partial correlation coefficient between Y and X. Multiple correlation coefficient describes the overall relationship between Y and 2 or more predictors controlling for still other variables.
39
Using Our Example Say that we conjecture that the partial correlation between our Y and X’s of interest is: For our example this value was 0.408 Recall Rsqare diff in full and reduced models
40
proc power ; multreg model=fixed alpha=.05 nfullpredictors= 4 ntestpredictors= 2 partialcorr=.408.35 ntotal= 97 80 60 50 40 power=. ; plot x=n min=40 max=100 key = oncurves yopts=(ref=.8.85.977 crossref=yes) ;run; The POWER Procedure Type III F Test in Multiple Regression Fixed Scenario Elements Method Exact Model Fixed X Number of Predictors in Full Model 4 Number of Test Predictors 2 Alpha 0.05 Computed Power Partial N Index Corr Total Power 1 0.408 97 0.979 2 0.408 80 0.949 3 0.408 60 0.864 4 0.408 50 0.787 5 0.408 40 0.677 6 0.350 97 0.910 7 0.350 80 0.843 8 0.350 60 0.713 9 0.350 50 0.623 10 0.350 40 0.514 Note n=4*10=40 under powers
42
Plan C Use the Table from Gatsonis and Sampson (1989)
43
U : the number of predictors of interest=2 p : the total number of predictors in the model=4 N = table value + p + 1 For 80% power N = 72 + 4 + 1 = 77
44
Proc Power and the Table proc power ; multreg model=random alpha=.05 nfullpredictors= 4 ntestpredictors= 2 partialcorr=.35.40 ntotal= 77 power=. ; plot x=n min=60 max=120 key = oncurves yopts=(ref=.8.90 crossref=yes) ;run; The POWER Procedure Type III F Test in Multiple Regression Fixed Scenario Elements Method Exact Model Random X Number of Predictors in Full Model 4 Number of Test Predictors 2 Alpha 0.05 Total Sample Size 77 Computed Power Partial Index Corr Power 1 0.35 0.802 2 0.40 0.908
46
Comments Power and sample size is ‘tricky.’ The n= 10 for each predictor will almost always under power a study. Plan A or B using the matrix mult is likely the best. One can specify regular correlations instead of partial correlations. This talk was developed with fixed effects, arguably one should plan for random effects unless for an experiment. SAS can easily calculate this. Gatsonis tables provide power for random effect settings. (usually n’s are close)
47
Further Work for Somebody A corresponding multiple logistic regression approach, that is, powering more than one predictor of interest with additional covariates in the model.
48
An Algorithm for Estimating Power and Sample Size for Logistic Models with One or More Independent Variables of Interest Jay Northern D. Keith Williams, PhD Zoran Bursac, PhD Joint Statistical Meetings, Denver, COAugust 3 – August 7, 2008
49
Background Existing tools are based on Hsieh, Block, and Larsen (1998) paper, and Agresti (1996) text. –PASS –%powerlog macro
50
Macro Details Fit the full and the reduced model –In the reduced model one can exclude one or more covariates of interest in order to test them simultaneously in the presence of other covariates Perform the likelihood ratio test with appropriate chi-square critical value based on correct number of degrees of freedom
51
Results
52
End
53
Plan C Exchangeable Matrix in Plan A
55
Pearson Correlation Coefficients, N = 97 psac_volumesvicpengleasonc_wtagebph psa 1.00000 0.624150.528620.550790.429580.026210.01720-0.01649 c_vo lume 0.624151.000000.581740.692900.481440.005110.03909-0.13321 svi 0.528620.581741.000000.680280.42857-0.002410.11766-0.11955 cpen 0.550790.692900.680281.000000.461570.001580.09956-0.08301 gleas on 0.429580.481440.428570.461571.00000-0.024210.225850.02683 c_wt 0.026210.00511-0.002410.00158-0.024211.000000.164320.32185 age 0.017200.039090.117660.099560.225850.164321.000000.36634 bph -0.01649-0.13321-0.11955-0.083010.026830.321850.366341.00000
56
Full Correlation Matrix psac_volume svi cpen gleason c_wt age bph psa10.6241510.5286190.5507930.429580.0262130.017199-0.01649 c_volume0.62415110.5817420.6928970.4814380.0051070.039094-0.13321 svi0.5286190.58174210.6802840.428573-0.002410.117658-0.11955 cpen0.5507930.6928970.68028410.4615660.0015790.099555-0.08301 gleason0.429580.4814380.4285730.4615661-0.024210.2258520.026826 c_wt0.0262130.005107-0.002410.001579-0.0242110.1643240.321849 age0.0171990.0390940.1176580.0995550.2258520.16432410.366341 bph-0.01649-0.13321-0.11955-0.083010.0268260.3218490.3663411
57
The Correlation of Y with All X’s Full Model psac_volume svi cpen gleason c_wt age bph psa 1 0.6241510.5286190.5507930.429580.0262130.017199-0.01649
58
Correlation Matrix of X’s Full Model psac_volume svi cpen gleason c_wt age bph psa c_volume10.5817420.6928970.4814380.0051070.039094-0.13321 svi0.58174210.6802840.428573-0.002410.117658-0.11955 cpen0.6928970.68028410.4615660.0015790.099555-0.08301 gleason0.4814380.4285730.4615661-0.024210.2258520.026826 c_wt0.005107-0.002410.001579-0.0242110.1643240.321849 age0.0390940.1176580.0995550.2258520.16432410.366341 bph-0.13321-0.11955-0.083010.0268260.3218490.3663411
59
The Correlation of Y with All X’s Reduced Model psac_volume svi cpen gleason c_wt age bph psa 0.5507930.429580.0262130.017199-0.01649
60
Correlation Matrix of X’s psac_volume svi cpen gleason c_wt age bph psa c_volume svi cpen10.4615660.0015790.099555-0.08301 gleason0.4615661-0.024210.2258520.026826 c_wt0.001579-0.0242110.1643240.321849 age0.0995550.2258520.16432410.366341 bph-0.083010.0268260.3218490.3663411
61
Regular Correlations Versus Partial Correlations
62
Correlation Matrix Full R xy Reduced Rxy X’s of interest Covariates in reduced model Rxx
63
Correlation Matrix Full R xy Reduced Rxy X’s of interest Covariates in reduced model Rxx
64
Correlation Matrix Full R xy Reduced Rxy X’s of interest Covariates in reduced model Rxx
65
The Gold Standard Approach Some Matrix Algebra
66
=0.35
67
The Gold Standard Approach Some Matrix Algebra
68
=0.35
69
Pearson Correlation Coefficients, N = 97 Prob > |r| under H0: Rho=0 psac_volumesvicpengleason psa c_volume1.000000.58174 <.0001 0.69290 <.0001 0.48144 <.0001 svi0.58174 <.0001 1.000000.68028 <.0001 0.42857 <.0001 cpen0.69290 <.0001 0.68028 <.0001 1.000000.46157 <.0001 gleason0.48144 <.0001 0.42857 <.0001 0.46157 <.0001 1.00000
70
Pearson Correlation Coefficients, N = 97 Prob > |r| under H0: Rho=0 cpengleason cpen 1.000000.46157 <.0001 gleason 0.46157 <.0001 1.00000
71
Pearson Correlation Coefficients, N = 97 Prob > |r| under H0: Rho=0 psac_volumesvicpengleason psa 0.62415 <.0001 0.52862 <.0001 0.55079 <.0001 0.42958 <.0001 c_volume svi cpen gleason
72
Pearson Correlation Coefficients, N = 97 Prob > |r| under H0: Rho=0 psac_volumesvicpengleason psa 0.55079 <.0001 0.42958 <.0001 c_volume svi cpen gleason
73
Full Correlation Matrix psac_volume svi cpen gleason c_wt age bph psa10.6241510.5286190.5507930.429580.0262130.017199-0.01649 c_volume0.62415110.5817420.6928970.4814380.0051070.039094-0.13321 svi0.5286190.58174210.6802840.428573-0.002410.117658-0.11955 cpen0.5507930.6928970.68028410.4615660.0015790.099555-0.08301 gleason0.429580.4814380.4285730.4615661-0.024210.2258520.026826 c_wt0.0262130.005107-0.002410.001579-0.0242110.1643240.321849 age0.0171990.0390940.1176580.0995550.2258520.16432410.366341 bph-0.01649-0.13321-0.11955-0.083010.0268260.3218490.3663411
74
The Correlation of Y with All X’s Full Model psac_volume svi cpen gleason c_wt age bph psa 0.6241510.5286190.5507930.429580.0262130.017199-0.01649
75
Correlation Matrix of X’s Full Model psac_volume svi cpen gleason c_wt age bph psa c_volume10.5817420.6928970.4814380.0051070.039094-0.13321 svi0.58174210.6802840.428573-0.002410.117658-0.11955 cpen0.6928970.68028410.4615660.0015790.099555-0.08301 gleason0.4814380.4285730.4615661-0.024210.2258520.026826 c_wt0.005107-0.002410.001579-0.0242110.1643240.321849 age0.0390940.1176580.0995550.2258520.16432410.366341 bph-0.13321-0.11955-0.083010.0268260.3218490.3663411
76
The Correlation of Y with All X’s Reduced Model psac_volume svi cpen gleason c_wt age bph psa 0.5507930.429580.0262130.017199-0.01649
77
Correlation Matrix of X’s psac_volume svi cpen gleason c_wt age bph psa c_volume svi cpen10.4615660.0015790.099555-0.08301 gleason0.4615661-0.024210.2258520.026826 c_wt0.001579-0.0242110.1643240.321849 age0.0995550.2258520.16432410.366341 bph-0.083010.0268260.3218490.3663411
78
SAS Code and Output proc power ; multreg model=fixed alpha=.05 nfullpredictors= 7 ntestpredictors= 2 rsqfull=0.46 rsqdiff=0.106 ntotal= 97 power=. ; plot x=n min=60 max=100 key = oncurves yopts=(ref=.977 crossref=yes) ; run; The POWER Procedure Type III F Test in Multiple Regression Fixed Scenario Elements Method Exact Model Fixed X Number of Predictors in Full Model 7 Number of Test Predictors 2 Alpha 0.05 R-square of Full Model 0.46 Difference in R-square 0.106 Total Sample Size 97 Computed Power Power 0.977
79
The PROC Power Graph
80
The Calculations Power = 0.97
81
proc power ; multreg model=fixed alpha=.05 nfullpredictors= 7 ntestpredictors= 2 rsqfull=0.2505682 rsqdiff=0.1111111 ntotal= 50 60 70 80 97 power=. ; plot x=n min=60 max=100 key = oncurves yopts=(ref=.8.85.9.95 crossref=yes) ; run; The POWER Procedure Type III F Test in Multiple Regression Fixed Scenario Elements Method Exact Model Fixed X Number of Predictors in Full Model 7 Number of Test Predictors 2 Alpha 0.05 R-square of Full Model 0.250568 R-square of Reduced Model 0.111111 Computed Power N Index Total Power 1 50 0.753 2 60 0.836 3 70 0.894 4 80 0.933 5 97 0.970
83
Pearson Correlation Coefficients, N = 97 Prob > |r| under H0: Rho=0 psac_volumesvicpengleason psa 1.00000.35.2 c_volume.351.00000.2 svi.35.21.00000.2 cpen.2 1.00000.2 gleason.2 1.00000
84
Matrix Arithmetic with Compound Correlation Matrix
85
proc power ; multreg model=fixed alpha=.05 nfullpredictors= 4 ntestpredictors= 2 rsqfull=0.22 rsqdiff=0.15.16 ntotal= 40 50 60 70 power=. ; plot x=n min=40 max=100 key = oncurves yopts=(ref=0.8 crossref=yes) ; run; The POWER Procedure Type III F Test in Multiple Regression Fixed Scenario Elements Method Exact Model Fixed X Number of Predictors in Full Model 4 Number of Test Predictors 2 Alpha 0.05 R-square of Full Model 0.22 Computed Power R-square N Index Diff Total Power 1 0.15 40 0.659 2 0.15 50 0.770 3 0.15 60 0.850 4 0.15 70 0.905 5 0.16 40 0.689 6 0.16 50 0.798 7 0.16 60 0.873 8 0.16 70 0.923
86
Calculations The number of predictors of interest 2 The total number of predictors in the model 4
87
Approaches in Estimating the Parameters to Calculate Power Plan A Complete specification of the parts for the expression: = 0.34 = 0.45
88
Approaches in Estimating the Parameters to Calculate Power Plan A Complete specification of the parts for the expression:
89
F(2,92) F(2,92,19.4) Critical Value for alpha =.05 3.07 Noncentrality Parameter 19.4 Total area in blue. Power = 0.97
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.