Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics.

Similar presentations


Presentation on theme: "A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics."— Presentation transcript:

1 A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics

2

3 Area = 0.16 1.00

4 Area = 0.47 2.00

5 Area = 0.81 3.00

6 3.87 Area = 0.955

7

8 Buzzwords Beta (  ) = P(Type II error) = P(Conclude the experimental groups are the same when they really are different) Power = 1 -  = P(Conclude experimental groups are different when they really are!)

9 The Non Centrality Parameter Two Group t-test

10 An Example Scenario Alpha =0.05, sigma=2 |mu1 – mu2| = 2, that is, a two unit diff in means for a population Propose n1 = 10 and n2 = 10

11 Rejection region for two tailed t- test alpha=0.05, df = 18

12 Noncentrality value =2.236, Critical value = |2.101| Table B.5, Values between 2.0 and 3.0, alpha = 0.05, df = 18 Power between 0.47 and 0.81, SAS calculation 0.56195

13

14 The Key Point of the Review One conjectures the difference in means to estimate power in studies that compare means. In regression models, one conjectures the difference in R-square between a model that includes predictors of interest and a model without these predictors.

15 Regression Power and Sample Size Power for specific predictors in the presence of other covariates in a model. More complex to conceptualize than testing differences among means.

16 Example Data Set

17 The Hypothetical Scenario A model with 4 terms Predictors for PSA of interest that we choose to power: 1.SVI 2.c_volume Two Covariates to be included : cpen, gleason

18 Approaches in Estimating the Parameters to Calculate Power Plan A Complete specification of the parts for the expression:

19 Details The full model We want to power the test that a model with these 2 predictors is statistically better than a model excluding them. The reduced model

20 Full Model Root MSE30.98987R-Square 0.4467 Dependent Mean23.73013Adj R-Sq0.4226 Coeff Var130.59291 Predictors of interest Note Parameter Estimates VariableDF Parameter Estimate Standard Errort ValuePr > |t| Intercept1-40.7687833.24420-1.230.2232 c_volume12.028210.584043.470.0008 svi117.8569010.750491.660.1001 cpen11.103811.325380.830.4071 gleason16.392945.025221.270.2065

21 Reduced Model Root MSE33.42074R-Square 0.3424 Dependent Mean23.73013Adj R-Sq0.3285 Coeff Var140.83671 Note R-Square difference 0.45 – 0.34= 0.11 Parameter Estimates VariableDF Parameter Estimate Standard Errort ValuePr > |t| Intercept1-71.5982734.91893-2.050.0431 cpen14.828681.016324.75<.0001 gleason112.286615.198732.360.0202

22 proc power ; multreg model=fixed alpha=.05 nfullpredictors= 4 ntestpredictors= 2 rsqfull=0.45 rsqdiff=0.11 ntotal= 97 80 70 60 50 40 power=. ; plot x=n min=40 max=100 key = oncurves yopts=(ref=0.8.977 crossref=yes) ; run; The POWER Procedure Type III F Test in Multiple Regression Fixed Scenario Elements Method Exact Model Fixed X Number of Predictors in Full Model 4 Number of Test Predictors 2 Alpha 0.05 R-square of Full Model 0.45 Difference in R-square 0.11 Computed Power N Index Total Power 1 97 0.979 2 80 0.949 3 70 0.916 4 60 0.864 5 50 0.787 6 40 0.677

23

24 Great, but I don’t have a dataset

25 Pearson Correlation Coefficients, N = 97 Prob > |r| under H0: Rho=0 psac_volumesvicpengleason psa1.000000.62415 <.0001 0.52862 <.0001 0.55079 <.0001 0.42958 <.0001 c_volume0.62415 <.0001 1.000000.58174 <.0001 0.69290 <.0001 0.48144 <.0001 svi0.52862 <.0001 0.58174 <.0001 1.000000.68028 <.0001 0.42857 <.0001 cpen0.55079 <.0001 0.69290 <.0001 0.68028 <.0001 1.000000.46157 <.0001 gleason0.42958 <.0001 0.48144 <.0001 0.42857 <.0001 0.46157 <.0001 1.00000 Use the Correlation Matrix

26 Pearson Correlation Coefficients, N = 97 Prob > |r| under H0: Rho=0 psac_volumesvicpengleason psa 0.62415 <.0001 0.52862 <.0001 0.55079 <.0001 0.42958 <.0001 c_volume svi cpen gleason Piece 1 Correlation of Y with all Predictors

27 Pearson Correlation Coefficients, N = 97 Prob > |r| under H0: Rho=0 psac_volumesvicpengleason psa c_volume1.000000.58174 <.0001 0.69290 <.0001 0.48144 <.0001 svi0.58174 <.0001 1.000000.68028 <.0001 0.42857 <.0001 cpen0.69290 <.0001 0.68028 <.0001 1.000000.46157 <.0001 gleason0.48144 <.0001 0.42857 <.0001 0.46157 <.0001 1.00000 Piece 2 Correlation of All Predictors with Each Other

28 Pearson Correlation Coefficients, N = 97 Prob > |r| under H0: Rho=0 psac_volumesvicpengleason psa 0.55079 <.0001 0.42958 <.0001 c_volume svi cpen gleason Piece 3 Correlation of Y with Reduced Model Predictors

29 Pearson Correlation Coefficients, N = 97 Prob > |r| under H0: Rho=0 psac_volumesvicpengleason psa c_volume svi cpen1.000000.46157 <.0001 gleason0.46157 <.0001 1.00000 Piece 4 Correlation of All Reduced Predictors with Each Other

30 Matrix Arithmetic with Correlation Matrix

31 Hold on, we will find out to do this arithmetic later

32 Different Rsquare Reductions proc power ; multreg model=fixed alpha=.05 nfullpredictors= 4 ntestpredictors= 2 rsqfull=0.45 rsqdiff=0.11.10.09.08 ntotal= 97 80 70 60 50 40 power=. ; plot x=n min=40 max=100 key = oncurves yopts=(ref=0.8.977 crossref=yes) ; run;

33

34 Matrix Arithmetic with Compound Correlation Matrix

35 proc iml; %let phi=0.35; %let rx=0.2; phi_yx_full={&phi,&phi,.2,.2}; rxx_full={1 &rx &rx &rx, &rx 1 &rx &rx, &rx &rx 1 &rx, &rx &rx &rx 1 }; phi_yx_red={&rx,&rx}; rxx_red={1 &rx, &rx 1 }; r2_full=(phi_yx_full)` * (rxx_full**(-1)) * (phi_yx_full); r2_red=phi_yx_red` * rxx_red**(-1) * phi_yx_red; r2diff=r2_full-r2_red; partial = (r2diff/(1-r2_red))**.5; print r2_full r2_red r2diff partial; run;quit; R2_FULL R2_RED R2DIFF PARTIAL 0.2171875 0.0666667 0.1505208 0.4015873

36 proc power ; multreg model=fixed alpha=.05 nfullpredictors= 4 ntestpredictors= 2 rsqfull=0.22 rsqdiff=0.15.16 ntotal= 40 50 60 70 power=. ; plot x=n min=40 max=100 key = oncurves yopts=(ref=0.8 crossref=yes) ; run; The POWER Procedure Type III F Test in Multiple Regression Fixed Scenario Elements Method Exact Model Fixed X Number of Predictors in Full Model 4 Number of Test Predictors 2 Alpha 0.05 R-square of Full Model 0.22 Computed Power R-square N Index Diff Total Power 1 0.15 40 0.659 2 0.15 50 0.770 3 0.15 60 0.850 4 0.15 70 0.905 5 0.16 40 0.689 6 0.16 50 0.798 7 0.16 60 0.873 8 0.16 70 0.923

37

38 Plan B Specify the typical value of the multiple partial correlation coefficient between Y and X. Multiple correlation coefficient describes the overall relationship between Y and 2 or more predictors controlling for still other variables.

39 Using Our Example Say that we conjecture that the partial correlation between our Y and X’s of interest is: For our example this value was 0.408 Recall Rsqare diff in full and reduced models

40 proc power ; multreg model=fixed alpha=.05 nfullpredictors= 4 ntestpredictors= 2 partialcorr=.408.35 ntotal= 97 80 60 50 40 power=. ; plot x=n min=40 max=100 key = oncurves yopts=(ref=.8.85.977 crossref=yes) ;run; The POWER Procedure Type III F Test in Multiple Regression Fixed Scenario Elements Method Exact Model Fixed X Number of Predictors in Full Model 4 Number of Test Predictors 2 Alpha 0.05 Computed Power Partial N Index Corr Total Power 1 0.408 97 0.979 2 0.408 80 0.949 3 0.408 60 0.864 4 0.408 50 0.787 5 0.408 40 0.677 6 0.350 97 0.910 7 0.350 80 0.843 8 0.350 60 0.713 9 0.350 50 0.623 10 0.350 40 0.514 Note n=4*10=40 under powers

41

42 Plan C Use the Table from Gatsonis and Sampson (1989)

43 U : the number of predictors of interest=2 p : the total number of predictors in the model=4 N = table value + p + 1 For 80% power N = 72 + 4 + 1 = 77

44 Proc Power and the Table proc power ; multreg model=random alpha=.05 nfullpredictors= 4 ntestpredictors= 2 partialcorr=.35.40 ntotal= 77 power=. ; plot x=n min=60 max=120 key = oncurves yopts=(ref=.8.90 crossref=yes) ;run; The POWER Procedure Type III F Test in Multiple Regression Fixed Scenario Elements Method Exact Model Random X Number of Predictors in Full Model 4 Number of Test Predictors 2 Alpha 0.05 Total Sample Size 77 Computed Power Partial Index Corr Power 1 0.35 0.802 2 0.40 0.908

45

46 Comments Power and sample size is ‘tricky.’ The n= 10 for each predictor will almost always under power a study. Plan A or B using the matrix mult is likely the best. One can specify regular correlations instead of partial correlations. This talk was developed with fixed effects, arguably one should plan for random effects unless for an experiment. SAS can easily calculate this. Gatsonis tables provide power for random effect settings. (usually n’s are close)

47 Further Work for Somebody A corresponding multiple logistic regression approach, that is, powering more than one predictor of interest with additional covariates in the model.

48 An Algorithm for Estimating Power and Sample Size for Logistic Models with One or More Independent Variables of Interest Jay Northern D. Keith Williams, PhD Zoran Bursac, PhD Joint Statistical Meetings, Denver, COAugust 3 – August 7, 2008

49 Background Existing tools are based on Hsieh, Block, and Larsen (1998) paper, and Agresti (1996) text. –PASS –%powerlog macro

50 Macro Details Fit the full and the reduced model –In the reduced model one can exclude one or more covariates of interest in order to test them simultaneously in the presence of other covariates Perform the likelihood ratio test with appropriate chi-square critical value based on correct number of degrees of freedom

51 Results

52 End

53 Plan C Exchangeable Matrix in Plan A

54

55 Pearson Correlation Coefficients, N = 97 psac_volumesvicpengleasonc_wtagebph psa 1.00000 0.624150.528620.550790.429580.026210.01720-0.01649 c_vo lume 0.624151.000000.581740.692900.481440.005110.03909-0.13321 svi 0.528620.581741.000000.680280.42857-0.002410.11766-0.11955 cpen 0.550790.692900.680281.000000.461570.001580.09956-0.08301 gleas on 0.429580.481440.428570.461571.00000-0.024210.225850.02683 c_wt 0.026210.00511-0.002410.00158-0.024211.000000.164320.32185 age 0.017200.039090.117660.099560.225850.164321.000000.36634 bph -0.01649-0.13321-0.11955-0.083010.026830.321850.366341.00000

56 Full Correlation Matrix psac_volume svi cpen gleason c_wt age bph psa10.6241510.5286190.5507930.429580.0262130.017199-0.01649 c_volume0.62415110.5817420.6928970.4814380.0051070.039094-0.13321 svi0.5286190.58174210.6802840.428573-0.002410.117658-0.11955 cpen0.5507930.6928970.68028410.4615660.0015790.099555-0.08301 gleason0.429580.4814380.4285730.4615661-0.024210.2258520.026826 c_wt0.0262130.005107-0.002410.001579-0.0242110.1643240.321849 age0.0171990.0390940.1176580.0995550.2258520.16432410.366341 bph-0.01649-0.13321-0.11955-0.083010.0268260.3218490.3663411

57 The Correlation of Y with All X’s Full Model psac_volume svi cpen gleason c_wt age bph psa 1 0.6241510.5286190.5507930.429580.0262130.017199-0.01649

58 Correlation Matrix of X’s Full Model psac_volume svi cpen gleason c_wt age bph psa c_volume10.5817420.6928970.4814380.0051070.039094-0.13321 svi0.58174210.6802840.428573-0.002410.117658-0.11955 cpen0.6928970.68028410.4615660.0015790.099555-0.08301 gleason0.4814380.4285730.4615661-0.024210.2258520.026826 c_wt0.005107-0.002410.001579-0.0242110.1643240.321849 age0.0390940.1176580.0995550.2258520.16432410.366341 bph-0.13321-0.11955-0.083010.0268260.3218490.3663411

59 The Correlation of Y with All X’s Reduced Model psac_volume svi cpen gleason c_wt age bph psa 0.5507930.429580.0262130.017199-0.01649

60 Correlation Matrix of X’s psac_volume svi cpen gleason c_wt age bph psa c_volume svi cpen10.4615660.0015790.099555-0.08301 gleason0.4615661-0.024210.2258520.026826 c_wt0.001579-0.0242110.1643240.321849 age0.0995550.2258520.16432410.366341 bph-0.083010.0268260.3218490.3663411

61 Regular Correlations Versus Partial Correlations

62 Correlation Matrix Full R xy Reduced Rxy X’s of interest Covariates in reduced model Rxx

63 Correlation Matrix Full R xy Reduced Rxy X’s of interest Covariates in reduced model Rxx

64 Correlation Matrix Full R xy Reduced Rxy X’s of interest Covariates in reduced model Rxx

65 The Gold Standard Approach Some Matrix Algebra

66 =0.35

67 The Gold Standard Approach Some Matrix Algebra

68 =0.35

69 Pearson Correlation Coefficients, N = 97 Prob > |r| under H0: Rho=0 psac_volumesvicpengleason psa c_volume1.000000.58174 <.0001 0.69290 <.0001 0.48144 <.0001 svi0.58174 <.0001 1.000000.68028 <.0001 0.42857 <.0001 cpen0.69290 <.0001 0.68028 <.0001 1.000000.46157 <.0001 gleason0.48144 <.0001 0.42857 <.0001 0.46157 <.0001 1.00000

70 Pearson Correlation Coefficients, N = 97 Prob > |r| under H0: Rho=0 cpengleason cpen 1.000000.46157 <.0001 gleason 0.46157 <.0001 1.00000

71 Pearson Correlation Coefficients, N = 97 Prob > |r| under H0: Rho=0 psac_volumesvicpengleason psa 0.62415 <.0001 0.52862 <.0001 0.55079 <.0001 0.42958 <.0001 c_volume svi cpen gleason

72 Pearson Correlation Coefficients, N = 97 Prob > |r| under H0: Rho=0 psac_volumesvicpengleason psa 0.55079 <.0001 0.42958 <.0001 c_volume svi cpen gleason

73 Full Correlation Matrix psac_volume svi cpen gleason c_wt age bph psa10.6241510.5286190.5507930.429580.0262130.017199-0.01649 c_volume0.62415110.5817420.6928970.4814380.0051070.039094-0.13321 svi0.5286190.58174210.6802840.428573-0.002410.117658-0.11955 cpen0.5507930.6928970.68028410.4615660.0015790.099555-0.08301 gleason0.429580.4814380.4285730.4615661-0.024210.2258520.026826 c_wt0.0262130.005107-0.002410.001579-0.0242110.1643240.321849 age0.0171990.0390940.1176580.0995550.2258520.16432410.366341 bph-0.01649-0.13321-0.11955-0.083010.0268260.3218490.3663411

74 The Correlation of Y with All X’s Full Model psac_volume svi cpen gleason c_wt age bph psa 0.6241510.5286190.5507930.429580.0262130.017199-0.01649

75 Correlation Matrix of X’s Full Model psac_volume svi cpen gleason c_wt age bph psa c_volume10.5817420.6928970.4814380.0051070.039094-0.13321 svi0.58174210.6802840.428573-0.002410.117658-0.11955 cpen0.6928970.68028410.4615660.0015790.099555-0.08301 gleason0.4814380.4285730.4615661-0.024210.2258520.026826 c_wt0.005107-0.002410.001579-0.0242110.1643240.321849 age0.0390940.1176580.0995550.2258520.16432410.366341 bph-0.13321-0.11955-0.083010.0268260.3218490.3663411

76 The Correlation of Y with All X’s Reduced Model psac_volume svi cpen gleason c_wt age bph psa 0.5507930.429580.0262130.017199-0.01649

77 Correlation Matrix of X’s psac_volume svi cpen gleason c_wt age bph psa c_volume svi cpen10.4615660.0015790.099555-0.08301 gleason0.4615661-0.024210.2258520.026826 c_wt0.001579-0.0242110.1643240.321849 age0.0995550.2258520.16432410.366341 bph-0.083010.0268260.3218490.3663411

78 SAS Code and Output proc power ; multreg model=fixed alpha=.05 nfullpredictors= 7 ntestpredictors= 2 rsqfull=0.46 rsqdiff=0.106 ntotal= 97 power=. ; plot x=n min=60 max=100 key = oncurves yopts=(ref=.977 crossref=yes) ; run; The POWER Procedure Type III F Test in Multiple Regression Fixed Scenario Elements Method Exact Model Fixed X Number of Predictors in Full Model 7 Number of Test Predictors 2 Alpha 0.05 R-square of Full Model 0.46 Difference in R-square 0.106 Total Sample Size 97 Computed Power Power 0.977

79 The PROC Power Graph

80 The Calculations Power = 0.97

81 proc power ; multreg model=fixed alpha=.05 nfullpredictors= 7 ntestpredictors= 2 rsqfull=0.2505682 rsqdiff=0.1111111 ntotal= 50 60 70 80 97 power=. ; plot x=n min=60 max=100 key = oncurves yopts=(ref=.8.85.9.95 crossref=yes) ; run; The POWER Procedure Type III F Test in Multiple Regression Fixed Scenario Elements Method Exact Model Fixed X Number of Predictors in Full Model 7 Number of Test Predictors 2 Alpha 0.05 R-square of Full Model 0.250568 R-square of Reduced Model 0.111111 Computed Power N Index Total Power 1 50 0.753 2 60 0.836 3 70 0.894 4 80 0.933 5 97 0.970

82

83 Pearson Correlation Coefficients, N = 97 Prob > |r| under H0: Rho=0 psac_volumesvicpengleason psa 1.00000.35.2 c_volume.351.00000.2 svi.35.21.00000.2 cpen.2 1.00000.2 gleason.2 1.00000

84 Matrix Arithmetic with Compound Correlation Matrix

85 proc power ; multreg model=fixed alpha=.05 nfullpredictors= 4 ntestpredictors= 2 rsqfull=0.22 rsqdiff=0.15.16 ntotal= 40 50 60 70 power=. ; plot x=n min=40 max=100 key = oncurves yopts=(ref=0.8 crossref=yes) ; run; The POWER Procedure Type III F Test in Multiple Regression Fixed Scenario Elements Method Exact Model Fixed X Number of Predictors in Full Model 4 Number of Test Predictors 2 Alpha 0.05 R-square of Full Model 0.22 Computed Power R-square N Index Diff Total Power 1 0.15 40 0.659 2 0.15 50 0.770 3 0.15 60 0.850 4 0.15 70 0.905 5 0.16 40 0.689 6 0.16 50 0.798 7 0.16 60 0.873 8 0.16 70 0.923

86 Calculations The number of predictors of interest 2 The total number of predictors in the model 4

87 Approaches in Estimating the Parameters to Calculate Power Plan A Complete specification of the parts for the expression: = 0.34 = 0.45

88 Approaches in Estimating the Parameters to Calculate Power Plan A Complete specification of the parts for the expression:

89 F(2,92) F(2,92,19.4) Critical Value for alpha =.05 3.07 Noncentrality Parameter 19.4 Total area in blue. Power = 0.97


Download ppt "A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics."

Similar presentations


Ads by Google