The %LRpowerCorr10 SAS Macro Power Estimation for Logistic Regression Models with Several Predictors of Interest in the Presence of Covariates D. Keith.

Slides:



Advertisements
Similar presentations
Inferential Statistics and t - tests
Advertisements

Continued Psy 524 Ainsworth
A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
Introduction to Predictive Modeling with Examples Nationwide Insurance Company, November 2 D. A. Dickey.
Logistic Regression I Outline Introduction to maximum likelihood estimation (MLE) Introduction to Generalized Linear Models The simplest logistic regression.
Logistic Regression.
Econ 140 Lecture 81 Classical Regression II Lecture 8.
EPI 809/Spring Probability Distribution of Random Error.
Overview of Logistics Regression and its SAS implementation
Objectives (BPS chapter 24)
Generalized Linear Models (GLM)
Creating Graphs on Saturn GOPTIONS DEVICE = png HTITLE=2 HTEXT=1.5 GSFMODE = replace; PROC REG DATA=agebp; MODEL sbp = age; PLOT sbp*age; RUN; This will.
April 25 Exam April 27 (bring calculator with exp) Cox-Regression
The Multiple Regression Model Prepared by Vera Tabakova, East Carolina University.
Multiple regression analysis
The Simple Regression Model
Correlation. Two variables: Which test? X Y Contingency analysis t-test Logistic regression Correlation Regression.
Chapter 11 Survival Analysis Part 2. 2 Survival Analysis and Regression Combine lots of information Combine lots of information Look at several variables.
1 Regression Analysis Regression used to estimate relationship between dependent variable (Y) and one or more independent variables (X). Consider the variable.
OLS versus MLE Example YX Here is the data:
This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3.
Logistic Regression II Simple 2x2 Table (courtesy Hosmer and Lemeshow) Exposure=1Exposure=0 Disease = 1 Disease = 0.
1 G Lect 11W Logistic Regression Review Maximum Likelihood Estimates Probit Regression and Example Model Fit G Multiple Regression Week 11.
Logistic Regression I HRP 261 2/09/04 Related reading: chapters and of Agresti.
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
1 Experimental Statistics - week 4 Chapter 8: 1-factor ANOVA models Using SAS.
April 11 Logistic Regression –Modeling interactions –Analysis of case-control studies –Data presentation.
1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation.
Lecture 4 SIMPLE LINEAR REGRESSION.
EIPB 698E Lecture 10 Raul Cruz-Cano Fall Comments for future evaluations Include only output used for conclusions Mention p-values explicitly (also.
1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation Note: Homework Due Thursday.
Regression Examples. Gas Mileage 1993 SOURCES: Consumer Reports: The 1993 Cars - Annual Auto Issue (April 1993), Yonkers, NY: Consumers Union. PACE New.
Regression For the purposes of this class: –Does Y depend on X? –Does a change in X cause a change in Y? –Can Y be predicted from X? Y= mX + b Predicted.
© 2001 Prentice-Hall, Inc. Statistics for Business and Economics Simple Linear Regression Chapter 10.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
Chapter 11 Linear Regression Straight Lines, Least-Squares and More Chapter 11A Can you pick out the straight lines and find the least-square?
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
6-1 Introduction To Empirical Models Based on the scatter diagram, it is probably reasonable to assume that the mean of the random variable Y is.
6-3 Multiple Regression Estimation of Parameters in Multiple Regression.
Topic 13: Multiple Linear Regression Example. Outline Description of example Descriptive summaries Investigation of various models Conclusions.
1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Simple Linear Regression. Data available : (X,Y) Goal : To predict the response Y. (i.e. to obtain the fitted response function f(X)) Least Squares Fitting.
Environmental Modeling Basic Testing Methods - Statistics III.
Applied Epidemiologic Analysis - P8400 Fall 2002 Labs 6 & 7 Case-Control Analysis ----Logistic Regression Henian Chen, M.D., Ph.D.
LOGISTIC REGRESSION Binary dependent variable (pass-fail) Odds ratio: p/(1-p) eg. 1/9 means 1 time in 10 pass, 9 times fail Log-odds ratio: y = ln[p/(1-p)]
1 Statistics 262: Intermediate Biostatistics Regression Models for longitudinal data: Mixed Models.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
CRD, Strength of Association, Effect Size, Power, and Sample Size Calculations BUSI 6480 Lecture 4.
Experimental Statistics - week 9
Applied Epidemiologic Analysis - P8400 Fall 2002 Labs 6 & 7 Case-Control Analysis ----Logistic Regression Henian Chen, M.D., Ph.D.
Significance Tests for Regression Analysis. A. Testing the Significance of Regression Models The first important significance test is for the regression.
1 Experimental Statistics - week 12 Chapter 11: Linear Regression and Correlation Chapter 12: Multiple Regression.
Logistic Regression For a binary response variable: 1=Yes, 0=No This slide show is a free open source document. See the last slide for copyright information.
Analysis of matched data Analysis of matched data.
1 Experimental Statistics - week 11 Chapter 11: Linear Regression and Correlation.
BINARY LOGISTIC REGRESSION
Inference for Regression (Chapter 14) A.P. Stats Review Topic #3
Notes on Logistic Regression
ביצוע רגרסיה לוגיסטית. פרק ה-2
6-1 Introduction To Empirical Models
Logistic Regression.
Introduction to Logistic Regression
Logistic Regression.
Correlation and Simple Linear Regression
Presentation transcript:

The %LRpowerCorr10 SAS Macro Power Estimation for Logistic Regression Models with Several Predictors of Interest in the Presence of Covariates D. Keith Williams M.P.H. Ph.D. Zoran Bursac M.P.H. Ph.D. Department of Biostatistics University of Arkansas for Medical Sciences

The Premise for Linear and Logistic Regression Power and Sample Size Power to detect significance among specific predictors in the presence of other covariates in a model. For linear regression Proc Power works great! Logistic regression power estimation is ‘quirky’

Common Approaches to Estimate Logistic Regression Power Power for one predictor possibly in the presence of other covariates. There may exist correlation among these predictors using %powerlog macro A weakness…commonly we are interested in power to detect the significance of more than one predictor

A quick look at %LRpowerCorr10

LRpowerCorr10 1.Up to 10 predictors 2.2 binary, 4 uniform (-3,3), and 4 normal 3.Specify a correlation among predictors 4.Specify an odds ratio value for the predictors 5.Specify the set of factors of interest and the set of covariates

A Power Scenario logit = ln (1.5) x1 + ln(1.5) x2 + ln(1.1) x3 + ln(1.05) x4 + ln(1.02) x5 + ln(1.05) x6 + ln(1.01) x7 + ln(1.05)x8 +ln( 1.02) x9 + ln(1.03) x10 Risk factors of interest Covariates of interest

%LRpowerCorr Example %LRpowerCorr10(2000,1000,.2,.1, 1.5,1.5, 1.1, 1.05,1.02,1.05, 1.01,1.05,1.02,1.03, cx1 cx2 cx3 cx4 cx5 cx6 cx7 cx8 cx9 cx10, cx4 cx5 cx6 cx7 cx8 cx9 cx10,.05, 3, 0.1,0.5); The 3 risk factors of interest Full model Reduced model Level of signficance The number of terms of interest Prob of ‘1’ for the binary cx1 and cx2 n number of simulations Correlation among predictors mean number of ‘1’s

%LRpowerCorr10 Output Sample size = 2000; Simulations = 1000; Rho =.2; P(Y=1) =.1 OR1=1.5, OR2=1.5, OR3=1.1, OR4=1.05, OR5=1.02,OR6=1.05 OR7=1.01, OR8=1.05, OR9=1.02, OR10=1.03 Full Model: cx1 cx2 cx3 cx4 cx5 cx6 cx7 cx8 cx9 cx10 Reduced Model: cx4 cx5 cx6 cx7 cx8 cx9 cx10 Power LCL UCL 88% 86% 90%

A look at regular linear regression. The basic structure is the same.

A Key Point about Linear Regression We rarely have a conjectured values for particular betas in a regular linear regression Therefore for linear regression models, one conjectures the difference in R-square between a model that includes predictors of interest and a model without these predictors.

Example Data Set

The Hypothetical Scenario A model with 4 terms Predictors for PSA of interest that we choose to power: 1.SVI 2.c_volume Two Covariates to be included : cpen, gleason

Details The full model We want to power the test that a model with these 2 predictors is statistically better than a model excluding them. The reduced model

The Corresponding Hypothesis H(o): H(a): At least one of the above is non-zero in the full model when the difference in Rsquare = ?

Lets go back through those last 3 slides again

Hypothetical Full Model Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Predictors of interest Note Parameter Estimates VariableDF Parameter Estimate Standard Errort ValuePr > |t| Intercept c_volume svi cpen gleason

Hypothetical Reduced Model Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Note R-Square difference 0.45 – 0.34= 0.11 Parameter Estimates VariableDF Parameter Estimate Standard Errort ValuePr > |t| Intercept cpen <.0001 gleason

proc power ; multreg model=fixed alpha=.05 nfullpredictors= 4 ntestpredictors= 2 rsqfull=0.45 rsreduced=0.34 ntotal= power=. ; plot x=n min=40 max=100 key = oncurves yopts=(ref= crossref=yes) ; run; The POWER Procedure Type III F Test in Multiple Regression Fixed Scenario Elements Method Exact Model Random X Number of Predictors in Full Model 4 Number of Test Predictors 2 Alpha 0.05 R-square of Full Model 0.45 Difference in R-square 0.11 Computed Power N Index Total Power

Now the logistic regression case

Logistic Regression LR test review

Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC SC Log L The SAS System Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept age sesdum sesdum sector age_ses age_ses age_sect ses2_sect ses3_sect

Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC SC Log L Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept <.0001 age sesdum sesdum sector

The Corresponding Hypothesis H(o): H(a): At least one of the above is non-zero in the full model LRchisq = – = Pvalue = 0.22 (Implies none are helpful)

Power for Logistic Models Background Most existing tools are based on Hsieh, Block, and Larsen (1998) paper, and Agresti (1996) text. %powerlog macro and other software. Recent publication by Demidenko (2008)

SAS 9.2 Proc Power for Logistic The LOGISTIC statement performs power and sample size analyses for the likelihood ratio chi- square test of a single predictor in binary logistic regression, possibly in the presence of one or more covariates. All predictor variables are assumed to be independent of each other. So, this analysis is not applicable to studies with correlated predictors — for example, most observational studies (as opposed to randomized studies).LOGISTIC

Common Approaches to Estimate Logistic Regression Power Calculate the power to detect significance of one predictor possibly in the presence of other predictors. There may exist correlation among these predictors using %powerlog macro A weakness…In many instances we are interested in power to detect the significance of more than one predictor

A demonstration of the %Powerlog macro

The %PowerLog Macro Logistic Regression Power for a one s.d. unit increase from the mean of X1 Any number of other covariates in the model are accounted for by putting the R-Square of a regular regression model:

%Powerlog Function Example %powerlog(p1=.5, p2=.6667, power=.8,rsq=%str(0,.0565,.1141),alpha=.05); Prob of 1 at mean of X1 Prob of 1 at mean + SD of X1 Three hypothetical values of the rsquare of X1 regressed on any number of other covariates

%Powerlog Output Alpha=.05, p1=.5 p2=.6667

%LRpowerCorr10 versus %powerlog n=70 Sample size = 70; Simulations = 1000; Rho = 0 ; P(Y=1) =.5 OR1=1, OR2=1, OR3=1, OR4=1, OR5=1, OR6=1 OR7=2, OR8=1, OR9=1, OR10=1 Full Model: cx7 cx8 cx9 cx10 Reduced Model: cx8 cx9 cx10 Power LCL UCL 79% 76% 81%

%LRpowerCorr10 versus %powerlog n=75 Sample size = 75; Simulations = 1000; Rho =.1 ; P(Y=1) =.5 OR1=1, OR2=1, OR3=1, OR4=1, OR5=1, OR6=1 OR7=2, OR8=1, OR9=1, OR10=1 Full Model: cx7 cx8 cx9 cx10 Reduced Model: cx8 cx9 cx10 Power LCL UCL 81% 78% 83%

%LRpowerCorr10 versus %powerlog n=80 Sample size = 80; Simulations = 1000; Rho =.2 ; P(Y=1) =.5 OR1=1, OR2=1, OR3=1, OR4=1, OR5=1, OR6=1 OR7=2, OR8=1, OR9=1, OR10=1 Full Model: cx7 cx8 cx9 cx10 Reduced Model: cx8 cx9 cx10 Power LCL UCL 80% 77% 82%

Again…only one predictor of interest using %powerlog

The %LRpowerCorr10 Macro Power Estimation –One or more predictors of interest –Different distributions of predictors –Other covariates in model –Correlation among predictors –Specify OR values associated with predictors –Average proportion of ‘1’s

%LRpowerCorr (N, Simulations, Correlation) Define logit: Specify associations between each covariate x and outcome y through parameter estimate . PROC LOGISTIC: fit the full multivariate model. Save -2LnLikelihood. PROC LOGISTIC: fit the reduced multivariate model. Save -2LnLikelihood. Perform Likelihood Ratio test. (The difference in the reduced and full -2LnLikelihoods) Is the resulting chi-square test statistic> chi-square critical value? (With respect to correct number of d.f.) Loop If so reject the null. If not fail to reject the null. Save the result. Calculate the proportion of correct rejections (i.e. power to detect the specified associations) Sample of size N from the specified logit. Convert logits to binary.

SAMPLESIZE The sample size to be evaluated NSIMS The number of simulation runs P The correlation among the predictors AVEP The average number of “1” responses in the samples with only intercept in model OR1 - OR2 The odds ratios associated with binary CX1-CX2 OR3 – OR6 The odds ratio associated with uniform (-3,3) CX3-CX6 OR7 - CX10 The odds ratio associated with N(0,1) CX7-CX10 FULLMODEL The predictor terms in the full model among CX1 CX2 CX3 CX4 CX5 CX6 CX7 CX8 CX9 CX10 REDUCEDMODEL The predictor terms in the reduced model among CX1 CX2 CX3 CX4 CX5 CX6 CX7 CX8 CX9 CX10 ALPHA The significance level of the testing DFTEST The degrees freedom of the testing PCX1 Probability of ‘1’ for binary CX1 PCX2 Probability of ‘1’ for binary CX2 %LRpowerCorr10 Variables

Example from Hosmer Applied Logistic Regression ‘The low birth weight study’ Primary Risk Factors of Interest Confounders

We wish to find the power to detect significance for at least one of the risk factors in the full model Full Model Reduced Model

The Corresponding Hypothesis H(o): H(a): At least one of the above is non-zero in the full model

Hypothesized Odds Ratios AGE OR=1.1 (CX7) Normal LBT OR=1.5 (CX1) Binary RACE OR=1.5 (CX2) Binary FTV OR=1.1 (CX3) Uniform SMOKE OR=1.02 (CX8) Normal PLT OR=1.02 (CX9) Normal HT OR=1.02 (CX10) Normal UI OR=1.02 (CX4) Uniform P(Y=1)=0.1 Investigate N = 900 Rho=0.2

Macro Commands %LRpowerCorr10 (900,1000,.2,.1, 1.5,1.5, 1.1,1.02,1.02,1.02, cx1 cx2 cx3 cx7 cx4 cx8 cx9 cx10, cx4 cx8 cx9 cx10,.05, 4, 0.25,0.5);

Output Sample size = 900; Simulations = 1000; Rho =.2; P(Y=1) =.1 OR1=1.5, OR2=1.5, OR3=1.1, OR4=1.02, OR5=1.02,OR6=1.02 OR7=1.1, OR8=1.02, OR9=1.02, OR10=1.02 Full Model: cx1 cx2 cx3 cx7 cx4 cx8 cx9 cx10 Reduced Model: cx4 cx8 cx9 cx10 Power LCL UCL 73% 70% 75%

Recent Development %Quickpower Macro %quickpower2(100,.2,.1, 1.5,1.5, 1.1,1.02,1.02,1.02, cx1 cx2 cx3 cx7 cx4 cx8 cx9 cx10, cx4 cx8 cx9 cx10, 8,.05, 4, 0.25,0.5);

A trick to get a good guess for N The POWER Procedure Type III F Test in Multiple Regression Fixed Scenario Elements Method Exact Model Random X Number of Predictors in Full Model 8 Number of Test Predictors 4 Alpha 0.05 R-square of Full Model R-square of Reduced Model Nominal Power 0.8 Computed N Total Actual N Power Total

Resulting in… Sample size = 962; Simulations = 1000; Rho =.2; P(Y=1) =.1 OR1=1.5, OR2=1.5, OR3=1.1, OR4=1.02, OR5=1.02,OR6=1.02 OR7=1.1, OR8=1.02, OR9=1.02, OR10=1.02 Full Model: cx1 cx2 cx3 cx7 cx4 cx8 cx9 cx10 Reduced Model: cx4 cx8 cx9 cx10 Power LCL UCL 76% 74% 79%

LRpowerCorr C Macro Approximate Power Curve %LRpowerCorr10C (50,150,500,.1,.5, 1,1, 1,1,1,1, 2.0,1,1,1, cx7 cx8 cx9 cx10, cx8 cx9 cx10,.05, 1,.25,.25); ods graphics on; proc logistic data=base desc plots(only)=(roc(id=obs) effect); model reject=n1/; run; ods graphics off;

The SAS Macros Text file versions of the %LRpowerCorr and %quickpower SAS macros with an example Copy and paste into SAS to run.