Biostatistics Case Studies 2005 Peter D. Christenson Biostatistician Session 4: Taking Risks and Playing the Odds: OR vs.

Slides:



Advertisements
Similar presentations
High Resolution studies
Advertisements

Logistic Regression I Outline Introduction to maximum likelihood estimation (MLE) Introduction to Generalized Linear Models The simplest logistic regression.
Simple Logistic Regression
HSRP 734: Advanced Statistical Methods July 24, 2008.
Categorical Data. To identify any association between two categorical data. Example: 1,073 subjects of both genders were recruited for a study where the.
April 25 Exam April 27 (bring calculator with exp) Cox-Regression
Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number.
Intermediate methods in observational epidemiology 2008 Confounding - II.
PH6415 Review Questions. 2 Question 1 A journal article reports a 95%CI for the relative risk (RR) of an event (treatment versus control as (0.55, 0.97).
Measures of association
Point and Confidence Interval Estimation of a Population Proportion, p
BIOST 536 Lecture 3 1 Lecture 3 – Overview of study designs Prospective/retrospective  Prospective cohort study: Subjects followed; data collection in.
Chapter 11 Survival Analysis Part 2. 2 Survival Analysis and Regression Combine lots of information Combine lots of information Look at several variables.
EPI 809/Spring Multiple Logistic Regression.
Categorical Data Analysis: Stratified Analyses, Matching, and Agreement Statistics Biostatistics March 2007 Carla Talarico.
Notes on Logistic Regression STAT 4330/8330. Introduction Previously, you learned about odds ratios (OR’s). We now transition and begin discussion of.
Log-linear analysis Summary. Focus on data analysis Focus on underlying process Focus on model specification Focus on likelihood approach Focus on ‘complete-data.
BIOST 536 Lecture 4 1 Lecture 4 – Logistic regression: estimation and confounding Linear model.
C. Logit model, logistic regression, and log-linear model A comparison.
Linear Regression and Correlation Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and the level of.
Assessing Survival: Cox Proportional Hazards Model Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Logistic Regression II Simple 2x2 Table (courtesy Hosmer and Lemeshow) Exposure=1Exposure=0 Disease = 1 Disease = 0.
SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005.
7 Regression & Correlation: Rates Basic Medical Statistics Course October 2010 W. Heemsbergen.
Multiple Choice Questions for discussion
Measuring Associations Between Exposure and Outcomes.
Evidence-Based Medicine 4 More Knowledge and Skills for Critical Reading Karen E. Schetzina, MD, MPH.
Simple Linear Regression
Biostatistics Case Studies Peter D. Christenson Biostatistician Session 6: Control of Confounding Bias Using Propensity Scoring.
Biostatistics Case Studies Peter D. Christenson Biostatistician Session 5: Analysis Issues in Large Observational Studies.
Statistical Bootstrapping Peter D. Christenson Biostatistician January 20, 2005.
Logit model, logistic regression, and log-linear model A comparison.
Measures of Association
Assessing Survival: Cox Proportional Hazards Model
Biostatistics Case Studies Peter D. Christenson Biostatistician Session 2: Diagnostic Classification.
Excepted from HSRP 734: Advanced Statistical Methods June 5, 2008.
Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Approaches to the measurement of excess risk 1. Ratio of RISKS 2. Difference in RISKS: –(risk in Exposed)-(risk in Non-Exposed) Risk in Exposed Risk in.
October 15. In Chapter 19: 19.1 Preventing Confounding 19.2 Simpson’s Paradox 19.3 Mantel-Haenszel Methods 19.4 Interaction.
1October In Chapter 17: 17.1 Data 17.2 Risk Difference 17.3 Hypothesis Test 17.4 Risk Ratio 17.5 Systematic Sources of Error 17.6 Power and Sample.
Estimating Causal Effects from Large Data Sets Using Propensity Scores Hal V. Barron, MD TICR 5/06.
01/20151 EPI 5344: Survival Analysis in Epidemiology Survival curve comparison (non-regression methods) March 3, 2015 Dr. N. Birkett, School of Epidemiology,
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Assessing Binary Outcomes: Logistic Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
MBP1010 – Lecture 8: March 1, Odds Ratio/Relative Risk Logistic Regression Survival Analysis Reading: papers on OR and survival analysis (Resources)
Biostatistics Case Studies 2005 Peter D. Christenson Biostatistician Session 6: “Number Needed to Treat” to Prevent One Case.
1 EPI 5240: Introduction to Epidemiology Measures used to compare groups October 5, 2009 Dr. N. Birkett, Department of Epidemiology & Community Medicine,
Limited Dependent Variables Ciaran S. Phibbs. Limited Dependent Variables 0-1, small number of options, small counts, etc. 0-1, small number of options,
Case Control Study : Analysis. Odds and Probability.
A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand.
1 Multivariable Modeling. 2 nAdjustment by statistical model for the relationships of predictors to the outcome. nRepresents the frequency or magnitude.
1 STA 617 – Chp10 Models for matched pairs Summary  Describing categorical random variable – chapter 1  Poisson for count data  Binomial for binary.
Log-linear Models HRP /03/04 Log-Linear Models for Multi-way Contingency Tables 1. GLM for Poisson-distributed data with log-link (see Agresti.
Organization of statistical research. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and.
Matching. Objectives Discuss methods of matching Discuss advantages and disadvantages of matching Discuss applications of matching Confounding residual.
BIOSTATISTICS Lecture 2. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and creating methods.
POPLHLTH 304 Regression (modelling) in Epidemiology Simon Thornley (Slides adapted from Assoc. Prof. Roger Marshall)
1 Introduction to Modeling Beyond the Basics (Chapter 7)
Applied Epidemiologic Analysis - P8400 Fall 2002 Labs 6 & 7 Case-Control Analysis ----Logistic Regression Henian Chen, M.D., Ph.D.
A short introduction to epidemiology Chapter 6: Precision Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand.
Introduction to Biostatistics, Harvard Extension School, Fall, 2005 © Scott Evans, Ph.D.1 Contingency Tables.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: Multiple, Logistic and Proportional Hazards Regression.
Notes on Logistic Regression
Multiple logistic regression
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Presentation transcript:

Biostatistics Case Studies 2005 Peter D. Christenson Biostatistician Session 4: Taking Risks and Playing the Odds: OR vs. RR

Case Study What's the Relative Risk? A Method of Correcting the Odds Ratio in Cohort Studies of Common Outcomes Jun Zhang, MB, PhD; Kai F. Yu, PhD JAMA. 1998;280: ABSTRACT Logistic regression is used frequently in cohort studies and clinical trials. When the incidence of an outcome of interest is common in the study population (>10%), the adjusted odds ratio derived from the logistic regression can no longer approximate the risk ratio. The more frequent the outcome, the more the odds ratio overestimates the risk ratio when it is more than 1 or underestimates it when it is less than 1. We propose a simple method to approximate a risk ratio from the adjusted odds ratio and derive an estimate of an association or treatment effect that better represents the true relative risk

Further OR to RR References 1.McNutt LA, Wu C, Xue X, et. al. Estimating the relative risk in cohort studies and clinical trials of common outcomes. American Journal of Epidemiology 2003 ; 157: Greenland S. Model-based estimation of relative risks and other epidemiologic measures in studies of common outcomes and in case-control studies. American Journal of Epidemiology 2004 ; 160:

Goals Why is RR vs. OR an issue? Examine the solution given in our case study. Point out difficulties with this solution. Suggest other solutions.

Why Use Odds Ratio at All? In case-control studies, cannot measure RR. This is due to selection of controls. Example: Risk Factor Cases Controls1 Controls Ratio of (90/150) (90/690) Percents /(10/50) /(10/410) = 3.0 = 5.3 Odds [(90/150)/(60/150)] [(90/690)/(600/690)] Ratio /[(10/50)/(40/50)] /[(10/410)/(400/410)] = 6.0 = 6.0

Cohort Study Example Risk Factor Diseased + 50/100 = 50% - 30/100 = 30% RR = 50%/30% = OR = (50%/50%)/(30%/70%) = 2.33

Cohort Study Example – Adjust for Gender Strata Risk Factor Diseased Male + 50/100 = 50% - 30/100 = 30% Female + 80/100 = 80% - 40/120 = 33% RR = (Male Weight)(50%/30%) + (Female Weight)(80%/33%) = (200/420) (220/420)2.40 = 2.06 Other weights may be used.

Need for Additional Adjustment Limitations of stratification: Only a few covariates can be adjusted. Covariates need to be categorical, or made so. Regression adjustment: Allows more covariates (up to ~ 10 subjects per parameter). Allows continuous covariates. Logistic, log-linear, poisson regression.

Logistic Regression Model: Log[Odds of disease] = Log[ Prob(disease) / Prob(non-disease) ] = function( exposure, covariates, interactions) Thus, only functions of the odds can be estimated, e.g., antilog of Log[Odds of disease] for exposed minus Log[Odds of disease] for unexposed, i.e., odds ratio (OR). E.g., if log(odds) = (exposed) + 0.2(covariate), then OR = exp(0.8) = 2.23, if exposed =1 or 0 for Yes or No

RR is Preferred To obtain RR rather than OR, we can: 1.Convert OR from logistic regression to RR, or 2.Use a model other than logistic that also fits the data. We consider (2) later (down 6 slides). The solution proposed in our case study is to apply (1). Their solution is displayed on the next slide.

Case Study, Page 1691, 1 st column In a cohort study, P 0 indicates the incidence of the outcome of interest in the nonexposed group and P 1 in the exposed group; OR, odds ratio; and RR, risk ratio: OR=(P 1 /(1-P 1) )/(P 0 /(1-P 0) ); thus, (P 1 /P 0 )=OR/[(1-P 0 )+(P 0 xOR)]. Since RR=P 1 /P 0, the corrected

Table from Case Study The authors perform a simulation, setting the RR to be constant over all strata of the covariates, and show that their conversion does correct the logistic OR to a RR close to the stratified (M-H) RR which is correct here, with categorical strata:

Difficulties with the OR to RR Conversion The P 0 is assumed to be fixed and known, so confidence intervals for the RR are too narrow. The formula is only valid if RR is constant over all covariate patterns, or is used for one particular set of covariates. Thus, it does not do the intended job of adjustment, i.e., account for confounding.

Breast Cancer Example See table on next slide. Outcome: 5-year mortality. Predictor: receptor level, low vs. high; low is suspected mortality risk factor. Covariate: CA staging I, II, III. Want: RR of death (low over high receptor), adjusted for stage. For this study, do not need to model; can use observed death rates. Thus, we can compare the logistic OR to RR results with actual RRs. Since the RRs increase with stage, the OR to RR conversion does not perform well.

Greenland (2004) Comparison of RRs Adjusted RR * * Using Zhang & Yu: OR=2.51 and p0=(5+17+9)/ ( )=0.215: Converted RR= 2.51/[(1-.215) + (0.215*2.51)]=1.89 2/ RR by Strata:

Alternative #1 to Zhang & Yu Conversion E.g., if log(p/(1-p)) = (exposed) + 0.2(covariate) = u, then p=e u /(1+e u ), from algebra, where p = prob (death). The OR to RR conversion over-estimates RR. An alternative is to find Prob[death] for a low receptor and for a high receptor population, using the distribution of staging (“Standardized RRs”), and take the ratio of these probabilities. These probabilities can be found from the logistic equation: In fact, these probabilities are given in the row for logistic in the previous table. We now find the standardized risk ratio using these probabilities:

Alternative #1 continued If all women were at the low receptor level, the standardized (to the staging distribution) risk of death is: 0.190(0.349) (0.500) (0.151) = Similarly, at the high receptor level, 0.086(0.349) (0.500) (0.151) = Thus, the standardized RR is 0.401/0.239 = 1.68, which is close to the adjusted RR from the observed death rates. Note that the weights (0.349, 0.500, 0.151) are the relative proportions of women at each stage, e.g., 0.349=(12+55)/( ).

Alternative #2 to Zhang & Yu Conversion Logistic: log(p/(1-p)) = (exposed) + 0.2(covariate) Log-linear: log(p) = (exposed) + 0.1(covariate) (made-up numbers) Use a model other than the logistic, provided that it fits as well. The Greenland(2004) table used two other models, log-linear (binomial) and Poisson regression. The log-linear binomial fits log(prob(death), compared to the logistic that fits log(odds): From the log-linear model, adjusted RR is found directly as antilog(0.6) = This avoids the unwanted OR entirely.

SAS Code for Log-Linear Model of Breast Cancer data ca; input receptor stage death survive total; datalines; ; proc genmod; class receptor stage; model death/total = receptor stage/dist=binomial link=log; estimate 'r1' intercept 1 receptor 1 0 stage 1 0 0/exp; estimate 'r2' intercept 1 receptor 0 1 stage 1 0 0/exp; estimate 'r3' intercept 1 receptor 1 0 stage 0 1 0/exp; estimate 'r4' intercept 1 receptor 0 1 stage 0 1 0/exp; estimate 'r5' intercept 1 receptor 1 0 stage 0 0 1/exp; estimate 'r6' intercept 1 receptor 0 1 stage 0 0 1/exp; estimate 'rr' receptor 1 -1 /exp; run; *r1-r6 are in the Greenland(2004) table;

SAS Code for Logistic Model of Breast Cancer data ca; input receptor stage death survive total; datalines; ; proc logistic data=ca; class receptor stage; model death/total = receptor stage/lackfit; output out=out1 pred=predicted l=lower u=upper; run; proc print data=out1; run; *values for predicted are in the Greenland(2004) table;

Conclusions The JAMA OR to RR conversion is faulty. Can use logistic regression, and find standardized risks, then take ratio to get RR. Can use log-linear models that model risk directly, rather than odds. Should check whether any model adequately fits the data. Often, several do. The major advantage of the log-linear model is that confidence intervals for the adjusted RR are much easier. I know of no software that gives CIs for the standardized RRs.