Lecture 17: Regression for Case-control Studies BMTRY 701 Biostatistical Methods II.

Slides:



Advertisements
Similar presentations
Continued Psy 524 Ainsworth
Advertisements

Logistic Regression Psy 524 Ainsworth.
Logistic Regression I Outline Introduction to maximum likelihood estimation (MLE) Introduction to Generalized Linear Models The simplest logistic regression.
Logistic Regression.
What is Interaction for A Binary Outcome? Chun Li Department of Biostatistics Center for Human Genetics Research September 19, 2007.
Lecture 16: Logistic Regression: Goodness of Fit Information Criteria ROC analysis BMTRY 701 Biostatistical Methods II.
SC968: Panel Data Methods for Sociologists Random coefficients models.
Lecture 4 (Chapter 4). Linear Models for Correlated Data We aim to develop a general linear model framework for longitudinal data, in which the inference.
Matched designs Need Matched analysis. Incorrect unmatched analysis. cc cc exp,exact Proportion | Exposed Unexposed | Total Exposed
Repeated Measures, Part 3 May, 2009 Charles E. McCulloch, Division of Biostatistics, Dept of Epidemiology and Biostatistics, UCSF.
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
Multilevel Models 4 Sociology 8811, Class 26 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
Multinomial Logit Sociology 8811 Lecture 11 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
FINAL REVIEW BIOST/EPI 536 December 14, Outline Before the midterm: Interpretation of model parameters (Cohort vs case-control studies) Hypothesis.
In previous lecture, we highlighted 3 shortcomings of the LPM. The most serious one is the unboundedness problem, i.e., the LPM may make the nonsense predictions.
Sociology 601 Class 28: December 8, 2009 Homework 10 Review –polynomials –interaction effects Logistic regressions –log odds as outcome –compared to linear.
BIOST 536 Lecture 3 1 Lecture 3 – Overview of study designs Prospective/retrospective  Prospective cohort study: Subjects followed; data collection in.
So far, we have considered regression models with dummy variables of independent variables. In this lecture, we will study regression models whose dependent.
In previous lecture, we dealt with the unboundedness problem of LPM using the logit model. In this lecture, we will consider another alternative, i.e.
Event History Models Sociology 229: Advanced Regression Class 5
BIOST 536 Lecture 4 1 Lecture 4 – Logistic regression: estimation and confounding Linear model.
Logistic regression for binary response variables.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Logistic Regression. Outline Review of simple and multiple regressionReview of simple and multiple regression Simple Logistic RegressionSimple Logistic.
Multiple Choice Questions for discussion
Methods Workshop (3/10/07) Topic: Event Count Models.
1 BINARY CHOICE MODELS: PROBIT ANALYSIS In the case of probit analysis, the sigmoid function is the cumulative standardized normal distribution.
Lecture 15: Logistic Regression: Inference and link functions BMTRY 701 Biostatistical Methods II.
Biostatistics Case Studies 2005 Peter D. Christenson Biostatistician Session 4: Taking Risks and Playing the Odds: OR vs.
Basic epidemiologic analysis with Stata Biostatistics 212 Lecture 5.
April 11 Logistic Regression –Modeling interactions –Analysis of case-control studies –Data presentation.
Scientific question: Does the lunch intervention impact cognitive ability? The data consists of 4 measures of cognitive ability including:Raven’s score.
Lecture 3: Inference in Simple Linear Regression BMTRY 701 Biostatistical Methods II.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
HSRP 734: Advanced Statistical Methods July 17, 2008.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
MBP1010 – Lecture 8: March 1, Odds Ratio/Relative Risk Logistic Regression Survival Analysis Reading: papers on OR and survival analysis (Resources)
Lecture 3 Linear random intercept models. Example: Weight of Guinea Pigs Body weights of 48 pigs in 9 successive weeks of follow-up (Table 3.1 DLZ) The.
Logistic Regression Applications Hu Lunchao. 2 Contents 1 1 What Is Logistic Regression? 2 2 Modeling Categorical Responses 3 3 Modeling Ordinal Variables.
Lecture 18 Ordinal and Polytomous Logistic Regression BMTRY 701 Biostatistical Methods II.
Lecture 12: Cox Proportional Hazards Model
BIOST 536 Lecture 1 1 Lecture 1 - Introduction Overview of course  Focus is on binary outcomes  Some ordinal outcomes considered Simple examples Definitions.
Multiple Logistic Regression STAT E-150 Statistical Methods.
The dangers of an immediate use of model based methods The chronic bronchitis study: bronc: 0= no 1=yes poll: pollution level cig: cigarettes smokes per.
Applied Epidemiologic Analysis - P8400 Fall 2002 Labs 6 & 7 Case-Control Analysis ----Logistic Regression Henian Chen, M.D., Ph.D.
Logistic Regression Analysis Gerrit Rooks
Logistic regression (when you have a binary response variable)
POPLHLTH 304 Regression (modelling) in Epidemiology Simon Thornley (Slides adapted from Assoc. Prof. Roger Marshall)
Matched Case-Control Study Duanping Liao, MD, Ph.D Phone:
Conditional Logistic Regression Epidemiology/Biostats VHM812/802 Winter 2016, Atlantic Veterinary College, PEI Raju Gautam.
Exact Logistic Regression
1 Ordinal Models. 2 Estimating gender-specific LLCA with repeated ordinal data Examining the effect of time invariant covariates on class membership The.
Logistic Regression and Odds Ratios Psych DeShon.
Birthweight (gms) BPDNProp Total BPD (Bronchopulmonary Dysplasia) by birth weight Proportion.
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
Analysis of matched data Analysis of matched data.
Chapter 13 LOGISTIC REGRESSION. Set of independent variables Categorical outcome measure, generally dichotomous.
Logistic Regression APKC – STATS AFAC (2016).
Discussion: Week 4 Phillip Keung.
Lecture 18 Matched Case Control Studies
Generalized Linear Models
Introduction to Logistic Regression
Multiple logistic regression
Problems with infinite solutions in logistic regression
Count Models 2 Sociology 8811 Lecture 13
Logistic Regression.
Presentation transcript:

Lecture 17: Regression for Case-control Studies BMTRY 701 Biostatistical Methods II

Old business: Comparing AUCs  Good reference: Hanley and McNeill “Comparing AUCs for ROC curves based on the same data” See class website for pdf.

Additional Reading in Logistic REgression  Hosmer and Lemeshow, Applied Logistic Regression   n/Logistic.html n/Logistic.html  regression.html regression.html  Regression.pdf Regression.pdf  Etc: Google “logistic regression”

Case Control Studies in Logistic Regression  /online/ma_chap11.pdf /online/ma_chap11.pdf  How is a case-control study performed?  What is the outcome and what is the predictor in the regression setting?

Recall the simple 2x2 example  Odds ratio for 2x2 table can be used in case- control studies  Similarly, the logistic regression model can be used treating ‘case’ status as the outcome.  It has been shown that the results do not depend on the sampling (i.e., cohort vs. case-control study).

Example: Case control study of HPV and Oropharyngeal Cancer  Gillison et al. ( 944) 944  100 cases and 200 controls with oropharyngeal cancer  How was the sampling done?

Data on Case vs. HPV > table(data$hpv16ser, data$control) > epitab(data$hpv16ser, data$control) $tab Outcome Predictor 0 p0 1 p1 oddsratio lower upper p.value NA NA NA e-21

Multiple Logistic Regression  This is not ‘randomized’ study  there are lots of other predictors that may be associated with the cancer  Examples: smoking alcohol age gender

Fit the model:  Write down the model assume main effects of tobacco, alcohol and their interaction  What is the likelihood function?  What are the MLEs?

How do we interpret the results?  Is there an effect of tobacco?  Is there an effect of alcohol?  Is there an interaction?

Interpreting the interaction  What is the OR for smoker/non-drinker versus a non-smoker/non-drinker?  What is the OR for a smoker/drinker versus a non-smoker/drinker?

How can we assess if the effect of smoking differs by HPV status? 

How likely is it that someone who smokes and drinks will get oropharyngeal cancer?  How can we estimate the chance?

Matched case control studies  References: Hosmer and Lemeshow, Applied Logistic Regression c.pdfhttp://staff.pubhealth.ku.dk/~bxc/SPE.2002/Slides/mc c.pdf Matched-CC.pdfhttp://staff.pubhealth.ku.dk/~bxc/Talks/Nested- Matched-CC.pdf ect35.htmhttp:// ect35.htm (beginning page 5)

Matched design  Matching on important factors is common  OP cancer: age gender  Why? forces the distribution to be the same on those variables removes any effects of those variables on the outcome eliminates confounding

1-to-M matching  For each ‘case’, there is a matched ‘control  Process usually dictates that the case is enrolled, then a control is identified  For particularly rare diseases or when large N is required, often use more than one control per case

Logistic regression for matched case control studies  Recall independence  But, if cases and controls are matched, are they still independent?

Solution: treat each matched set as a stratum  one-to-one matching: 1 case and 1 control per stratum  one-to-M matching: 1 case and M controls per stratum  Logistic model per stratum: within stratum, independence holds.  We assume that the OR for x and y is constant across strata

How many parameters is that?  Assume sample size is 2n and we have 1-to-1 matching:  n strata + p covariates = n+p parameters  This is problematic: as n gets large, so does the number of parameters too many parameters to estimate and a problem of precision  but, do we really care about the strata-specific intercepts?  “NUISANCE PARAMETERS”

Conditional logistic regression  To avoid estimation of the intercepts, we can condition on the study design.  Huh?  Think about each stratum: how many cases and controls? what is the probability that the case is the case and the control is the control? what is the probability that the control is the case and the case the control?  For each stratum, the likelihood contribution is based on this conditional probability

Conditioning  For 1 to 1 matching: with two individuals in stratum k where y indicates case status (1 = case, 0 = control)  Write as a likelihood contribution for stratum k:

Likelihood function for CLR Substitute in our logistic representation of p and simplify:

Likelihood function for CLR  Now, take the product over all the strata for the full likelihood  This is the likelihood for the matched case-control design  Notice: there are no strata-specific parameters cases are defined by subscript ‘1’ and controls by subscript ‘2’  Theory for 1-to-M follows similarly (but not shown here)

Interpretation of β  Same as in ‘standard’ logistic regression  β represents the log odds ratio comparing the risk of disease by a one unit difference in x

When to use matched vs. unmatched?  Some papers use both for a matched design  Tradeoffs: bias precision  Sometimes matched design to ensure balance, but then unmatched analysis  They WILL give you different answers  Gillison paper

Another approach to matched data  use random effects models  CLR is elegant and simple  can identify the estimates using a ‘transformation’ of logistic regression results  But, with new age of computing, we have other approaches  Random effects models: allow strata specific intercepts not problematic estimation process additional assumptions: intercepts follow normal distribution Will NOT give identical results

. xi: clogit control hpv16ser, group(strata) or Iteration 0: log likelihood = Iteration 1: log likelihood = Iteration 2: log likelihood = Iteration 3: log likelihood = Conditional (fixed-effects) logistic regression Number of obs = 300 LR chi2(1) = Prob > chi2 = Log likelihood = Pseudo R2 = control | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] hpv16ser |

. xi: logistic control hpv16ser Logistic regression Number of obs = 300 LR chi2(1) = Prob > chi2 = Log likelihood = Pseudo R2 = control | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] hpv16ser |

. xi: gllamm control hpv16ser, i(strata) family(binomial) number of level 1 units = 300 number of level 2 units = 100 Condition Number = gllamm model log likelihood = control | Coef. Std. Err. z P>|z| [95% Conf. Interval] hpv16ser | _cons | Variances and covariances of random effects ***level 2 (strata) var(1): 4.210e-21 (2.231e-11) OR = 17.63