Lecture 18 Matched Case Control Studies

Slides:



Advertisements
Similar presentations
Logistic Regression Psy 524 Ainsworth.
Advertisements

Logistic Regression I Outline Introduction to maximum likelihood estimation (MLE) Introduction to Generalized Linear Models The simplest logistic regression.
What is Interaction for A Binary Outcome? Chun Li Department of Biostatistics Center for Human Genetics Research September 19, 2007.
Lecture 16: Logistic Regression: Goodness of Fit Information Criteria ROC analysis BMTRY 701 Biostatistical Methods II.
SC968: Panel Data Methods for Sociologists Random coefficients models.
Lecture 4 (Chapter 4). Linear Models for Correlated Data We aim to develop a general linear model framework for longitudinal data, in which the inference.
Matched designs Need Matched analysis. Incorrect unmatched analysis. cc cc exp,exact Proportion | Exposed Unexposed | Total Exposed
Repeated Measures, Part 3 May, 2009 Charles E. McCulloch, Division of Biostatistics, Dept of Epidemiology and Biostatistics, UCSF.
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem.
1 BINARY CHOICE MODELS: PROBIT ANALYSIS In the case of probit analysis, the sigmoid function F(Z) giving the probability is the cumulative standardized.
Multilevel Models 4 Sociology 8811, Class 26 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
Multinomial Logit Sociology 8811 Lecture 11 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
Lecture 17: Regression for Case-control Studies BMTRY 701 Biostatistical Methods II.
In previous lecture, we highlighted 3 shortcomings of the LPM. The most serious one is the unboundedness problem, i.e., the LPM may make the nonsense predictions.
Sociology 601 Class 28: December 8, 2009 Homework 10 Review –polynomials –interaction effects Logistic regressions –log odds as outcome –compared to linear.
BIOST 536 Lecture 3 1 Lecture 3 – Overview of study designs Prospective/retrospective  Prospective cohort study: Subjects followed; data collection in.
1 Review of Correlation A correlation coefficient measures the strength of a linear relation between two measurement variables. The measure is based on.
So far, we have considered regression models with dummy variables of independent variables. In this lecture, we will study regression models whose dependent.
In previous lecture, we dealt with the unboundedness problem of LPM using the logit model. In this lecture, we will consider another alternative, i.e.
Event History Models Sociology 229: Advanced Regression Class 5
BIOST 536 Lecture 4 1 Lecture 4 – Logistic regression: estimation and confounding Linear model.
TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT This sequence describes the testing of a hypotheses relating to regression coefficients. It is.
BINARY CHOICE MODELS: LOGIT ANALYSIS
Christopher Dougherty EC220 - Introduction to econometrics (chapter 10) Slideshow: binary choice logit models Original citation: Dougherty, C. (2012) EC220.
Logistic Regression II Simple 2x2 Table (courtesy Hosmer and Lemeshow) Exposure=1Exposure=0 Disease = 1 Disease = 0.
1 G Lect 11W Logistic Regression Review Maximum Likelihood Estimates Probit Regression and Example Model Fit G Multiple Regression Week 11.
Methods Workshop (3/10/07) Topic: Event Count Models.
1 BINARY CHOICE MODELS: PROBIT ANALYSIS In the case of probit analysis, the sigmoid function is the cumulative standardized normal distribution.
Biostatistics Case Studies 2005 Peter D. Christenson Biostatistician Session 4: Taking Risks and Playing the Odds: OR vs.
Basic epidemiologic analysis with Stata Biostatistics 212 Lecture 5.
Lecture 3: Inference in Simple Linear Regression BMTRY 701 Biostatistical Methods II.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.
Lecture 3 Linear random intercept models. Example: Weight of Guinea Pigs Body weights of 48 pigs in 9 successive weeks of follow-up (Table 3.1 DLZ) The.
Example: Bioassay experiment Problem statement –Observations: At each level of dose, 5 animals are tested, and number of death are observed.
Lecture 18 Ordinal and Polytomous Logistic Regression BMTRY 701 Biostatistical Methods II.
Multiple Logistic Regression STAT E-150 Statistical Methods.
Bayes Theorem. Prior Probabilities On way to party, you ask “Has Karl already had too many beers?” Your prior probabilities are 20% yes, 80% no.
The dangers of an immediate use of model based methods The chronic bronchitis study: bronc: 0= no 1=yes poll: pollution level cig: cigarettes smokes per.
Logistic Regression Analysis Gerrit Rooks
Conditional Logistic Regression Epidemiology/Biostats VHM812/802 Winter 2016, Atlantic Veterinary College, PEI Raju Gautam.
Exact Logistic Regression
1 Ordinal Models. 2 Estimating gender-specific LLCA with repeated ordinal data Examining the effect of time invariant covariates on class membership The.
Logistic Regression and Odds Ratios Psych DeShon.
Birthweight (gms) BPDNProp Total BPD (Bronchopulmonary Dysplasia) by birth weight Proportion.
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
Chapter 13 LOGISTIC REGRESSION. Set of independent variables Categorical outcome measure, generally dichotomous.
BINARY LOGISTIC REGRESSION
Statistical Modelling
Logistic Regression APKC – STATS AFAC (2016).
From t-test to multilevel analyses Del-2
assignment 7 solutions ► office networks ► super staffing
Probability Theory and Parameter Estimation I
CHAPTER 7 Linear Correlation & Regression Methods
Discussion: Week 4 Phillip Keung.
Sec 9C – Logistic Regression and Propensity scores
Event History Analysis 3
Maximum Likelihood Estimation
Introduction to Logistic Regression
Multiple logistic regression
Statistical Assumptions for SLR
Simple Linear Regression
Problems with infinite solutions in logistic regression
Analysis of time-stratified case-crossover studies in environmental epidemiology using Stata Aurelio Tobías Spanish Council for Scientific Research (CSIC),
CMGPD-LN Methodological Lecture Day 4
Count Models 2 Sociology 8811 Lecture 13
Logistic Regression.
Presentation transcript:

Lecture 18 Matched Case Control Studies BMTRY 701 Biostatistical Methods II

Matched case control studies References: Hosmer and Lemeshow, Applied Logistic Regression http://staff.pubhealth.ku.dk/~bxc/SPE.2002/Slides/mcc.pdf http://staff.pubhealth.ku.dk/~bxc/Talks/Nested-Matched-CC.pdf http://www.tau.ac.il/cc/pages/docs/sas8/stat/chap49/sect35.htm http://www.ats.ucla.edu/stat/sas/library/logistic.pdf (beginning page 5)

Matched design Matching on important factors is common OP cancer: Why? age gender Why? forces the distribution to be the same on those variables removes any effects of those variables on the outcome eliminates confounding

1-to-M matching For each ‘case’, there is a matched ‘control Process usually dictates that the case is enrolled, then a control is identified For particularly rare diseases or when large N is required, often use more than one control per case

Logistic regression for matched case control studies Recall independence But, if cases and controls are matched, are they still independent?

Solution: treat each matched set as a stratum one-to-one matching: 1 case and 1 control per stratum one-to-M matching: 1 case and M controls per stratum Logistic model per stratum: within stratum, independence holds. We assume that the OR for x and y is constant across strata

How many parameters is that? Assume sample size is 2n and we have 1-to-1 matching: n strata + p covariates = n+p parameters This is problematic: as n gets large, so does the number of parameters too many parameters to estimate and a problem of precision but, do we really care about the strata-specific intercepts? “NUISANCE PARAMETERS”

Conditional logistic regression To avoid estimation of the intercepts, we can condition on the study design. Huh? Think about each stratum: how many cases and controls? what is the probability that the case is the case and the control is the control? what is the probability that the control is the case and the case the control? For each stratum, the likelihood contribution is based on this conditional probability

Conditioning For 1 to 1 matching: with two individuals in stratum k where y indicates case status (1 = case, 0 = control) Write as a likelihood contribution for stratum k:

Likelihood function for CLR Substitute in our logistic representation of p and simplify:

Likelihood function for CLR Now, take the product over all the strata for the full likelihood This is the likelihood for the matched case-control design Notice: there are no strata-specific parameters cases are defined by subscript ‘1’ and controls by subscript ‘2’ Theory for 1-to-M follows similarly (but not shown here)

Interpretation of β Same as in ‘standard’ logistic regression β represents the log odds ratio comparing the risk of disease by a one unit difference in x

When to use matched vs. unmatched? Some papers use both for a matched design Tradeoffs: bias precision Sometimes matched design to ensure balance, but then unmatched analysis They WILL give you different answers Gillison paper

Another approach to matched data use random effects models CLR is elegant and simple can identify the estimates using a ‘transformation’ of logistic regression results But, with new age of computing, we have other approaches Random effects models: allow strata specific intercepts not problematic estimation process additional assumptions: intercepts follow normal distribution Will NOT give identical results

. xi: clogit control hpv16ser, group(strata) or Iteration 0: log likelihood = -72.072957 Iteration 1: log likelihood = -71.803221 Iteration 2: log likelihood = -71.798737 Iteration 3: log likelihood = -71.798736 Conditional (fixed-effects) logistic regression Number of obs = 300 LR chi2(1) = 76.12 Prob > chi2 = 0.0000 Log likelihood = -71.798736 Pseudo R2 = 0.3465 ------------------------------------------------------------------------------ control | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- hpv16ser | 13.16616 4.988492 6.80 0.000 6.26541 27.66742

. xi: logistic control hpv16ser Logistic regression Number of obs = 300 LR chi2(1) = 90.21 Prob > chi2 = 0.0000 Log likelihood = -145.8514 Pseudo R2 = 0.2362 ------------------------------------------------------------------------------ control | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- hpv16ser | 17.6113 6.039532 8.36 0.000 8.992582 34.4904

OR = 17.63 . xi: gllamm control hpv16ser, i(strata) family(binomial) number of level 1 units = 300 number of level 2 units = 100 Condition Number = 2.4968508 gllamm model log likelihood = -145.8514 ------------------------------------------------------------------------------ control | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- hpv16ser | 2.868541 .3429353 8.36 0.000 2.1964 3.540681 _cons | -1.464547 .1692104 -8.66 0.000 -1.796193 -1.1329 Variances and covariances of random effects ***level 2 (strata) var(1): 4.210e-21 (2.231e-11) OR = 17.63