Exact Logistic Regression

Slides:



Advertisements
Similar presentations
Regression and correlation methods
Advertisements

Brief introduction on Logistic Regression
Logistic Regression.
1 Contingency Tables: Tests for independence and homogeneity (§10.5) How to test hypotheses of independence (association) and homogeneity (similarity)
1 Introduction to Inference Confidence Intervals William P. Wattles, Ph.D. Psychology 302.
Overview of Logistics Regression and its SAS implementation
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
Lecture 17: Regression for Case-control Studies BMTRY 701 Biostatistical Methods II.
Exact Logistic Regression Larry Cook. Outline Review the logistic regression model Explore an example where model assumptions fail –Brief algebraic interlude.
April 25 Exam April 27 (bring calculator with exp) Cox-Regression
Sociology 601: Class 5, September 15, 2009
Sociology 601 Class 28: December 8, 2009 Homework 10 Review –polynomials –interaction effects Logistic regressions –log odds as outcome –compared to linear.
BIOST 536 Lecture 3 1 Lecture 3 – Overview of study designs Prospective/retrospective  Prospective cohort study: Subjects followed; data collection in.
BIOST 536 Lecture 9 1 Lecture 9 – Prediction and Association example Low birth weight dataset Consider a prediction model for low birth weight (< 2500.
Sociology 601 Class12: October 8, 2009 The Chi-Squared Test (8.2) – expected frequencies – calculating Chi-square – finding p When (not) to use Chi-squared.
In previous lecture, we dealt with the unboundedness problem of LPM using the logit model. In this lecture, we will consider another alternative, i.e.
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
BINARY CHOICE MODELS: LOGIT ANALYSIS
Logistic regression for binary response variables.
Logistic Regression In logistic regression the outcome variable is binary, and the purpose of the analysis is to assess the effects of multiple explanatory.
Logistic Regression II Simple 2x2 Table (courtesy Hosmer and Lemeshow) Exposure=1Exposure=0 Disease = 1 Disease = 0.
Biostat 200 Lecture 8 1. Hypothesis testing recap Hypothesis testing – Choose a null hypothesis, one-sided or two sided test – Set , significance level,
EFFECT SIZE Parameter used to compare results of different studies on the same scale in which a common effect of interest (response variable) has been.
Methods Workshop (3/10/07) Topic: Event Count Models.
1 BINARY CHOICE MODELS: PROBIT ANALYSIS In the case of probit analysis, the sigmoid function is the cumulative standardized normal distribution.
Topic 5 Statistical inference: point and interval estimate
April 11 Logistic Regression –Modeling interactions –Analysis of case-control studies –Data presentation.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.
Bandit Thinkhamrop, PhD. (Statistics) Department of Biostatistics and Demography Faculty of Public Health Khon Kaen University, THAILAND.
Applied Epidemiologic Analysis - P8400 Fall 2002
AN INTRODUCTION TO LOGISTIC REGRESSION ENI SUMARMININGSIH, SSI, MM PROGRAM STUDI STATISTIKA JURUSAN MATEMATIKA UNIVERSITAS BRAWIJAYA.
1 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה פרופ’ בנימין רייזר פרופ’ דוד פרג’י גב’ אפרת ישכיל.
LOGISTIC REGRESSION A statistical procedure to relate the probability of an event to explanatory variables Used in epidemiology to describe and evaluate.
Linear correlation and linear regression + summary of tests
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Biostat 200 Lecture 8 1. The test statistics follow a theoretical distribution (t stat follows the t distribution, F statistic follows the F distribution,
Lecture 18 Ordinal and Polytomous Logistic Regression BMTRY 701 Biostatistical Methods II.
Biostat 200 Lecture 8 1. Where are we Types of variables Descriptive statistics and graphs Probability Confidence intervals for means and proportions.
1 Follow the three R’s: Respect for self, Respect for others and Responsibility for all your actions.
1 G Lect 7a G Lecture 7a Comparing proportions from independent samples Analysis of matched samples Small samples and 2  2 Tables Strength.
The dangers of an immediate use of model based methods The chronic bronchitis study: bronc: 0= no 1=yes poll: pollution level cig: cigarettes smokes per.
Applied Epidemiologic Analysis - P8400 Fall 2002 Labs 6 & 7 Case-Control Analysis ----Logistic Regression Henian Chen, M.D., Ph.D.
Logistic regression (when you have a binary response variable)
Logistic Regression Saed Sayad 1www.ismartsoft.com.
1 Probability and Statistics Confidence Intervals.
Conditional Logistic Regression Epidemiology/Biostats VHM812/802 Winter 2016, Atlantic Veterinary College, PEI Raju Gautam.
1 Ordinal Models. 2 Estimating gender-specific LLCA with repeated ordinal data Examining the effect of time invariant covariates on class membership The.
THE CHI-SQUARE TEST BACKGROUND AND NEED OF THE TEST Data collected in the field of medicine is often qualitative. --- For example, the presence or absence.
Categorical Data Analysis
Nonparametric Statistics
Birthweight (gms) BPDNProp Total BPD (Bronchopulmonary Dysplasia) by birth weight Proportion.
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
Logistic Regression Logistic Regression - Binary Response variable and numeric and/or categorical explanatory variable(s) –Goal: Model the probability.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Bandit Thinkhamrop, PhD. (Statistics) Department of Biostatistics and Demography Faculty of Public Health Khon Kaen University, THAILAND.
Logistic Regression APKC – STATS AFAC (2016).
CHAPTER 7 Linear Correlation & Regression Methods
Lecture 18 Matched Case Control Studies
Introduction to Logistic Regression
STAT120C: Final Review.
ביצוע רגרסיה לוגיסטית. פרק ה-2
Logistic Regression.
Problems with infinite solutions in logistic regression
Count Models 2 Sociology 8811 Lecture 13
Categorical Data Analysis
Common Statistical Analyses Theory behind them
Logistic Regression.
Presentation transcript:

Exact Logistic Regression Epidemiology/Biostatistics VHM-812/802, Winter 2016, Atlantic Vet. College, PEI Raju Gautam

Purpose Use with sparse data Why Ordinary logistic regression (OLS) may not be appropriate? Testing and inference is based on large sample size Normality assumption for parameter estimation Wald test follows normal distribution Likelihood Ratio Test (LRT) follows Chi-square distribution

Fisher’ exact test - overview Similar to Chi-square, more accurate for small sample size Example data: “lbw.dta” low birth weight data Effect of history of premature labour and smoking on low birth weight Smoking 1 LBW Conditional probability: P(LBW+|smoking status) knowing that 4 out of 27 women are LBW+ and 2 out of 6 are smokers (smoke=1). 19 4 2 23 1 4 21 6 27

Exact probability Given by hypergeometric distribution Smoking Smoking 1 LBW 1 Row total a b a+b c d c+d C. total a+c b+d a+b+c+d (=n) 19 4 2 23 LBW 1 4 21 6 27 𝑝= 𝑎+𝑏 𝑎 𝑐+𝑑 𝑑 𝑛 𝑎+𝑐 = 𝑎+𝑏 ! 𝑐+𝑑 ! 𝑎+𝑐 ! 𝑏+𝑑 ! 𝑎!𝑏!𝑐!𝑑!𝑛! 𝟏𝟗+𝟒 ! 𝟐+𝟐 ! 𝟏𝟗+𝟐 ! 𝟒+𝟐 ! 𝟏𝟗!𝟒!𝟐!𝟐! =𝟎.𝟏𝟕𝟗𝟒𝟖𝟕𝟐 Probability that women who smoked had babies with LBW

Example using STATA hypergeometricp function hypergeometricp(N,K,n,k) N = sample size K = subjects with attribute of interest (eg. SMOKE = 1) N = subjects with outcome (event) of interest (eg LBW+) K = # of successes out of K di hypergeometricp(27,6,4,2) 0.17948718

Computing P Value Compute sufficient statistic Observed sufficient statistic 𝑂𝑏𝑠 𝑠𝑢𝑓𝑓 = 𝑖=1 27 𝐿𝑜𝑤 1 × 𝑃𝑇𝐿 1 =2 Possible values of sufficient statistics: 0,1,2,3,4 Create distribution of j possible sufficient statistics Number of possible allocation of 23 zeros and 4 ones to 27 subjects

P value… Suff. Counts Prob. H0 true 5985 0.341 5985 0.341 Pr. obs. 0 PTL+ and 4 PTL- in LBW+ 1 7980 0.455 Pr. obs. 1 PTL+ and 3 PTL- in LBW+ 2 3150 0.179 Pr. obs. 2 PTL+ and 2 PTL- in LBW+ 3 420 0.024 Pr. obs. 3 PTL+ and 1 PTL- in LBW+ 4 15 0.001 Pr. obs. 4 PTL+ and 0 PTL- in LBW+ Total 17550 Test the hypothesis β1 = 0 Calculate P value by summing the probabilities over values of the Suff. Statistic that are as likely or less likely to have smaller probability than the Obssuff. = 2 P = 0.179+0.024+0.001 = 0.204

P value using STATA . tab low ptl, exact | History of premature Low birth | labor weight | None One | Total -----------+----------------------+---------- 0 | 19 4 | 23 1 | 2 2 | 4 Total | 21 6 | 27 Fisher's exact = 0.204 1-sided Fisher's exact = 0.204 Conclusion: There is not enough evidence to support that having a history of pre-term delivery increases the risk of low birth weight.

Exact logistic Extends Fisher’s idea Computes estimates and confidence interval of each parameter separately Allows addition of covariates CMLE: Conditional Maximum Likelihood Estimates Uses computationally intensive algorithm

Exact logistic regression Number of obs = 27 Model score = 2.018634 Pr >= score = 0.2043 ------------------------------------------------------------------ low | Odds Ratio Suff. 2*Pr(Suff.) [95% Conf. Interval] ----+------------------------------------------------------------- ptl | 4.402267 2 0.4085 .2507705 79.01123 P value using 2*Pr(Suff.) is in error (Hosmer et.al. Applied Logistic Reg. 2013) Compare with Ordinary Logistic Regression . logistic low ptl Logistic regression Number of obs = 27 LR chi2(1) = 1.81 Prob > chi2 = 0.1791 Log likelihood = -10.423421 Pseudo R2 = 0.0797 ----------------------------------------------------------------- low | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] +---------------------------------------------------------------- ptl | 4.75 5.421312 1.37 0.172 .5072157 44.48304 _cons | .1052632 .0782518 -3.03 0.002 .0245188 .4519108 ------------------------------------------------------------------

Why is the exact logistic OR different from OLR? Inference by exact uses cMLE Eliminate α by conditioning on observed value of its sufficient statistic 𝑚= 𝑗=1 𝑛 𝑦 𝑗. Conditional likelihood 𝑃 𝑦 𝑚 = exp⁡( 𝑗=1 𝑛 𝑦 𝑗 𝑋 ′ 𝑗 𝛽) 𝑅 (𝑒𝑥𝑝 𝑗=1 𝑛 𝑦 𝑗 𝑋 ′ 𝑗 𝛽) (1) where, R = {(y1, y2, …, yn): 𝑗=1 𝑛 𝑦 𝑗 =𝑚}

Why is the exact OR diff…. From equation (1) The p Х 1 vector of sufficient statistics for β 𝑡= 𝑗=1 𝑛 𝑦 𝑗 𝑥 𝑗 (2) with its distribution 𝑃 𝑇 1 = 𝑡 1 , …, 𝑇 𝑝 = 𝑡 𝑝 = 𝑐(𝑡) 𝑒 𝑡′𝛽 𝑢 𝑐(𝑢) 𝑒 𝑢′𝛽 , where 𝑐 𝑡 =|{ 𝑦1,𝑦2,…,𝑦𝑛 : 𝑗=1 𝑛 𝑦 𝑗 =𝑚, 𝑗=1 𝑛 𝑦 𝑗 𝑥 𝑖𝑗 = 𝑡 𝑖 , 𝑖=1,2,…,𝑝 }| The summation in the denominator is over all u for which c(u) ≥ 1. 𝑃 𝑇 1 = 𝑡 1 = 𝑐( 𝑡 1 ) 𝑒 𝑡 1 ′𝛽1 𝑢 𝑐(𝑢) 𝑒 𝑢′𝛽1 In our case, point estimate is estimated by maximizing

Robust Standard Errors . logistic low ptl, robust Logistic regression Number of obs = 27 Wald chi2(1) = 1.79 Prob > chi2 = 0.1803 Log pseudolikelihood = -10.423421 Pseudo R2 = 0.0797 ------------------------------------------------------------------ | Robust low | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -+---------------------------------------------------------------- ptl | 4.75 5.524584 1.34 0.180 .486056 46.41955 _cons | .1052632 .0797424 -2.97 0.003 .0238477 .4646294 Confidence interval wider Uncertainty due to small sample size

Zero count Table containing cell with zero frequency Cross classify smoking status vs LBW . tab low smoke, chi | Smoking status during Low birth | pregnancy weight | no yes | Total -----------+----------------------+---------- 0 | 17 6 | 23 1 | 0 4 | 4 Total | 17 10 | 27 Pearson chi2(1) = 7.9826 Pr = 0.005 Suffobs = Suffmin -> Lower limit = - Inf Suffobs = Suffmax -> Upper limit = + Inf

Median Unbiased Estimator Exact logistic regression Number of obs = 27 Model score = 7.686957 Pr >= score = 0.0120 ---------------------------------------------------------------- low | Odds Ratio Suff. 2*Pr(Suff.) [95% Conf. Interval] --+------------------------------------------------------------- smoke | 12.30305* 4 0.0239 1.361276 +Inf ----------------------------------------------------------------- (*) median unbiased estimates (MUE) In situations when Suffobs = Suffmin OR Suffobs = Suffmax Coefficient is estimated using MUE (Hirji et. Al. 1989)

An example from VER book Data: Nocardia (Demonstration) Variables: casecont: case or control status of herd (outcome) dcpct: % of cows treated with dry-cow treatments dneo: use of neomycin dclox: use of cloxacillin dbarn: barn type (categorical variable) Predictor “dcpct” was included in the model but conditioned out