Problems with infinite solutions in logistic regression

Slides:



Advertisements
Similar presentations
Problems with infinite solutions in logistic regression
Advertisements

Contingency Tables Chapters Seven, Sixteen, and Eighteen Chapter Seven –Definition of Contingency Tables –Basic Statistics –SPSS program (Crosstabulation)
Logistic Regression.
What is Interaction for A Binary Outcome? Chun Li Department of Biostatistics Center for Human Genetics Research September 19, 2007.
Lecture 16: Logistic Regression: Goodness of Fit Information Criteria ROC analysis BMTRY 701 Biostatistical Methods II.
SC968: Panel Data Methods for Sociologists Random coefficients models.
Logistic Regression Example: Horseshoe Crab Data
Introduction to Logistic Regression In Stata Maria T. Kaylen, Ph.D. Indiana Statistical Consulting Center WIM Spring 2014 April 11, 2014, 3:00-4:30pm.
Matched designs Need Matched analysis. Incorrect unmatched analysis. cc cc exp,exact Proportion | Exposed Unexposed | Total Exposed
1 BINARY CHOICE MODELS: PROBIT ANALYSIS In the case of probit analysis, the sigmoid function F(Z) giving the probability is the cumulative standardized.
Multinomial Logit Sociology 8811 Lecture 11 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
Lecture 17: Regression for Case-control Studies BMTRY 701 Biostatistical Methods II.
In previous lecture, we highlighted 3 shortcomings of the LPM. The most serious one is the unboundedness problem, i.e., the LPM may make the nonsense predictions.
Sociology 601 Class 28: December 8, 2009 Homework 10 Review –polynomials –interaction effects Logistic regressions –log odds as outcome –compared to linear.
BIOST 536 Lecture 3 1 Lecture 3 – Overview of study designs Prospective/retrospective  Prospective cohort study: Subjects followed; data collection in.
Sociology 601 Class12: October 8, 2009 The Chi-Squared Test (8.2) – expected frequencies – calculating Chi-square – finding p When (not) to use Chi-squared.
In previous lecture, we dealt with the unboundedness problem of LPM using the logit model. In this lecture, we will consider another alternative, i.e.
A trial of incentives to attend adult literacy classes Carole Torgerson, Greg Brooks, Jeremy Miles, David Torgerson Classes randomised to incentive or.
BIOST 536 Lecture 4 1 Lecture 4 – Logistic regression: estimation and confounding Linear model.
TOBIT ANALYSIS Sometimes the dependent variable in a regression model is subject to a lower limit or an upper limit, or both. Suppose that in the absence.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: dummy classification with more than two categories Original citation:
DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 10) Slideshow: Tobit models Original citation: Dougherty, C. (2012) EC220 - Introduction.
Methods Workshop (3/10/07) Topic: Event Count Models.
1 The Receiver Operating Characteristic (ROC) Curve EPP 245 Statistical Analysis of Laboratory Data.
1 BINARY CHOICE MODELS: PROBIT ANALYSIS In the case of probit analysis, the sigmoid function is the cumulative standardized normal distribution.
Logistic Regression 2 Sociology 8811 Lecture 7 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
Returning to Consumption
Basic epidemiologic analysis with Stata Biostatistics 212 Lecture 5.
How do Lawyers Set fees?. Learning Objectives 1.Model i.e. “Story” or question 2.Multiple regression review 3.Omitted variables (our first failure of.
MultiCollinearity. The Nature of the Problem OLS requires that the explanatory variables are independent of error term But they may not always be independent.
Multinomial Logit Sociology 8811 Lecture 10
Analysis of time-stratified case-crossover studies in environmental epidemiology using Stata Aurelio Tobías Spanish Council for Scientific Research (CSIC),
Basic Biostatistics Prof Paul Rheeder Division of Clinical Epidemiology.
Bandit Thinkhamrop, PhD. (Statistics) Department of Biostatistics and Demography Faculty of Public Health Khon Kaen University, THAILAND.
Count Models 1 Sociology 8811 Lecture 12
Lecture 18 Ordinal and Polytomous Logistic Regression BMTRY 701 Biostatistical Methods II.
BIOST 536 Lecture 1 1 Lecture 1 - Introduction Overview of course  Focus is on binary outcomes  Some ordinal outcomes considered Simple examples Definitions.
The dangers of an immediate use of model based methods The chronic bronchitis study: bronc: 0= no 1=yes poll: pollution level cig: cigarettes smokes per.
Analysis of Experimental Data IV Christoph Engel.
Conditional Logistic Regression Epidemiology/Biostats VHM812/802 Winter 2016, Atlantic Veterinary College, PEI Raju Gautam.
Exact Logistic Regression
1 BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL Economists are often interested in the factors behind the decision-making of individuals or enterprises,
Logistic Regression 2 Sociology 8811 Lecture 7 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
Birthweight (gms) BPDNProp Total BPD (Bronchopulmonary Dysplasia) by birth weight Proportion.
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
1 COMPARING LINEAR AND LOGARITHMIC SPECIFICATIONS When alternative specifications of a regression model have the same dependent variable, R 2 can be used.
Bandit Thinkhamrop, PhD. (Statistics) Department of Biostatistics and Demography Faculty of Public Health Khon Kaen University, THAILAND.
BINARY LOGISTIC REGRESSION
QM222 Nov. 7 Section D1 Multicollinearity Regression Tables What to do next on your project QM222 Fall 2016 Section D1.
Logistic Regression APKC – STATS AFAC (2016).
Advanced Quantitative Techniques
CHAPTER 7 Linear Correlation & Regression Methods
Notes on Logistic Regression
Discussion: Week 4 Phillip Keung.
Advanced Quantitative Techniques
Lecture 18 Matched Case Control Studies
QM222 Class 8 Section A1 Using categorical data in regression
Introduction to Logistic Regression
Multiple logistic regression
STAT120C: Final Review.
Gologit2: Generalized Logistic Regression/ Partial Proportional Odds Models for Ordinal Dependent Variables Part 1: The gologit model & gologit2 program.
Logistic Regression 4 Sociology 8811 Lecture 9
CMGPD-LN Methodological Lecture Day 4
Count Models 2 Sociology 8811 Lecture 13
Common Statistical Analyses Theory behind them
Introduction to Econometrics, 5th edition
Introduction to Econometrics, 5th edition
Presentation transcript:

Problems with infinite solutions in logistic regression Ian White MRC Biostatistics Unit, Cambridge UK Stata Users’ Group London, 12th September 2006 h:\stats\boundary

Introduction I teach logistic regression for the analysis of case-control studies to Epidemiology Master’s students, using Stata I stress how to work out degrees of freedom e.g. if E has 2 levels and M has 4 levels then you get 3 d.f. for testing the E*M interaction Our practical uses data on 244 cases of leprosy and 1027 controls previous BCG vaccination is the exposure of interest level of schooling is a possible effect modifier in what follows I’m ignoring other confounders

Leprosy data -> tabulation of d outcome 0=control, | 1=case | Freq. Percent Cum. ------------+----------------------------------- 0 | 1,027 80.80 80.80 1 | 244 19.20 100.00 Total | 1,271 100.00 -> tabulation of bcg exposure BCG scar | Freq. Percent Cum. Absent | 743 58.46 58.46 Present | 528 41.54 100.00 -> tabulation of school possible effect modifier Schooling | Freq. Percent Cum. 0 | 282 22.19 22.19 1 | 606 47.68 69.87 2 | 350 27.54 97.40 3 | 33 2.60 100.00 lep-bdy.do

Main effects model . xi: logistic d i.bcg i.school i.bcg _Ibcg_0-1 (naturally coded; _Ibcg_0 omitted) i.school _Ischool_0-3 (naturally coded; _Ischool_0 omitted) Logistic regression Number of obs = 1271 LR chi2(4) = 97.50 Prob > chi2 = 0.0000 Log likelihood = -572.86093 Pseudo R2 = 0.0784 ------------------------------------------------------------------------------ d | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _Ibcg_1 | .2908624 .0523636 -6.86 0.000 .204384 .4139314 _Ischool_1 | .7035071 .1197049 -2.07 0.039 .5040026 .9819836 _Ischool_2 | .4029998 .0888644 -4.12 0.000 .2615825 .6208704 _Ischool_3 | .09077 .0933769 -2.33 0.020 .0120863 .6816944 . estimates store main

Interaction model . xi: logistic d i.bcg*i.school i.bcg _Ibcg_0-1 (naturally coded; _Ibcg_0 omitted) i.school _Ischool_0-3 (naturally coded; _Ischool_0 omitted) i.bcg*i.school _IbcgXsch_#_# (coded as above) Logistic regression Number of obs = 1271 LR chi2(7) = 101.43 Prob > chi2 = 0.0000 Log likelihood = -570.90012 Pseudo R2 = 0.0816 ------------------------------------------------------------------------------ d | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _Ibcg_1 | .2248804 .0955358 -3.51 0.000 .0977993 .5170913 _Ischool_1 | .6626409 .1234771 -2.21 0.027 .4599012 .9547549 _Ischool_2 | .4116581 .1027612 -3.56 0.000 .2523791 .6714598 _Ischool_3 | 1.28e-08 1.42e-08 -16.41 0.000 1.46e-09 1.12e-07 _IbcgXsch_~1 | 1.448862 .7046411 0.76 0.446 .5585377 3.758385 _IbcgXsch_~2 | 1.086848 .6226504 0.15 0.884 .3536056 3.340553 _IbcgXsch_~3 | 4.25e+07 . . . . . Note: 17 failures and 0 successes completely determined. . estimates store inter

The problem . table bcg school, by(d) ---------------------------------- 0=control | , 1=case | and BCG | Schooling scar | 0 1 2 3 ----------+----------------------- 0 | Absent | 141 257 129 17 Present | 57 229 182 15 1 | Absent | 77 93 29 Present | 7 27 10 1

LR test . xi: logistic d i.bcg i.school LR chi2(4) = 97.50 Log likelihood = -572.86093 . estimates store main . xi: logistic d i.bcg*i.school LR chi2(7) = 101.43 Log likelihood = -570.90012 . estimates store inter . lrtest main inter Likelihood-ratio test LR chi2(2) = 3.92 (Assumption: main nested in inter) Prob > chi2 = 0.1407

What is Stata doing? (guess) Recognises the information matrix is singular Hence reduces model df by 1 In other situations Stata drops observations if a single variable perfectly predicts success/failure this happens if the problematic cell doesn’t occur in a reference category then Stata refuses to perform lrtest, but we can force it to do so Stata still gets df=2; can use df(3) option

. gen bcgrev=1-bcg . xi: logistic d i.bcgrev*i.school i.bcgrev _Ibcgrev_0-1 (naturally coded; _Ibcgrev_0 omitted) i.school _Ischool_0-3 (naturally coded; _Ischool_0 omitted) i.bcg~v*i.sch~l _IbcgXsch_#_# (coded as above) note: _IbcgXsch_1_3 != 0 predicts failure perfectly _IbcgXsch_1_3 dropped and 17 obs not used Logistic regression Number of obs = 1254 LR chi2(6) = 94.12 Prob > chi2 = 0.0000 Log likelihood = -570.90012 Pseudo R2 = 0.0762 ------------------------------------------------------------------------------ d | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _Ibcgrev_1 | 4.446809 1.889136 3.51 0.000 1.933895 10.22502 _Ischool_1 | .9600749 .4312915 -0.09 0.928 .3980361 2.315729 _Ischool_2 | .4474097 .2307071 -1.56 0.119 .1628482 1.229215 _Ischool_3 | .5428571 .6013396 -0.55 0.581 .0619132 4.75979 _IbcgXsch_~1 | .6901971 .3356713 -0.76 0.446 .2660717 1.79039 _IbcgXsch_~2 | .920092 .5271167 -0.15 0.884 .2993516 2.82801 . est store interrev . lrtest interrev main observations differ: 1254 vs. 1271 r(498); . lrtest interrev main, force Likelihood-ratio test LR chi2(2) = 3.92 (Assumption: main nested in interrev) Prob > chi2 = 0.1407

What’s right? Zero cell suggests small sample so asymptotic c2 distribution may be inappropriate for LRT true in this case: have a bcg*school category with only 1 observation but I’m going to demonstrate the same problem in hypothetical example with expected cell counts > 3 but a zero observed cell count Could combine or drop cells to get rid of zeroes but the cell with zeroes may carry information Problems with testing boundary values are well known e.g. LRT for testing zero variance component isn’t c21 but here the point estimate, not the null value, is on the boundary

Example to explain why LRT makes some sense . tab x y, chi2 exact | y x | 0 1 | Total -----------+----------------------+---------- 0 | 10 20 | 30 1 | 0 10 | 10 Total | 10 30 | 40 Pearson chi2(1) = 4.4444 Pr = 0.035 Fisher's exact = 0.043 1-sided Fisher's exact = 0.035 main2.log

Model: logit P(y=1|x) = a + bx Difference in log lik = 3.4 LRT = 6.8 on 0 df? Data 10 20 \ 0 10 See main2.do

Example to explore correct df using Pearson / Fisher as gold standard . tab x y, chi2 exact | y x | 0 1 | Total -----------+----------------------+---------- 1 | 6 0 | 6 2 | 3 6 | 9 3 | 3 6 | 9 Total | 12 12 | 24 Pearson chi2(2) = 8.0000 Pr = 0.018 Fisher's exact = 0.029 Main3.do All expected counts ≥3 Don’t want to drop or merge category 1 - contains the evidence for association!

. xi: logistic y i.x i.x _Ix_1-3 (naturally coded; _Ix_1 omitted) Logistic regression Number of obs = 24 LR chi2(2) = 10.36 Prob > chi2 = 0.0056 Log likelihood = -11.457255 Pseudo R2 = 0.3113 ------------------------------------------------------------------------------ y | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _Ix_2 | 1.61e+08 1.61e+08 18.90 0.000 2.27e+07 1.14e+09 _Ix_3 | 1.61e+08 . . . . . Note: 6 failures and 0 successes completely determined. . est store x . xi: logistic y LR chi2(0) = 0.00 Prob > chi2 = . Log likelihood = -16.635532 Pseudo R2 = 0.0000 . est store null

LRT . xi: logistic y i.x Log likelihood = -11.457255 . est store x . est store null . lrtest x null Likelihood-ratio test LR chi2(1) = 10.36 (Assumption: null nested in x) Prob > chi2 = 0.0013

Clearly LRT isn’t great. But 1df is even worse than 2df Comparison of tests | y x | 0 1 | Total -----------+----------------------+---------- 1 | 6 0 | 6 2 | 3 6 | 9 3 | 3 6 | 9 Total | 12 12 | 24 Pearson chi2(2) = 8.0000 P = 0.018 Fisher's exact = P = 0.029 LR chi2(1) = 10.36 P = 0.0013 (using 2df: P = 0.0056) Clearly LRT isn’t great. But 1df is even worse than 2df

Note In this example, we could use Pearson / Fisher as gold standard. Can’t do this in more complex examples (e.g. adjust for several covariates).

My proposal for Stata lrtest appears to adjust df for infinite parameter estimates: it should not Model df should be incremented to allow for any variables dropped because they perfectly predict success/failure Don’t need to increment log lik as it is 0 for the cases dropped Can the ad hoc handling of zeroes by xi:logistic be improved?

Conclusions for statisticians Must remember the c2 approximation is still poor for these LRTs typically anti-conservative? (Kuss, 2002) Performance of LRT can be improved by using penalised likelihood (Firth, 1993; Bull, 2006) - like a mildly informative prior worth using routinely? Gold standard: Bayes or exact logistic regression (logXact)?

The end

Output for example with 2-level x . logit y x Log likelihood = -19.095425 ------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _cons | .6931472 .3872983 1.79 0.074 -.0659436 1.452238 . estimates store x . logit y Log likelihood = -22.493406 _cons | 1.098612 .3651484 3.01 0.003 .3829346 1.81429 . estimates store null . lrtest x null df(unrestricted) = df(restricted) = 1 r(498); . lrtest x null, force df(1) Likelihood-ratio test LR chi2(1) = 6.80 (Assumption: null nested in x) Prob > chi2 = 0.0091 main2.log