Limited Dependent Variables Ciaran S. Phibbs. Limited Dependent Variables 0-1, small number of options, small counts, etc. 0-1, small number of options,

Slides:



Advertisements
Similar presentations
Dummy Dependent variable Models
Advertisements

Econometrics I Professor William Greene Stern School of Business
Logistic Regression Psy 524 Ainsworth.
Discrete Choice Modeling William Greene Stern School of Business IFS at UCL February 11-13, 2004
Longitudinal and Multilevel Methods for Models with Discrete Outcomes with Parametric and Non-Parametric Corrections for Unobserved Heterogeneity David.
[Part 1] 1/15 Discrete Choice Modeling Econometric Methodology Discrete Choice Modeling William Greene Stern School of Business New York University 0Introduction.
April 25 Exam April 27 (bring calculator with exp) Cox-Regression
1Prof. Dr. Rainer Stachuletz Limited Dependent Variables P(y = 1|x) = G(  0 + x  ) y* =  0 + x  + u, y = max(0,y*)
x – independent variable (input)
Binary Response Lecture 22 Lecture 22.

QUALITATIVE AND LIMITED DEPENDENT VARIABLE MODELS.
Clustered or Multilevel Data
An Introduction to Logistic Regression JohnWhitehead Department of Economics Appalachian State University.
In previous lecture, we dealt with the unboundedness problem of LPM using the logit model. In this lecture, we will consider another alternative, i.e.
Topic 3: Regression.
Lecture 14-2 Multinomial logit (Maddala Ch 12.2)
An Introduction to Logistic Regression
Basic Statistical Concepts Donald E. Mercante, Ph.D. Biostatistics School of Public Health L S U - H S C.
Multivariate Probability Distributions. Multivariate Random Variables In many settings, we are interested in 2 or more characteristics observed in experiments.
Generalized Linear Models
Discrete Choice Modeling William Greene Stern School of Business New York University.
MODELS OF QUALITATIVE CHOICE by Bambang Juanda.  Models in which the dependent variable involves two ore more qualitative choices.  Valuable for the.
1 G Lect 11W Logistic Regression Review Maximum Likelihood Estimates Probit Regression and Example Model Fit G Multiple Regression Week 11.
Biostatistics Case Studies 2005 Peter D. Christenson Biostatistician Session 4: Taking Risks and Playing the Odds: OR vs.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Review and Preview This chapter combines the methods of descriptive statistics presented in.
POTH 612A Quantitative Analysis Dr. Nancy Mayo. © Nancy E. Mayo A Framework for Asking Questions Population Exposure (Level 1) Comparison Level 2 OutcomeTimePECOT.
Excepted from HSRP 734: Advanced Statistical Methods June 5, 2008.
Multilevel Data in Outcomes Research Types of multilevel data common in outcomes research Random versus fixed effects Statistical Model Choices “Shrinkage.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Limited Dependent Variables Ciaran S. Phibbs May 30, 2012.
Different Distributions David Purdie. Topics Application of GEE to: Binary outcomes: – logistic regression Events over time (rate): –Poisson regression.
Copyright © 2014 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Contingency tables Brian Healy, PhD. Types of analysis-independent samples OutcomeExplanatoryAnalysis ContinuousDichotomous t-test, Wilcoxon test ContinuousCategorical.
Multilevel Modeling Software Wayne Osgood Crime, Law & Justice Program Department of Sociology.
Forecasting Choices. Types of Variable Variable Quantitative Qualitative Continuous Discrete (counting) Ordinal Nominal.
Nested Logit Model by Asif Khan Phd Graduate Seminar in advance Statistics Institute of Rural Development (IRE) Georg-August University Goettingen July.
Maximum Likelihood Estimation Methods of Economic Investigation Lecture 17.
“Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and Magnus Lindelow, The World Bank, Washington.
MBP1010 – Lecture 8: March 1, Odds Ratio/Relative Risk Logistic Regression Survival Analysis Reading: papers on OR and survival analysis (Resources)
Jennifer Lewis Priestley Presentation of “Assessment of Evaluation Methods for Prediction and Classification of Consumer Risk in the Credit Industry” co-authored.
Issues in Estimation Data Generating Process:
Discrete Choice Modeling William Greene Stern School of Business New York University.
Right Hand Side (Independent) Variables Ciaran S. Phibbs June 6, 2012.
Right Hand Side (Independent) Variables Ciaran S. Phibbs.
Evaluating Risk Adjustment Models Andy Bindman MD Department of Medicine, Epidemiology and Biostatistics.
Meeghat Habibian Analysis of Travel Choice Transportation Demand Analysis Lecture note.
Discrete Choice Modeling William Greene Stern School of Business New York University.
Exam 2: Rules Section 2.1 Bring a cheat sheet. One page 2 sides. Bring a calculator. Bring your book to use the tables in the back.
Robust Regression. Regression Methods  We are going to look at three approaches to robust regression:  Regression with robust standard errors  Regression.
Qualitative and Limited Dependent Variable Models ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.
Analysis of Experimental Data IV Christoph Engel.
 Binary models Logit and Probit  Binary models with correlation (multivariate)  Multinomial non ordered  Ordered models (rankings)  Count models.
The Probit Model Alexander Spermann University of Freiburg SS 2008.
Logit Models Alexander Spermann, University of Freiburg, SS Logit Models.
Instructor: R. Makoto 1richard makoto UZ Econ313 Lecture notes.
Econometric analysis of CVM surveys. Estimation of WTP The information we have depends on the elicitation format. With the open- ended format it is relatively.
Non-Linear Dependent Variables Ciaran S. Phibbs November 17, 2010.
Logistic Regression: Regression with a Binary Dependent Variable.
The Probit Model Alexander Spermann University of Freiburg SoSe 2009
Statistical Modelling
William Greene Stern School of Business New York University
M.Sc. in Economics Econometrics Module I
THE LOGIT AND PROBIT MODELS
Generalized Linear Models
Generalized Linear Models
THE LOGIT AND PROBIT MODELS
EC 331 The Theory of and applications of Maximum Likelihood Method
MPHIL AdvancedEconometrics
Chapter 6 Logistic Regression: Regression with a Binary Dependent Variable Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall.
Presentation transcript:

Limited Dependent Variables Ciaran S. Phibbs

Limited Dependent Variables 0-1, small number of options, small counts, etc. 0-1, small number of options, small counts, etc. Non-linear in this case really means that the dependent variable is not continuous, or even close to continuous. Non-linear in this case really means that the dependent variable is not continuous, or even close to continuous.

Outline Binary Choice Binary Choice Multinomial Choice Multinomial Choice Counts Counts Most models in general framework of probability models Most models in general framework of probability models –Prob (event/occurs)

Basic Problems Heteroscedastic error terms Heteroscedastic error terms Predictions not constrained to match actual outcomes Predictions not constrained to match actual outcomes

Y i = β o + βX + ε i Y i =0 if lived, Y i =1 if died Prob (Y i =1) = F(X,  ) Prob (Y i =0) = 1 – F(X,  ) OLS, also called a linear probability model  i is heteroscedastic, depends on  i is heteroscedastic, depends on  Predictions not constrained to (0,1)

Binary Outcomes Common in Health Care Mortality Mortality Other outcome Other outcome –Infection –Patient safety event –Rehospitalization <30 days Decision to seek medical care Decision to seek medical care

Standard Approaches to Binary Choice-1 Logistic regression Logistic regression

Advantages of Logistic Regression Designed for relatively rare events Designed for relatively rare events Commonly used in health care; most readers can interpret an odds ratio Commonly used in health care; most readers can interpret an odds ratio

Standard Approaches to Binary Choice-2 Probit regression (classic example is decision to make a large purchase) Probit regression (classic example is decision to make a large purchase) y* =  y* =  X +  y=1 if y* >0 y=0 if y* ≤0

Binary Choice There are other methods, using other distributions. There are other methods, using other distributions. In general, logistic and probit give about the same answer. In general, logistic and probit give about the same answer. It used to be a lot easier to calculate marginal effects with probit, not so any more It used to be a lot easier to calculate marginal effects with probit, not so any more

Odds Ratios vs. Relative Risks Standard method of interpreting logistic regression is odds ratios. Standard method of interpreting logistic regression is odds ratios. Convert to % effect, really relative risk Convert to % effect, really relative risk This approximation starts to break down at 10% outcome incidence This approximation starts to break down at 10% outcome incidence

Can Convert OR to RR Zhang J, Yu KF. What’s the Relative Risk? A Method of Correcting the Odds Ratio in Cohort Studies of Common Outcomes. JAMA 1998;280(19): Zhang J, Yu KF. What’s the Relative Risk? A Method of Correcting the Odds Ratio in Cohort Studies of Common Outcomes. JAMA 1998;280(19): RR = OR. RR = OR. (1-P 0 ) + (P 0 x OR) (1-P 0 ) + (P 0 x OR) Where P 0 is the sample probability of the outcome

Effect of Correction for RR From Phibbs et al., NEJM 5/24/2007,  20% mortality Odds Ratio Calculated RR

Extensions Panel data, can now estimate both random effects and fixed effects models. The Stata manual lists 34 related estimation commands Panel data, can now estimate both random effects and fixed effects models. The Stata manual lists 34 related estimation commands All kinds of variations. All kinds of variations. –Panel data –Grouped data

Extensions Goodness of fit tests. Several tests. Goodness of fit tests. Several tests. Probably the most commonly reported statistics are: Probably the most commonly reported statistics are: –Area under ROC curve, c-statistic in SAS output. Range 0.50 to 1.0. –Hosmer-Lemeshow test –NEJM paper, c=0.86, H-L p=0.34

More on Hosmer-Lemeshow Test The H-L test breaks the sample up into n (usually 10, some programs (Stata) let you vary this) equal groups and compares the number of observed and expected events in each group. The H-L test breaks the sample up into n (usually 10, some programs (Stata) let you vary this) equal groups and compares the number of observed and expected events in each group. If your model predicts well, the events will be concentrated in the highest risk groups; most can be in the highest risk group. If your model predicts well, the events will be concentrated in the highest risk groups; most can be in the highest risk group. Alternate specification, divide the sample so that the events are split into equal groups. Alternate specification, divide the sample so that the events are split into equal groups.

Multinomial Choice What if more than one choice or outcome? What if more than one choice or outcome? Options are more limited Options are more limited –Multivariable Probit (multiple decisions, each with two alternatives) –Several logit models (single decision, multiple alternatives)

Logit Models for Multiple Choices Conditional Logit Model (McFadden) Conditional Logit Model (McFadden) –Unordered choices Multinomial Logit Model Multinomial Logit Model –Choices can be ordered.

Examples of Health Care Uses for Logit Models for Multiple Choices Choice of what hospital to use, among those in market area Choice of what hospital to use, among those in market area Choice of treatment among several options Choice of treatment among several options

Conditional Logit Model

Conditional logit model Also known as the random utility model Also known as the random utility model Is derived from consumer theory Is derived from consumer theory How consumers choose from a set of options How consumers choose from a set of options Model driven by the characteristics of the choices. Model driven by the characteristics of the choices. Individual characteristics “cancel out” but can be included. For example, in hospital choice, can interact with distance to hospital Individual characteristics “cancel out” but can be included. For example, in hospital choice, can interact with distance to hospital Can express the results as odds ratios. Can express the results as odds ratios.

Estimation of McFadden’s Model Some software packages (e.g. SAS) require that the number of choices be equal across all observations. Some software packages (e.g. SAS) require that the number of choices be equal across all observations. LIMDEP, allows a “NCHOICES” options that lets you set the number of choices for each observation. This is a very useful feature. May be able to do this in Stata (clogit) with “group” LIMDEP, allows a “NCHOICES” options that lets you set the number of choices for each observation. This is a very useful feature. May be able to do this in Stata (clogit) with “group”

Example of Conditional Logit Estimates Study I did looking at elderly service- connected veterans choice of VA or non-VA hospital Study I did looking at elderly service- connected veterans choice of VA or non-VA hospital Log distance0.66p<0.001 Population density p<0.001 VA2.80p<0.001

Multinomial Logit Model

Must identify a reference choice, model yields set of parameter estimates for each of the other choices Must identify a reference choice, model yields set of parameter estimates for each of the other choices Allows direct estimation of parameters for individual characteristics. Model can (should) also include parameters for choice characteristics Allows direct estimation of parameters for individual characteristics. Model can (should) also include parameters for choice characteristics

Example of a Multinomial Logit Model Effect on VLBW delivery at hospital if nearby hospital opens mid-level NICU. Effect on VLBW delivery at hospital if nearby hospital opens mid-level NICU. Hosp w/ no NICU-0.65 Hosp w/ no NICU-0.65 Hosp w/ high-level NICU-0.70 Hosp w/ high-level NICU-0.70

Independence of Irrelevant Alternatives Results should be robust to varying the number of alternative choices Results should be robust to varying the number of alternative choices –Can re-estimate model after deleting some of the choices. –McFadden, regression based test. Regression- Based Specification Tests for the Multinomial Logit Model. J Econometrics 1987;34(1/2): If fail IIA, may need to estimate a nested logit model If fail IIA, may need to estimate a nested logit model

Independence of Irrelevant Alternatives - 2 McFadden test is fairly weak, likely to pass. Note, this test can also be used to test for omitted variables. McFadden test is fairly weak, likely to pass. Note, this test can also be used to test for omitted variables. For many health applications, doesn’t matter, the models are very robust (e.g. hospital choice models driven by distance). For many health applications, doesn’t matter, the models are very robust (e.g. hospital choice models driven by distance).

Count Data (integers) Continuation of the same problem Continuation of the same problem Problem diminishes as counts increase Problem diminishes as counts increase Rule of Thumb. Need to use count data models for counts under 30 Rule of Thumb. Need to use count data models for counts under 30

Count Data Some examples of where count data models are needed in health care Some examples of where count data models are needed in health care –Dependent variable is number of outpatient visits –Number of times a prescription of a chronic disease medication is refilled in a year –Number of adverse events in a unit (or hospital) over a period of time

Count Data Poisson distribution. A distribution for counts. Poisson distribution. A distribution for counts. –Problem: very restrictive assumption that mean and variance are equal

Count Data In general, negative binomial is a better choice. Stata, test for what distribution is part of the package. Other distributions can also be used. In general, negative binomial is a better choice. Stata, test for what distribution is part of the package. Other distributions can also be used.

Other Models New models are being introduced all of the time. More and better ways to address the problems of limited dependent variables. New models are being introduced all of the time. More and better ways to address the problems of limited dependent variables. Includes semi-parametric and non- parameteric methods. Includes semi-parametric and non- parameteric methods.

Reference Texts Greene. Econometric Analysis, Ch. 19 and 20. Greene. Econometric Analysis, Ch. 19 and 20. Maddala. Limited-Dependent and Qualitative Variables in Econometrics Maddala. Limited-Dependent and Qualitative Variables in Econometrics

Journal References McFadden D. Specification Tests for the Multinomial Logit Model. J Econometrics 1987;34(1/2): McFadden D. Specification Tests for the Multinomial Logit Model. J Econometrics 1987;34(1/2): Zhang J, Yu KF. What’s the Relative Risk? A Method of Correctingthe Odds Ratio in Cohort Studies of Common Outcomes. JAMA 1998;280(19): Zhang J, Yu KF. What’s the Relative Risk? A Method of Correctingthe Odds Ratio in Cohort Studies of Common Outcomes. JAMA 1998;280(19):