William Greene Department of Economics Stern School of Business New York University Some Applications of Latent Class Modeling In Health Economics.

Slides:



Advertisements
Similar presentations
William Greene Department of Economics Stern School of Business New York University Latent Class Modeling.
Advertisements

Inflated Responses in Self-Assessed Health Mark Harris Department of Economics, Curtin University Bruce Hollingsworth Department of Economics,
7. Models for Count Data, Inflation Models. Models for Count Data.
Lecture 11 (Chapter 9).
Grandparenting and health in Europe: a longitudinal analysis Di Gessa G, Glaser K and Tinker A Institute of Gerontology, Department of Social Science,
Longitudinal and Multilevel Methods for Models with Discrete Outcomes with Parametric and Non-Parametric Corrections for Unobserved Heterogeneity David.
Conference on Irish Economic Policy Union membership and the union wage Premium in Ireland Frank Walsh School of Economics University College Dublin
Introduction to Regression with Measurement Error STA431: Spring 2015.
SUPERSIZED NATION By Jennifer Ericksen August 24, 2007.
Part 1: Simple Linear Model 1-1/301-1 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics.
Part 24 – Statistical Tests:3 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics.
[Part 1] 1/15 Discrete Choice Modeling Econometric Methodology Discrete Choice Modeling William Greene Stern School of Business New York University 0Introduction.
Models with Discrete Dependent Variables
Part 23: Parameter Heterogeneity [1/115] Econometric Analysis of Panel Data William Greene Department of Economics Stern School of Business.
EPUNet Conference – BCN 06 “The causal effect of socioeconomic characteristics in health limitations across Europe: a longitudinal analysis using the European.
Smoking, Drinking and Obesity Hung-Hao Chang* David R. Just Biing-Hwan Lin National Taiwan University Cornell University ERS, USDA Present at National.
Part 18: Ordered Outcomes [1/88] Econometric Analysis of Panel Data William Greene Department of Economics Stern School of Business.
8. Heterogeneity: Latent Class Models. Latent Classes A population contains a mixture of individuals of different types (classes) Common form of the.
[Part 8] 1/27 Stochastic FrontierModels Applications Stochastic Frontier Models William Greene Stern School of Business New York University 0Introduction.
Single and Multiple Spell Discrete Time Hazards Models with Parametric and Non-Parametric Corrections for Unobserved Heterogeneity David K. Guilkey.
Discrete Choice Modeling William Greene Stern School of Business New York University.
Press Release FOR IMMEDIATE RELEASE:CONTACT: Roseanne Pawelec, Tuesday, July 23, 2002(617) NEARLY HALF OF ALL MASSACHUSETTS RESIDENTS OVERWEIGHT.
9. Binary Dependent Variables 9.1 Homogeneous models –Logit, probit models –Inference –Tax preparers 9.2 Random effects models 9.3 Fixed effects models.
Economics of Gender Chapter 5 Assist.Prof.Dr.Meltem INCE YENILMEZ.
Efficiency Measurement William Greene Stern School of Business New York University.
2004 Falls County Health Survey Texas Behavioral Risk Factor Surveillance System (BRFSS)
1/53: Topic 3.1 – Models for Ordered Choices Microeconometric Modeling William Greene Stern School of Business New York University New York NY USA William.
 United States Health Care System Performance  The World’s Best Health Care? William Greene Department of Economics Stern School of Business.
Empirical Methods for Microeconomic Applications University of Lugano, Switzerland May 27-31, 2013 William Greene Department of Economics Stern School.
Introduction to Regression with Measurement Error STA302: Fall/Winter 2013.
Efficiency Measurement William Greene Stern School of Business New York University.
Using Neural Networks to Predict Claim Duration in the Presence of Right Censoring and Covariates David Speights Senior Research Statistician HNC Insurance.
Modelling Charitable Donations: A Latent Class Panel Approach Sarah Brown (Sheffield) William Greene (New York) Mark Harris (Monash) Karl Taylor (Sheffield)
Empirical Methods for Microeconomic Applications University of Lugano, Switzerland May 27-31, 2013 William Greene Department of Economics Stern School.
How Big a Problem is Obesity for the Medicare Program? AcademyHealth June 10, 2008 Bruce Stuart, Lirong Zhao, Jennifer Lloyd The Peter Lamy Center Drug.
The Health Consequences of Incarceration Michael Massoglia Penn State University.
Introduction to Multivariate Analysis Epidemiological Applications in Health Services Research Dr. Ibrahim Awad Ibrahim.
Obesity, Medication Use and Expenditures among Nonelderly Adults with Asthma Eric M. Sarpong AHRQ Conference September 10, 2012.
Lecture 7: What is Regression Analysis? BUEC 333 Summer 2009 Simon Woodcock.
“Insuring Consumption against illness” Paul Gertler and John Gruber American Economic Review (2002) Presented by Osea Giuntella Getrler-Gruber(2002)- presented.
Discrete Choice Modeling William Greene Stern School of Business New York University.
Discrete Choice Modeling William Greene Stern School of Business New York University.
The dynamics of poverty in Ethiopia : persistence, state dependence and transitory shocks By Abebe Shimeles, PHD.
Limited Dependent Variables Ciaran S. Phibbs. Limited Dependent Variables 0-1, small number of options, small counts, etc. 0-1, small number of options,
Roghayeh parsaee  These approaches assume that the study sample arises from a homogeneous population  focus is on relationships among variables 
Discrete Choice Modeling William Greene Stern School of Business New York University.
[Topic 9-Latent Class Models] 1/66 9. Heterogeneity: Latent Class Models.
1/68: Topic 4.2 – Latent Class Models Microeconometric Modeling William Greene Stern School of Business New York University New York NY USA William Greene.
Efficiency Measurement William Greene Stern School of Business New York University.
Happiness in Europe Cross-country differences in the determinants of subjective Well-Being EPUNet Conference 2006 Peder J. Pedersen University of Aarhus.
1/53: Topic 3.1 – Models for Ordered Choices Microeconometric Modeling William Greene Stern School of Business New York University New York NY USA William.
1)You have 15 seconds to answer each question 2)Choose which answer you want out of the 4 options, if you get it wrong, go back to the previous question.
6. Ordered Choice Models. Ordered Choices Ordered Discrete Outcomes E.g.: Taste test, credit rating, course grade, preference scale Underlying random.
[Part 5] 1/43 Discrete Choice Modeling Ordered Choice Models Discrete Choice Modeling William Greene Stern School of Business New York University 0Introduction.
Children’s Emotional and Behavioral Problems and Their Parents’ Labor Supply Patrick Richard, Ph.D., M.A. Nicholas C. Petris Center on Health Markets and.
Chapter 8 Nurses in Hospital and Long- Term Care Services.
Microeconometric Modeling
Microeconometric Modeling
Limited Dependent Variables
William Greene Stern School of Business New York University
William Greene Stern School of Business New York University
Microeconometric Modeling
Econometric Analysis of Panel Data
Microeconometric Modeling
Microeconometric Modeling
Econometric Analysis of Panel Data
Empirical Methods for Microeconomic Applications University of Lugano, Switzerland May 27-31, 2019 William Greene Department of Economics Stern School.
Microeconometric Modeling
Presentation transcript:

William Greene Department of Economics Stern School of Business New York University Some Applications of Latent Class Modeling In Health Economics



Outline Theory: Finite Mixture and Latent Class Models Applications o Obesity o Self Assessed Health o Efficiency of Nursing Hommes

Latent Classes A population contains a mixture of individuals of different types (classes) Common form of the data generating mechanism within the classes Observed outcome y is governed by the common process F(y|x,  j ) Classes are distinguished by the parameters,  j.

A Latent Class Hurdle NB2 Model Analysis of ECHP panel data ( ) Two class Latent Class Model o Typical in health economics applications Hurdle model for physician visits o Poisson hurdle for participation and negative binomial intensity given participation o Contrast to a negative binomial model

How Finite Mixture Models Work

Find the ‘Best’ Fitting Mixture of Two Normal Densities

Mixing probabilities.715 and.285

Approximation Actual Distribution

A Practical Distinction Finite Mixture (Discrete Mixture): o Functional form strategy o Component densities have no meaning o Mixing probabilities have no meaning o There is no question of “class membership” o The number of classes is uninteresting – enough to get a good fit Latent Class: o Mixture of subpopulations o Component densities are believed to be definable “groups” (Low Users and High Users in Bago d’Uva and Jones application) o The classification problem is interesting – who is in which class? o Posterior probabilities, P(class|y,x) have meaning o Question of the number of classes has content in the context of the analysis

Why Make the Distinction? Same estimation strategy Same estimation results Extending the latent class model o Allows a rich, flexible model specification for behavior o The classes may be governed by different processes

Antecedents Pearson’s 1894 study of crabs in Naples – finite mixture of two normals – seeking evidence of two subspecies. Some of the extensions I will note here have already been (implicitly) employed in earlier literature. o Different underlying processes o Heterogeneous class probabilities o Correlations of unobservables in class probabilities with unobservables in structural (within class) models One has not and is not widespread (yet) o Cross class restrictions implied by the theory of the model

Switching Regressions Mixture of normals with heterogeneous mean o y ~ N(b 0 x 0,  0 2 ) if d=0, y ~ N(b 1 x 1,  1 2 ) if d=1, P(d=1)=  (cz). o d is unobserved (latent switching). Becomes a latent class model when regime 0 is a demand function and regime 1 is a supply function, d=0 if excess supply The two regression equations may involve different variables – a true latent class model

Endogenous Switching (ca.1980) Not identified. Regimes do not coexist.

Outcome Inflation Models Lambert 1992, Technometrics. Quality control problem. Counting defects per unit of time on the assembly line. How to explain the zeros; is the process under control or not? Two State Outcome: Prob(State=0)=R, Prob(State=1)=1-R o State=0, Y=0 with certainty o State=1, Y ~ some distribution support that includes 0, e.g., Poisson. o Prob(State 0|y>0) = 0 o Prob(State 1|y=0) = (1-R)f(0)/[R + (1-R)f(0)] o R = Logistic probability “Nonstandard” latent class model Recent users have extended this to “Outcome Inflated Models,” e.g., twos inflation in models of fertility; inflated responses in health status.

Split Population Survival Models Schmidt and Witte 1989 study of recidivism F=1 for eventual failure, F=0 for never fail. F is unobserved. P(F=1)= , P(F=0)=1-  C=1 for a recidivist, observed. Prob(F=1|C=1) = 1. Density for time until failure actually occurs is  × g(t|F=1). Density for observed duration (possibly censored) o P(C=0)=(1-  ) +  (G(T|F=1)) (Observation is censored) o Density given C=1 =  g(t|F=1) o G=survival function, t=time of observation. Unobserved F implies a latent population split. They added covariates to  :  i =logit(z i ). Different models apply to the two latent subpopulations.

Variations of Interest Heterogeneous priors for the class probabilities Correlation of unobservables in class probabilities with unobservables in regime specific models Variations of model structure across classes Behavioral basis for the mixed models with implied restrictions

Applications Obesity: Heterogeneous class probabilities, generalized ordered choice; Endogenous class membership Self Assessed Health: Heterogeneous subpopulations; endogenous class membership Cost Efficiency of Nursing Homes: theoretical restrictions on underlying models

Modeling Obesity with a Latent Class Model Mark Harris Department of Economics, Curtin University Bruce Hollingsworth Department of Economics, Lancaster University Pushkar Maitra Department of Economics, Monash University William Greene Stern School of Business, New York University

300 Million People Worldwide. International Obesity Task Force:

Costs of Obesity In the US more people are obese than smoke or use illegal drugs Obesity is a major risk factor for non-communicable diseases like heart problems and cancer Obesity is also associated with: o lower wages and productivity, and absenteeism o low self-esteem An economic problem. It is costly to society: o USA costs are around 4-8% of all annual health care expenditure - US $100 billion o Canada, 5%; France, %; and New Zealand 2.5%

Measuring Obesity An individual’s weight given their height should lie within a certain range o Body Mass Index (BMI) o Weight (Kg)/height(Meters) 2 WHO guidelines: o Underweight BMI < 18.5 o Normal18.5 < BMI < 25 o Overweight25 < BMI < 30 o Obese BMI > 30 o Morbidly Obese BMI > 40

Two Latent Classes: Approximately Half of European Individuals

Modeling BMI Outcomes Grossman-type health production function Health Outcomes = f(inputs) Existing literature assumes BMI is an ordinal, not cardinal, representation of individuals. o Weight-related health status o Do not assume a one-to-one relationship between BMI levels and (weight-related) health status levels Translate BMI values into an ordinal scale using WHO guidelines Preserves underlying ordinal nature of the BMI index but recognizes that individuals within a so-defined weight range are of an (approximately) equivalent (weight-related) health status level

Conversion to a Discrete Measure Measurement issues: Tendency to under- report BMI o women tend to under-estimate/report weight; o men over-report height. Using bands should alleviate this Allows focus on discrete ‘at risk’ groups

A Censored Regression Model for BMI Simple Regression Approach Based on Actual BMI: BMI* =  ′x + ,  ~ N[0,  2 ] Interval Censored Regression Approach WT = 0 if BMI* < 25 Normal 1 if 25 < BMI* < 30 Overweight 2 if BMI* > 30 Obese  Inadequate accommodation of heterogeneity  Inflexible reliance on WHO classification  Rigid measurement by the guidelines

An Ordered Probit Approach A Latent Regression Model for “True BMI” BMI* =  ′x + ,  ~ N[0,σ 2 ], σ 2 = 1 “True BMI” = a proxy for weight is unobserved Observation Mechanism for Weight Type WT = 0 if BMI* < 0Normal 1 if 0 < BMI* <  Overweight 2 if  < BMI* Obese

Heterogeneity in the BMI Ranges Boundaries are set by the WHO narrowly defined for all individuals Strictly defined WHO definitions may consequently push individuals into inappropriate categories We allow flexibility at the margins of these intervals Following Pudney and Shields (2000) therefore we consider Generalised Ordered Choice models - boundary parameters are now functions of observed personal characteristics

Generalized Ordered Probit Approach A Latent Regression Model for True BMI BMI i * =  ′x i +  i,  i ~ N[0,σ 2 ], σ 2 = 1 Observation Mechanism for Weight Type WT i = 0 if BMI i * < 0 Normal 1 if 0 < BMI i * <  i (w i ) Overweight 2 if  (w i ) < BMI i * Obese

Latent Class Modeling Several ‘types’ or ‘classes. Obesity be due to genetic reasons (the FTO gene) or lifestyle factors Distinct sets of individuals may have differing reactions to various policy tools and/or characteristics The observer does not know from the data which class an individual is in. Suggests a latent class approach for health outcomes (Deb and Trivedi, 2002, and Bago d’Uva, 2005)

Latent Class Application Two class model (considering FTO gene): o More classes make class interpretations much more difficult o Parametric models proliferate parameters Endogenous class membership: Two classes allow us to correlate the equations driving class membership and observed weight outcomes via unobservables. Theory for more than two classes not yet developed.

Heterogeneous Class Probabilities  j = Prob(class=j) = governor of a detached natural process. Homogeneous.  ij = Prob(class=j|z i,individual i) Now possibly a behavioral aspect of the process, no longer “detached” or “natural” Nagin and Land 1993, “Criminal Careers…

Endogeneity of Class Membership

Model Components x: determines observed weight levels within classes For observed weight levels we use lifestyle factors such as marital status and exercise levels z: determines latent classes For latent class determination we use genetic proxies such as age, gender and ethnicity: the things we can’t change w: determines position of boundary parameters within classes For the boundary parameters we have: weight-training intensity and age (BMI inappropriate for the aged?) pregnancy (small numbers and length of term unknown)

Data US National Health Interview Survey (2005); conducted by the National Center for Health Statistics Information on self-reported height and weight levels, BMI levels Demographic information Split sample (30,000+) by gender

Outcome Probabilities Class 0 dominated by normal and overweight probabilities ‘normal weight’ class Class 1 dominated by probabilities at top end of the scale ‘non-normal weight’ Unobservables for weight class membership, negatively correlated with those determining weight levels:

Normal Overweight Obese Class 0 Class 1

Classification (Latent Probit) Model

BMI Ordered Choice Model Conditional on class membership, lifestyle factors Marriage comfort factor only for normal class women Both classes associated with income, education Exercise effects similar in magnitude Exercise intensity only important for ‘non-normal’ class: Home ownership only important for.non-normal.class, and negative: result of differing socieconomic status distributions across classes?

Effects of Aging on Weight Class

Effect of Education on Probabilities

Effect of Income on Probabilities

Inflated Responses in Self-Assessed Health Mark Harris Department of Economics, Curtin University Bruce Hollingsworth Department of Economics, Lancaster University William Greene Stern School of Business, New York University

Introduction Health sector an important part of developed countries’ economies: E.g., Australia 9% of GDP To see if these resources are being effectively utilized, we need to fully understand the determinants of individuals’ health levels To this end much policy, and even more academic research, is based on measures of self-assessed health (SAH) from survey data

SAH vs. Objective Health Measures Favorable SAH categories seem artificially high.  60% of Australians are either overweight or obese (Dunstan et. al, 2001)  1 in 4 Australians has either diabetes or a condition of impaired glucose metabolism  Over 50% of the population has elevated cholesterol  Over 50% has at least 1 of the “deadly quartet” of health conditions (diabetes, obesity, high blood pressure, high cholestrol)  Nearly 4 out of 5 Australians have 1 or more long term health conditions (National Health Survey, Australian Bureau of Statistics 2006)  Australia ranked #1 in terms of obesity rates Similar results appear to appear for other countries

SAH vs. Objective Health Our objectives 1.Are these SAH outcomes are “over- inflated” 2.And if so, why, and what kinds of people are doing the over- inflating/mis-reporting?

HILDA Data The Household, Income and Labour Dynamics in Australia (HILDA) dataset: 1. a longitudinal survey of households in Australia 2. well tried and tested dataset 3. contains a host of information on SAH and other health measures, as well as numerous demographic variables

Self Assessed Health “In general, would you say your health is: Excellent, Very good, Good, Fair or Poor?" Responses 1,2,3,4,5 (we will be using 0,1,2,3,4) Typically ¾ of responses are “good” or “very good” health; in our data (HILDA) we get 72% Similar numbers for most developed countries Does this truly represent the health of the nation?

Recent Literature - Heterogeneity Carro (2012) o Ordered SAH, “good,” “so so,” bad” o Two effects: Random effects (Mundlak) in latent index function, fixed effects in threshold Schurer and Jones(2011) o Heterogeneity, panel data, o “Generalized ordered probit:” different slope vectors for each outcome.

Kerkhofs and Lindeboom, Health Economics, 1995 Subjective Health Measures and State Dependent Reporting Errors Incentive to “misreport” depends on employment status: employed, unemployed, retired, disabled Ho = an objective, observed health indicator H* = latent health = f1(Ho,X1) Hs = reported health = f2(H*,X2,S) o S = employment status, 4 observed categories o Ordered choice, o Boundaries depend on S,X2; Heterogeneity is induced by incentives produced by employment status

A Two Class Latent Class Model True ReporterMisreporter

Reporter Type Model

Y=4 Y=3 Y=2 Y=1 Y=0

Pr(true,y) = Pr(true) * Pr(y | true)

Mis-reporters choose either good or very good The response is determined by a probit model Y=3 Y=2

Observed Mixture of Two Classes

Who are the Misreporters?

Priors and Posteriors

General Results

Whither the EM Algorithm? An Algorithm, not a model o E step: Compute posterior probabilities,  ij o M step: In each class, estimate class specific parameters using a (class and individual) weighted log likelihood, using the posteriors as weights. Cannot impose cross class restrictions Cannot model endogeneity

Latent Class Efficiency Studies Battese and Coelli – growing in weather “regimes” for Indonesian rice farmers Kumbhakar and Orea – cost structures for U.S. Banks Greene (Health Economics, 2005) – revisits WHO Year 2000 World Health Report

Studying Economic Efficiency in Health Care Hospital and Nursing Home o Cost efficiency o Role of quality (not studied today) Agency for Health Reseach and Quality (AHRQ)

Stochastic Frontier Analysis logC = f(output, input prices, environment) + v + u ε = v + u o v = noise – the usual “disturbance” o u = inefficiency Frontier efficiency analysis o Estimate parameters of model o Estimate u (to the extent we are able – we use E[u|ε]) o Evaluate and compare observed firms in the sample

Nursing Home Costs 44 Swiss nursing homes, 13 years Cost, Pk, Pl, output, two environmental variables Estimate cost function Estimate inefficiency

Estimated Cost Efficiency

Inefficiency? Not all agree with the presence (or identifiability) of “inefficiency” in market outcomes data. Variation around the common production structure may all be nonsystematic and not controlled by management Implication, no inefficiency: u = 0.

A Two Class Model Class 1: With Inefficiency o logC = f(output, input prices, environment) +  v v +  u u Class 2: Without Inefficiency o logC = f(output, input prices, environment) +  v v o  u = 0 Implement with a single zero restriction in a constrained (same cost function) two class model Parameterization: λ =  u /  v = 0 in class 2.

LogL= 464 with a common frontier model, 527 with two classes

Conclusion Latent class modeling provides a rich, flexible platform for behavioral model building. Thank you.