Categorical and Zero Inflated Growth Models Alan C. Acock* Summer, 2009 *Alan C. Acock, Department of Human Development and Family Sciences, Oregon State.

Slides:

Advertisements

Similar presentations

Continued Psy 524 Ainsworth

Advertisements

Lecture 11 (Chapter 9).

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 1 Stats 330: Lecture 32.

Brief introduction on Logistic Regression

Latent Growth Modeling Chongming Yang Research Support Center FHSS College.

Data: Crab mating patterns Data: Typists (Poisson with random effects) (Poisson Regression, ZIP model, Negative Binomial) Data: Challenger (Binomial with.

Logistic Regression Psy 524 Ainsworth.

Logistic Regression I Outline Introduction to maximum likelihood estimation (MLE) Introduction to Generalized Linear Models The simplest logistic regression.

Random effects as latent variables: SEM for repeated measures data Dr Patrick Sturgis University of Surrey.

Mixture modelling of continuous variables. Mixture modelling So far we have dealt with mixture modelling for a selection of binary or ordinal variables.

Growth Mixture Modeling Workgroup Heterogeneous trajectories of cognitive decline: Association with neuropathology Hayden K., Reed B., Chelune G., Manly.

HSRP 734: Advanced Statistical Methods July 24, 2008.

Logistic Regression Example: Horseshoe Crab Data

Latent Growth Curve Modeling In Mplus:

Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number.

Chapter 13 Multiple Regression

QUALITATIVE AND LIMITED DEPENDENT VARIABLE MODELS.

Chapter 12 Multiple Regression

So far, we have considered regression models with dummy variables of independent variables. In this lecture, we will study regression models whose dependent.

Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 11 th Edition.

Notes on Logistic Regression STAT 4330/8330. Introduction Previously, you learned about odds ratios (OR’s). We now transition and begin discussion of.

An Introduction to Logistic Regression

Mixture Modeling Chongming Yang Research Support Center FHSS College.

Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.

Introduction to Multilevel Modeling Using SPSS

Inference for regression - Simple linear regression

© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.

Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1.

Quantitative Skills 1: Graphing

2 nd Order CFA Byrne Chapter 5. 2 nd Order Models The idea of a 2 nd order model (sometimes called a bi-factor model) is: – You have some latent variables.

Applications The General Linear Model. Transformations.

Lecture 3: Inference in Simple Linear Regression BMTRY 701 Biostatistical Methods II.

Excepted from HSRP 734: Advanced Statistical Methods June 5, 2008.

Social patterning in bed-sharing behaviour A longitudinal latent class analysis (LLCA)

Growth Mixture Modeling of Longitudinal Data David Huang, Dr.P.H., M.P.H. UCLA, Integrated Substance Abuse Program.

Latent Growth Curve Modeling In Mplus: An Introduction and Practice Examples Part II Edward D. Barker, Ph.D. Social, Genetic, and Developmental Psychiatry.

University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.

Chap 14-1 Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics.

Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.

Confirmatory Factor Analysis Psych 818 DeShon. Construct Validity: MTMM ● Assessed via convergent and divergent evidence ● Convergent – Measures of the.

© Department of Statistics 2012 STATS 330 Lecture 20: Slide 1 Stats 330: Lecture 20.

1 Parallel Models. 2 Model two separate processes which run in tandem Bedwetting and daytime wetting 5 time points: 4½, 5½, 6½,7½ & 9½ yrs Binary measures.

1 Differential Item Functioning in Mplus Summer School Week 2.

Roghayeh parsaee  These approaches assume that the study sample arises from a homogeneous population  focus is on relationships among variables 

Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.

University Rennes 2, CRPCC, EA 1285

Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.

ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL.

1 Statistics 262: Intermediate Biostatistics Regression Models for longitudinal data: Mixed Models.

Linear Regression Chapter 7. Slide 2 What is Regression? A way of predicting the value of one variable from another. – It is a hypothetical model of the.

1 Family Variables: Latent Class & Latent Transtion Alan C. Acock Presented at the Conference on Research with Dyads and families Purdue University May,

Roger B. Hammer Assistant Professor Department of Sociology Oregon State University Conducting Social Research Logistic Regression Categorical Data Analysis.

Logistic Regression and Odds Ratios Psych DeShon.

Nonparametric Statistics

REGRESSION MODEL FITTING & IDENTIFICATION OF PROGNOSTIC FACTORS BISMA FAROOQI.

LOGISTIC REGRESSION. Purpose  Logistical regression is regularly used when there are only two categories of the dependent variable and there is a mixture.

Stats Methods at IC Lecture 3: Regression.

Nonparametric Statistics

Chapter 14 Introduction to Multiple Regression

BINARY LOGISTIC REGRESSION

EHS Lecture 14: Linear and logistic regression, task-based assessment

Logistic Regression APKC – STATS AFAC (2016).

Bivariate & Multivariate Regression Analysis

Advanced Quantitative Techniques

Notes on Logistic Regression

Correlation, Regression & Nested Models

CHAPTER 29: Multiple Regression*

Nonparametric Statistics

Rachael Bedford Mplus: Longitudinal Analysis Workshop 23/06/2015

Presentation transcript:

Categorical and Zero Inflated Growth Models Alan C. Acock* Summer, 2009 *Alan C. Acock, Department of Human Development and Family Sciences, Oregon State University, Corvallis OR This was supported in part by 1R01DA13474, The Positive Action Program: Outcomes and Mediators, A Randomized Trial in Hawaii and R305L CFDA U.S. Department of Education; Positive Action for Social and Character Development. Randomized trial in Chicago, Brian Flay, PI. ; and R215S CFDA, Uintah Character Education Randomized Trial, U.S. Department of

Alan C. Acock2 Topics to Be Covered Predicting Rare Events Binary Growth Curves Count Growth curves Zero-Inflated Poisson Growth Curves Latent Class Zero-Inflated Poisson Models A detailed presentation of the ideas is available at

Alan C. Acock3 Predicting Rare Events Physical conflict in romantic relationships Frequency of depressive symptoms Frequency of Parent-Child Conflict Frequency of risky sex last month

Poisson with too many zeros

Alan C. Acock5 Binary: Does Behavior Occur Structural zeros—behavior is not in behavioral repertoire Do not smoke marijuana  Didn’t smoke last month Chance zeros—Behavior part of repertoire, just not last month No fight with spouse last week, but...

Alan C. Acock6 Count Component Two-part model Equation for zero vs. not zero Equation for those not zero. Zeros are missing values Zero-inflated model— includes both those who are structural zeros and chance zeros

Alan C. Acock7 Trajectory of the Probability of Behavior

Alan C. Acock8 Trajectory of the Count of behavior Occurring?

Alan C. Acock9 Why Aren’t Both Lines Straight? We use a linear model of the growth curve We predict the log of the expected count We predict log odds for the binary component

Alan C. Acock10 Why Aren’t Both Lines Straight? For the count we are predicting Expected = ln(  ) =  +  T i T i (0, 1, 2,...) is the time period  is the intercept or initial value  is the slope or rate of growth

Why Aren’t Both Lines Straight? Expected log odds or expected log count, are linear Expected probability or expected count, are not linear

Alan C. Acock11 Time Invariant Covariates Time invariant covariates are constants over the duration of study May influence growth in the binary and count components May influence initial level of binary and count components Different effects a major focus

Alan C. Acock12 Time Invariant Covariates Mother’s education might influence likelihood of being structurally zero Mother’s education might be negatively related to the rate of growth

Alan C. Acock13 Time Varying Covariates Time Varying Covariates—variables that can change across waves Peer pressure may increase each year between 12 and 18 The peer pressure each wave can directly influence drug usage that year

Estimating a Binary Growth Curve

Alan C. Acock15 Example of Binary Component Brian Flay has a study in Hawaii evaluating the Positive Action Program in Grades Key outcome—reducing negative responses to behaviors that Positive Action promotes Gender is a time invariant covariate—boys higher initially but to have just as strong a negative slope Level of implementation is a time varying covariate —the nearly 200 classrooms vary. A Latent Profile Analysis produced two classes on implementation

Binary Model Alan C. Acock16

Alan C. Acock17 Predicting a Threshold Thresholds

Alan C. Acock18 Title: workshop binary growth.inp Data: File is workshop_growth.dat ; Variables: Names are idnum s1flbadc s2flbadc s3flbadc s4flbadc male s1flbadd s2flbadd s3flbadd s4flbadd s1flbadm s2flbadm s3flbadm s4flbadm c3 c4 s3techer room ; Usevariables are male s1flbadd s2flbadd s3flbadd s4flbadd c3 c4 ; Categorical are s1flbadd s2flbadd s3flbadd s4flbadd ; Missing are all (-9999) ; Analysis: Estimator = ML ; Binary Growth Curve Program—Part 1

Alan C. Acock19 Model: alpha beta | ; alpha on male ; beta on male ; s3flbadd on c3 ; s4flbadd on c4 ; Output: Patterns sampstat standardized tech8; Binary Growth Curve Program— Part 2

Alan C. Acock20 S1FLBADD Category Category Loglikelihood H0 Value S2FLBADD Category Category Information Criteria Number of Free Parameters 9 Akaike (AIC) Bayesian (BIC) Sample-Size Adjusted BIC S3FLBADD Category Category S4FLBADD Category Category Sample Proportions and Model Fit Proportion of Negative Responses drops each year

Alan C. Acock21 Estimates S.E. Est./S.E. Std StdYX ALPHA ON MALE BETA ON MALE S3FLBADD ON C S4FLBADD ON C BETA WITH ALPHA Intercepts ALPHA BETA Model Estimates

Alan C. Acock22 Gender Effects Unstandardized effect of male on the intercept, α, is.548, z = 2.98, p <.01 Standardized Beta weight is.232 Partially standardized (standardized on latent variable only) is.464 Path to slope is not significant, B =.03, partially standardized path is.08 However effective the program is at reducing negative feelings, it is about as effective for boys as for girls

Alan C. Acock23 Implementation Effects Wave 3—Unstandardized effect of implementation for the Binary Component has a B = -.23, z = -2.71, p <.05-- Exponentiated odds ratio is e -.23 =.79 Wave 4—the unstandardized effect of implementation for the Binary Component has a B = -.64, z = , p < Exponentiated odds ratio is e -.64 =.53

Alan C. Acock24 Thresholds Estimates S.E. Est./S.E. Std StdYX S1FLBADD$ S2FLBADD$ S3FLBADD$ S4FLBADD$ Residual Variances ALPHA BETA LOGISTIC REGRESSION ODDS RATIO RESULTS S3FLBADD ON C S4FLBADD ON C MODEL RESULTS (cont.)

Alan C. Acock25 Thresholds & Graphs Mplus does not graph estimated probabilities when there are covariates because variances depend on the covariate level We cannot estimate initial probability using threshold value. If no covariates, we would exponentiate the threshold. In Stata display exp(-.714) yields.49.

Thresholds & Graphs If you want a series of graphs (e.g., boy/low intervention both wave 3 and wave 4), you need to treat each combination as a separate group Each group would have no covariates; just be a subset of children. Results might not be consistent with the model using all of the data

Estimating a Count Growth Curve

Alan C. Acock27 Count Component Mplus uses a Poisson Distribution for estimating counts The Poisson distribution is a single parameter distribution with λ = M =  2 Without adjusting for the excess of zeros, the  2 is often greater than the M

Alan C. Acock28 Count Component

Alan C. Acock29 Count Program—Part 1 Title: workshop count growth fixed effects.inp Data: File is workshop_growth.dat ; Variable: Names are idnum s1flbadc s2flbadc s3flbadc s4flbadc male s1flbadd s2flbadd s3flbadd s4flbadd s1flbadm s2flbadm s3flbadm s4flbadm c3 c4 s3techer room ; Usevariables are s1flbadc s2flbadc s3flbadc s4flbadc ; Missing are all (-9999) ; Count are s1flbadc s2flbadc s3flbadc s4flbadc ;

Alan C. Acock30 Count Program—Part 2 Model: alpha beta | ; ; !fixes var. of intercept at 0 ;!fixes var.of slope at 0 Output: residual tech1 tech4 tech8; Plot: Type = Plot3 ; Series = s1flbadc s2flbadc s3flbadc s4flbadc(*) ;

Fixing Variances Fixing the variance of the intercept and slope makes this a fixed effects model Fixing the variance of the slope only makes it a random intercept model Not fixing them makes it a random intercept & random slope model This takes a very long time to run

Alan C. Acock31 Count Model Output- MODEL RESULTS Estimates S.E. S.E./Est. Means ALPHA BETA Variances ALPHA BETA

Alan C. Acock32 Interpreting the Est. Intercept We fixed the residual variances at zero The mean intercept is.56, z = , p <.001 We can exponentiate this when there are no covariates to get the expected count at the intercept, e.56 = 1.75

Alan C. Acock33 Interpreting the Est. Slope The mean slope is -.64, z = , p <.001. With no covariates we use exponentiation to obtain the expected count for each wave Expected count (wave1) = e α  e β  0 = 1.75 Expected count (wave2) = e α  e β  1 =.92 Expected count (wave3) = e α  e β  2 =.48 Expected count (wave4) = e α  e β  3 =.25

Alan C. Acock34 Sample and Estimated Count

Putting the Binary and Count Growth Curves Together

Two-Part Model Alan C. Acock37

Alan C. Acock38 Putting the Binary and Count Models Together Two-Part Solution First part models binary outcome as we did here with binary data Second part deletes all people who have a count of zero at any wave. This leaves only children who have a count of at least 1 for every wave Second part estimated using a Poisson Model

Alan C. Acock39 Putting the Binary and Count Models Together Zero-Inflated Growth Curve Model estimates growth curve for structural zeros and for the count simultaneously Binary component includes all observations Count component includes all observations but is modeling only those zeros that are explainable by a random Poisson process

Alan C. Acock40 Zero-Inflated Poisson Regression

Here are 5 cases with counts Alan C. Acock | s1flbadc s2flbadc s3flbadc s4flbadc | | s1flbadc s2flbadc s3flbadc s4flbadc | | | | | 201. | | 203. | | 204. | | 207. | | 208. | | | | | |

Here are there Binary Scores Alan C. Acock | s1flbadd s2flbadd s3flbadd s4flbadd | | s1flbadd s2flbadd s3flbadd s4flbadd | | | | | 201. | | 203. | | 204. | | 207. | | 208. | |

ZIP Model With No Covarites Alan C. Acock43

ZIP Model With No Covarites Alan C. Acock44

Alan C. Acock45

Interpreting Inflation Model β_i under the Model Results, B = 2.353, z = 7.149, p <.001. The threshold for the zero-inflated part of the model is shown under the label of Intercepts. For each wave the threshold is , z = 3.45, p <.05. This large negative value will be confusing, unless we remember that the outcome for the inflated part of the model is predicting always zero. We are not predicting one.

Interpreting Inflation Model The more negative the threshold value the smaller the likelihood of being in the always zero class at the start. (display exp(-6.756) .001.) Logistic regression usually is predicting the presence of an outcome, but now we are predicting its absence.

ZIP Model With No Covariates Alan C. Acock46

ZIP Model With No Covariates Alan C. Acock47

Binary Part: Probability of Inflation Alan C. Acock48

Interpreting the Count Part Alan C. Acock49

Count Part: Expected Count Alan C. Acock50

ZIP Model with Covariates Alan C. Acock51

ZIP Model with Covariates— Covariates Effects Alan C. Acock52

ZIP Model with Covariates Alan C. Acock53

ZIP Model with Covariates Alan C. Acock54

ZIP Model with Covariates: Intercept and Slope Alan C. Acock55

Latent Class Growth Analysis Using Zero- Inflated Poisson Model (LCGA Poisson)

Alan C. Acock57 LCGA Poisson Models We use mixture models A single population may have two subpopulations, i.e., our Implementation variable is a class variable Usually assumes class membership explains differences in trajectory, thus a fixed effects model

Alan C. Acock58 Latent Profile Analysis for Implementation Variable Overall Item Means Two Classes First Class Second Class Stickers for PA Word of the week You put notes in icu box Teacher read ICU notes about you Teacher read your ICU notes Tokens for meeting goals PA Assembly activities Assembly Balloon for PA Whole school PA Days/wk taught PA N1,5501,021529

Alan C. Acock59 Applied To Count Growth Models We can use a Latent Class Analysis in combination with a count growth model to See if there are several classes Classes are distinct from each other Members of a class share a homogeneous growth trajectory

Alan C. Acock60 Applied To Zero-Inflated Growth Models Sometimes referred to as Case or Person Centered rather than Variable Centered Subgroups of children rather than of variables Has advantages in ease of interpretation of results No w/n group variance of intercepts or slope –assumes each subgroup is homogeneous

Alan C. Acock61 LCGA Using Count Model, No Covariates—One Class Solution Serves as a baseline for multi-class solutions Add Mixture to Analysis  section because we are doing a mixture model Add %Overall%  to Model: section Later, we will add commands so each class can have differences

Alan C. Acock62 LCGA Count Model Program— Part 1 Title: LCA zip poisson model NO covariates c1.inp Latent Class Growth Analysis for a count outcome using a ZIP Model with no covariates and just one class Data: File is workshop_growth.dat ; Variable: Names are idnum s1flbadc s2flbadc s3flbadc s4flbadc male s1flbadd s2flbadd s3flbadd s4flbadd s1flbadm s2flbadm s3flbadm s4flbadm c3 c4 s3techer room ; Usevariables are s1flbadc s2flbadc s3flbadc s4flbadc ; Missing are all (-9999) ; Classes = c(1) ; ! this says there is a single class

Alan C. Acock63 LCGA Count Model Program— Part 2 Analysis: Type = Mixture ; Starts 20 2 ; Model: %Overall% Alpha Beta | ; Output: residual tech1 tech11 ; !tech11 gives you the Lo, Mendell, Rubin test Plot: Type = Plot3 ; ! Series = s1flbadc s2flbadc s3flbadc s4flbadc(*) ;

Alan C. Acock64 LCGA Using ZIP Model, No Covariates—Two Classes The starts 20 2 ; generates 20 starting values, does an initial estimation on each of these, then does full iterations on 2 best initial solutions. Best two did not converge with 150 starts for 4 classes We change classes = c(1) to = c(2) The following table compares 1 to 3 classes

LCGA Using ZIP Model, No Covariates—Two Classes We will focus on the 2 class solution There is a normative class (902 children) and a deviant class with just 85 The biggest improvement in fit is from 1 to 2 classes

Alan C. Acock65 Comparison of 1 to 4 Classes 1 Class 2 Classes 3 Classes 4 Classes Free Parameters 2514 AIC BIC Sample Adjusted BIC Entropy Lo, Mendell, Rubin na 2 v 1 Value = p < v 2 Value = p <.01 4 v 3 Value = P <.05 N for each class C1 = 2927 C1=2651 C2=276 C1=235 C2=81 C3=2611 C1 = 19 C2 = 2559 C3 = 188 C4 = 161

Alan C. Acock66 Two Class Solution CLASSIFICATION QUALITY Entropy CLASSIFICATION OF INDIVIDUALS BASED ON THEIR MOST LIKELY LATENT CLASS MEMBERSHIP Class Counts and Proportions Latent Classes

Alan C. Acock67 Two Class solution Parameter Estimate Class 1 N = 2651 Class 2 N=276 Mean α.379*** 1.395*** Mean β -.821*** -.395*** Count at time Count at time Count at time Count at time Display exp(alpha)*exp(beta*0,1,2,3)

Alan C. Acock68 Two Class solution—Count Part

Alan C. Acock69 Interpretation This solution has the deviant group start with a higher initial count ( α) and showing a significant improvement, decline in count, (β) The Normative group starts with a lower a lower count and drops less rapidly, possibly because there is a floor effect on the count.

Alan C. Acock70 Interpretation With a three class solution we got similar results with a normative group and a deviant group as the two main groups. However, the third class (only 85 kids) start with a low count and actually increase the count significantly over time. These 85 kids would be ideal for a qualitative sample because the program definitely fails them.

Alan C. Acock71 Adding Covariates We can add covariates These can be free across classes or constrained across classes The simple interpretation and graphs no longer work

Alan C. Acock72 Program for Freeing Constraint Model: %Overall% Alpha Beta | ; Alpha_i Beta_i | ; Alpha on male ; Beta on male ; S3flbadc on c3 ; S4flbadc on c4 ; %c#2% [s1flbadc#1 s2flbadc#1 s3flbadc#1 s4flbadc#1](1) ; [Beta_i] ;

Alan C. Acock73 Next Steps If you find distinct classes of participants who have different growth trajectories you can save the class of each participant. This is shown in the detailed document You can then compare the classes on whatever variable you think might be important in explaining the differentiation, e.g., parental support for program This will generate a new set of important covariates for subsequent research

Alan C. Acock74 Next Steps An introduction to growth curves and a detailed presentation of the ideas we’ve discussed is available at

Alan C. Acock75 Summary—Three Models Available from Mplus Traditional growth modeling where There is a common expectation for the trajectory for a sample Parameter estimates will have variances across individuals around the common expected trajectory (random effects) Covariates may explain some of this variance

Alan C. Acock76 Summary—Three Models Available from Mplus Latent Class Growth Models where We expect distinct classes that have different trajectories Class membership explains all of the variance in the parameters. Classes are homogeneous with respect to their growth curves (fixed effects)

Alan C. Acock77 Summary—Three Models Available from Mplus Mixture Models extending Latent Class Growth Models where We expect distinguishable classes that each have a different common trajectory Residual variance not explained by class membership are allowed (random effects) Covariates may explain some of this residual variance

Alan C. Acock78 Summary Mplus offers many features that are especially useful for longitudinal studies of individuals and families Many outcomes for family members are best studied using longitudinal data to identify growth trajectories Studies of growth trajectories can utilize time invariant, time variant, and distal outcomes

Alan C. Acock79 Summary Some outcomes for family members are successes or failures and the binary growth curves are useful for modeling these processes Some outcomes for family members are counts of how often some behavior or outcome occurs Many outcome involve both the binary and the count components