Mixture modelling of continuous variables. Mixture modelling So far we have dealt with mixture modelling for a selection of binary or ordinal variables.

Slides:



Advertisements
Similar presentations
Population vs. Sample Population: A large group of people to which we are interested in generalizing. parameter Sample: A smaller group drawn from a population.
Advertisements

Linear Regression t-Tests Cardiovascular fitness among skiers.
Model Assessment, Selection and Averaging
Exploratory factor analysis GHQ-12. EGO GHQ-12 EFA 1) Assuming items are continuous Variable: Names are ghq01 ghq02 ghq03 ghq04 ghq05 ghq06 ghq07 ghq08.
Data Analysis Statistics. Inferential statistics.
Lecture 23: Tues., Dec. 2 Today: Thursday:
Clustered or Multilevel Data
Data Analysis Statistics. Inferential statistics.
Introduction to Linear and Logistic Regression. Basic Ideas Linear Transformation Finding the Regression Line Minimize sum of the quadratic residuals.
Central Tendency and Variability
Statistical hypothesis testing – Inferential statistics II. Testing for associations.
Mixture Modeling Chongming Yang Research Support Center FHSS College.
1 Binary Models 1 A (Longitudinal) Latent Class Analysis of Bedwetting.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
© Willett, Harvard University Graduate School of Education, 8/27/2015S052/I.3(c) – Slide 1 More details can be found in the “Course Objectives and Content”
Estimation and Hypothesis Testing Faculty of Information Technology King Mongkut’s University of Technology North Bangkok 1.
ANCOVA Lecture 9 Andrew Ainsworth. What is ANCOVA?
Inferential Statistics: SPSS
Inference for regression - Simple linear regression
Session 1 1 Check installations 2 Open Mplus 3 Type basic commands 4 Get data read in, spat out &read in again 5 Run an analysis 6 What has it done?
Correlation and Regression. The test you choose depends on level of measurement: IndependentDependentTest DichotomousContinuous Independent Samples t-test.
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance.
Categorical and Zero Inflated Growth Models Alan C. Acock* Summer, 2009 *Alan C. Acock, Department of Human Development and Family Sciences, Oregon State.
One-sample In the previous cases we had one sample and were comparing its mean to a hypothesized population mean However in many situations we will use.
Social patterning in bed-sharing behaviour A longitudinal latent class analysis (LLCA)
Learning Theory Reza Shadmehr logistic regression, iterative re-weighted least squares.
Growth Mixture Modeling of Longitudinal Data David Huang, Dr.P.H., M.P.H. UCLA, Integrated Substance Abuse Program.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Repeated Measurements Analysis. Repeated Measures Analysis of Variance Situations in which biologists would make repeated measurements on same individual.
Introduction to Probability and Statistics Thirteenth Edition Chapter 12 Linear Regression and Correlation.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
PCB 3043L - General Ecology Data Analysis. OUTLINE Organizing an ecological study Basic sampling terminology Statistical analysis of data –Why use statistics?
Session 1 1 Check installations 2 Open Mplus 3 Type basic commands 4 Get data read in, spat out &read in again 5 Run an analysis 6 What has it done?
Regression. Types of Linear Regression Model Ordinary Least Square Model (OLS) –Minimize the residuals about the regression linear –Most commonly used.
1 DIF. 2 Winsteps: MFQ & DIF 3 Sample 2500 “boys” and 2500 “girls” All roughly 14 years old Data collected from ALSPAC hands-on clinic Short-form (13-item)
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
CHI SQUARE TESTS.
1 Parallel Models. 2 Model two separate processes which run in tandem Bedwetting and daytime wetting 5 time points: 4½, 5½, 6½,7½ & 9½ yrs Binary measures.
Statistical Inference for the Mean Objectives: (Chapter 9, DeCoursey) -To understand the terms: Null Hypothesis, Rejection Region, and Type I and II errors.
Medical Statistics as a science
University of Warwick, Department of Sociology, 2012/13 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 5 Multiple Regression.
Roghayeh parsaee  These approaches assume that the study sample arises from a homogeneous population  focus is on relationships among variables 
N318b Winter 2002 Nursing Statistics Specific statistical tests Chi-square (  2 ) Lecture 7.
Case Selection and Resampling Lucila Ohno-Machado HST951.
University Rennes 2, CRPCC, EA 1285
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Tutorial I: Missing Value Analysis
Cross-sectional LCA Patterns of first response to cigarettes.
Assumptions of Multiple Regression 1. Form of Relationship: –linear vs nonlinear –Main effects vs interaction effects 2. All relevant variables present.
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
HYPOTHESIS TESTING FOR DIFFERENCES BETWEEN MEANS AND BETWEEN PROPORTIONS.
Nonparametric Statistics
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Growth mixture modeling
Remember the equation of a line: Basic Linear Regression As scientists, we find it an irresistible temptation to put a straight line though something that.
Linear model. a type of regression analyses statistical method – both the response variable (Y) and the explanatory variable (X) are continuous variables.
Lecture 9 Sections 3.3 Objectives:
Structural Equation Modeling using MPlus
CH 5: Multivariate Methods
Hypothesis testing. Chi-square test
APPROACHES TO QUANTITATIVE DATA ANALYSIS
How to handle missing data values
Mathematical Foundations of BME Reza Shadmehr
Checking the data and assumptions before the final analysis.
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
Presentation transcript:

Mixture modelling of continuous variables

Mixture modelling So far we have dealt with mixture modelling for a selection of binary or ordinal variables collected at a single time point (c/s) or longitudinally across time The simplest example of a mixture model consists of a single continuous manifest variable The multivariate extension to this simple model is known as Latent Profile Analysis

Single continuous variable An underlying latent grouping might present itself as a multi-modal distribution for the continuous variable Height

Single continuous variable An underlying latent grouping might present itself as a multi-modal distribution for the continuous variable Height Females

Single continuous variable An underlying latent grouping might present itself as a multi-modal distribution for the continuous variable Height Males

Single continuous variable But the distance between modes may be small or even non-existent Depends on the variation in the item being measured and also the sample in which the measurement is taken (e.g. clinical or general population)

Single continuous variable Figure taken from: Muthén, B. (2001). Latent variable mixture modeling. In G. A. Marcoulides & R. E. Schumacker (eds.), New Developments and Techniques in Structural Equation Modeling (pp. 1-33). Lawrence Erlbaum Associates.

Single continuous variable Figure taken from: Muthén, B. (2001). Latent variable mixture modeling. In G. A. Marcoulides & R. E. Schumacker (eds.), New Developments and Techniques in Structural Equation Modeling (pp. 1-33). Lawrence Erlbaum Associates.

Single continuous variable We assume that the manifest variable is normally distributed within each latent class

GHQ Example Data: File is "ego_ghq12_id.dta.dat" ; Define: sumodd = ghq01 +ghq03 +ghq05 +ghq07 +ghq09 +ghq11; sumeven = ghq02 +ghq04 +ghq06 +ghq08 +ghq10 +ghq12; ghq_sum = sumodd + sumeven; Variable: Names are ghq01 ghq02 ghq03 ghq04 ghq05 ghq06 ghq07 ghq08 ghq09 ghq10 ghq11 ghq12 f1 id; Missing are all (-9999) ; usevariables = ghq_sum; Analysis: Type = basic ; plot: type is plot3; Here we derive a single sum-score from the 12 ordinal GHQ items The syntax shows that variables can be created in the define statement which are not then used in the final model

Examine the distribution of the scale Scale appears unimodal, although there is a long upper-tail

Examine the distribution of the scale By changing from the default number of bins we see secondary modes appearing

Fit a 2-class mixture Variable: classes = c(2); Analysis: type = mixture ; proc = 2 (starts); starts = ; stiterations = 20; stscale = 15; model: %overall% %c#1% [ghq_sum]; ghq_sum (equal_var); %c#2% [ghq_sum]; ghq_sum (equal_var);

Fit a 2-class mixture Variable: classes = c(2); Analysis: type = mixture ; proc = 2 (starts); starts = ; stiterations = 20; stscale = 15; model: %overall% %c#1% [ghq_sum]; ghq_sum (equal_var); %c#2% [ghq_sum]; ghq_sum (equal_var); This funny set of symbols refers to the first class Means are referred to using square brackets. Variances are bracket-less. Here we have constrained the variances to be equal between classes by having the same bit of text in brackets at the end of the two variance lines Means will be freely estimated.

Model results TESTS OF MODEL FIT Loglikelihood H0 Value H0 Scaling Correction Factor for MLR Information Criteria Number of Free Parameters 4 Akaike (AIC) Bayesian (BIC) Sample-Size Adjusted BIC (n* = (n + 2) / 24)

Model results FINAL CLASS COUNTS AND PROPORTIONS FOR THE LATENT CLASSES BASED ON THE ESTIMATED MODEL Latent classes CLASSIFICATION QUALITY Entropy CLASSIFICATION OF INDIVIDUALS BASED ON THEIR MOST LIKELY LATENT CLASS MEMBERSHIP Latent classes Average Latent Class Probabilities for Most Likely Latent Class Membership (Row) by Latent Class (Column) Entropy is high A smaller class of 18% has emerged, consistent with the expected behaviour of the GHQ in this sample from primary care Class-specific entropies are both good

Model results Two-Tailed Estimate S.E. Est./S.E. P-Value Latent Class 1 Means GHQ_SUM Variances GHQ_SUM Latent Class 2 Means GHQ_SUM Variances GHQ_SUM Categorical Latent Variables Means C# Huge separation in means since SD = 4.3 (i.e. sqrt(18.88))

Examine within-class distributions

What have we done? We have effectively done a t-test backwards. Rather than obtaining a manifest binary variable, assuming equality of variances and testing for equality of means we have derived a latent binary variable based on the assumption of a difference in means (still with equal variance)

What next? The bulk of the sample now falls into a class with a GHQ distribution which is more symmetric than the sample as a whole There appear to be additional modes within the smaller class The ‘optimal’ number of classes can be assessed in the usual way using aBIC, entropy and the bootstrap LRT. In the univariate case, residual correlations are not an issue, but when moving to a multivariate example, these too will need to be assessed.

What next? As before, posterior probabilities can be exported and modelled in a weighted regression analysis A logistic regression analysis using a latent binary variable derived from the data is likely to be far more informative than a linear-regression analysis using the manifest continuous variable

What if we had not constrained the variances? Variable: classes = c(2); Analysis: type = mixture ; proc = 2 (starts); starts = ; stiterations = 20; stscale = 15; model: %overall% %c#1% [ghq_sum]; ghq_sum ; ! (equal_var); %c#2% [ghq_sum]; ghq_sum ; ! (equal_var); Commented out

TESTS OF MODEL FIT Loglikelihood H0 Value H0 Scaling Correction Factor for MLR Information Criteria Number of Free Parameters 5 Akaike (AIC) Bayesian (BIC) Sample-Size Adjusted BIC FINAL CLASS COUNTS AND PROPORTIONS FOR THE LATENT CLASSES BASED ON THE ESTIMATED MODEL Latent Classes CLASSIFICATION QUALITY Entropy Entropy is poor Classes are more equal BIC is lower!

Means are closer, variances differ greatly MODEL RESULTS Two-Tailed Estimate S.E. Est./S.E. P-Value Latent Class 1 Means GHQ_SUM Variances GHQ_SUM Latent Class 2 Means GHQ_SUM Variances GHQ_SUM

Distribution far from symmetric