The Measurement and Analysis of Complex Traits Everything you didn’t want to know about measuring behavioral and psychological constructs Leuven Workshop.

Slides:



Advertisements
Similar presentations
The Simple Linear Regression Model Specification and Estimation Hill et al Chs 3 and 4.
Advertisements

Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
1 Regression as Moment Structure. 2 Regression Equation Y =  X + v Observable Variables Y z = X Moment matrix  YY  YX  =  YX  XX Moment structure.
StatisticalDesign&ModelsValidation. Introduction.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 13 Nonlinear and Multiple Regression.
Linkage Analysis: An Introduction Pak Sham Twin Workshop 2001.
Structural Equation Modeling
The Simple Linear Regression Model: Specification and Estimation
Chapter 10 Simple Regression.
Statistical Methods Chichang Jou Tamkang University.
PSYC512: Research Methods PSYC512: Research Methods Lecture 19 Brian P. Dyre University of Idaho.
Intro to Statistics for the Behavioral Sciences PSYC 1900
Raw data analysis S. Purcell & M. C. Neale Twin Workshop, IBG Colorado, March 2002.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Structural Equation Modeling Intro to SEM Psy 524 Ainsworth.
Classification and Prediction: Regression Analysis
Mixture Modeling Chongming Yang Research Support Center FHSS College.
Regression and Correlation Methods Judy Zhong Ph.D.
Xitao Fan, Ph.D. Chair Professor & Dean Faculty of Education University of Macau Designing Monte Carlo Simulation Studies.
Inference for regression - Simple linear regression
Introduction to Regression Analysis. Two Purposes Explanation –Explain (or account for) the variance in a variable (e.g., explain why children’s test.
Brain Mapping Unit The General Linear Model A Basic Introduction Roger Tait
Continuously moderated effects of A,C, and E in the twin design Conor V Dolan & Sanja Franić (BioPsy VU) Boulder Twin Workshop March 4, 2014 Based on PPTs.
Rasch trees: A new method for detecting differential item functioning in the Rasch model Carolin Strobl Julia Kopf Achim Zeileis.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
CJT 765: Structural Equation Modeling Class 8: Confirmatory Factory Analysis.
Linkage in selected samples Manuel Ferreira QIMR Boulder Advanced Course 2005.
Longitudinal Modeling Nathan Gillespie & Dorret Boomsma \\nathan\2008\Longitudinal neuro_f_chol.mx neuro_f_simplex.mx jepq6.dat.
Measurement Models: Exploratory and Confirmatory Factor Analysis James G. Anderson, Ph.D. Purdue University.
CJT 765: Structural Equation Modeling Class 12: Wrap Up: Latent Growth Models, Pitfalls, Critique and Future Directions for SEM.
Spatial Analysis & Geostatistics Methods of Interpolation Linear interpolation using an equation to compute z at any point on a triangle.
G Lecture 7 Confirmatory Factor Analysis
G Lecture 81 Comparing Measurement Models across Groups Reducing Bias with Hybrid Models Setting the Scale of Latent Variables Thinking about Hybrid.
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
N318b Winter 2002 Nursing Statistics Specific statistical tests Chi-square (  2 ) Lecture 7.
CJT 765: Structural Equation Modeling Class 8: Confirmatory Factory Analysis.
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
Lecture 1: Basic Statistical Tools. A random variable (RV) = outcome (realization) not a set value, but rather drawn from some probability distribution.
Mx modeling of methylation data: twin correlations [means, SD, correlation] ACE / ADE latent factor model regression [sex and age] genetic association.
4 basic analytical tasks in statistics: 1)Comparing scores across groups  look for differences in means 2)Cross-tabulating categoric variables  look.
Powerful Regression-based Quantitative Trait Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.
Review of statistical modeling and probability theory Alan Moses ML4bio.
David M. Evans Multivariate QTL Linkage Analysis Queensland Institute of Medical Research Brisbane Australia Twin Workshop Boulder 2003.
Assumptions of Multiple Regression 1. Form of Relationship: –linear vs nonlinear –Main effects vs interaction effects 2. All relevant variables present.
Categorical Data Frühling Rijsdijk 1 & Caroline van Baal 2 1 IoP, London 2 Vrije Universiteit, A’dam Twin Workshop, Boulder Tuesday March 2, 2004.
Nonparametric Statistics
CJT 765: Structural Equation Modeling Final Lecture: Multiple-Group Models, a Word about Latent Growth Models, Pitfalls, Critique and Future Directions.
Computacion Inteligente Least-Square Methods for System Identification.
Developmental Models/ Longitudinal Data Analysis Danielle Dick & Nathan Gillespie Boulder, March 2006.
QTL Mapping Using Mx Michael C Neale Virginia Institute for Psychiatric and Behavioral Genetics Virginia Commonwealth University.
Statistical principles: the normal distribution and methods of testing Or, “Explaining the arrangement of things”
Classical Test Theory Psych DeShon. Big Picture To make good decisions, you must know how much error is in the data upon which the decisions are.
IRT Equating Kolen & Brennan, 2004 & 2014 EPSY
Nonparametric Statistics
Boulder Colorado Workshop March
HGEN Thanks to Fruhling Rijsdijk
Structural Equation Modeling using MPlus
Probability Theory and Parameter Estimation I
CH 5: Multivariate Methods
Introduction to Linkage and Association for Quantitative Traits
Statistical Methods For Engineers
Nonparametric Statistics
Linkage in Selected Samples
Structural Equation Modeling
Presentation transcript:

The Measurement and Analysis of Complex Traits Everything you didn’t want to know about measuring behavioral and psychological constructs Leuven Workshop August 2008

Overview SEM factor model basics Group differences: - practical Relative merits of factor scores & sum scores Test for normal distribution of factor Alternatives to the factor model Extensions for multivariate linkage & association

Structural Equation Model basics Two kinds of relationships –Linear regression X -> Y single-headed –Unspecified Covariance X Y double-headed Four kinds of variable –Squares – observed variables –Circles – latent, not observed variables –Triangles – constant (zero variance) for specifying means –Diamonds -- observed variables used as moderators (on paths)

Single Factor Model 1.00 F S1S2S3Sm l1l2l3 lm e1e2e3e4

Factor Model with Means 1.00 F1 S1S2S3Sm l1l2l3 lm e1e2e3e4 F2 MF B8 mF1mF2 mS1 mS2mS3 mSm 1.00

Factor model essentials Diagram translates directly to algebraic formulae Factor typically assumed to be normally distributed: SEM Error variance is typically assumed to be normal as well May be applied to binary or ordinal data –Threshold model

What is the best way to measure factors? Use a sum score Use a factor score Use neither - model-fit

Factor Score Estimation Formulae for continuous case –Thompson 1951 (Regression method) –C = LL’ + V –f = (I+J) -1 L’V -1 x –Where J = L’V -1 L

Factor Score Estimation Formulae for continuous case –Bartlett 1938 –C = LL’ + V –f b = J -1 L’V -1 x –where J = L’V -1 L Neither is suitable for ordinal data

Estimate factor score by ML M1M2M3 M4 M5 M6 F1 l6l6 l1l1 Want ML estimate of this

ML Factor Score Estimation Marginal approach L(f&x) = L(f)L(x|f) (1) L(f) = pdf(f) L(x|f) = pdf(x*) x* ~ N(V,Lf) Maximize (1) with respect to f Repeat for all subjects in sample –Works for ordinal data too!

Multifactorial Threshold Model Normal distribution of liability x. ‘Yes’ when liability x > t x  t

Item Response Theory - Factor model equivalence Normal Ogive IRT Model Normal Theory Threshold Factor Model Takane & DeLeeuw (1987 Psychometrika) –Same fit –Can transform parameters from one to the other

Item Response Probability Example item response probability shown in white  Response Probability 1.5.0

Do groups differ on a measure? Observed –Function of observed categorical variable (sex) –Function of observed continuous variable (age) Latent –Function of unobserved variable –Usually categorical –Estimate of class membership probability Has statistical issues with LRT

Practical: Find the Difference(s) Item 1Item 2 Item Factor l1l2l3 1 Mean r1r2r3 1 mean1mean2mean3 Item 1Item 2 Item 3 Variance Factor l1l2l3 1 Mean r1r2r3 1 mean1mean2mean3

Sequence of MNI testing 1. Model fx of covariates on factor mean & variance 2. Model fx of covariates on factor loadings & thresholds 2. Identify which loadings & thresholds are non-invariant 1 beats 2? Yes Measurement invariance: Sum* or ML scores No 3. Revise scale MNI: Compute ML factor scores using covariates * If factor loadings equal

Continuous Age as a Moderator in the Factor Model Stims TranqMJ Factor l1 l2 l r1r2r3 1 mean1 mean2 mean3 Z4 j Age 1.00 L O5 b Age d V 1.00 W5 k b - Factor Variance d - Factor Mean j - Factor Loadings k - Item means v - Item variances [dk and bj confound]

What is the best way to measure and model variation in my trait? Behavioral / Psychological characteristics usually Likert –Might use ipsative? What if Measurement Invariance does not hold? –How do we judge: Development GxE interaction Sex limitation Start simple: Finding group differences in mean

Simulation Study (MK) Generate True factor score f ~ N(0,1) Generate Item Errors e j ~ N(0,1) Obtain vector of j item scores s j = L*f j + e j Repeat N times to obtain sample Compute sum score Estimate factor score by ML

Two measures of performance Reliability

Two measures of performance Validity

Simulation parameters 10 binary item scale Thresholds –[ ] Factor Loadings – [ ]

Mess up measurement parameters Randomly reorder thresholds Randomly reorder factor loadings Blend reordered estimates with originals 0% - 100% ‘doses’

Measurement non-invariance Which works better: ML or Sum score? Three tests: –SEM - Likelihood ratio test difference in latent factor mean –ML Factor score t-test –Sum score t-test

MNI figures

More Factors: Common Pathway Model

More Factors: Independent Pathway Model = 1 Independent pathway model is submodel of 3 factor common pathway model

Example: Fat MZT MZA

Results of fitting twin model

ML A, C, E or P Factor Scores Compute joint likelihood of data and factor scores –p(FS,Items) = p(Items|FS)*p(FS) –works for non-normal FS distribution Step 1: Estimate parameters of (CP/IP) (Moderated) Factor Model Step 2: Maximize likelihood of factor scores for each (family’s) vector of observed scores –Plug in estimates from Step 1

Business end of FS script

The guts of it ! Residuals only

Shell script to FS everyone

Central Limit Theorem Additive effects of many small factors 1 Gene  3 Genotypes  3 Phenotypes 2 Genes  9 Genotypes  5 Phenotypes 3 Genes  27 Genotypes  7 Phenotypes 4 Genes  81 Genotypes  9 Phenotypes

Measurement artifacts Few binary items Most items rarely endorsed (floor effect) Most items usually endorsed (ceiling effect) Items more sensitive at some parts of distribution Non-linear models of item-trait relationship

Assessing the distribution of latent trait Schmitt et al 2006 MBR method N-variate binary item data have 2 N possible patterns Normal theory factor model predicts pattern frequencies –E.g., high factor loadings but different thresholds – – but would be uncommon – – – item threshold

Latent Trait (Factor) Model M1M2M3 M4 M5 M6 F1 l6l6 l1l1 Discrimination Difficulty Use Gaussian quadrature weights to integrate over factor; then relax constraints on weights

Latent Trait (Factor) Model M1M2M3 M4 M5 M6 F1 l6l6 l1l1 Discrimination Difficulty Difference in model fit: LRT~  2

Chi-squared test for non-normality performs well

Detecting latent heterogeneity Scatterplot of 2 classes S1 S2 Mean S1|c1 Mean S1|c2 Mean S2|c1Mean S2|c2

Scatterplot of 2 classes Closer means S1 S2 Mean S1|c1 Mean S1|c2 Mean S2|c1Mean S2|c2

Scatterplot of 2 classes Latent heterogeneity: Factors or classes? S1 S2

Latent Profile Model Class 1: p Class 2: (1-p) Class Membership probability

Factor Mixture Model 1.00 F S1S2S3Sm l1l2l3 lm e1e2e3e F S1S2S3Sm l1l2l3 lm e1e2e3e4 Class 1: p Class 2: (1- p) Class Membership probability NB means omitted

Classes or Traits? A Simulation Study Generate data under: –Latent class models –Latent trait models –Factor mixture models Fit above 3 models to find best-fitting model –Vary number of factors –Vary number of classes See Lubke & Neale Multiv Behav Res (2007 & In press)

What to do about conditional data Two things –Different base rates of “Stem” item –Different correlation between Stem and “Probe” items Use data collected from relatives

Data from Relatives: Likely failure of conditional independence STEM P2P3 P4 P5 P6 F1 l6l6 l1l1 STEM P2P3 P4 P5 P6 F2 l6l6 l1l1 R > 0

Series of bivariate integrals m/2 t1 i t2 i Π (    (x 1, x 2 ) dx 1 dx 2 ) j j=1 t1 t2 i-1 i-1 Can work with p-variate integration, best if p<m “Generalized MML” built into Mx

Dependence 1 Did your use of it cause you physical problems or make you depressed or very nervous?

Extensions to More Complex Applications Endophenotypes Linkage Analysis Association Analysis

Basic Linkage (QTL) Model Q: QTL Additive Genetic F: Family Environment E: Random Environment 3 estimated parameters: q, f and e Every sibship may have different model PP  = p(IBD=2) +.5 p(IBD=1) F1 P1 F2 P Q1 1 Q2 1 E2 1 E1 Pihat efqqfe 

Measurement Linkage (QTL) Model Q: QTL Additive Genetic F: Family Environment E: Random Environment 3 estimated parameters: q, f and e Every sibship may have different model PP  = p(IBD=2) +.5 p(IBD=1)  P1 F2 P Q2 1 E2 Pihat F1 11 Q1 1 E1 efq qfe M1M2M3 M4 M5 M6 l6l6 l1l1 M1M2M3 M4 M5 M6 l6l6 l1l1 F1 11 Q1Q1 1 E1E1 efq

Fulker Association Model Multilevel model for the means LDL1LDL2 M G1G2 SD BW w RR C b Geno1 Geno2 mm

Measurement Fulker Association Model (SM) M G1G2 SD BW w b Geno1 Geno2 mm F2 M1M2M3 M4 M5 M6 l6l6 l1l1 M1M2M3 M4 M5 M6 l6l6 l1l1 F1 M G1G1 G2G2 SD BW w b Gen o1 Gen o2 mm

Multivariate Linkage & Association Analyses Computationally burdensome Distribution of test statistics questionable Permutation testing possible –Even heavier burden Potential to refine both assessment and genetic models Lots of long & wide datasets on the way –Dense repeated measures EMA –fMRI –Need to improve software! Open source Mx