1 SEM for small samples Michel Tenenhaus ESSEC-HEC Research Workshop Series on “PLS (Partial Least Squares) Developments”

Slides:



Advertisements
Similar presentations
PLS path modeling and Regularized Generalized Canonical Correlation Analysis for multi-block data analysis Michel Tenenhaus.
Advertisements

1 Michel Tenenhaus & Carlo Lauro A SEM approach for composite indicators building Michel Tenenhaus & Carlo Lauro.
Writing up results from Structural Equation Models
1 Regression as Moment Structure. 2 Regression Equation Y =  X + v Observable Variables Y z = X Moment matrix  YY  YX  =  YX  XX Moment structure.
Structural Equation Modeling. What is SEM Swiss Army Knife of Statistics Can replicate virtually any model from “canned” stats packages (some limitations.
StatisticalDesign&ModelsValidation. Introduction.
Structural Equation Modeling Using Mplus Chongming Yang Research Support Center FHSS College.
Structural Equation Modeling
Benefits Key Features and Results. XLSTAT-ADA’s functions.
Regression analysis Relating two data matrices/tables to each other Purpose: prediction and interpretation Y-data X-data.
Mutidimensional Data Analysis Growth of big databases requires important data processing.  Need for having methods allowing to extract this information.
Confirmatory Factor Analysis
Sakesan Tongkhambanchong, Ph.D.(Applied Behavioral Science Research) Faculty of Education, Burapha University.
Component based SEM Comparison between various methods
FTP Biostatistics II Model parameter estimations: Confronting models with measurements.
« هو اللطیف » By : Atefe Malek. khatabi Spring 90.
UNITE DE SENSOMETRIE ET CHIMIOMETRIE How to use PLS path modeling for analyzing multiblock data sets Michel Tenenhaus Mohamed Hanafi Vincenzo Esposito.
Correlation & Regression Chapter 15. Correlation statistical technique that is used to measure and describe a relationship between two variables (X and.
1 A bridge between PLS path modeling and ULS-SEM Michel Tenenhaus.
Structural Equation Modeling
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 14 Using Multivariate Design and Analysis.
1-1 Regression Models  Population Deterministic Regression Model Y i =  0 +  1 X i u Y i only depends on the value of X i and no other factor can affect.
Multivariate Data Analysis Chapter 11 - Structural Equation Modeling.
CHAPTER 30 Structural Equation Modeling From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach,
Topic 3: Regression.
TRICAP_06 Mohamed Hanafi PLS PATH MODELLING : Computation of latent variables with the estimation mode B UNITE DE SENSOMETRIE ET CHIMIOMETRIE Nantes-France.
Structural Equation Modeling Intro to SEM Psy 524 Ainsworth.
Simple Linear Regression Analysis
Structural Equation Modeling Continued: Lecture 2 Psy 524 Ainsworth.
Factor Analysis Psy 524 Ainsworth.
Confirmatory factor analysis
Factor Analysis and Inference for Structured Covariance Matrices
ANCOVA Lecture 9 Andrew Ainsworth. What is ANCOVA?
PLS-SEM: Introduction and Overview
Key Features and Results
Kayla Jordan D. Wayne Mitchell RStats Institute Missouri State University.
Structural Equation Modeling Made Easy A Tutorial Based on a Behavioral Study of Communication in Virtual Teams Using WarpPLS Ned Kock.
A criterion-based PLS approach to SEM Michel Tenenhaus (HEC Paris) Arthur Tenenhaus (SUPELEC)
CJT 765: Structural Equation Modeling Class 7: fitting a model, fit indices, comparingmodels, statistical power.
SEM: Basics Byrne Chapter 1 Tabachnick SEM
Estimation Kline Chapter 7 (skip , appendices)
CJT 765: Structural Equation Modeling Class 10: Non-recursive Models.
CJT 765: Structural Equation Modeling Class 8: Confirmatory Factory Analysis.
Controlling for Common Method Variance in PLS Analysis: The Measured Latent Marker Variable Approach Wynne W. Chin Jason Bennett Thatcher Ryan T. Wright.
Advanced Statistics Factor Analysis, II. Last lecture 1. What causes what, ξ → Xs, Xs→ ξ ? 2. Do we explore the relation of Xs to ξs, or do we test (try.
Measurement Models: Identification and Estimation James G. Anderson, Ph.D. Purdue University.
G Lecture 81 Comparing Measurement Models across Groups Reducing Bias with Hybrid Models Setting the Scale of Latent Variables Thinking about Hybrid.
CJT 765: Structural Equation Modeling Class 8: Confirmatory Factory Analysis.
Estimation Kline Chapter 7 (skip , appendices)
September 28, 2000 Improved Simultaneous Data Reconciliation, Bias Detection and Identification Using Mixed Integer Optimization Methods Presented by:
Evaluation of structural equation models Hans Baumgartner Penn State University.
The Instrumental Variables Estimator The instrumental variables (IV) estimator is an alternative to Ordinary Least Squares (OLS) which generates consistent.
SEM with AMOS1 Christian A. Klöckner. 2 Agenda Expectations What is Structural Equation Modelling? What is AMOS? A simple regression A multiple regression.
Chapter 14 EXPLORATORY FACTOR ANALYSIS. Exploratory Factor Analysis  Statistical technique for dealing with multiple variables  Many variables are reduced.
Chapter 17 STRUCTURAL EQUATION MODELING. Structural Equation Modeling (SEM)  Relatively new statistical technique used to test theoretical or causal.
Chapter 12 REGRESSION DIAGNOSTICS AND CANONICAL CORRELATION.
Structural Equation Modeling using MPlus
Project 5 Data Mining & Structural Equation Modeling
CJT 765: Structural Equation Modeling
IS6000 – Class 10 Introduction to SmartPLS (&SPSS)
Probabilistic Models with Latent Variables
The regression model in matrix form
Linear Model Selection and regularization
Structural Equation Modeling
SOC 681 – Causal Models with Directly Observed Variables
3.2. SIMPLE LINEAR REGRESSION
Causal Relationships with measurement error in the data
Factor Analysis.
Structural Equation Modeling
Presentation transcript:

1 SEM for small samples Michel Tenenhaus ESSEC-HEC Research Workshop Series on “PLS (Partial Least Squares) Developments”

2 Orange juice example (J. Pagès) X 1 = Physico-chemical, X 2 = Sensorial, X = [X 1, X 2 ], Y = Hedonic

3 Structural Equation Modeling The PLS approach of Herman WOLD Study of a system of linear relationships between latent variables. Each latent variable is described by a set of manifest variables, or summarizes them. Variables can be numerical, ordinal or nominal (no need for normality assumptions). The number of observations can be small compare to the number of variables.

4 Orange juice example on a homogenous group of judges Glucose Fructose Saccharose Sweetening power pH before processing pH after centrifugation Titer Citric acid Vitamin C Smell intensity Odor typicity Pulp Taste intensity Acidity Bitterness Sweetness 11 11 22 Judge 2 Judge 3 Judge 96 Physico-chemical Sensorial Manifest variable Endogenous latent variable Hedonic Exogenous latent variable Measurement modelStructural model w 11 w 12 w 19  21 11  22 w 21 w 22 w 27 w 32 w 33 w 396

5 A SEM tree Chatelin-Esposito Vinzi Fahmy-Jäger-Tenenhaus XLSTAT-PLSPM (2007) W. Chin PLS-Graph Herman Wold NIPALS (1966) PLS approach (1975) J.-B. Lohmöller LVPLS 1.8 (1984) SEM Component-based SEM (Score computation) Covariance-based SEM (CSA) (Model estimation/validation) H. Hwang Y. Takane GSCA (2004) H. Hwang VisualGSCA 1.0 (2007) For good blocks (High Cronbach  ): - Score = 1st PC - Score =  MV’s For good blocks, all methods give almost the same results. AMOS 6.0, 2007 Score computed for each block using MV loadings Path analysis on the structural model defined on the scores

6 When all blocks are good, all the methods give practically the same results: M. Tenenhaus : Component-based SEM Total Quality Management, 2008 For all data, PLS and SEM yield to highly correlated LV scores: M. Tenenhaus : SEM for small samples HEC Working paper, Results Data structures are stronger than statistical methods.

7 PLS algorithm (Mode A, Centroid scheme) Glucose Fructose Saccharose Sweetening power pH before processing pH after centrifugation Titer Citric acid Vitamin C Smell intensity Odor typicity Pulp Taste intensity Acidity Bitterness Sweetness 11 11 22 Juge 2 Juge 3 Juge 96  21 11  22 w 11 w 12 w 19 w 21 w 22 w 27 w 32 w 33 w 396 Y 1 =X 1 w 1 (outer estimate) Y 2 =X 2 w 2 Y 3 =X 3 w 3 Z 1 =Y 2 +Y 3 (inner estimate) Z 2 =Y 1 +Y 3 Z 3 =Y 1 +Y 2 w 11 = Cor(glucose,Z 1 ) w 12 = Cor(fructose,Z 1 ) w 19 = Cor(vitamin C,Z 1 ) w 21 = Cor(smell int.,Z 2 ) w 22 = Cor(odor typ.,Z 2 ) w 27 = Cor(Sweetness,Z 2 ) w 32 = Cor(judge2,Z 3 ) w 33 = Cor(judge3,Z 3 ) w 3,96 = Cor(judge96,Z 3 ) Iterate until convergence.

8 SPECIAL CASES OF PLS PATH MODELLING Principal component analysis Multiple factor analysis Canonical correlation analysis Redundancy analysis PLS regression Generalized canonical correlation analysis (Horst) Generalized canonical correlation analysis(Carroll) Analyse de la co-inertie multiple (Chessel & Hanafi) etc.…

9 Use of XLSTAT-PLSPM

10 Outer weight w Non significant variables are in red

11 Outer weight w

12 Correlation MV-LV

13 Correlation MV-LV

14 Use of XLSTAT-PLSPM Latent variables =========================================================== Physico-chimical Sensorial Hedonic Fruivita refr Tropicana refr Tropicana r.t Pampryl refr Joker r.t Pampryl r.t ===========================================================

15 Use of XLSTAT-PLSPM

16 Model estimation by PLS : Inner model and correlations Glucose  1  2  3 Judge 2 Judge 3 Judge 96  Fructose Saccharose Sweetening power pH before processing pH aftercentrifugation Titer Vitamin C Citric acid Smell intensity Odor typicity Pulp Taste intensity Acidity Bitterness Sweetness (t = 1.522).713 (t = 3.546) >0 R 2 = (t = 2.864) Non significant variables are in red

17 Estimation of the inner model by PLS regression R 2 = The correlation between the physico-chemical and the sensorial variables can be taken into account by using PLS regression: Physico-chemical sensorial CoeffCS[1](hédonic) Validation of PLS regression by Jack-knife

18 Use of the PLS option of XLSTAT-PLSPM Physico-chemical has no direct effect on Hedonic, but a strong indirect effect.

19 Direct, indirect and total effects

20 Covariance-based Structural Equation Modeling Latent variables : Structural model (inner model) : Ici :

21 Structural Equation Modeling Measurement model (outer model) : VM VL VM Endogenous Exogenous

22 Structural Equation Modeling MV covariance matrix : Outer model Inner model Cov. for exo. LV Variance for structural residuals Variance for measurement residuals

23 Covariance-based SEM ULS algorithm (Unweighted Least Squares) : S = Observed covariance matrix for MV’s Goodness-of-fit Index (Jöreskog & Sorbum): Generalization of PCA

24 Use of AMOS 6.0 Method = ULS This is a computational trick: Residual variances are passed to errors and can always be computed afterwards. First Roderick McDonald’s idea (1996) Measurement residual variances are canceled:

25 Covariance-based SEM ULS algorithm with the McDonald’s constraints: S = Observed covariance matrix for MV Goodness-of-fit Index (Jöreskog & Sorbum):

26 Use of AMOS Method = ULS - Measurement residual variances = 0

27 Results Outer LV Estimates: 2 nd McDonald’s idea PLS estimate of LV: - Mode A - LV inner estimate = theoretical LV - LV inner estimate computation is useless. GFI =.903

28

29

30 Model estimation by SEM-ULS : Inner model and correlations Glucose  1  2  3 Judge 2, Judge 3, Judge 96  Fructose Saccharose Sweetening power pH before processing pH aftercentrifugation Titer Vitamin C Citric acid Smell intensity Odor typicity Pulp Taste intensity Acidity Bitterness Sweetness (P =.35).64 (P =.05) >0 R 2 = (P =.01) Non significant variables in red. Constraint weights in blue.

31 Use of SEM-ULS Latent variable estimates (Scores) Latent variables =========================================================== Physico-chemical Sensorial Hedonic Fruivita refr Tropicana refr Tropicana r.t Pampryl refr Joker r.t Pampryl r.t ===========================================================

32 Comparison between the PLS and SEM-ULS scores

33 Path analysis on scores with AMOS Bootstrap validation

34 Direct, indirect and total effects

35 When mode A is chosen, outer LV estimates using Covariance-based SEM (ULS or ML) or Component based SEM (PLS) are always very close. It is possible to mimic PLS with a covariance-based SEM software (McDonald,1996, Tenenhaus, 2001). Covariance-based SEM authorizes to implement constraints on the model parameters. This is impossible with PLS. Conclusion 1: SEM-ULS > PLS

36 When SEM-ULS does not converge or does not give an admissible solution, PLS is an attractive alternative. PLS offers many optimization criterions for the LV search (but rigorous proofs are still to be found). PLS still works when the number of MV is very high and the number of cases very small (for example 38 MV and 6 cases). PLS allows to use formative LV in a much easier way than SEM-ULS. Conclusion 2: PLS > SEM-ULS

37 Second particular case : Multi-block data analysis

Sensory analysis of 21 Loire Red Wines (J. Pagès) X 1 = Smell at rest, X 2 = View, X 3 = Smell after shaking, X 4 = Tasting X1X1 X2X2 X3X3 X4X4 3 Appellations4 Soils Illustrative variable 4 blocks of variables

PCA of each block: Correlation loadings

PCA of each block with AMOS: Correlation loadings GFI =.301

41 Multi-block data analysis = Confirmatory Factor Analysis VIEW SMELL AFTER SHAKING SMELL AT REST SMELL AT REST TASTING GFI =.849

42 First dimension Using MV with significant loadings

43 First global score GFI =.973 2nd order CFA

44 Validation of the first dimension Correlations Rest1 View Shaking1 Tasting1 Score1 Rest1ViewShaking1Tasting1

45 Second dimension

46 2 nd global score GFI =.905

47 Validation of the second dimension Correlations Rest2 Shaking2 Tasting2 Score2 Rest2Shaking2Tasting2

48 Mapping of the correlations with the global scores Score 1 related with quality Score 2 unrelated with quality

49 Correlation with global quality New result: Not obtained with other multi-block data analysis methods, nor with factor analysis of the whole data set.

50 Wine visualization in the global score space Wines marked by Appellation

51 Wine visualization in the global score space Wines marked by Soil

DAM = Dampierre-sur-Loire

A soft, warm, blackberry nose. A good core of fruit on the palate with quite well worked tannin and acidity on the finish; Good length and a lot of potential. DECANTER (mai 1997) (DECANTER AWARD ***** : Outstanding quality, a virtually perfect example) Cuvée Lisagathe 1995

Final conclusion « All the proofs of a pudding are in the eating, not in the cooking ». William Camden (1623)