Download presentation
Presentation is loading. Please wait.
Published byShon Carroll Modified over 9 years ago
1
1 SEM for small samples Michel Tenenhaus ESSEC-HEC Research Workshop Series on “PLS (Partial Least Squares) Developments”
2
2 Orange juice example (J. Pagès) X 1 = Physico-chemical, X 2 = Sensorial, X = [X 1, X 2 ], Y = Hedonic
3
3 Structural Equation Modeling The PLS approach of Herman WOLD Study of a system of linear relationships between latent variables. Each latent variable is described by a set of manifest variables, or summarizes them. Variables can be numerical, ordinal or nominal (no need for normality assumptions). The number of observations can be small compare to the number of variables.
4
4 Orange juice example on a homogenous group of judges Glucose Fructose Saccharose Sweetening power pH before processing pH after centrifugation Titer Citric acid Vitamin C Smell intensity Odor typicity Pulp Taste intensity Acidity Bitterness Sweetness 11 11 22 Judge 2 Judge 3 Judge 96 Physico-chemical Sensorial Manifest variable Endogenous latent variable Hedonic Exogenous latent variable Measurement modelStructural model w 11 w 12 w 19 21 11 22 w 21 w 22 w 27 w 32 w 33 w 396
5
5 A SEM tree Chatelin-Esposito Vinzi Fahmy-Jäger-Tenenhaus XLSTAT-PLSPM (2007) W. Chin PLS-Graph Herman Wold NIPALS (1966) PLS approach (1975) J.-B. Lohmöller LVPLS 1.8 (1984) SEM Component-based SEM (Score computation) Covariance-based SEM (CSA) (Model estimation/validation) H. Hwang Y. Takane GSCA (2004) H. Hwang VisualGSCA 1.0 (2007) For good blocks (High Cronbach ): - Score = 1st PC - Score = MV’s For good blocks, all methods give almost the same results. AMOS 6.0, 2007 Score computed for each block using MV loadings Path analysis on the structural model defined on the scores
6
6 When all blocks are good, all the methods give practically the same results: M. Tenenhaus : Component-based SEM Total Quality Management, 2008 For all data, PLS and SEM yield to highly correlated LV scores: M. Tenenhaus : SEM for small samples HEC Working paper, 2008. Results Data structures are stronger than statistical methods.
7
7 PLS algorithm (Mode A, Centroid scheme) Glucose Fructose Saccharose Sweetening power pH before processing pH after centrifugation Titer Citric acid Vitamin C Smell intensity Odor typicity Pulp Taste intensity Acidity Bitterness Sweetness 11 11 22 Juge 2 Juge 3 Juge 96 21 11 22 w 11 w 12 w 19 w 21 w 22 w 27 w 32 w 33 w 396 Y 1 =X 1 w 1 (outer estimate) Y 2 =X 2 w 2 Y 3 =X 3 w 3 Z 1 =Y 2 +Y 3 (inner estimate) Z 2 =Y 1 +Y 3 Z 3 =Y 1 +Y 2 w 11 = Cor(glucose,Z 1 ) w 12 = Cor(fructose,Z 1 ) w 19 = Cor(vitamin C,Z 1 ) w 21 = Cor(smell int.,Z 2 ) w 22 = Cor(odor typ.,Z 2 ) w 27 = Cor(Sweetness,Z 2 ) w 32 = Cor(judge2,Z 3 ) w 33 = Cor(judge3,Z 3 ) w 3,96 = Cor(judge96,Z 3 ) Iterate until convergence.
8
8 SPECIAL CASES OF PLS PATH MODELLING Principal component analysis Multiple factor analysis Canonical correlation analysis Redundancy analysis PLS regression Generalized canonical correlation analysis (Horst) Generalized canonical correlation analysis(Carroll) Analyse de la co-inertie multiple (Chessel & Hanafi) etc.…
9
9 Use of XLSTAT-PLSPM
10
10 Outer weight w Non significant variables are in red
11
11 Outer weight w
12
12 Correlation MV-LV
13
13 Correlation MV-LV
14
14 Use of XLSTAT-PLSPM Latent variables =========================================================== Physico-chimical Sensorial Hedonic ----------------------------------------------------------- Fruivita refr. 0.917 0.964 1.253 Tropicana refr. 0.630 1.378 0.946 Tropicana r.t. 1.120 0.462 0.742 ----------------------------------------------------------- Pampryl refr. -0.176 -0.570 -0.747 Joker r.t. -1.680 -0.852 -0.991 Pampryl r.t. -0.810 -1.381 -1.203 ===========================================================
15
15 Use of XLSTAT-PLSPM
16
16 Model estimation by PLS : Inner model and correlations Glucose 1 2 3 Judge 2 Judge 3 Judge 96 Fructose Saccharose Sweetening power pH before processing pH aftercentrifugation Titer Vitamin C Citric acid Smell intensity Odor typicity Pulp Taste intensity Acidity Bitterness Sweetness -.89.93.1.95.94 -. 97 -. 98. 98.41 -. 19.71 -.64 -.93 -.95.97.306 (t = 1.522).713 (t = 3.546) >0 R 2 = 0.96.820 (t = 2.864) Non significant variables are in red
17
17 Estimation of the inner model by PLS regression R 2 = 0.946 The correlation between the physico-chemical and the sensorial variables can be taken into account by using PLS regression: 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 Physico-chemical sensorial CoeffCS[1](hédonic) Validation of PLS regression by Jack-knife
18
18 Use of the PLS option of XLSTAT-PLSPM Physico-chemical has no direct effect on Hedonic, but a strong indirect effect.
19
19 Direct, indirect and total effects
20
20 Covariance-based Structural Equation Modeling Latent variables : Structural model (inner model) : Ici :
21
21 Structural Equation Modeling Measurement model (outer model) : VM VL VM Endogenous Exogenous
22
22 Structural Equation Modeling MV covariance matrix : Outer model Inner model Cov. for exo. LV Variance for structural residuals Variance for measurement residuals
23
23 Covariance-based SEM ULS algorithm (Unweighted Least Squares) : S = Observed covariance matrix for MV’s Goodness-of-fit Index (Jöreskog & Sorbum): Generalization of PCA
24
24 Use of AMOS 6.0 Method = ULS This is a computational trick: Residual variances are passed to errors and can always be computed afterwards. First Roderick McDonald’s idea (1996) Measurement residual variances are canceled:
25
25 Covariance-based SEM ULS algorithm with the McDonald’s constraints: S = Observed covariance matrix for MV Goodness-of-fit Index (Jöreskog & Sorbum):
26
26 Use of AMOS 6.0 - Method = ULS - Measurement residual variances = 0
27
27 Results Outer LV Estimates: 2 nd McDonald’s idea PLS estimate of LV: - Mode A - LV inner estimate = theoretical LV - LV inner estimate computation is useless. GFI =.903
28
28
29
29
30
30 Model estimation by SEM-ULS : Inner model and correlations Glucose 1 2 3 Judge 2, Judge 3, Judge 96 Fructose Saccharose Sweetening power pH before processing pH aftercentrifugation Titer Vitamin C Citric acid Smell intensity Odor typicity Pulp Taste intensity Acidity Bitterness Sweetness -.77 -.76.89.22 1 1.00 -. 87 -. 88. 94.26 -. 08.66 -.56 -.94 -.97 1.22 (P =.35).64 (P =.05) >0 R 2 = 0.96.79 (P =.01) Non significant variables in red. Constraint weights in blue.
31
31 Use of SEM-ULS Latent variable estimates (Scores) Latent variables =========================================================== Physico-chemical Sensorial Hedonic ----------------------------------------------------------- Fruivita refr. 0.915 0.866 1.141 Tropicana refr. 0.526 1.270 0.868 Tropicana r.t. 0.832 0.422 0.672 ----------------------------------------------------------- Pampryl refr. -0.158 -0.526 -0.686 Joker r.t. -1.740 -0.774 -0.867 Pampryl r.t. -0.375 -1.258 -1.127 ===========================================================
32
32 Comparison between the PLS and SEM-ULS scores
33
33 Path analysis on scores with AMOS Bootstrap validation
34
34 Direct, indirect and total effects
35
35 When mode A is chosen, outer LV estimates using Covariance-based SEM (ULS or ML) or Component based SEM (PLS) are always very close. It is possible to mimic PLS with a covariance-based SEM software (McDonald,1996, Tenenhaus, 2001). Covariance-based SEM authorizes to implement constraints on the model parameters. This is impossible with PLS. Conclusion 1: SEM-ULS > PLS
36
36 When SEM-ULS does not converge or does not give an admissible solution, PLS is an attractive alternative. PLS offers many optimization criterions for the LV search (but rigorous proofs are still to be found). PLS still works when the number of MV is very high and the number of cases very small (for example 38 MV and 6 cases). PLS allows to use formative LV in a much easier way than SEM-ULS. Conclusion 2: PLS > SEM-ULS
37
37 Second particular case : Multi-block data analysis
38
Sensory analysis of 21 Loire Red Wines (J. Pagès) X 1 = Smell at rest, X 2 = View, X 3 = Smell after shaking, X 4 = Tasting X1X1 X2X2 X3X3 X4X4 3 Appellations4 Soils Illustrative variable 4 blocks of variables
39
PCA of each block: Correlation loadings
40
PCA of each block with AMOS: Correlation loadings GFI =.301
41
41 Multi-block data analysis = Confirmatory Factor Analysis VIEW SMELL AFTER SHAKING SMELL AT REST SMELL AT REST TASTING GFI =.849
42
42 First dimension Using MV with significant loadings
43
43 First global score GFI =.973 2nd order CFA
44
44 Validation of the first dimension Correlations 1.6211.865.7621.682.813.8951.813.920.942.944 Rest1 View Shaking1 Tasting1 Score1 Rest1ViewShaking1Tasting1
45
45 Second dimension
46
46 2 nd global score GFI =.905
47
47 Validation of the second dimension Correlations 1.7891.782.8031.944.904.928 Rest2 Shaking2 Tasting2 Score2 Rest2Shaking2Tasting2
48
48 Mapping of the correlations with the global scores Score 1 related with quality Score 2 unrelated with quality
49
49 Correlation with global quality New result: Not obtained with other multi-block data analysis methods, nor with factor analysis of the whole data set.
50
50 Wine visualization in the global score space Wines marked by Appellation
51
51 Wine visualization in the global score space Wines marked by Soil
52
DAM = Dampierre-sur-Loire
53
A soft, warm, blackberry nose. A good core of fruit on the palate with quite well worked tannin and acidity on the finish; Good length and a lot of potential. DECANTER (mai 1997) (DECANTER AWARD ***** : Outstanding quality, a virtually perfect example) Cuvée Lisagathe 1995
54
Final conclusion « All the proofs of a pudding are in the eating, not in the cooking ». William Camden (1623)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.