Partial Least Squares Path Modeling: Past and Future CARMA Ed Rigdon, Georgia State University Oct. 9, 2015 1.

Partial Least Squares Path Modeling: Past and Future CARMA Ed Rigdon, Georgia State University Oct. 9, 2015 1

Partial least squares (PLS), like ordinary least squares (OLS) or maximum likelihood (ML), is a general approach to parameter estimation—path modeling is just one application Partial least squares (PLS), like ordinary least squares (OLS) or maximum likelihood (ML), is a general approach to parameter estimation—path modeling is just one application 2

Conceptual Variable Proxy Observed Variables Framework: Modeling unobserved conceptual variables 3 Mathematical Operations

The nominalist / naming fallacy Assuming that something labeled “X” is actually X Cliff (1983), de Leeuw (1984) 4

Conceptual Variable Proxy Observed Variables Framework: Modeling unobserved conceptual variables 5 Mathematical Operations

Conceptual Variable Proxy Observed Variables Realism vs operationalism 6 Mathematical Operations Realism: variable exists independently of operations and data Operationalism: operations define the variable

How does PLS path modeling work? 7

Structural model linking conceptual variables 8 A g1g1 g2g2 g3g3 B C D

Statistical model linking proxies PLS path models are almost always recursive 9 A* g1g1 g2g2 g3g3 B* C* D*

Observed variables divided into exclusive “blocks” 10 A* a1a2a3…a1a2a3… c 1 c 2 c 3 … g1g1 g2g2 g3g3 b1b2b3…b1b2b3… d 1 d 2 d 3 … B* C* D*

Alternating proxies stand in for conceptual variables 11

Outer proxy: weighted sum of the indicators for that block 12 A* a1a2a3…a1a2a3… c 1 c 2 c 3 … g1g1 g2g2 g3g3 b1b2b3…b1b2b3… d 1 d 2 d 3 … B* C* D*

Inner proxy: weighted sum of directly connected outer proxies 13 A* a1a2a3…a1a2a3… c 1 c 2 c 3 … g1g1 g2g2 g3g3 b1b2b3…b1b2b3… d 1 d 2 d 3 … B* C* D*

Inner weights and outer weights 14 A* a1a2a3…a1a2a3… c 1 c 2 c 3 … g1g1 g2g2 g3g3 Inner weights link proxies to other proxies Outer weights link proxies to associated observed variables b1b2b3…b1b2b3… d 1 d 2 d 3 … B* C* D*

Inner weights are estimated in regressions using outer proxies A*, B*, C*, D* 15

Outer weights are estimated using inner proxies, generally using either “Mode B” or “Mode A”. Each block must use one method or the other exclusively. 16

Mode B regresses each inner proxy on its associated indicators as a set 17 A* a1a1 a2a2 a3a3 eAeA

Mode A regresses each indicator, one at a time, on its associated inner proxy 18

This looks a bit like factor analysis, but: (1) estimation not simultaneous, (2) residuals not formally part of model, and essentially unconstrained 19 A* a1a1 e1e1 a2a2 e2e2 a3a3 e3e3

PLS routinely standardizes both indicators and structural variables: Unstandardized: high variance components dominate composites Standardized: high correlation components dominate composites 20

Standardization also means that single-predictor regressions can be reversed without changing the coefficients 21

Standardized, hence reversible... One at a time, so no correlation among predictors... Mode B vs Mode A is not “formative vs reflective,” but rather “regression weights vs. correlation weights” 22 Becker et al. (2013), Rigdon (2012)

23 OLS Regression weights Correlation weights Dana & Dawes (2004), Waller & Jones (2010)

OLS regression weights (Mode B) maximize in-sample R 2 s, but are only best out-of-sample when n and true predictability are high 24 Dana & Dawes (2004), Becker et al. (2013)

25 Dana & Dawes (2004)

And correlation weights avoid the surprises of “incorrectly” signed weights that can emerge due to collinearity 26

There are multiple schemes for forming inner proxies, too, but differences in results are minor 27

“Loadings” (zero order correlations between indicators and each structural variable) and structural path coefficients are by-products, not model parameters 29

Significance: Standard errors estimated via bootstrapping, to minimize distributional assumptions 30

History and Rationale Launched by econometrician Herman Wold in the 1970s at Wharton, inspired by student Karl Jöreskog’s factor-based innovations in the 1960s at Uppsala 31

THEN: why use PLS path modeling To approximate ML factor analysis Without ML’s sample size, distribution, and computing capacity requirements Sacrificing ML’s accuracy to get a more interactive modeling experience With easy real-world application 32

Guide & Ketokivi (2015): Journal of Operations Management desk-rejecting essentially all submissions that use PLS path modeling 33

Today: bad arguments for using PLS “Non normal data” “Exploratory” “Low sample size” 34

“My data are non-normal” Multinormality is an assumption of ML estimation, not factor-based SEM generally ML is robust against modest deviations Other estimators accommodate non-normality 35

“This is an exploratory study” All studies are, to some degree Formal model, hypotheses, instrument Contribution? Defies validation 36

“My sample size is low” Get more data Equal weights outperform at low n (Dana & Dawes 2004; Becker et al. 2013) 37

38 Dana & Dawes (2004)

Invalid arguments against using PLS path modeling Biased parameter estimates No overall fit test Not a latent variable method Doesn’t deal with measurement error 39

PLS yields biased estimates of factor model parameters (Outer) loadings over-estimated (Inner) path coefficients under-estimated 40

PLS yields consistent estimates of composite model parameters Becker et al. (2013) 41

42 Becker et al. 2013

PLS can’t test / falsify models in the same way as factor methods PLS path modeling lacks an overall fit statistic like factor-based SEM’s  2 That is true, but... what exactly does  2 tell us? 43

“If a value of  2 is obtained, which is large compared to the number of degrees of freedom, this is an indication that more information can be extracted from the data.” Jöreskog (1969, p. 201) 44

Conceptual Variable Proxy Observed Variables Does it matter if the observed variables contain additional information? 45 Mathematical Operations

PLS is not a latent variable method This depends on what “latent variable” means If it means “common factor,” that’s right, but so what? If it means “unobserved variable with causal influence,” that’s not right 46

Conceptual Variable Proxy Observed Variables A framework 47 Mathematical Operations

PLS does not account for measurement error Neither does factor analysis 48

Isn’t this measurement error? 49

Conceptual Variable Proxy Observed Variables 50 Mathematical Operations

Factor model residuals and “factor indeterminacy” 51

Rank of a covariance matrix With p observed variables... A covariance matrix has rank at most p The data contain at most p distinct dimensions of information (Mulaik 2010) 52

Rank: covariance matrix of F & E The factor model specifies With all F independent of all E The joint covariance matrix of F and E: Will generally have rank p + k 53

Factor indeterminacy Solve for F in the factor model: P is a weight matrix, closely tied to R 2 F,Y, R 2 for the factors predicted by all the observed variables in the model S is a set of arbitrary vectors, one per factor Guttman (1955); Schönemann and Steiger (1976) 54

S: arbitrary, not random Same variance as its associated factor F Orthogonal to every other variable within the model May be correlated (+ or -) with any variable outside the model Guttman (1955) Schönemann and Steiger (1976) 55

Conceptual Variable Proxy Observed Variables The conceptual variable is outside the statistical model 56 Mathematical Operations

Factor indeterminacy blurs the correlation between common factor and any outside variable. Clarity comes only from reducing factor indeterminacy 57 Steiger (1996)

Determinacy index: Guttman’s  min Minimum correlation between different, equally correct realizations of the same common factor in the same model 58

Values of Guttman’s  min Four common factors correlated 0.7, congeneric (actually, parallel) indicators 59 Indicators Per Factor 2468 Loading.5.12.38.50.58.6.32.54.64.70.7.49.67.75.80.8.65.79.85.88.9.82.90.93.95

Observed variable residuals in factor-based SEM are repackaged as factor indeterminacy, and continue to threaten the validity of inferences about conceptual variables. Observed variable residuals in factor-based SEM are repackaged as factor indeterminacy, and continue to threaten the validity of inferences about conceptual variables. 60

Summing up... 61

62 A g1g1 g2g2 g3g3 B C D Both approaches—factor-based and composite-based—can be used to model and learn about relations between conceptual variables...

... by building empirical proxies, formed out of data 63 A* a1a2a3…a1a2a3… c 1 c 2 c 3 … g1g1 g2g2 g3g3 b1b2b3…b1b2b3… d 1 d 2 d 3 … B* C* D*

Conceptual Variable Proxy Observed Variables A framework 64 Mathematical Operations (Un)reliability (In)validity

Thank you 65

References Becker, J.-M., Rai, A., Rigdon, E.E. (2013). Predictive validity and formative measurement in structural equation modeling: Embracing practical relevance. Thirty-Fourth International Conference on Information Systems. Cliff, N. 1983. Some cautions concerning the application of causal modeling methods. Multivar Behav Res, 18: 115-126. Dana, J., Dawes, R.M. (2004). The superiority of simple alternatives to regression for social science predictions. J Educ Behav Stat (29:3), 317-331. De Leeuw, Jan (1985). Reviews. Psychometrika (50:3), pp. 371-375. Guide, V.D.R., Ketokivi, M. (2015). Notes from the editors: Redefining some methodological criteria for the journal. J Oper Manag (37), v-viii. Guttman, L. (1955). The determinacy of factor score matrices with implications for five other basic problems of common-factor theory. Brit J Statist Psych (8:2), 65-81. Jöreskog, K. G. (1969). A general approach to maximum likelihood confirmatory factor analysis. Psychometrika (34: 2), 183-202. Mulaik, S.A. (2010). Foundations of Factor Analysis (2 nd ed.). Boca Raton, FL: Chapman & Hall / CRC. Rigdon, E.E, (2012). Rethinking partial least squares path modeling: In praise of simple methods. Long Range Plann (45:5-6), 341-358. Schönemann, P.H. Steiger, J.H. (1976). Regression component analysis. Brit J Math Statist Psych (29:2), 175-189. Steiger, J.H. (1996). The relationship between external variables and common factors. Psychometrika (44:1), 93-97. Waller, N., Jones, J. (2010). Correlation weights in multiple regression. Psychometrika (75:1), 58–69. 66

Partial Least Squares Path Modeling: Past and Future CARMA Ed Rigdon, Georgia State University Oct. 9, 2015 1.

Similar presentations

Presentation on theme: "Partial Least Squares Path Modeling: Past and Future CARMA Ed Rigdon, Georgia State University Oct. 9, 2015 1."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Partial Least Squares Path Modeling: Past and Future CARMA Ed Rigdon, Georgia State University Oct. 9, 2015 1.

Similar presentations

Presentation on theme: "Partial Least Squares Path Modeling: Past and Future CARMA Ed Rigdon, Georgia State University Oct. 9, 2015 1."— Presentation transcript:

Similar presentations

About project

Feedback