Partial Least Squares Very brief intro
Multivariate regression The multiple regression approach creates a linear combination of the predictors that best correlates with the outcome With principal components regression, we first create several linear combinations (equal to the number of predictors) and then use those composites in predicting the outcome instead of the original predictors –Components are independent Helps with collinearity –Can use fewer of components relative to predictors while still retaining most of the predictor variance
X1 X2 X3 X4 Linear Composite Outcome X1 X2 X3 X4 LinComp Outcome LinComp New Composite Note the bold, we are dealing with vectors and matrices Here T refers to our components, W and Q are coefficient vectors as B is above Multiple Regression Principal Components Regression
Partial Least Squares Partial Least Squares is just like PC Regression except in how the component scores are computed PC regression = weights are calculated from the covariance matrix of the predictors PLS = weights reflect the covariance structure between predictors and response –While conceptually not too much of a stretch, it requires a more complicated iterative algorithm Nipals and SIMPLS algorithms probably most common Like in regression, the goal is to maximize the correlation between the response(s) and component scores
Example Download the PCA R code again Requires the pls package Do consumer ratings of various beer aspects associate 1 with their SES?
Multiple regression All are statistically significant correlates of SES and almost all the variance is accounted for (98.7%) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) *** ALCOHOL * AROMA < 2e-16 *** COLOR ** COST * REPUTAT < 2e-16 *** SIZE < 2e-16 *** TASTE < 2e-16 *** --- Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 212 degrees of freedom (11 observations deleted due to missingness) Multiple R-squared: 0.987, Adjusted R-squared: F-statistic: 2305 on 7 and 212 DF, p-value: < 2.2e-16
PC Regression For first 3 components The first component accounts for 53.4% of the variance in the predictors, and only 33% of the variance in the outcome With the second and third, the vast majority of the variance in the predictors and outcome is accounted for Loadings breakdown according to a PCA for the predictors Data: X dimension: Y dimension: Fit method: svdpc Number of components considered: 3 TRAINING: % variance explained 1 comps 2 comps 3 comps X SES Loadings: Comp 1 Comp 2 Comp 3 COST SIZE ALCOHOL REPUTAT AROMA COLOR TASTE Comp 1 Comp 2 Comp 3 SS loadings Proportion Var Cumulative Var
PLS Regression For first 3 components The first component accounts for 44.8% of the variance in the predictors (almost 10% less than PCR), and 90% of the variance in the outcome (a lot more than PCR) The loadings are notably different compared to the PC regression Data: X dimension: Y dimension: Fit method: kernelpls Number of components considered: 3 TRAINING: % variance explained 1 comps 2 comps 3 comps X SES Loadings: Comp 1 Comp 2 Comp 3 COST SIZE ALCOHOL REPUTAT AROMA COLOR TASTE Comp 1 Comp 2 Comp 3 SS loadings Proportion Var Cumulative Var
Comparison of coefficients MR PCA PLS (Intercept) COST SIZE ALCOHOL REPUTAT AROMA COLOR TASTE Coefficients: Estimate (Intercept) COST SIZE ALCOHOL REPUTAT AROMA COLOR TASTE (Intercept) COST SIZE ALCOHOL REPUTAT AROMA COLOR TASTE 0.022
Why PLS? PLS can extends to multiple outcomes and allows for dimension reduction Less restrictive in terms of assumptions than MR –Distribution free –No collinearity –Independence of observations not required Unlike PCR it creates components with an eye to the predictor-DV relationship Unlike Canonical Correlation, it maintains the predictive nature of the model While similar interpretation is possible, depending on your research situation and goals, any may be viable analyses