Presentation is loading. Please wait.

Presentation is loading. Please wait.

Copyright © 2011 by Statistical Innovations Inc. All rights reserved. Comparison of two methods: CCR and PLS-Regression (PLS-R) Scope: Regression with.

Similar presentations


Presentation on theme: "Copyright © 2011 by Statistical Innovations Inc. All rights reserved. Comparison of two methods: CCR and PLS-Regression (PLS-R) Scope: Regression with."— Presentation transcript:

1 Copyright © 2011 by Statistical Innovations Inc. All rights reserved. Comparison of two methods: CCR and PLS-Regression (PLS-R) Scope: Regression with a single dependent variable Y and many correlated predictors 1

2 Copyright © 2011 by Statistical Innovations Inc. All rights reserved. Some Differences Between PLS-R and CCR (K<P) Invariant to Predictor Scaling? Components Correlated? PLS-RNO CCRYES As in traditional regression, predictions obtained from CCR are invariant to any linear transformations on the predictors. Predictions obtained from PLS-R, similar to penalized regression methods, are not invariant. 2

3 Copyright © 2011 by Statistical Innovations Inc. All rights reserved. Partial Least Squares Regression (PLS-R) Idea: Replace the P predictors x g, g=1,2,…,P by K≤P orthonormal* predictive components v 1, v 2, …, v K *orthogonal and standardized to have variance 1 (Y and Xs assumed centered) Initialize algorithm: Set k=1 and for each g Compute v 1 : Each is weighted by its covariance with Y, and then divided by the normalizing constant s k Step 1: Compute Step 2: For each g, set orthogonal component of with respect to v 1,…, v k (“deflation” step) Step 3: Increment k = k+1 and return to step 1. When finished, express each component in terms of original Xs 3 (“restoration” step):

4 Copyright © 2011 by Statistical Innovations Inc. All rights reserved. Correlated Component Regression* 4 Correlated Component Regression (CCR) utilizes K correlated components, each a linear combination of the predictors, to predict an outcome variable. The first component S 1 captures the effects of predictors which have direct effects on the outcome. It is a weighted average of all 1-predictor effects. The second component S 2, correlated with S 1, captures the effects of suppressor variables that improve prediction by removing extraneous variation from one or more of the predictors that have direct effects. Additional components are included if they improve prediction. Prime predictors (those having direct effects) are identified as those having substantial loadings on S 1, and proxy predictors (suppressor variables) as those having substantial loadings on S 2, and relatively small loadings on S 1. Simultaneous variable reduction is achieved using a step-down algorithm where at each step the least important predictor is removed, importance defined by the absolute value of the standardized coefficient. M-fold CV is used to determine the number of components K and number of predictors P. *Implemented in CORExpress® program: patent pending regarding this technology

5 Copyright © 2011 by Statistical Innovations Inc. All rights reserved. Example: Correlated Component Regression Estimation Algorithm as Applied to Predictors in Linear Regression: CCR-lm 5 Step 1: Form 1st component S 1 as average of P 1-predictor models (ignoring  g ) g=1,2,…,P; 1-component model: Step 2: Form 2nd component S 2 as average of Where each is estimated from the following 2-predictor model: g=1,2,…,P; Step 3: Estimate the 2-component model using S 1 and S 2 as predictors: Continue for K = 3,4,…,K*-component model. For example, for K=3, step 2 becomes: Final regression coefficients are obtained by OLS regression on components:

6 Copyright © 2011 by Statistical Innovations Inc. All rights reserved. PLS-R is Sensitive to Predictor Scale Predictions for Y obtained from PLS-R model with K < P components depend upon the relative scales of the predictors If x 1 is replaced by x* 1 = cx 1, where c > 0 – for c > 1, 1-component model (PLS1) will tend to have increased weight for x 1 – for c < 1, 1-component model (PLS1) will tend to have decreased weight for x 1 Example: N=24 car models* – Y = PRICE (car price measured in francs) – X 1 = CYLINDER (engine measured in cubic centimeters): – X 2 = POWER (horsepower): – X 3 = SPEED (top speed in kilometers/hour): – X 4 = WEIGHT (kilograms): – X 5 = LENGTH (centimeters): – X 6 = WIDTH (centimeters): How do results differ if we use standardized predictors (= Predictor/StdDev)? 6 PredictorStd. Dev Cylinder527.9 POWER38.8 SPEED25.2 WEIGHT230.3 LENGTH41.3 WIDTH7.7 *Data source: Michel Tenenhaus

7 Copyright © 2011 by Statistical Innovations Inc. All rights reserved. For PLS-R, Scale Effects Relative Predictor Importance and Optimal # Components Implied Relative Importance of Predictors is based on Standardized Coefficients # Components K Determined by Cross-Validated R 2 (CV-R 2 ) PLS1 (K=1)PLS1 w/ stdzd predictors (K=1)CCR1 (K=1) Training R 2 0.74Training R 2 0.79Training R 2 0.79 CV-R 2 0.70CV-R 2 0.74CV-R 2 0.75 UnStdStandardized UnStdStandardized PredictorsCoefficientPredictorsCoefficientPredictorsCoefficient CYLINDER0.73ZCYLINDER0.18CYLINDER0.18 POWER0.00ZPOWER0.19POWER0.19 SPEED0.00ZSPEED0.16SPEED0.16 WEIGHT0.13ZWEIGHT0.18WEIGHT0.18 LENGTH0.00ZLENGTH0.16LENGTH0.16 WIDTH0.00ZWIDTH0.13WIDTH0.13 PLS3 (K=3)PLS2 w/ stdzd predictors (K=2)CCR2 (K=2) Training R 2 0.83Training R 2 0.81Training R 2 0.82 CV-R 2 0.69CV-R 2 0.76CV-R 2 0.75 UnStdStandardized UnStdStandardized PredictorsCoefficientPredictorsCoefficientPredictorsCoefficient CYLINDER-0.02ZCYLINDER0.19CYLINDER0.19 POWER0.43ZPOWER0.31POWER0.37 SPEED0.17ZSPEED0.22SPEED0.20 WEIGHT0.48ZWEIGHT0.18WEIGHT0.17 LENGTH-0.05ZLENGTH0.08LENGTH0.02 WIDTH0.00 ZWIDTH0.01 WIDTH0.05 Relative importance obtained from PLS-R is sensitive to scaling of predictors (.73 vs. 18). Additional component required due to scale: K* = 3 (original scale) K* = 2 (standardized) Overall, importance of CYLINDER goes from unimportant (-.02 with original scale) to important (.19 with standardized). 7

8 Copyright © 2011 by Statistical Innovations Inc. All rights reserved. Relationships for 1-Component Models Unstandardized predictors: With P = 1 predictor, model is saturated (K=P) so CCR1 = PLS1 = OLS Regression coefficient estimate = COV(Y,X)/VAR(X) With P > 1 predictors, CCR1 and PLS1 can differ considerably Coefficient estimates for CCR1 are proportional to COV(Y,X g )/VAR(X g ) Coefficient estimates for PLS1 are proportional to COV(Y,X g ), so predictors with larger variance have a larger weight and may dominate Standardized predictors: Since VAR(X g ) = 1 for all g=1,2,…,P : COV(Y,X g )/VAR(X g ) = COV(Y,X g ) and CCR1 = PLS1(K = 1) 8

9 Copyright © 2011 by Statistical Innovations Inc. All rights reserved. Example: PLS2 with Unstandardized Predictors 9 CV-R 2 as a function of # predictors PredictorAll12345678910 CYLINDER494664654545 WEIGHT494565564455 POWER284163114332 SPEED60060000000 LENGTH60060000000 Total13812 3012 Predictors2252222222 Training R 2 0.74 CV-R 2 0.69 (.05) Standardized PredictorsCoefficient CYLINDER0.64 WEIGHT0.23

10 Copyright © 2011 by Statistical Innovations Inc. All rights reserved. 10 PredictorAll12345678910 ZPOWER606666666666 ZWEIGHT606666666666 ZCYLINDER525555556565 ZSPEED251151015155 ZLENGTH70020101012 Total20418 2418 241824 Predictors3343334344 Training R 2 0.84 CV-R 2 0.78 (.02) Standardized PredictorsCoefficient ZPOWER0.58 ZCYLINDER0.20 ZWEIGHT0.19 CV-R 2 as a function of # predictors Example: PLS2 with Standardized Predictors

11 Copyright © 2011 by Statistical Innovations Inc. All rights reserved. Example: 2-component CCR Model (CCR2) 11 PredictorAll12345678910 POWER606666666666 WEIGHT596666656666 SPEED273633203403 CYLINDER232632313102 LENGTH101600100101 WIDTH70601000000 Total186183618 1218 1218 Predictors3633323323 Training R 2 0.84 CV-R 2 0.77 (.03) Standardized PredictorsCoefficient POWER0.45 WEIGHT0.44 SPEED0.10 CV-R 2 as a function of # predictors


Download ppt "Copyright © 2011 by Statistical Innovations Inc. All rights reserved. Comparison of two methods: CCR and PLS-Regression (PLS-R) Scope: Regression with."

Similar presentations


Ads by Google