Download presentation
Presentation is loading. Please wait.
Published byKerrie Baldwin Modified over 9 years ago
1
From linearity to nonlinear additive spline modeling in Partial Least-Squares regression Jean-François Durand Montpellier II University Scuola della Società Italiana di Statistica, Capua 2004/09/15
2
Main effects Linear Partial Least-Squares (PLSL) Learning data matrices : X nxp, r=rank(X), and Y nxq Learning data matrices : X nxp, r=rank(X), and Y nxq p predictors (cont. or categorical) p predictors (cont. or categorical) q responses (cont. or categorical) q responses (cont. or categorical) continuous : regression model continuous : regression model q indicator var ’s : classification model q indicator var ’s : classification model All variables are standardized with respect to
3
k latent variables k latent variables algorithm algorithm (1) (2) Once obtained, « Partial » regressions are made and next is computed on remaining information
4
OLS model on the k latent variables OLS model on the k latent variables « coordinate » linear function of the main effect of on the response. To summarize : PLSL (X,Y)
5
The dimension of the model : k The dimension of the model : k Cross-Validation (CV or GCV) Cross-Validation (CV or GCV) if k=r, PLSL( X, Y) = OLS(X, Y) if k=r, PLSL( X, Y) = OLS(X, Y) If Y = X, If Y = X, PLSL( X, Y=X ) = PCA( X ) Pruning step : Variable subset selection (CV or GCV)
6
Maps of the observations Maps of the observations
7
Main effects Partial Least-Squares Splines (PLSS) Additive model through k latent variables Additive model through k latent variables « coordinate » spline function of the main effect of on the response : a spline function To summarize : PLSS(X,Y)= PLSL(B,Y) B = spline coding matrix of X
8
principal components maps principal components maps Pruning step : parsimonious models by selecting main effects according to the range of spline functions. Validation of the new models : CV or GCV Pruning step : parsimonious models by selecting main effects according to the range of spline functions. Validation of the new models : CV or GCV
9
tuning parameters tuning parameters The PLS dimension : k (CV or GCV) The PLS dimension : k (CV or GCV) The spline space for each predictor The spline space for each predictor the degree d the degree d the « knots» : the number K and the locations the « knots» : the number K and the locations Dimension of the spline space : d+1+K Advantages of PLSS against colinearity of predictors against small ratio #observations / #predictors easy to interpret the main effects spline functions
10
Multivariate Additive PLS Splines : MAPLSS (bivariate interactions) Model casted in the ANOVA decomposition : Model casted in the ANOVA decomposition : ANOVA spline functions
11
The curse of dimensionality The curse of dimensionality The price of nonlinearity : expansion of the dimension of B MAPLSS(X,Y) = PLSL(B,Y) B = spline coding matrix of X with interactions Example : p predictors (p -1)p / 2 possible interactions spline dimension = 10 for each predictor Necessity of eliminating non influent interactions
12
Rule: Order decreasingly interactions, refuse one if CRIT(k)<0 1) Automatic selection of candidate interactions : 1) Automatic selection of candidate interactions : Denote or each interaction i is separately added to the main effects model m and evaluated 2) Add step-by-step ordered candidates to the main effects model, and accept a model if it significantly improves CV 2) Add step-by-step ordered candidates to the main effects model, and accept a model if it significantly improves CV
13
3) Pruning step : Selection of main effects and interactions according to the range of the ANOVA functions (CV/GCV) Advantages of MAPLSS : inherits the advantages of PLSL and PLSS captures most influential bivariate interations easy interpretable ANOVA function plots Disadvantages of MAPLSS : no higher interactions no automatic selection of spline parameters
14
Bibliography J. F. Durand. Local Polynomial Additive Regression through PLS and Splines: PLSS, Chemometrics and Intelligent Laboratory Systems 58, 235-246, 2001. J. F. Durand and R. Lombardo. Interactions terms in nonlinear PLS via additive spline transformations. « Between Data Science and Applied Data Analysis », Studies in Classification, Data Analysis, and Knowledge Organization. Eds M.Schader, W. Gaul and M. Vichi, Springer, 22-29, 2003
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.