PLS Regression Hervé Abdi The university of Texas at Dallas

Slides:



Advertisements
Similar presentations
PCA for analysis of complex multivariate data. Interpretation of large data tables by PCA In industry, research and finance the amount of data is often.
Advertisements

Benefits Key Features and Results. XLSTAT-ADA’s functions.
Regression analysis Relating two data matrices/tables to each other Purpose: prediction and interpretation Y-data X-data.
Mutidimensional Data Analysis Growth of big databases requires important data processing.  Need for having methods allowing to extract this information.
3.2 OLS Fitted Values and Residuals -after obtaining OLS estimates, we can then obtain fitted or predicted values for y: -given our actual and predicted.
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Learning Objectives Copyright © 2004 John Wiley & Sons, Inc. Bivariate Correlation and Regression CHAPTER Thirteen.
Simple Regression Model
Definition  Regression Model  Regression Equation Y i =  0 +  1 X i ^ Given a collection of paired data, the regression equation algebraically describes.
Chapter 17 Overview of Multivariate Analysis Methods
LINEAR REGRESSION: What it Is and How it Works Overview What is Bivariate Linear Regression? The Regression Equation How It’s Based on r.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: What it Is and How it Works. Overview What is Bivariate Linear Regression? The Regression Equation How It’s Based on r.
LINEAR REGRESSION: What it Is and How it Works. Overview What is Bivariate Linear Regression? The Regression Equation How It’s Based on r Assumptions.
Statistics 350 Lecture 16. Today Last Day: Introduction to Multiple Linear Regression Model Today: More Chapter 6.
Quantitative Structure-Activity Relationships (QSAR) Comparative Molecular Field Analysis (CoMFA) Gijs Schaftenaar.
Multivariate R e g r e s s i o n
Chapter Topics Types of Regression Models
Simple Linear Regression Analysis
Linear Regression Analysis
Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural.
1 Statistical Tools for Multivariate Six Sigma Dr. Neil W. Polhemus CTO & Director of Development StatPoint, Inc.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc. Chap 12-1 Correlation and Regression.
Introduction to Linear Regression
Applied Quantitative Analysis and Practices LECTURE#22 By Dr. Osman Sadiq Paracha.
Section 5.2: Linear Regression: Fitting a Line to Bivariate Data.
Chapter 9 Analyzing Data Multiple Variables. Basic Directions Review page 180 for basic directions on which way to proceed with your analysis Provides.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
Correlation Correlation is used to measure strength of the relationship between two variables.
Multiple Linear Regression. Purpose To analyze the relationship between a single dependent variable and several independent variables.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Regression Regression relationship = trend + scatter
Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
Chapter 7 Multivariate techniques with text Parallel embedded system design lab 이청용.
Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.
Chapter Thirteen Copyright © 2006 John Wiley & Sons, Inc. Bivariate Correlation and Regression.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 12 Analyzing the Association Between Quantitative Variables: Regression Analysis Section.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Principal Component Analysis (PCA)
Feature Selection and Extraction Michael J. Watts
Lecture 10 Introduction to Linear Regression and Correlation Analysis.
Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L11.1 Lecture 11: Canonical correlation analysis (CANCOR)
Chapter 14 Introduction to Regression Analysis. Objectives Regression Analysis Uses of Regression Analysis Method of Least Squares Difference between.
Topics, Summer 2008 Day 1. Introduction Day 2. Samples and populations Day 3. Evaluating relationships Scatterplots and correlation Day 4. Regression and.
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
McGraw-Hill/Irwin © 2003 The McGraw-Hill Companies, Inc.,All Rights Reserved. Part Four ANALYSIS AND PRESENTATION OF DATA.
Introduction Many problems in Engineering, Management, Health Sciences and other Sciences involve exploring the relationships between two or more variables.
Analysis and Interpretation: Multiple Variables Simultaneously
CHAPTER 3 Describing Relationships
ENM 310 Design of Experiments and Regression Analysis
Dimension Reduction in Workers Compensation
Simple Linear Regression
1) A residual: a) is the amount of variation explained by the LSRL of y on x b) is how much an observed y-value differs from a predicted y-value c) predicts.
Hồi quy PLS Hervé Abdi Đại học Texas, Dallas
Regression Analysis Week 4.
Prediction of new observations
Descriptive Statistics vs. Factor Analysis
Tutorial 8 Table 3.10 on Page 76 shows the scores in the final examination F and the scores in two preliminary examinations P1 and P2 for 22 students in.
M248: Analyzing data Block D UNIT D2 Regression.
Principal Components Analysis
Multiple Linear Regression
Somi Jacob and Christian Bach
Adequacy of Linear Regression Models
Checking the data and assumptions before the final analysis.
Adequacy of Linear Regression Models
Adequacy of Linear Regression Models
Principal Component Analysis
Chapter 14 Multiple Regression
Presentation transcript:

PLS Regression Hervé Abdi The university of Texas at Dallas

An Example: What is Mouthfeel? From Folkenberg D.M., Bredie W.L.P., Martend M., (1999). What is mouthfeel: Sensory-rheological relationship in instant hot cocoa drinks. Journal of Sensory Studies, 14, (Data set courtoisie of Marten, H., Marten M. (2001) Multivariate Analysis of Quality: An introduction. London: Wiley. Downloaded from: Data set: Cocoa-ii.mat Goal. Predict sensory attributes (mouthfell): Dependent variables (Y set) from physical/chemical/rheological properties: Predictors / independent variables (X set)

An Example: What is Mouthfeel? 6 Predictors / independent variables (X set) physical/chemical/rheological properties %COCOA %SUGAR %MILK SEDIMENT COLOUR VISCOSITY 10 Dependent variables (Y set) colour cocoa-odour milk-odour thick-txtr mouthfeel smooth-txtr creamy-txtr cocoa-taste milk-taste sweet 14 Samples (n-: without stabilizer, n+: are with stabilizer)

X

Y

Why using PLS and PCA and MLR A short tour

I by J data sets: PCA, CA, Biplots, etc. I J The beauty of Euclide …

I by J  I by 1 (with J << I) data sets: Multiple Regression IJ1 The beauty of Euclide

I by J  I by K data sets: PLS, CANDIS, etc. IJK The beauty of Euclide

Why using PLS ? 1.To explain the similarity between the observations (here cocoa samples). 2.To detect the structure in the relationships between dependent and independent variables. 3.To get a graphical representation of the data 4.To predict the value of new observations

PLS combines features of Principal Component Analysis (PCA) and Multiple Linear Regression (MLR). Like PCA: PLS extracts factors from X. Like MLR: PLS predicts Y from X Combine PCA & MLR. PLS extracts factors from X in order to predict Y What is PLS Regression ?

When to use PLS ? To analyze two data tables describing the same I observations with J predictors and K dependent variables 1 … j … J 1...i...I1...i...I x i,j …... ……... Independent Variables Observations 1 … k … K 1...i...I1...i...I y i,k ……... Dependent Variables

General principle of PLS: 1 … j … J 1...i...I1...i...I x ij …... ……... Predictors X Observations t 1 … t ℓ... t L 1...i...I1...i...I t i,ℓ …... ……... Latent Variables t ℓ = Xw ℓ 1 … k … K 1...i...I1...i...I y i,k ……... Dependent Variables Predict NIPALS ℓ = t ℓ c T

PLS: Maps of the observations …... x ij t i,ℓ t 1 … t ℓ... t L …... ……... Latent Variables 1 … j … J 1...i...I1...i...I ……... X 1 … k … K y i,k ……... t ℓ = Xw ℓ NIPALS ℓ = t ℓ c T lv 2 lv 1 Observations: t ℓ I i

PLS: Maps of the variables …... x ij t i,ℓ t 1 … t ℓ... t L …... ……... Latent Variables 1 … j … J 1...i...I1...i...I ……... X 1 … k … K y i,k ……... t ℓ = Xw ℓ NIPALS ℓ = t ℓ c T lv 1 lv 2 Circle of correlations lv 2 lv 1 Common map w ℓ & c ℓ x x y x y y y

PLS: Predicting Y from X …... x ij t i,ℓ t 1 … t ℓ... t L …... ……... Latent Variables 1 … j … J 1...i...I1...i...I ……... X 1 … k … K y i,k ……... t ℓ = Xw ℓ NIPALS ℓ = t ℓ c T t ℓ = Xw ℓ & = t ℓ c T = XB pls Some Magic Here!

PLS: How do we explain Y from X? RESS =  (data – prediction) 2 Compare Data (Y) with Prediction (Yhat) RESS (REsidual Sum of Squares) 1 … k … K Y 1...i...I1...i...I ℓ = XB pls 1...i...I1...i...I

1 … k … K (-1) = X (-1) B pls 2...i...I2...i...I PLS: How do we predict Y from X? How well will we do with NEW data? Cross-validation. Here Jackknife 1 … k … K Y 1...i...I1...i...I Predict y 1 from X (-1) 1 … k … K Y (-1) 12...i...I12...i...I Predict y 2 from X (-2) … etc … Predict y I from X (-I)

PLS: How do we predict Y from X? How well will we do with NEW data? Cross-validation. Here Jackknife PRESS =  (data – jackknifed prediction) 2 Compare Data (Y) with Jackknifed Prediction (Y jack ) PRESS (Predicted REsidual Sum of Squares) 1 … k … K Y 1...i...I1...i...I jack = XB pls 1...i...I1...i...I

PLS Big Question: How Many Latent Variables? Compare RESS and PRESS, or use PRESS. Quick and Dirty: Min(PRESS) => Optimum number of Latent Variables

Back to cocoa Goals: Explain and Predict Sensory (Y) from Physico-Chemical (X)

X

Y

Correlation within the X set

Correlation within the Y set

Correlation between X and Y

Show The t (latent) variables

Show w

Show c

B pls: X to Y (in Z-scores)

B * pls from X to Y (original units)

Show RESS & PRESS < min PRESS for 4 Keep 4 latent variables

Plot w & t (1 vs 2)

Plot w & c (1 vs 2)

Show the circle of correlation

Conclusion Useful References (contain bibliography): Abdi (2007, 2003) see