Partial Least Squares Very brief intro. Multivariate regression The multiple regression approach creates a linear combination of the predictors that best.

Slides:



Advertisements
Similar presentations
Copula Representation of Joint Risk Driver Distribution
Advertisements

Basic Statistics Correlation.
Ecole Nationale Vétérinaire de Toulouse Linear Regression
STATISTICS Linear Statistical Models
Overview of Lecture Partitioning Evaluating the Null Hypothesis ANOVA
Overview of Lecture Factorial Designs Experimental Design Names
1 SPSS output & analysis. 2 The Regression Equation A line in a two dimensional or two-variable space is defined by the equation Y=a+b*X The Y variable.
Assumptions underlying regression analysis
Linear regression models in R (session 1) Tom Price 3 March 2009.
1 What you've always wanted to know about logistic regression analysis, but were afraid to ask... Februari, Gerrit Rooks Sociology of Innovation.
Bivariate &/vs. Multivariate
Simple Linear Regression 1. review of least squares procedure 2
Part 24: Hypothesis Tests 24-1/33 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics.
Step three: statistical analyses to test biological hypotheses General protocol continued.
Continued Psy 524 Ainsworth
Factor Analysis with SAS
Topics: Multiple Regression Analysis (MRA)
Lecture Unit Multiple Regression.
Chapter Twelve Multiple Regression and Model Building McGraw-Hill/Irwin Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Chapter Thirteen The One-Way Analysis of Variance.
Simple Linear Regression Analysis
Business Statistics, 4e by Ken Black
Multiple Regression and Model Building
Heibatollah Baghi, and Mastee Badii
Topic 12: Multiple Linear Regression
A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
Geometric Representation of Regression. ‘Multipurpose’ Dataset from class website Attitude towards job –Higher scores indicate more unfavorable attitude.
Lecture 10 F-tests in MLR (continued) Coefficients of Determination BMTRY 701 Biostatistical Methods II.
The Multiple Regression Model.
4/14/ lecture 81 STATS 330: Lecture 8. 4/14/ lecture 82 Collinearity Aims of today’s lecture: Explain the idea of collinearity and its connection.
Regression analysis Relating two data matrices/tables to each other Purpose: prediction and interpretation Y-data X-data.
Factor Analysis and Principal Components Removing Redundancies and Finding Hidden Variables.
Lecture 7: Principal component analysis (PCA)
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Standard Error of the Estimate Goodness of Fit Coefficient of Determination Regression Coefficients.
Ch. 14: The Multiple Regression Model building
Simple Linear Regression Analysis
Classification and Prediction: Regression Analysis
Relationships Among Variables
Lecture 5 Correlation and Regression
Multiple Linear Regression Response Variable: Y Explanatory Variables: X 1,...,X k Model (Extension of Simple Regression): E(Y) =  +  1 X 1 +  +  k.
Objectives of Multiple Regression
ANCOVA Lecture 9 Andrew Ainsworth. What is ANCOVA?
PCA Example Air pollution in 41 cities in the USA.
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance.
Chapter 12 Multiple Linear Regression Doing it with more variables! More is better. Chapter 12A.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
Tests and Measurements Intersession 2006.
Multiple Linear Regression. Purpose To analyze the relationship between a single dependent variable and several independent variables.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Lecture 11 Multicollinearity BMTRY 701 Biostatistical Methods II.
This supervised learning technique uses Bayes’ rule but is different in philosophy from the well known work of Aitken, Taroni, et al. Bayes’ rule: Pr is.
Environmental Modeling Basic Testing Methods - Statistics III.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Chapter 8 Relationships Among Variables. Outline What correlational research investigates Understanding the nature of correlation What the coefficient.
Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L11.1 Lecture 11: Canonical correlation analysis (CANCOR)
Chapter 14 EXPLORATORY FACTOR ANALYSIS. Exploratory Factor Analysis  Statistical technique for dealing with multiple variables  Many variables are reduced.
Chapter 12 REGRESSION DIAGNOSTICS AND CANONICAL CORRELATION.
Predicting Energy Consumption in Buildings using Multiple Linear Regression Introduction Linear regression is used to model energy consumption in buildings.
Mini-Revision Since week 5 we have learned about hypothesis testing:
Linear Regression Methods for Collinearity
Regression.
Review. Review Statistics Needed Need to find the best place to draw the regression line on a scatter plot Need to quantify the cluster.
Applied Statistical Analysis
Quantitative Methods Simple Regression.
Multivariate Statistics
3 basic analytical tasks in bivariate (or multivariate) analyses:
Presentation transcript:

Partial Least Squares Very brief intro

Multivariate regression The multiple regression approach creates a linear combination of the predictors that best correlates with the outcome With principal components regression, we first create several linear combinations (equal to the number of predictors) and then use those composites in predicting the outcome instead of the original predictors –Components are independent Helps with collinearity –Can use fewer of components relative to predictors while still retaining most of the predictor variance

X1 X2 X3 X4 Linear Composite Outcome X1 X2 X3 X4 LinComp Outcome LinComp New Composite Note the bold, we are dealing with vectors and matrices Here T refers to our components, W and Q are coefficient vectors as B is above Multiple Regression Principal Components Regression

Partial Least Squares Partial Least Squares is just like PC Regression except in how the component scores are computed PC regression = weights are calculated from the covariance matrix of the predictors PLS = weights reflect the covariance structure between predictors and response –While conceptually not too much of a stretch, it requires a more complicated iterative algorithm Nipals and SIMPLS algorithms probably most common Like in regression, the goal is to maximize the correlation between the response(s) and component scores

Example Download the PCA R code again Requires the pls package Do consumer ratings of various beer aspects associate 1 with their SES?

Multiple regression All are statistically significant correlates of SES and almost all the variance is accounted for (98.7%) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) *** ALCOHOL * AROMA < 2e-16 *** COLOR ** COST * REPUTAT < 2e-16 *** SIZE < 2e-16 *** TASTE < 2e-16 *** --- Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 212 degrees of freedom (11 observations deleted due to missingness) Multiple R-squared: 0.987, Adjusted R-squared: F-statistic: 2305 on 7 and 212 DF, p-value: < 2.2e-16

PC Regression For first 3 components The first component accounts for 53.4% of the variance in the predictors, and only 33% of the variance in the outcome With the second and third, the vast majority of the variance in the predictors and outcome is accounted for Loadings breakdown according to a PCA for the predictors Data: X dimension: Y dimension: Fit method: svdpc Number of components considered: 3 TRAINING: % variance explained 1 comps 2 comps 3 comps X SES Loadings: Comp 1 Comp 2 Comp 3 COST SIZE ALCOHOL REPUTAT AROMA COLOR TASTE Comp 1 Comp 2 Comp 3 SS loadings Proportion Var Cumulative Var

PLS Regression For first 3 components The first component accounts for 44.8% of the variance in the predictors (almost 10% less than PCR), and 90% of the variance in the outcome (a lot more than PCR) The loadings are notably different compared to the PC regression Data: X dimension: Y dimension: Fit method: kernelpls Number of components considered: 3 TRAINING: % variance explained 1 comps 2 comps 3 comps X SES Loadings: Comp 1 Comp 2 Comp 3 COST SIZE ALCOHOL REPUTAT AROMA COLOR TASTE Comp 1 Comp 2 Comp 3 SS loadings Proportion Var Cumulative Var

Comparison of coefficients MR PCA PLS (Intercept) COST SIZE ALCOHOL REPUTAT AROMA COLOR TASTE Coefficients: Estimate (Intercept) COST SIZE ALCOHOL REPUTAT AROMA COLOR TASTE (Intercept) COST SIZE ALCOHOL REPUTAT AROMA COLOR TASTE 0.022

Why PLS? PLS can extends to multiple outcomes and allows for dimension reduction Less restrictive in terms of assumptions than MR –Distribution free –No collinearity –Independence of observations not required Unlike PCR it creates components with an eye to the predictor-DV relationship Unlike Canonical Correlation, it maintains the predictive nature of the model While similar interpretation is possible, depending on your research situation and goals, any may be viable analyses