Projection on Latent Variables

Slides:



Advertisements
Similar presentations
Things to do in Lecture 1 Outline basic concepts of causality
Advertisements

11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Geometric Representation of Regression. ‘Multipurpose’ Dataset from class website Attitude towards job –Higher scores indicate more unfavorable attitude.
Kin 304 Regression Linear Regression Least Sum of Squares
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 ~ Curve Fitting ~ Least Squares Regression Chapter.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
Chapter 10 Curve Fitting and Regression Analysis
Psychology 202b Advanced Psychological Statistics, II February 1, 2011.
Lecture 3 Cameron Kaplan
Statistics 350 Lecture 16. Today Last Day: Introduction to Multiple Linear Regression Model Today: More Chapter 6.
Lesson #32 Simple Linear Regression. Regression is used to model and/or predict a variable; called the dependent variable, Y; based on one or more independent.
The Basics of Regression continued
PSY 1950 General Linear Model November 12, The General Linear Model Or, What the Hell ’ s Going on During Estimation?
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. by Lale Yurttas, Texas A&M University Chapter 171 CURVE.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Linear Regression and Linear Prediction Predicting the score on one variable.
Simple Linear Regression Analysis
Least-Squares Regression
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
Stat13-lecture 25 regression (continued, SE, t and chi-square) Simple linear regression model: Y=  0 +  1 X +  Assumption :  is normal with mean 0.
Ch4 Describing Relationships Between Variables. Pressure.
Managerial Economics Demand Estimation. Scatter Diagram Regression Analysis.
MECN 3500 Inter - Bayamon Lecture 9 Numerical Methods for Engineering MECN 3500 Professor: Dr. Omar E. Meza Castillo
Ch4 Describing Relationships Between Variables. Section 4.1: Fitting a Line by Least Squares Often we want to fit a straight line to data. For example.
Statistical Methods Statistical Methods Descriptive Inferential
Multiple Linear Regression. Purpose To analyze the relationship between a single dependent variable and several independent variables.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 4 Section 2 – Slide 1 of 20 Chapter 4 Section 2 Least-Squares Regression.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Multiple Linear Regression Partial Regression Coefficients.
1 Multiple Regression A single numerical response variable, Y. Multiple numerical explanatory variables, X 1, X 2,…, X k.
Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
Simple Linear Regression (SLR)
Simple Linear Regression (OLS). Types of Correlation Positive correlationNegative correlationNo correlation.
Environmental Modeling Basic Testing Methods - Statistics III.
Ch14: Linear Least Squares 14.1: INTRO: Fitting a pth-order polynomial will require finding (p+1) coefficients from the data. Thus, a straight line (p=1)
Lesson 14 - R Chapter 14 Review. Objectives Summarize the chapter Define the vocabulary used Complete all objectives Successfully answer any of the review.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Psychology 202a Advanced Psychological Statistics October 22, 2015.
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 ~ Curve Fitting ~ Least Squares Regression.
L Berkley Davis Copyright 2009 MER301: Engineering Reliability Lecture 12 1 MER301: Engineering Reliability LECTURE 12: Chapter 6: Linear Regression Analysis.
Regression Analysis Intro to OLS Linear Regression.
4 basic analytical tasks in statistics: 1)Comparing scores across groups  look for differences in means 2)Cross-tabulating categoric variables  look.
1 G Lect 3M Regression line review Estimating regression coefficients from moments Marginal variance Two predictors: Example 1 Multiple regression.
Multiple Regression David A. Kenny January 12, 2014.
Lecture 8: Ordinary Least Squares Estimation BUEC 333 Summer 2009 Simon Woodcock.
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 7: Regression.
OLS Regression What is it? Closely allied with correlation – interested in the strength of the linear relationship between two variables One variable is.
Lecturer: Ing. Martina Hanová, PhD.. Regression analysis Regression analysis is a tool for analyzing relationships between financial variables:  Identify.
1 Experimental Statistics - week 11 Chapter 11: Linear Regression and Correlation.
Multiple Regression.
The simple linear regression model and parameter estimation
Regression and Correlation of Data Summary
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
REGRESSION G&W p
Part 5 - Chapter
Part 5 - Chapter 17.
Reasoning in Psychology Using Statistics
Regression.
G Lecture 10b Example: Recognition Memory
Multiple Regression.
The Least-Squares Regression Line
Part 5 - Chapter 17.
Multiple Regression.
Multiple Linear Regression
Example on the Concept of Regression . observation
3 basic analytical tasks in bivariate (or multivariate) analyses:
Presentation transcript:

Projection on Latent Variables PLS Partial Least Squares but also Projection on Latent Variables

First marginal (partial) regression through the origin, of X1 on Y Step 1 First marginal (partial) regression through the origin, of X1 on Y

Second marginal regression through the origin, of X2 on Y

COVARIANCE DIRECTION COSINES STEP 2

The marginal slopes w after normalization describe the first LATENT VARIABLE

The scores t on latent variable are computed STEP 3 The scores t on latent variable are computed

STEP 4 Regression of the response variable y on the latent variable t. Used to compute the model and the residuals to be used to compute the next latent variables.

KNOWLEDGE of DATA (EXPERIENCE) A very simple numerical example: Object Predictor Predictor Response 1 2 1 5.363 5.360 26.810 2 9.979 9.974 49.880 3 35.447 35.440 177.210 4 36.040 36.041 180.180 5 52.107 52.109 260.549 6 72.069 72.069 360.360 You can easily note that the two predictors are about equal. Really they are obtained by two repetition of the measure of the same quantity. KNOWLEDGE of DATA (EXPERIENCE)

Strategy A y = a + b x1 2y = b2 2  25 2 We know that there are really only one predictor, measured two times, so that we decide to use only the first With the method of least squares we compute slope and intercept. This is the regression model: y = a + b x1 with a = 0.01380 and b = 4.99966 With 2 variance of the predictor (obviously the same for the two predictors) the variance of the estimate of the response is (because of the law of propagation of variances) 2y = b2 2  25 2

We used only the chemical knowledge Strategy A We used only the chemical knowledge 2y  25 2

Strategy B y = a + b x1 with a = -0.00685 and b = 5.00013 We know that the mean of two repetitions has variance one half that of a single repetition. So we decide to use as a single predictor, the mean m of the two determinations. With the method of least squares we compute slope and intercept. This is the regression model: y = a + b x1 with a = -0.00685 and b = 5.00013

With 2 variance of the predictors the variance of the mean is: 2/2 So the variance of the response is computed as: 2y = b2 2/2  12.5 2

With strategy B, We used both the chemical knowledge and the knowledge of statistics 2y = 12.5 2

We use the least square multiple regression (MLR or OLS) STRATEGY C We use the least square multiple regression (MLR or OLS) With the method of least squares we compute two slopes and intercept. This is the regression model: y = a + b1 x1 + b2 x2 with a = 0.0128, b1 = 1.5019 and b2 = 3.49844 The variance of the estimate of the response is obtained from the law of propagation of variances as: 2y = b21 2 + b22 2  14.5 2

We were very lucky!!! BUT ……. b1 = 1.5019 and b2 = 3.49844 In his effort to minimize the sum of the squares of residuals OLS can be even worse.. In fact it is possible to notice that the sum of the two slopes: b1 = 1.5019 and b2 = 3.49844 is 5.00034, about the same as the unique slope obtained with strategies A and B. Apparently with two almost equal predictors what is important is the sum of the slopes. It must be about 5. So the result b1 = 15 and b2 = -10 seems acceptable, BUT …….

b21 = 152 = 225 and b22 = 102 = 100 2y = b21 2 + b22 2  325 2

Conclusion: OLS, using all the experimental information, never gives a model better than that of the strategy B (knowledge of data and of statistics) and the result can be worse than that obtained from strategy A that uses only a fraction of the information

First step: Regression of the two predictors on the response variable: Strategy PLS First step: Regression of the two predictors on the response variable: x1 = c1 + d1 y c1 = -0.00276, d1 = 0.20001 x2 = c2 + d2 y c2 = 0.00550, d2 = 0.19998

Normalization of slopes Strategy PLS Second step Normalization of slopes Result: w1 = 0.70716 w2 = 0.70705

Strategy PLS Step 3 Definition of a LATENT VARIABLE, combination of the two predictors by means of the coefficients W: t = w1 x1 + w2 x2

Strategy PLS STEP 4 Regression of the response on the latent variable. We obtain the regression model as a function of the latent variable: y = e + f t with e = -0.00685 and f = 3.53564

taking into account that From y = e + f t taking into account that t = w1 x1 + w2 x2 = 0.70716 x1 + 0.70705 x2 and that f = 3.53564 we obtain: y = -0.00685 + 2.50026 x1 + 2.49987 x2 (PLS closed form)

we can compute the variance of the response: Finally from y = -0.00685 + 2.50026 x1 + 2.4998 we can compute the variance of the response: 2y = b21 2 + b22 2 = (6.2513+6.2494) 2 = 12.5007 2

PLS is an intelligent technique The PLS model gives the same uncertainty on the response as that of Strategy B (knowledge of data and statistics, use of all the information) PLS “understand” that the two predictors have the same importance (slopes more or less equal) PLS is an intelligent technique