The Principal Components Regression Method David C. Garen, Ph.D. Hydrologist USDA Natural Resources Conservation Service National Water and Climate Center.

Slides:



Advertisements
Similar presentations
The Game of Algebra or The Other Side of Arithmetic The Game of Algebra or The Other Side of Arithmetic © 2007 Herbert I. Gross by Herbert I. Gross & Richard.
Advertisements

Canonical Correlation
Factor Analysis and Principal Components Removing Redundancies and Finding Hidden Variables.
Factor Analysis Continued
The General Linear Model Or, What the Hell’s Going on During Estimation?
Chapter Nineteen Factor Analysis.
8 - 1 Multivariate Linear Regression Chapter Multivariate Analysis Every program has three major elements that might affect cost: – Size » Weight,
Matrix Algebra Matrix algebra is a means of expressing large numbers of calculations made upon ordered sets of numbers. Often referred to as Linear Algebra.
Drought workshop, Boulder, CO, August 2009 The Revised Surface Water Supply Index: Formulation and Issues David C. Garen, Ph.D. Hydrologist USDA Natural.
1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification.
Lecture 7: Principal component analysis (PCA)
1 Multivariate Statistics ESM 206, 5/17/05. 2 WHAT IS MULTIVARIATE STATISTICS? A collection of techniques to help us understand patterns in and make predictions.
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
Principal Components An Introduction Exploratory factoring Meaning & application of “principal components” Basic steps in a PC analysis PC extraction process.
Common Factor Analysis “World View” of PC vs. CF Choosing between PC and CF PAF -- most common kind of CF Communality & Communality Estimation Common Factor.
REGRESSION Want to predict one variable (say Y) using the other variable (say X) GOAL: Set up an equation connecting X and Y. Linear regression linear.
Bivariate Regression CJ 526 Statistical Analysis in Criminal Justice.
Econ 140 Lecture 131 Multiple Regression Models Lecture 13.
General linear model and regression analysis. The general linear model: Y = μ + σ 2 (Age) + σ 2 (Sex) + σ 2 (Genotype) + σ 2 (Measurement) + σ 2 (Condition)
Linear Regression Models Based on Chapter 3 of Hastie, Tibshirani and Friedman Slides by David Madigan.
Canonical Correlation: Equations Psy 524 Andrew Ainsworth.
Intro to Statistics for the Behavioral Sciences PSYC 1900
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Topic 3: Regression.
A Brief Introduction to Statistical Forecasting Kevin Werner.
Example of Simple and Multiple Regression
Principle Component Analysis (PCA) Networks (§ 5.8) PCA: a statistical procedure –Reduce dimensionality of input vectors Too many features, some of them.
Principal Components Analysis (PCA). a technique for finding patterns in data of high dimension.
Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of.
Some matrix stuff.
Introduction to the gradient analysis. Community concept (from Mike Austin)
Copyright © 2010 Pearson Education, Inc. All rights reserved. 4.3 – Slide 1.
Multiple Regression The Basics. Multiple Regression (MR) Predicting one DV from a set of predictors, the DV should be interval/ratio or at least assumed.
Multivariate Statistics Matrix Algebra I W. M. van der Veld University of Amsterdam.
Steps in Regression Analysis (1) Choose the dependent and independent variables (2) Examine the scatterplots and the correlation matrix Check for any high.
Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.
Canonical Correlation Psy 524 Andrew Ainsworth. Matrices Summaries and reconfiguration.
A note about gradient descent: Consider the function f(x)=(x-x 0 ) 2 Its derivative is: By gradient descent (If f(x) is more complex we usually cannot.
Education 793 Class Notes Multiple Regression 19 November 2003.
Principal Component Analysis (PCA). Data Reduction summarization of data with many (p) variables by a smaller set of (k) derived (synthetic, composite)
Analysis Overheads1 Analyzing Heterogeneous Distributions: Multiple Regression Analysis Analog to the ANOVA is restricted to a single categorical between.
Nonlinear Models. Agenda Omitted Variables Dummy Variables Nonlinear Models Nonlinear in variables Polynomial Regressions Log Transformed Regressions.
Carlos H. R. Lima - Depto. of Civil and Environmental Engineering, University of Brasilia. Brazil. Upmanu Lall - Water Center, Columbia.
Hydrologic Forecasting With Statistical Models Angus Goodbody David Garen USDA Natural Resources Conservation Service National Water and Climate Center.
Lecture 12 Factor Analysis.
1 Experimental Statistics - week 12 Chapter 12: Multiple Regression Chapter 13: Variable Selection Model Checking.
Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore
Principle Component Analysis and its use in MA clustering Lecture 12.
Differential Equations Linear Equations with Variable Coefficients.
Feature Selection and Extraction Michael J. Watts
Chapter 8 Relationships Among Variables. Outline What correlational research investigates Understanding the nature of correlation What the coefficient.
Principal Component Analysis
Multivariate Transformation. Multivariate Transformations  Started in statistics of psychology and sociology.  Also called multivariate analyses and.
Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L11.1 Lecture 11: Canonical correlation analysis (CANCOR)
Simple and multiple regression analysis in matrix form Least square Beta estimation Beta Simple linear regression Multiple regression with two predictors.
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
Methods of multivariate analysis Ing. Jozef Palkovič, PhD.
Chapter 12 REGRESSION DIAGNOSTICS AND CANONICAL CORRELATION.
Section 6-1: Multivariate Linear Systems and Row Operations A multivariate linear system (also multivariable linear system) is a system of linear equations.
12 Systems of Linear Equations and Inequalities.
Linear Regression.
Multicollinearity in Regression Principal Components Analysis
Descriptive Statistics vs. Factor Analysis
Shudong Wang, NWEA Liru Zhang, Delaware DOE G. Gage Kingsbury, NWEA
Principal Component Analysis
Solving Systems of Equations by the Substitution and Addition Methods
Principal Component Analysis
Regression and Correlation of Data
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

The Principal Components Regression Method David C. Garen, Ph.D. Hydrologist USDA Natural Resources Conservation Service National Water and Climate Center Portland, Oregon

The General Linear Regression Model where: Y = dependent variable X i = independent variables b i = regression coefficients n = number of independent variables

The Problem If X’s are intercorrelated, they contain redundant information, and the b’s cannot be meaningfully estimated. However, we don’t want to have to throw out most of the X’s but prefer to retain them for robustness.

The Solution Possibilities: 1) Pre-combine X’s into composite index(es), e.g., Z-score method 2) Principal components regression These are similar in concept but differ in the mathematics.

Principal Components Analysis Principal components regression is just like standard regression except the independent variables are principal components rather than the original X variables. Principal components are linear combinations of the X’s.

Principal Components Analysis Each principal component is a weighted sum of all the X’s:...

Principal Components Analysis The e’s are called eigenvectors, derived from a matrix equation whose input is the correlation matrix of all the X’s with each other. Principal components are new variables that are not correlated with each other. The principal components transformation is equivalent to a rotation of axes.

Principal Components Analysis

The eigenvectors (weights) are based solely on the intercorrelations among the X’s and have no knowledge of Y (in contrast to Z-score, for which the opposite is true). Principal components can be used for purely descriptive purposes, but we want to use them as independent variables in a regression.

Principal Components Analysis -- Example Independent Variables: X 1 – X 5 Snow water equivalent at 5 stations X 6 – X 10 Water year to date precipitation at 5 stations X 11 Antecedent streamflow X 12 Climate teleconnection index

Correlation Matrix X1X1 X2X2 X3X3 X4X4 X5X5 X6X6 X7X7 X8X8 X9X9 X 10 X 11 X 12 Y X1X X2X X3X X4X X5X X6X X7X X8X X9X X X X

First Five Eigenvectors PC 1 PC 2 PC 3 PC 4 PC 5 X1X X2X X3X X4X X5X X6X X7X X8X X9X X X X % var

Principal Components Regression Procedure Try the PC’s in order Test for regression coefficient significance (t-test) Stop at first insignificant component Transform regression coefficients to be in terms of original variables Sign test – coefficient signs must be same as correlation with Y

Principal Components Regression Procedure t-test iterations for example data set (tcrit = 1.2): : stop here, use only first PC Continuing : 3rd PC exceeds tcrit

Principal Components Regression Procedure Final model for example data set (1 PC): Y = 2.91 X X X X X X X X X X X X R = 0.906JR = SE = JSE =

Summary Principal components analysis is a standard multivariate statistical procedure Can be used for descriptive purposes to reduce the dimensionality of correlated variables Can be taken a step further to provide new, non- correlated independent variables for regression PC’s taken in order, subject to t-test and sign test Final model is expressed in terms of original X variables