Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Principal Components Regression Method David C. Garen, Ph.D. Hydrologist USDA Natural Resources Conservation Service National Water and Climate Center.

Similar presentations


Presentation on theme: "The Principal Components Regression Method David C. Garen, Ph.D. Hydrologist USDA Natural Resources Conservation Service National Water and Climate Center."— Presentation transcript:

1 The Principal Components Regression Method David C. Garen, Ph.D. Hydrologist USDA Natural Resources Conservation Service National Water and Climate Center Portland, Oregon

2 The General Linear Regression Model where: Y = dependent variable X i = independent variables b i = regression coefficients n = number of independent variables

3 The Problem If X’s are intercorrelated, they contain redundant information, and the b’s cannot be meaningfully estimated. However, we don’t want to have to throw out most of the X’s but prefer to retain them for robustness.

4 The Solution Possibilities: 1) Pre-combine X’s into composite index(es), e.g., Z-score method 2) Principal components regression These are similar in concept but differ in the mathematics.

5 Principal Components Analysis Principal components regression is just like standard regression except the independent variables are principal components rather than the original X variables. Principal components are linear combinations of the X’s.

6 Principal Components Analysis Each principal component is a weighted sum of all the X’s:...

7 Principal Components Analysis The e’s are called eigenvectors, derived from a matrix equation whose input is the correlation matrix of all the X’s with each other. Principal components are new variables that are not correlated with each other. The principal components transformation is equivalent to a rotation of axes.

8 Principal Components Analysis

9 The eigenvectors (weights) are based solely on the intercorrelations among the X’s and have no knowledge of Y (in contrast to Z-score, for which the opposite is true). Principal components can be used for purely descriptive purposes, but we want to use them as independent variables in a regression.

10 Principal Components Analysis -- Example Independent Variables: X 1 – X 5 Snow water equivalent at 5 stations X 6 – X 10 Water year to date precipitation at 5 stations X 11 Antecedent streamflow X 12 Climate teleconnection index

11 Correlation Matrix X1X1 X2X2 X3X3 X4X4 X5X5 X6X6 X7X7 X8X8 X9X9 X 10 X 11 X 12 Y X1X1 1.0.72.67.76.81.54.31.54.38.50.18.64.65 X2X2 1.0.67.45.80.62.45.47.31.49.14.39.60 X3X3 1.0.49.72.84.76.86.68.85.48.56.80 X4X4 1.0.62.42.26.36.56.38.28.59.68 X5X5 1.0.62.49.51.44.62.32.59.73 X6X6 1.0.93.87.83.90.63.43.85 X7X7 1.0.82.85.90.67.32.76 X8X8 1.0.74.84.64.39.70 X9X9 1.0.80.70.49.84 X 10 1.0.64.46.79 X 11 1.0.36.51 X 12 1.0.64

12 First Five Eigenvectors PC 1 PC 2 PC 3 PC 4 PC 5 X1X1 0.2650.4440.0040.074-0.104 X2X2 0.2490.325-0.483-0.0300.315 X3X3 0.3350.016-0.1780.149-0.314 X4X4 0.2290.3530.456-0.595-0.009 X5X5 0.2870.332-0.1480.1200.412 X6X6 0.339-0.168-0.162-0.106-0.040 X7X7 0.308-0.329-0.150-0.058-0.015 X8X8 0.317-0.197-0.1140.027-0.261 X9X9 0.304-0.2400.299-0.313-0.103 X 10 0.330-0.197 0.072-0.129 X 11 0.235-0.3490.3510.1680.692 X 12 0.2320.2620.4730.675-0.212 % var.62.715.87.83.83.2

13 Principal Components Regression Procedure Try the PC’s in order Test for regression coefficient significance (t-test) Stop at first insignificant component Transform regression coefficients to be in terms of original variables Sign test – coefficient signs must be same as correlation with Y

14 Principal Components Regression Procedure t-test iterations for example data set (tcrit = 1.2): 10.243 10.105 0.622 : stop here, use only first PC Continuing... 10.225 0.629 1.235 : 3rd PC exceeds tcrit 10.261 0.632 1.239 -1.073 10.092 0.621 1.219 -1.055 -0.588 11.723 0.722 1.416 -1.225 -0.683 -2.764 11.395 0.702 1.376 -1.191 -0.664 -2.686 -0.073

15 Principal Components Regression Procedure Final model for example data set (1 PC): Y = 2.91 X 1 + 3.34 X 2 + 2.44 X 3 + 2.27 X 4 + 2.50 X 5 + 3.34 X 6 + 2.69 X 7 + 2.45 X 8 + 2.97 X 9 + 2.78 X 10 + 0.55 X 11 + 2.47 X 12 - 79.78 R = 0.906JR = 0.890 SE = 62.558JSE = 67.410

16 Summary Principal components analysis is a standard multivariate statistical procedure Can be used for descriptive purposes to reduce the dimensionality of correlated variables Can be taken a step further to provide new, non- correlated independent variables for regression PC’s taken in order, subject to t-test and sign test Final model is expressed in terms of original X variables


Download ppt "The Principal Components Regression Method David C. Garen, Ph.D. Hydrologist USDA Natural Resources Conservation Service National Water and Climate Center."

Similar presentations


Ads by Google