round(sapply(Lk,var),2) Temp SpCond DO pH Chloride Sulfate Calculate the variance for each WQ parameter:"> round(sapply(Lk,var),2) Temp SpCond DO pH Chloride Sulfate Calculate the variance for each WQ parameter:">

Presentation is loading. Please wait.

Presentation is loading. Please wait.

PCA Example The data set “Lakes” consists of five year average of water quality parameters measurements at 48 lakes in Texas for the period 1975-2010.

Similar presentations


Presentation on theme: "PCA Example The data set “Lakes” consists of five year average of water quality parameters measurements at 48 lakes in Texas for the period 1975-2010."— Presentation transcript:

1 PCA Example The data set “Lakes” consists of five year average of water quality parameters measurements at 48 lakes in Texas for the period 1975-2010. Several lakes have golden algae boom records during this period of time. Are the differences in water quality parameters driving the golden algae blooms in these lakes? Are the water quality parameters different in lakes from a period of time to another? R data “Lakes”

2 PCA Example Variables: Name – name of the lake Bloom – presence or absence of golden algae blooms Year - the first year of the five year period Temp – water temperature in degrees Celsius SpCond - Specific conductance, microsiemens per centimeter DO – dissolved oxygen, mg/L pH – water pH Chloride – chloride concentration, mg/L Sulfate - sulfate concentration mg/L

3 PCA Example Lakes=read.csv("E:/Multivariate_analysis/Data/Lakes.csv",header=T) Read the data: Remove the first three columns of the data and keep only the water quality (WQ) parameters: Lk=Lakes[,-c(1:3)] > round(sapply(Lk,var),2) Temp SpCond DO pH Chloride Sulfate 9.98 2395416.94 0.83 0.14 162220.42 55044.98 Calculate the variance for each WQ parameter:

4 PCA Example Normalize the data: > NLk=scale(Lk) Calculate the correlation matrix of the normalized data: > round(cor(NLk),2) Temp SpCond DO pH Chloride Sulfate Temp 1.00 -0.02 -0.57 -0.20 0.00 0.01 SpCond -0.02 1.00 -0.07 0.29 0.85 0.96 DO -0.57 -0.07 1.00 0.35 -0.09 -0.10 pH -0.20 0.29 0.35 1.00 0.21 0.26 Chloride 0.00 0.85 -0.09 0.21 1.00 0.81 Sulfate 0.01 0.96 -0.10 0.26 0.81 1.00

5 PCA Example > eigen(cor(NLk)) $values [1] 2.86256674 1.75834470 0.75470254 0.38378574 0.20896051 0.03163976 $vectors [,1] [,2] [,3] [,4] [,5] [,6] [1,] -0.01709341 0.60859261 0.53586341 -0.58462716 0.002062515 0.019495627 [2,] 0.57759456 0.03011448 -0.08663364 -0.03882989 0.304075712 0.751000965 [3,] -0.03239882 -0.66764588 -0.05186583 -0.74159388 0.023311888 -0.002075523 [4,] 0.22462025 -0.41945689 0.82114464 0.30853194 -0.060881157 -0.020607442 [5,] 0.54026408 0.06014976 -0.14527739 -0.09287685 -0.814058298 -0.109882586 [6,] 0.56806964 0.05826710 -0.08526964 -0.05407189 0.490502635 -0.650472377 Calculate the eigenvectors and eigenvalues of the correlation matrix:

6 PCA Example Extract the principal components from the correlation matrix: > Lakes_PCA=princomp(NLk,corr=TRUE) > summary(Lakes_PCA,loadings=TRUE) Importance of components: Comp.1 Comp.2 Comp.3 Standard deviation 1.6870433 1.3222100 0.8662362 Proportion of Variance 0.4770945 0.2930575 0.1257838 Cumulative Proportion 0.4770945 0.7701519 0.8959357 Loadings: Comp.1 Comp.2 Comp.3 Temp 0.609 0.536 SpCond 0.578 DO -0.668 pH 0.225 -0.419 0.821 Chloride 0.540 -0.145 Sulfate 0.568 >

7 PCA Example Plot the variance of each principal component:

8 PCA Example Write the equations of the first three principal components: SpCond, Chloride, and Sulfate have important loadings on the first principal axis, Temp, DO, and pH contribute significantly to the second principal axis, and Temp, pH, and Chloride are important loadings on the third principal axis.

9 PCA Example Calculate the scores for each principal axis for the PCA diagram: > Lakes_PCA$scores Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 [1,] -0.840940601 -0.129405032 0.3724851406 0.282009798 -0.0159568183 -0.0245294336 [2,] -1.129223226 -0.998041022 -1.6138622935 -0.108292698 0.1506844836 -0.0994239039 [3,] -0.803185195 2.824527084 1.7646947373 0.388294337 -0.0344049191 0.0068578657 [4,] -0.800500726 0.178911098 0.2942787158 0.185687561 -0.0006769275 0.0366685692 [5,] -0.984180111 -0.638164386 -0.2893359063 -0.977367586 0.0675853680 0.0448307344 [6,] -0.726931303 1.823618015 1.8948715840 -0.672342526 -0.0255919528 -0.0070631567 [7,] -0.768218704 -1.767306989 -0.0040369230 -0.037962664 -0.0485309836 -0.0275789371 ………………………………………………………………………………………………………………………………………………… [174,] 1.629960036 -0.579185868 0.3245963801 -0.470741904 -0.0852283736 0.7779341323

10 PCA Example >year1=which(Lakes[,3]==sort(unique(Lakes[,3]))[1]) >year2=which(Lakes[,3]==sort(unique(Lakes[,3]))[2]) >year3=which(Lakes[,3]==sort(unique(Lakes[,3]))[3]) >year4=which(Lakes[,3]==sort(unique(Lakes[,3]))[4]) >year5=which(Lakes[,3]==sort(unique(Lakes[,3]))[5]) >year6=which(Lakes[,3]==sort(unique(Lakes[,3]))[6]) >year7=which(Lakes[,3]==sort(unique(Lakes[,3]))[7]) >plot(Lakes_PCA$scores[year1,1],Lakes_PCA$scores[year1,2],xlab="PC1",ylab="PC2",pch=15,xli m=range(Lakes_PCA$scores[,1])*c(0.98,1.3),ylim=range(Lakes_PCA$scores[,2])) >points(Lakes_PCA$scores[year2,1],Lakes_PCA$scores[year2,2],pch=15,col="red") >points(Lakes_PCA$scores[year3,1],Lakes_PCA$scores[year3,2],pch=15,col="blue") >points(Lakes_PCA$scores[year4,1],Lakes_PCA$scores[year4,2],pch=15,col="green") >points(Lakes_PCA$scores[year5,1],Lakes_PCA$scores[year5,2],pch=15,col="pink") >points(Lakes_PCA$scores[year6,1],Lakes_PCA$scores[year6,2],pch=15,col="yellow") >points(Lakes_PCA$scores[year7,1],Lakes_PCA$scores[year7,2],pch=15,col="brown") >legend(11,2,legend=as.character(sort(unique(Algae[,3]))),bty="n",pch=15,col=c("black","red","blue ","green","pink","yellow","brown")) Make a PC1 vs PC2 diagram showing each year with a different color:

11 PCA Example PC1 vs PC2 diagram : Several lakes have different water quality in years 1975, 1980, and 1985 (blue, red, and black isolated points).

12 PCA Example >plot(Lakes_PCA$scores[year1,1],Lakes_PCA$scores[year1,3],xlab="PC1",ylab="PC3",pch=15,xli m=range(Lakes_PCA$scores[,1])*c(0.98,1.3),ylim=range(Lakes_PCA$scores[,3])) >points(Lakes_PCA$scores[year2,1],Lakes_PCA$scores[year2,3],pch=15,col="red") >points(Lakes_PCA$scores[year3,1],Lakes_PCA$scores[year3,3],pch=15,col="blue") >points(Lakes_PCA$scores[year4,1],Lakes_PCA$scores[year4,3],pch=15,col="green") >points(Lakes_PCA$scores[year5,1],Lakes_PCA$scores[year5,3],pch=15,col="pink") >points(Lakes_PCA$scores[year6,1],Lakes_PCA$scores[year6,3],pch=15,col="yellow") >points(Lakes_PCA$scores[year7,1],Lakes_PCA$scores[year7,3],pch=15,col="brown") >legend("topright",legend=as.character(sort(unique(Lakes[,3]))),bty="n",pch=15,col=c("black","red", "blue","green","pink","yellow","brown")) Make a PC1 vs PC3 diagram showing each year with a different color:

13 PCA Example PC1 vs PC3 diagram: The five year period starting in 1985 show different water quality in several lakes (blue dots). A few lakes show differences in 1975 and 1980 compared to the rest of the group.

14 PCA Example Make a PC1 vs PC2 diagram showing lakes with algae bloom records in blue: >algae=which(Lakes[,2]=="Algae") >noalgae=which(Lakes[,2]=="NoAlgae") >plot(Lakes_PCA$scores[noalgae,1],Lakes_PCA$scores[noalgae,2],xlim=range(Lak es_PCA$scores[,1])*c(0.98,1.3),ylim=range(Lakes_PCA$scores[,2]),xlab="PC1",ylab ="PC2",pch=15) >points(Lakes_PCA$scores[algae,1],Lakes_PCA$scores[algae,2],pch=15,col="blue") >legend(10,6,legend=c("no-algae","algae"),bty="n",pch=15,col=c("black","blue"))

15 PCA Example Make a PC1 vs PC2 diagram showing algae and no-algae lakes: Clear separation between lakes with and without golden algae blooms on the PC1 axis.

16 PCA Example Make a PC1 vs PC3 diagram showing lakes with algae bloom records in blue: >algae=which(Lakes[,2]=="Algae") >noalgae=which(Lakes[,2]=="NoAlgae") >plot(Lakes_PCA$scores[noalgae,1],Lakes_PCA$scores[noalgae,3],xlim=range(Lake s_PCA$scores[,1])*c(0.98,1.3),ylim=range(Lakes_PCA$scores[,3]),xlab="PC1",ylab=" PC3",pch=15) >points(Lakes_PCA$scores[algae,1],Lakes_PCA$scores[algae,3],pch=15,col="blue") >legend(10,2,legend=c("no-algae","algae"),bty="n",pch=15,col=c("black","blue"))

17 PCA Example PC1 vs PC3 diagram : The separation between algae lakes and no-algae lakes is given by PC1.

18 PCA Example Biplot of the first two principal components. Separation of algae lakes from no-algae lakes is determined by the variables Chloride, Sulfate, and SpCond. The eigenvectors of these three variables are so close in value that the arrows overlap. > biplot(Lakes_PCA,xlabs=abbreviate(Lakes[,1]),xlim=c(-0.1,0.3),ylim=c(-0.2,0.3))

19 PCA Example Biplot of the first two principal components: > biplot(Lakes_PCA,xlabs=rep("",dim(Lakes)[1]),xlim=c(-0.1,0.3),ylim=c(-0.2,0.2)) > points(Lakes_PCA$scores[noalgae,1],Lakes_PCA$scores[noalgae,2],col="black",pch=16) > points(Lakes_PCA$scores[algae,1],Lakes_PCA$scores[algae,2],col="blue",pch=16)


Download ppt "PCA Example The data set “Lakes” consists of five year average of water quality parameters measurements at 48 lakes in Texas for the period 1975-2010."

Similar presentations


Ads by Google