PCA Example The data set “Lakes” consists of five year average of water quality parameters measurements at 48 lakes in Texas for the period 1975-2010.

Slides:



Advertisements
Similar presentations
Agenda of Week XI Review of Week X Factor analysis Illustration Method of maximum likelihood Principal component analysis Usages, basic model Objective,
Advertisements

Canonical Correlation
PCA for analysis of complex multivariate data. Interpretation of large data tables by PCA In industry, research and finance the amount of data is often.
Factor Analysis and Principal Components Removing Redundancies and Finding Hidden Variables.
Evolving Factor Analysis The evolution of a chemical system is gradually known by recording a new response vector at each stage of the process under study.
An Introduction to Multivariate Analysis
Factor Analysis Continued
Machine Learning Lecture 8 Data Processing and Representation
Multivariate Data Analysis Principal Component Analysis.
1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification.
Golden Alga (Prymnesium parvum ) Salt River Lakes Marc Dahlberg Water Quality Program Manager March 8, 2011.
Lecture 6 Ordination Ordination contains a number of techniques to classify data according to predefined standards. The simplest ordination technique is.
Data mining and statistical learning - lab2-4 Lab 2, assignment 1: OLS regression of electricity consumption on temperature at 53 sites.
Principal component analysis (PCA)
Data mining and statistical learning, lecture 4 Outline Regression on a large number of correlated inputs  A few comments about shrinkage methods, such.
Exploring Microarray data Javier Cabrera. Outline 1.Exploratory Analysis Steps. 2.Microarray Data as Multivariate Data. 3.Dimension Reduction 4.Correlation.
Principal component analysis (PCA)
Principal Component Analysis Principles and Application.
Principal component analysis (PCA) Purpose of PCA Covariance and correlation matrices PCA using eigenvalues PCA using singular value decompositions Selection.
Tables, Figures, and Equations
Techniques for studying correlation and covariance structure
Principal Component Analysis. Philosophy of PCA Introduced by Pearson (1901) and Hotelling (1933) to describe the variation in a set of multivariate data.
Factor Analysis Psy 524 Ainsworth.
The Tutorial of Principal Component Analysis, Hierarchical Clustering, and Multidimensional Scaling Wenshan Wang.
Chapter 2 Dimensionality Reduction. Linear Methods
Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of.
PCA Example Air pollution in 41 cities in the USA.
Xin He, Yashwant Malaiya, Anura P. Jayasumana Kenneth P
S1 File Principal component analysis for contiguous U.S. regional temperatures Contiguous U.S. regional atmospheric temperatures , 13-year moving.
1 Dimension Reduction Examples: 1. DNA MICROARRAYS: Khan et al (2001): 4 types of small round blue cell tumors (SRBCT) Neuroblastoma (NB) Rhabdomyosarcoma.
Principal Component Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.
Data set Proteins consumption shows the estimates of the average protein consumption from different food sources for the inhabitants of 25 European countries.
Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis.
Principal Component Analysis (PCA). Data Reduction summarization of data with many (p) variables by a smaller set of (k) derived (synthetic, composite)
Principal Components Analysis. Principal Components Analysis (PCA) A multivariate technique with the central aim of reducing the dimensionality of a multivariate.
CHAPTER 10 Principal Components BAND TRANSFORMATIONS A. Dermanis.
Herbicides in Metro Atlanta Streams and Rivers Data Analysis Cristal Moon.
Principal Component Analysis (PCA)
Principal Component Analysis Zelin Jia Shengbin Lin 10/20/2015.
CSSE463: Image Recognition Day 25 This week This week Today: Applications of PCA Today: Applications of PCA Sunday night: project plans and prelim work.
Principal Components Analysis ( PCA)
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
Canonical Correlation Analysis (CCA). CCA This is it! The mother of all linear statistical analysis When ? We want to find a structural relation between.
1 Magnetics measurements in NCSX: SVD/PCA methods-I Neil Pomphrey, Ed Lazarus Stellarator Theory Teleconference Sep. 23, 2004.
Principal component analysis (PCA)
Mini-Revision Since week 5 we have learned about hypothesis testing:
CSSE463: Image Recognition Day 27
Exploring Microarray data
COMP 1942 PCA TA: Harry Chan COMP1942.
Principal Component Analysis (PCA)
Dimension Reduction via PCA (Principal Component Analysis)
Principal Component Analysis
Quality Control at a Local Brewery
Principal Component Analysis (PCA)
Descriptive Statistics vs. Factor Analysis
Covariance Vs Correlation Matrix
Kei-ichi Okazaki, Shoji Takada  Structure 
X.1 Principal component analysis
Principal Components Analysis
Brendan K. Murphy, Kenneth D. Miller  Neuron 
Principal Component Analysis (PCA)
CSSE463: Image Recognition Day 25
ALL the following plots are subject to the filtering :
Multivariate Analysis of a Carbonate Chemistry Time-Series Study
PCA of Waimea Wave Climate
CA3 Retrieves Coherent Representations from Degraded Input: Direct Evidence for CA3 Pattern Completion and Dentate Gyrus Pattern Separation  Joshua P.
Factor Analysis (Principal Components) Output
Principal Component Analysis
Volume 74, Issue 5, Pages (June 2012)
Presentation transcript:

PCA Example The data set “Lakes” consists of five year average of water quality parameters measurements at 48 lakes in Texas for the period Several lakes have golden algae boom records during this period of time. Are the differences in water quality parameters driving the golden algae blooms in these lakes? Are the water quality parameters different in lakes from a period of time to another? R data “Lakes”

PCA Example Variables: Name – name of the lake Bloom – presence or absence of golden algae blooms Year - the first year of the five year period Temp – water temperature in degrees Celsius SpCond - Specific conductance, microsiemens per centimeter DO – dissolved oxygen, mg/L pH – water pH Chloride – chloride concentration, mg/L Sulfate - sulfate concentration mg/L

PCA Example Lakes=read.csv("E:/Multivariate_analysis/Data/Lakes.csv",header=T) Read the data: Remove the first three columns of the data and keep only the water quality (WQ) parameters: Lk=Lakes[,-c(1:3)] > round(sapply(Lk,var),2) Temp SpCond DO pH Chloride Sulfate Calculate the variance for each WQ parameter:

PCA Example Normalize the data: > NLk=scale(Lk) Calculate the correlation matrix of the normalized data: > round(cor(NLk),2) Temp SpCond DO pH Chloride Sulfate Temp SpCond DO pH Chloride Sulfate

PCA Example > eigen(cor(NLk)) $values [1] $vectors [,1] [,2] [,3] [,4] [,5] [,6] [1,] [2,] [3,] [4,] [5,] [6,] Calculate the eigenvectors and eigenvalues of the correlation matrix:

PCA Example Extract the principal components from the correlation matrix: > Lakes_PCA=princomp(NLk,corr=TRUE) > summary(Lakes_PCA,loadings=TRUE) Importance of components: Comp.1 Comp.2 Comp.3 Standard deviation Proportion of Variance Cumulative Proportion Loadings: Comp.1 Comp.2 Comp.3 Temp SpCond DO pH Chloride Sulfate >

PCA Example Plot the variance of each principal component:

PCA Example Write the equations of the first three principal components: SpCond, Chloride, and Sulfate have important loadings on the first principal axis, Temp, DO, and pH contribute significantly to the second principal axis, and Temp, pH, and Chloride are important loadings on the third principal axis.

PCA Example Calculate the scores for each principal axis for the PCA diagram: > Lakes_PCA$scores Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 [1,] [2,] [3,] [4,] [5,] [6,] [7,] ………………………………………………………………………………………………………………………………………………… [174,]

PCA Example >year1=which(Lakes[,3]==sort(unique(Lakes[,3]))[1]) >year2=which(Lakes[,3]==sort(unique(Lakes[,3]))[2]) >year3=which(Lakes[,3]==sort(unique(Lakes[,3]))[3]) >year4=which(Lakes[,3]==sort(unique(Lakes[,3]))[4]) >year5=which(Lakes[,3]==sort(unique(Lakes[,3]))[5]) >year6=which(Lakes[,3]==sort(unique(Lakes[,3]))[6]) >year7=which(Lakes[,3]==sort(unique(Lakes[,3]))[7]) >plot(Lakes_PCA$scores[year1,1],Lakes_PCA$scores[year1,2],xlab="PC1",ylab="PC2",pch=15,xli m=range(Lakes_PCA$scores[,1])*c(0.98,1.3),ylim=range(Lakes_PCA$scores[,2])) >points(Lakes_PCA$scores[year2,1],Lakes_PCA$scores[year2,2],pch=15,col="red") >points(Lakes_PCA$scores[year3,1],Lakes_PCA$scores[year3,2],pch=15,col="blue") >points(Lakes_PCA$scores[year4,1],Lakes_PCA$scores[year4,2],pch=15,col="green") >points(Lakes_PCA$scores[year5,1],Lakes_PCA$scores[year5,2],pch=15,col="pink") >points(Lakes_PCA$scores[year6,1],Lakes_PCA$scores[year6,2],pch=15,col="yellow") >points(Lakes_PCA$scores[year7,1],Lakes_PCA$scores[year7,2],pch=15,col="brown") >legend(11,2,legend=as.character(sort(unique(Algae[,3]))),bty="n",pch=15,col=c("black","red","blue ","green","pink","yellow","brown")) Make a PC1 vs PC2 diagram showing each year with a different color:

PCA Example PC1 vs PC2 diagram : Several lakes have different water quality in years 1975, 1980, and 1985 (blue, red, and black isolated points).

PCA Example >plot(Lakes_PCA$scores[year1,1],Lakes_PCA$scores[year1,3],xlab="PC1",ylab="PC3",pch=15,xli m=range(Lakes_PCA$scores[,1])*c(0.98,1.3),ylim=range(Lakes_PCA$scores[,3])) >points(Lakes_PCA$scores[year2,1],Lakes_PCA$scores[year2,3],pch=15,col="red") >points(Lakes_PCA$scores[year3,1],Lakes_PCA$scores[year3,3],pch=15,col="blue") >points(Lakes_PCA$scores[year4,1],Lakes_PCA$scores[year4,3],pch=15,col="green") >points(Lakes_PCA$scores[year5,1],Lakes_PCA$scores[year5,3],pch=15,col="pink") >points(Lakes_PCA$scores[year6,1],Lakes_PCA$scores[year6,3],pch=15,col="yellow") >points(Lakes_PCA$scores[year7,1],Lakes_PCA$scores[year7,3],pch=15,col="brown") >legend("topright",legend=as.character(sort(unique(Lakes[,3]))),bty="n",pch=15,col=c("black","red", "blue","green","pink","yellow","brown")) Make a PC1 vs PC3 diagram showing each year with a different color:

PCA Example PC1 vs PC3 diagram: The five year period starting in 1985 show different water quality in several lakes (blue dots). A few lakes show differences in 1975 and 1980 compared to the rest of the group.

PCA Example Make a PC1 vs PC2 diagram showing lakes with algae bloom records in blue: >algae=which(Lakes[,2]=="Algae") >noalgae=which(Lakes[,2]=="NoAlgae") >plot(Lakes_PCA$scores[noalgae,1],Lakes_PCA$scores[noalgae,2],xlim=range(Lak es_PCA$scores[,1])*c(0.98,1.3),ylim=range(Lakes_PCA$scores[,2]),xlab="PC1",ylab ="PC2",pch=15) >points(Lakes_PCA$scores[algae,1],Lakes_PCA$scores[algae,2],pch=15,col="blue") >legend(10,6,legend=c("no-algae","algae"),bty="n",pch=15,col=c("black","blue"))

PCA Example Make a PC1 vs PC2 diagram showing algae and no-algae lakes: Clear separation between lakes with and without golden algae blooms on the PC1 axis.

PCA Example Make a PC1 vs PC3 diagram showing lakes with algae bloom records in blue: >algae=which(Lakes[,2]=="Algae") >noalgae=which(Lakes[,2]=="NoAlgae") >plot(Lakes_PCA$scores[noalgae,1],Lakes_PCA$scores[noalgae,3],xlim=range(Lake s_PCA$scores[,1])*c(0.98,1.3),ylim=range(Lakes_PCA$scores[,3]),xlab="PC1",ylab=" PC3",pch=15) >points(Lakes_PCA$scores[algae,1],Lakes_PCA$scores[algae,3],pch=15,col="blue") >legend(10,2,legend=c("no-algae","algae"),bty="n",pch=15,col=c("black","blue"))

PCA Example PC1 vs PC3 diagram : The separation between algae lakes and no-algae lakes is given by PC1.

PCA Example Biplot of the first two principal components. Separation of algae lakes from no-algae lakes is determined by the variables Chloride, Sulfate, and SpCond. The eigenvectors of these three variables are so close in value that the arrows overlap. > biplot(Lakes_PCA,xlabs=abbreviate(Lakes[,1]),xlim=c(-0.1,0.3),ylim=c(-0.2,0.3))

PCA Example Biplot of the first two principal components: > biplot(Lakes_PCA,xlabs=rep("",dim(Lakes)[1]),xlim=c(-0.1,0.3),ylim=c(-0.2,0.2)) > points(Lakes_PCA$scores[noalgae,1],Lakes_PCA$scores[noalgae,2],col="black",pch=16) > points(Lakes_PCA$scores[algae,1],Lakes_PCA$scores[algae,2],col="blue",pch=16)