Multivariate Data Analysis Principal Component Analysis.

Slides:



Advertisements
Similar presentations
Krishna Rajan Data Dimensionality Reduction: Introduction to Principal Component Analysis Case Study: Multivariate Analysis of Chemistry-Property data.
Advertisements

Outlines Background & motivation Algorithms overview
PCA for analysis of complex multivariate data. Interpretation of large data tables by PCA In industry, research and finance the amount of data is often.
Covariance Matrix Applications
Factor Analysis and Principal Components Removing Redundancies and Finding Hidden Variables.
Pattern Recognition for the Natural Sciences Explorative Data Analysis Principal Component Analysis (PCA) Lutgarde Buydens, IMM, Analytical Chemistry.
PCA + SVD.
1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification.
Lecture 7: Principal component analysis (PCA)
Principal Components Analysis Babak Rasolzadeh Tuesday, 5th December 2006.
Principal Component Analysis
Principal component regression
Principal component analysis (PCA)
The rank of a product of two matrices X and Y is equal to the smallest of the rank of X and Y: Rank (X Y) =min (rank (X), rank (Y)) A = C S.
Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 9(b) Principal Components Analysis Martin Russell.
Data mining and statistical learning, lecture 4 Outline Regression on a large number of correlated inputs  A few comments about shrinkage methods, such.
09/05/2005 סמינריון במתמטיקה ביולוגית Dimension Reduction - PCA Principle Component Analysis.
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Exploring Microarray data Javier Cabrera. Outline 1.Exploratory Analysis Steps. 2.Microarray Data as Multivariate Data. 3.Dimension Reduction 4.Correlation.
PATTERN RECOGNITION : PRINCIPAL COMPONENTS ANALYSIS Prof.Dr.Cevdet Demir
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Principal component analysis (PCA)
Principal component analysis (PCA) Purpose of PCA Covariance and correlation matrices PCA using eigenvalues PCA using singular value decompositions Selection.
Computer Vision Spring ,-685 Instructor: S. Narasimhan WH 5409 T-R 10:30am – 11:50am Lecture #18.
Principal Component Analysis. Philosophy of PCA Introduced by Pearson (1901) and Hotelling (1933) to describe the variation in a set of multivariate data.
Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural.
Object Orie’d Data Analysis, Last Time Finished Algebra Review Multivariate Probability Review PCA as an Optimization Problem (Eigen-decomp. gives rotation,
Chapter 2 Dimensionality Reduction. Linear Methods
PCA Example Air pollution in 41 cities in the USA.
Computer Vision Spring ,-685 Instructor: S. Narasimhan Wean 5403 T-R 3:00pm – 4:20pm Lecture #19.
Threeway analysis Batch organic synthesis. Paul Geladi Head of Research NIRCE Chairperson NIR Nord Unit of Biomass Technology and Chemistry Swedish University.
Additive Data Perturbation: data reconstruction attacks.
Principal Component Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.
Data set Proteins consumption shows the estimates of the average protein consumption from different food sources for the inhabitants of 25 European countries.
Math 5364/66 Notes Principal Components and Factor Analysis in SAS Jesse Crawford Department of Mathematics Tarleton State University.
CLASSIFICATION. Periodic Table of Elements 1789 Lavosier 1869 Mendelev.
Principal Components Analysis. Principal Components Analysis (PCA) A multivariate technique with the central aim of reducing the dimensionality of a multivariate.
Examining Data. Constructing a variable 1. Assemble a set of items that might work together to define a construct/ variable. 2. Hypothesize the hierarchy.
Visualizing and Exploring Data 1. Outline 1.Introduction 2.Summarizing Data: Some Simple Examples 3.Tools for Displaying Single Variable 4.Tools for Displaying.
EIGENSYSTEMS, SVD, PCA Big Data Seminar, Dedi Gadot, December 14 th, 2014.
Equilibrium systems Chromatography systems Number of PCs original Mean centered Number of PCs original Mean centered
PATTERN RECOGNITION : PRINCIPAL COMPONENTS ANALYSIS Richard Brereton
Principal Component Analysis (PCA)
Principal Component Analysis (PCA)
Principal Component Analysis Zelin Jia Shengbin Lin 10/20/2015.
Principal Component Analysis (PCA).
Presented by: Muhammad Wasif Laeeq (BSIT07-1) Muhammad Aatif Aneeq (BSIT07-15) Shah Rukh (BSIT07-22) Mudasir Abbas (BSIT07-34) Ahmad Mushtaq (BSIT07-45)
GWAS Data Analysis. L1 PCA Challenge: L1 Projections Hard to Interpret (i.e. Little Data Insight) Solution: 1)Compute PC Directions Using L1 2)Compute.
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
Dynamic graphics, Principal Component Analysis Ker-Chau Li UCLA department of Statistics.
Principal Component Analysis
Principal component analysis (PCA)
Mini-Revision Since week 5 we have learned about hypothesis testing:
Exploring Microarray data
Principal Component Analysis (PCA)
Principal Component Analysis
Principal Nested Spheres Analysis
Quality Control at a Local Brewery
Dynamic graphics, Principal Component Analysis
Example of PCR, interpretation of calibration equations
Principal Component Analysis (PCA)
Recitation: SVD and dimensionality reduction
X.1 Principal component analysis
Principal Components Analysis
Factor Analysis (Principal Components) Output
Principal Component Analysis
Examining Data.
INTRODUCTION TO Machine Learning
Presentation transcript:

Multivariate Data Analysis Principal Component Analysis

Principal Component Analysis (PCA) Singular Value Decomposition Eigenvector / eigenvalue calculation

Data Matrix (IxK) Reduce variables Improve projections Remove noise Find outliers Find classes X I K

PCA Example with 2 variables, 6 objects Find best (most informative) direction in space Describe direction Make projection

x1x1 x2x2

x1x1 x2x2

1st PC

Score Residual

1st PC Loading p1 Loading p2 Unit vector

1st PC Loading p1 = cos(  ) Loading p2 = sin (  ) Unit vector 

X t p I K Score vector Loading vector i

X t p I K Score vector Loading vector k

X t p I K Score vector Loading vector

X = t 1 p 1 ’ + t 2 p 2 ’ t A p A ’ + E X=TP’+E X : properly preprocessed (IxK) T: Score matrix (IxA) P: loading matrix (KxA) E: residual matrix (IxK) t a : score vector p a : loading vector

The Wine Example People magazine Wise & Gallagher

France Italy Switz Austra Brit U.S.A. Russia Czech Japan Mexico Wine Beer Spirit LifeEx HeartD

Beer Wine Spirit LifeEx HeartD Mean Standard Deviation

Component Singular value 1 =46% 32% 12% 8% 2%

Score 1 (46%) Score 2 (32%) France Italy Switz Austral Brit USA Russia Czech Japan Mex

Loading 1 Loading 2 Wine Beer Spirit Life exp. Heart dis.

Conclusions Scores = positions of objects in multivariate space Loadings = importance of original variables for new directions Try to explain a large enough portion of X (46+32 = 78%)

The Apricot Example Manley & Geladi

Wavelength, nm Pseudoabsorbance Appelkoos

Component number Singular value Scree plot

What is rank? Mathematical rank = max(min(I,K)) Gives zero residual Effective rank = A Separates model from noise

ANOVA Comp# SSSS%SS%cum Total

Score 1 (98%) Score 2 (2%)

ANOVA SS tot = SS 1 + SS 2 + SS SS (I or K) SS tot = (I or K) From largest to smallest!

ANOVA X = TP’ + E data = model + residual SStot = SSmod + SSres R 2 = SSmod / SStot = 1 - SSres / SStot Coefficient of determination (often in %)

Examples Wines R 2 = SSmod = 78% SSres = 22% 2 Comp. Apricots 1 R 2 = SSmod = 99.93% SSres = 0.07% 2 Comp. Apricots 2 R 2 = SSmod = 100% SSres = ±0.0% 3 Comp.

Wavelength, nm Absorbance Outliers removed

Singular values Component No outliers 1 =81% 16% 3%

Score 2 (16%) Score 3 (3%) Whole fruit No kernel Thin slice

Wavelength, nm Loading 2 3

Loading 2 Loading 3

More nomenclature Score = Latent Variable Loading vector = Eigenvector Effective rank = Pseudorank = Model dimensionality = Number of components SS a = Eigenvalue Singular value = SS a 1/2

An analysis sequence 1. Scale, mean-center data 2. Calculate a few components 3. Check scores, loadings 4. Find outliers, groupings, explain 5. Remove outliers

An analysis sequence 6. Scale, mean-center data 7. Calculate enough components 8. Try to detemine pseudorank 9. Check score plots 10. Check loading plots 11. Check residuals

Residual stdev Wines

Residual stdev Wines