Pattern Recognition for the Natural Sciences Explorative Data Analysis Principal Component Analysis (PCA) Lutgarde Buydens, IMM, Analytical Chemistry.

Slides:



Advertisements
Similar presentations
Krishna Rajan Data Dimensionality Reduction: Introduction to Principal Component Analysis Case Study: Multivariate Analysis of Chemistry-Property data.
Advertisements

PCA for analysis of complex multivariate data. Interpretation of large data tables by PCA In industry, research and finance the amount of data is often.
Evolving Factor Analysis The evolution of a chemical system is gradually known by recording a new response vector at each stage of the process under study.
Dimension reduction (1)
September 2002 Center for Statistics, transnational University Limburg, Hasselt, Belgium and J&J PRD, Janssen Pharmaceutica, Beerse, Belgium 1 Graphical.
1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification.
Multivariate Methods Pattern Recognition and Hypothesis Testing.
Principal Component Analysis
Principal Components. Karl Pearson Principal Components (PC) Objective: Given a data matrix of dimensions nxp (p variables and n elements) try to represent.
DNA Microarray Bioinformatics - #27611 Program Normalization exercise (from last week) Dimension reduction theory (PCA/Clustering) Dimension reduction.
Dimension reduction : PCA and Clustering Agnieszka S. Juncker Slides: Christopher Workman and Agnieszka S. Juncker Center for Biological Sequence Analysis.
Unsupervised Learning - PCA The neural approach->PCA; SVD; kernel PCA Hertz chapter 8 Presentation based on Touretzky + various additions.
Principal component analysis (PCA)
The rank of a product of two matrices X and Y is equal to the smallest of the rank of X and Y: Rank (X Y) =min (rank (X), rank (Y)) A = C S.
Metabolomics Bob Ward German Lab Food Science and Technology.
‘Gene Shaving’ as a method for identifying distinct sets of genes with similar expression patterns Tim Randolph & Garth Tan Presentation for Stat 593E.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.
09/05/2005 סמינריון במתמטיקה ביולוגית Dimension Reduction - PCA Principle Component Analysis.
1 Data Analysis  Data Matrix Variables ObjectsX1X1 X2X2 X3X3 …XPXP n.
The UNIVERSITY of Kansas EECS 800 Research Seminar Mining Biological Data Instructor: Luke Huan Fall, 2006.
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
Microarray analysis Algorithms in Computational Biology Spring 2006 Written by Itai Sharon.
Contingency tables and Correspondence analysis Contingency table Pearson’s chi-squared test for association Correspondence analysis using SVD Plots References.
Exploring Microarray data Javier Cabrera. Outline 1.Exploratory Analysis Steps. 2.Microarray Data as Multivariate Data. 3.Dimension Reduction 4.Correlation.
PATTERN RECOGNITION : PRINCIPAL COMPONENTS ANALYSIS Prof.Dr.Cevdet Demir
Principal component analysis (PCA)
Principal component analysis (PCA) Purpose of PCA Covariance and correlation matrices PCA using eigenvalues PCA using singular value decompositions Selection.
Agenda Dimension reduction Principal component analysis (PCA)
Principal Component Analysis. Philosophy of PCA Introduced by Pearson (1901) and Hotelling (1933) to describe the variation in a set of multivariate data.
Analyzing Metabolomic Datasets Jack Liu Statistical Science, RTP, GSK
The Tutorial of Principal Component Analysis, Hierarchical Clustering, and Multidimensional Scaling Wenshan Wang.
NUS CS5247 A dimensionality reduction approach to modeling protein flexibility By, By Miguel L. Teodoro, George N. Phillips J* and Lydia E. Kavraki Rice.
Chapter 2 Dimensionality Reduction. Linear Methods
Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of.
Next. A Big Thanks Again Prof. Jason Bohland Quantitative Neuroscience Laboratory Boston University.
Feature extraction 1.Introduction 2.T-test 3.Signal Noise Ratio (SNR) 4.Linear Correlation Coefficient (LCC) 5.Principle component analysis (PCA) 6.Linear.
1 Dimension Reduction Examples: 1. DNA MICROARRAYS: Khan et al (2001): 4 types of small round blue cell tumors (SRBCT) Neuroblastoma (NB) Rhabdomyosarcoma.
Canonical Correlation Analysis and Related Techniques Simon Mason International Research Institute for Climate Prediction The Earth Institute of Columbia.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman modified by Hanne Jarmer.
Correspondence Analysis Ahmed Rebai Center of Biotechnology of Sfax.
1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh
Blind Information Processing: Microarray Data Hyejin Kim, Dukhee KimSeungjin Choi Department of Computer Science and Engineering, Department of Chemical.
PATTERN RECOGNITION : CLUSTERING AND CLASSIFICATION Richard Brereton
CpSc 881: Machine Learning PCA and MDS. 2 Copy Right Notice Most slides in this presentation are adopted from slides of text book and various sources.
Analyzing Expression Data: Clustering and Stats Chapter 16.
EIGENSYSTEMS, SVD, PCA Big Data Seminar, Dedi Gadot, December 14 th, 2014.
PATTERN RECOGNITION : PRINCIPAL COMPONENTS ANALYSIS Richard Brereton
Principal Component Analysis (PCA)
Principal Component Analysis Zelin Jia Shengbin Lin 10/20/2015.
Principal Component Analysis and Linear Discriminant Analysis for Feature Reduction Jieping Ye Department of Computer Science and Engineering Arizona State.
Principal Components Analysis ( PCA)
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
Unsupervised Learning II Feature Extraction
Advanced Strategies for Metabolomic Data Analysis Dmitry Grapov, PhD.
Principal Component Analysis
Principal component analysis (PCA)
Principal Component Analysis (PCA)
Unsupervised Learning
PREDICT 422: Practical Machine Learning
GRAPHICAL REPRESENTATIONS OF A DATA MATRIX
Exploring Microarray data
Gene Set Enrichment Analysis
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker
Machine Learning Dimensionality Reduction
Principal Component Analysis
Covariance Vs Correlation Matrix
Dimension reduction : PCA and Clustering
Unsupervised Learning
Presentation transcript:

Pattern Recognition for the Natural Sciences Explorative Data Analysis Principal Component Analysis (PCA) Lutgarde Buydens, IMM, Analytical Chemistry

Why Explorative Data Analysis ? Classical Science ? [ System Paradigm change in natural sciences Hypothesis driven

Why Explorative Data Analysis? Classical Science Science with advanced technologies ? [ System Explorative Analysis of data ? System Paradigm change in natural sciences Hypothesis drivenData driven

Explorative Data Analysis Advanced technology: High throughput (high quality) analysis NMR, HPLC, GC, MS/MS, immune assays, Hybrids Nano/Sensor technology Genomics (gene expression profiling) Proteomics, Metabolomics Fingerprinting Profiling in drug design Overwhelming amount of data

Explorative Data Analysis Visualization (principal component analysis, projections) Unsupervised Pattern recognition (clustering) Supervised Pattern recognition (classification) Quantitative analysis (correlations, predictions)

Principal Component Analysis: an Example 150 samples of Italian wines from the same region 3 different cultivars Is it possible to characterise cultivars ? Which variables are relevant for which cultivars ?

p (13 properties) (variables) (150 wine samples) n (objects) X ij Flavanoid concentration of sample 75 X x ij xjxj xixi Flavanoid concentration Data Matrix

Principal Component Analysis Barplot of 1 wine sample

Principal Component Analysis Line plot of 1 wine sample Barplot of 1 wine sample

Principal Component Analysis Line plot of 1 wine sample Barplot of 1 wine sample

Principal Component Analysis Line plot of 1 wine sampleBarplot of 1 wine sample

Data Matrix Representation Data Matrix Representation xjxj xixi X x ij 1p n xjxj xixi # samples # properties

xjxj xixi X x ij p (13)- dimensional Variable space 150 samples j xixi Sample 75 S p (13)    Data Matrix Representation Data Matrix Representation

xjxj xixi X x ij i p (13)- dimensional Variable space 13 variables150 samples n (150)-dimensional Object space j xixi Sample 75 Property 7 (flavanoids) S p (13) S n (150)       Data Matrix Representation Data Matrix Representation

Explorative Data Analysis

r (2)-dim. space of variables Principal Component Analysis Principal Component Analysis PCA: visualization : projection in 2 dimensions 1 p (13)- dim. space of variables S p (13) j xixi 1 i n (150)-dim. space of objects S n (150) 13 variables150 samples lv 2 lv 1 S2S2 13 variables x x xx xx x x x x x lv 1 lv 2 S2S2 150 samples r (2)-dim. space of objects

Principal Component Analysis x3 x1 x2 3 variables : S 3 12 samples

Principal Component Analysis x3 x1 x2 3 variables : S 3 12 samples

Principal Component Analysis S3S3 12 samples PC 1 PC 1 = l 11 x1 + l 12 x2 + l 13 x3 x3 x1 x2

x3 x1 x2 PC 1 PC 1 = l 11 x1 + l 12 x2 + l 13 x3 Criterion: Maximum variance of projections (x) x x x x x x x x x x x S3S3 12 samples Principal Component Analysis

PC 1 = l 11 x1 + l 12 x2 + l 13 x3 PC 2 = l 21 x1 + l 22 x2 + l 23 x3 Criterion: Maximum variance of projections (x) PC1 PC2 x2 x3 x1 x2 PC 1 x x x x x x x x x x x S3S3 12 samples PC 2 Principal Component Analysis

Principal Components Space PC 1 PC 2 S2S2 12 samples

r (2)-dim. space pc 2 pc 1 S2S2 1 p (13)- dim. space of variables S p (13) j xixi samples Principal Component Analysis Score plot

r (2)-dim. space pc 2 pc 1 S2S2 1 p (13)- dim. space of variables S p (13) j xixi samples Principal Component Analysis Score plot PC1 (38%) PC2 (20%) Wine data: score plot

pc 2 pc 1 S2S i n (150)- dim. Space of objects S n (150) 13 variables x x xx xx x x x x x Loading plot Principal Component Analysis

pc 2 pc 1 S2S i n (150)- dim. Space of objects S n (150) 13 variables x x xx xx x x x x x Loading plot Principal Component Analysis Wine data: loading plot PC1 (38%) PC2 (20%)

Singular Value Decomposition (SVD) X np = U nr D rr V T rp Left singular vectors PC scores Right singular vectors PC loadings p n r r r n p r X U VTVT = U T U =V T V =I

S2S2 S p (13) i S n (150) n 1 1 j xixi p S2S2 Loading plot 13 variables pc 1 pc 2 pc 1 Score plot 150 samples pc 2 x x xx xx x x x x x Principal Component Analysis : Biplot pc 2 pc 1 x xx x x x x x x x x 150 samples + 13 variables BIPLOT

Principal Component Analysis: an Example PC1 (38%) PC2 (20%)

Principal Component Analysis: Some Issues How many PC’s ? Scaling Outliers

How many PC’s ? No of PC’s Cumulative % of varianceScree plot 100% No of PC’s Log variance 

How many PC’s ? Wine data

How many PC’s ?

PCA: Scaling For better interpretation; may obscure results raw data; Mean-centering: (column wise, row wise, double) Auto-scaling (column wise, row wise) …..

Wine data mean-centered Wine data autoscaled PCA: Scaling

Wine data raw Wine data mean-centered PC1 (99.79%) PC2 (0.20%) PC1 (99.79%) PC2 (0.20%) PCA: Scaling

x3 x1 x2 3 variables : S 3 12 samples PC1 PCA: Outliers

x3 x1 x2 3 variables : S outlier PC1 PCA: Outliers

x3 x1 x2 3 variables : S 3 PC1 Leverage effect PCA: Outliers

Gene expression values Principal Component Analysis: a Recent Research Example X x ij 1 4 Treatments genes xjxj Organon Department of Cell Biology

PCA Interaction Gene Treatment