Download presentation
Presentation is loading. Please wait.
Published bySierra Verrier Modified over 9 years ago
1
Pattern Recognition for the Natural Sciences Explorative Data Analysis Principal Component Analysis (PCA) Lutgarde Buydens, IMM, Analytical Chemistry
2
Why Explorative Data Analysis ? Classical Science ? [ System Paradigm change in natural sciences Hypothesis driven
3
Why Explorative Data Analysis? Classical Science Science with advanced technologies ? [ System Explorative Analysis of data ? System Paradigm change in natural sciences Hypothesis drivenData driven
4
Explorative Data Analysis Advanced technology: High throughput (high quality) analysis NMR, HPLC, GC, MS/MS, immune assays, Hybrids Nano/Sensor technology Genomics (gene expression profiling) Proteomics, Metabolomics Fingerprinting Profiling in drug design Overwhelming amount of data
5
Explorative Data Analysis Visualization (principal component analysis, projections) Unsupervised Pattern recognition (clustering) Supervised Pattern recognition (classification) Quantitative analysis (correlations, predictions)
6
Principal Component Analysis: an Example 150 samples of Italian wines from the same region 3 different cultivars Is it possible to characterise cultivars ? Which variables are relevant for which cultivars ?
7
p (13 properties) (variables) (150 wine samples) n (objects) X ij Flavanoid concentration of sample 75 X x ij 1 7 75 xjxj xixi Flavanoid concentration Data Matrix
8
Principal Component Analysis Barplot of 1 wine sample
9
Principal Component Analysis Line plot of 1 wine sample Barplot of 1 wine sample
10
Principal Component Analysis Line plot of 1 wine sample Barplot of 1 wine sample
11
Principal Component Analysis Line plot of 1 wine sampleBarplot of 1 wine sample
12
Data Matrix Representation Data Matrix Representation xjxj xixi X x ij 1p n xjxj xixi # samples # properties
13
xjxj xixi X x ij 113 150 13 1 p (13)- dimensional Variable space 150 samples j xixi Sample 75 S p (13) Data Matrix Representation Data Matrix Representation
14
xjxj xixi X x ij 113 150 13 1 150 1 i p (13)- dimensional Variable space 13 variables150 samples n (150)-dimensional Object space j xixi Sample 75 Property 7 (flavanoids) S p (13) S n (150) Data Matrix Representation Data Matrix Representation
15
Explorative Data Analysis
16
r (2)-dim. space of variables Principal Component Analysis Principal Component Analysis PCA: visualization : projection in 2 dimensions 1 p (13)- dim. space of variables S p (13) j xixi 1 i n (150)-dim. space of objects S n (150) 13 variables150 samples lv 2 lv 1 S2S2 13 variables x x xx xx x x x x x lv 1 lv 2 S2S2 150 samples r (2)-dim. space of objects 13 150
17
Principal Component Analysis x3 x1 x2 3 variables : S 3 12 samples
18
Principal Component Analysis x3 x1 x2 3 variables : S 3 12 samples
19
Principal Component Analysis S3S3 12 samples PC 1 PC 1 = l 11 x1 + l 12 x2 + l 13 x3 x3 x1 x2
20
x3 x1 x2 PC 1 PC 1 = l 11 x1 + l 12 x2 + l 13 x3 Criterion: Maximum variance of projections (x) x x x x x x x x x x x S3S3 12 samples Principal Component Analysis
21
PC 1 = l 11 x1 + l 12 x2 + l 13 x3 PC 2 = l 21 x1 + l 22 x2 + l 23 x3 Criterion: Maximum variance of projections (x) PC1 PC2 x2 x3 x1 x2 PC 1 x x x x x x x x x x x S3S3 12 samples PC 2 Principal Component Analysis
22
Principal Components Space PC 1 PC 2 S2S2 12 samples
23
r (2)-dim. space pc 2 pc 1 S2S2 1 p (13)- dim. space of variables S p (13) j xixi 13 150 samples Principal Component Analysis Score plot
24
r (2)-dim. space pc 2 pc 1 S2S2 1 p (13)- dim. space of variables S p (13) j xixi 13 150 samples Principal Component Analysis Score plot PC1 (38%) PC2 (20%) Wine data: score plot
25
pc 2 pc 1 S2S2 150 1 i n (150)- dim. Space of objects S n (150) 13 variables x x xx xx x x x x x Loading plot Principal Component Analysis
26
pc 2 pc 1 S2S2 150 1 i n (150)- dim. Space of objects S n (150) 13 variables x x xx xx x x x x x Loading plot Principal Component Analysis Wine data: loading plot PC1 (38%) PC2 (20%)
27
Singular Value Decomposition (SVD) X np = U nr D rr V T rp Left singular vectors PC scores Right singular vectors PC loadings p n r r r n p r X U VTVT = U T U =V T V =I
28
S2S2 S p (13) i S n (150) n 1 1 j xixi p S2S2 Loading plot 13 variables pc 1 pc 2 pc 1 Score plot 150 samples pc 2 x x xx xx x x x x x Principal Component Analysis : Biplot pc 2 pc 1 x xx x x x x x x x x 150 samples + 13 variables BIPLOT
29
Principal Component Analysis: an Example PC1 (38%) PC2 (20%)
30
Principal Component Analysis: Some Issues How many PC’s ? Scaling Outliers
31
How many PC’s ? No of PC’s Cumulative % of varianceScree plot 100% No of PC’s Log variance 231156423564
32
How many PC’s ? Wine data
33
How many PC’s ?
34
PCA: Scaling For better interpretation; may obscure results raw data; Mean-centering: (column wise, row wise, double) Auto-scaling (column wise, row wise) …..
35
Wine data mean-centered Wine data autoscaled PCA: Scaling
36
Wine data raw Wine data mean-centered PC1 (99.79%) PC2 (0.20%) PC1 (99.79%) PC2 (0.20%) PCA: Scaling
37
x3 x1 x2 3 variables : S 3 12 samples PC1 PCA: Outliers
38
x3 x1 x2 3 variables : S 3 12 + 1 outlier PC1 PCA: Outliers
39
x3 x1 x2 3 variables : S 3 PC1 Leverage effect PCA: Outliers
40
Gene expression values Principal Component Analysis: a Recent Research Example X x ij 1 4 Treatments genes 50.000 xjxj Organon Department of Cell Biology
41
PCA Interaction Gene Treatment
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.