PCA vs ICA vs LDA.

Slides:



Advertisements
Similar presentations
Face Recognition Sumitha Balasuriya.
Advertisements

Independent Component Analysis: The Fast ICA algorithm
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Color Imaging Analysis of Spatio-chromatic Decorrelation for Colour Image Reconstruction Mark S. Drew and Steven Bergner
Face Recognition Ying Wu Electrical and Computer Engineering Northwestern University, Evanston, IL
Machine Learning Lecture 8 Data Processing and Representation
Dimensionality Reduction PCA -- SVD
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8
Subspace and Kernel Methods April 2004 Seong-Wook Joo.
Principal Component Analysis
Unsupervised Learning - PCA The neural approach->PCA; SVD; kernel PCA Hertz chapter 8 Presentation based on Touretzky + various additions.
CS 790Q Biometrics Face Recognition Using Dimensionality Reduction PCA and LDA M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive.
Dimensional reduction, PCA
Face Recognition using PCA (Eigenfaces) and LDA (Fisherfaces)
Projection Pursuit. Projection Pursuit (PP) PCA and FDA are linear, PP may be linear or non-linear. Find interesting “criterion of fit”, or “figure of.
Independent Component Analysis (ICA) and Factor Analysis (FA)
Bayesian belief networks 2. PCA and ICA
ICA Alphan Altinok. Outline  PCA  ICA  Foundation  Ambiguities  Algorithms  Examples  Papers.
Matthias Maneck - Journal Club WS 04/05 Independent components analysis of starch deficient pgm mutants GCB 2004 M. Scholz, Y. Gibon, M. Stitt, J. Selbig.
CS 485/685 Computer Vision Face Recognition Using Principal Components Analysis (PCA) M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive.
Survey on ICA Technical Report, Aapo Hyvärinen, 1999.
Summarized by Soo-Jin Kim
Machine Learning CS 165B Spring Course outline Introduction (Ch. 1) Concept learning (Ch. 2) Decision trees (Ch. 3) Ensemble learning Neural Networks.
Feature extraction 1.Introduction 2.T-test 3.Signal Noise Ratio (SNR) 4.Linear Correlation Coefficient (LCC) 5.Principle component analysis (PCA) 6.Linear.
General Tensor Discriminant Analysis and Gabor Features for Gait Recognition by D. Tao, X. Li, and J. Maybank, TPAMI 2007 Presented by Iulian Pruteanu.
EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.
IEEE TRANSSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
N– variate Gaussian. Some important characteristics: 1)The pdf of n jointly Gaussian R.V.’s is completely described by means, variances and covariances.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh
EE4-62 MLCV Lecture Face Recognition – Subspace/Manifold Learning Tae-Kyun Kim 1 EE4-62 MLCV.
CSC2515: Lecture 7 (post) Independent Components Analysis, and Autoencoders Geoffrey Hinton.
PCA vs ICA vs LDA. How to represent images? Why representation methods are needed?? –Curse of dimensionality – width x height x channels –Noise reduction.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:
Principal Component Analysis (PCA)
MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Feature Selection and Dimensionality Reduction. “Curse of dimensionality” – The higher the dimensionality of the data, the more data is needed to learn.
Principal Component Analysis and Linear Discriminant Analysis for Feature Reduction Jieping Ye Department of Computer Science and Engineering Arizona State.
Feature Extraction 主講人:虞台文. Content Principal Component Analysis (PCA) PCA Calculation — for Fewer-Sample Case Factor Analysis Fisher’s Linear Discriminant.
2D-LDA: A statistical linear discriminant analysis for image matrix
Introduction to Independent Component Analysis Math 285 project Fall 2015 Jingmei Lu Xixi Lu 12/10/2015.
An Introduction of Independent Component Analysis (ICA) Xiaoling Wang Jan. 28, 2003.
LDA (Linear Discriminant Analysis) ShaLi. Limitation of PCA The direction of maximum variance is not always good for classification.
Feature Extraction 主講人:虞台文.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 09: Discriminant Analysis Objectives: Principal.
Unsupervised Learning II Feature Extraction
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
Principal Component Analysis (PCA)
LECTURE 11: Advanced Discriminant Analysis
LECTURE 10: DISCRIMINANT ANALYSIS
Recognition with Expression Variations
Principal Component Analysis (PCA)
Machine Learning Dimensionality Reduction
Principal Component Analysis
Outline Peter N. Belhumeur, Joao P. Hespanha, and David J. Kriegman, “Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection,”
A Hybrid PCA-LDA Model for Dimension Reduction Nan Zhao1, Washington Mio2 and Xiuwen Liu1 1Department of Computer Science, 2Department of Mathematics Florida.
Principal Component Analysis
PCA is “an orthogonal linear transformation that transfers the data to a new coordinate system such that the greatest variance by any projection of the.
Introduction PCA (Principal Component Analysis) Characteristics:
Dimension reduction : PCA and Clustering
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Dimensionality Reduction
A Fast Fixed-Point Algorithm for Independent Component Analysis
Feature space tansformation methods
LECTURE 09: DISCRIMINANT ANALYSIS
Lecture 13: Singular Value Decomposition (SVD)
Feature Selection Methods
Feature Extraction (I)
Presentation transcript:

PCA vs ICA vs LDA

How to represent images? Why representation methods are needed?? Curse of dimensionality – width x height x channels Noise reduction Signal analysis & Visualization Representation methods Representation in frequency domain : linear transform DFT, DCT, DST, DWT, … Used as compression methods Subspace derivation PCA, ICA, LDA Linear transform derived from training data Feature extraction methods Edge(Line) Detection Feature map obtained by filtering Gabor transform Active contours (Snakes) …

What is subspace? (1/2) Find a basis in a low dimensional sub-space: Approximate vectors by projecting them in a low dimensional sub-space: (1) Original space representation: (2) Lower-dimensional sub-space representation: Note: if K=N, then

What is subspace? (2/2) Example (K=N):

PRINCIPAL COMPONENT ANALYSIS (PCA)

Why Principal Component Analysis? Motive Find bases which has high variance in data Encode data with small number of bases with low MSE

Derivation of PCs Assume that Find q’s maximizing this!! Principal component q can be obtained by Eigenvector decomposition such as SVD!

Dimensionality Reduction (1/2) Can ignore the components of less significance. You do lose some information, but if the eigenvalues are small, you don’t lose much n dimensions in original data calculate n eigenvectors and eigenvalues choose only the first p eigenvectors, based on their eigenvalues final data set has only p dimensions

Dimensionality Reduction (2/2) Variance Dimensionality

Reconstruction from PCs q=1 q=2 q=4 q=8 Original Image q=16 q=32 q=64 q=100…

LINEAR DISCRIMINANT ANALYSIS (LDA)

Limitations of PCA Are the maximal variance dimensions the relevant dimensions for preservation?

Linear Discriminant Analysis (1/6) What is the goal of LDA? Perform dimensionality reduction “while preserving as much of the class discriminatory information as possible”. Seeks to find directions along which the classes are best separated. Takes into consideration the scatter within-classes but also the scatter between-classes. For example of face recognition, more capable of distinguishing image variation due to identity from variation due to other sources such as illumination and expression.

Linear Discriminant Analysis (2/6) Within-class scatter matrix Between-class scatter matrix projection matrix LDA computes a transformation that maximizes the between-class scatter while minimizing the within-class scatter: products of eigenvalues ! : scatter matrices of the projected data y

Linear Discriminant Analysis (3/6) Does Sw-1 always exist? If Sw is non-singular, we can obtain a conventional eigenvalue problem by writing: In practice, Sw is often singular since the data are image vectors with large dimensionality while the size of the data set is much smaller (M << N ) c.f. Since Sb has at most rank C-1, the max number of eigenvectors with non-zero eigenvalues is C-1 (i.e., max dimensionality of sub-space is C-1)

Linear Discriminant Analysis (4/6) Does Sw-1 always exist? – cont. To alleviate this problem, we can use PCA first: PCA is first applied to the data set to reduce its dimensionality. LDA is then applied to find the most discriminative directions:

Linear Discriminant Analysis (5/6) PCA LDA D. Swets, J. Weng, "Using Discriminant Eigenfeatures for Image Retrieval", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 18, no. 8, pp. 831-836, 1996

Linear Discriminant Analysis (6/6) Factors unrelated to classification MEF vectors show the tendency of PCA to capture major variations in the training set such as lighting direction. MDF vectors discount those factors unrelated to classification. D. Swets, J. Weng, "Using Discriminant Eigenfeatures for Image Retrieval", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 18, no. 8, pp. 831-836, 1996

INDEPENDENT COMPONENT ANALYSIS

PCA vs ICA PCA ICA Focus on uncorrelated and Gaussian components Second-order statistics Orthogonal transformation ICA Focus on independent and non-Gaussian components Higher-order statistics Non-orthogonal transformation

Independent Component Analysis (1/5) Concept of ICA A given signal(x) is generated by linear mixing(A) of independent components(s) ICA is a statistical analysis method to estimate those independent components(z) and Mixing rule(W) x1 x2 x3 xM z1 z2 z3 zM Aij Wij s1 s2 s3 sM We do not know Both unknowns Some optimization Function is required!! s A x W z

Independent Component Analysis (2/5)

Independent Component Analysis(3/5) What is independent component?? If one variable can not be estimated from other variables, it is independent. By Central Limit Theorem, a sum of two independent random variables is more gaussian than original variables  distribution of independent components are nongaussian To estimate ICs, z should have nongaussian distribution, i.e. we should maximize nonguassianity.

Independent Component Analysis(4/5) What is nongaussianity? Supergaussian Subgaussian Low entropy Gaussian Supergaussian Subgaussian

Independent Component Analysis(5/5) Measuring nongaussianity by Kurtosis Kurtosis : 4th order cumulant of randomvariable If kurt(z) is zero, gaussian If kurt(z) is positive, supergaussian If kurt(z) is negative, subgaussian Maximzation of |kurt(z)| by gradient method Fixed point algorithm은 gradient가 w에 상수배 한 값과 같으면 gradient를 w에 더하거나 빼도 그 값이 변하지 않는다는 사실에 근거하고 있다. 따라서 w = k * Gradient 가 되므로 w를 Gradient값으로 계속 update시켜나가면 w = gradient인 시점이 오고, 거기서 Iteration이 멈춘다. Simply change The norm of w Fast-fixed point algorithm

PCA vs LDA vs ICA PCA : Proper to dimension reduction LDA : Proper to pattern classification if the number of training samples of each class are large ICA : Proper to blind source separation or classification using ICs when class id of training data is not available

References Simon Haykin, “Neural Networks – A Comprehensive Foundation- 2nd Edition,” Prentice Hall Marian Stewart Bartlett, “Face Image Analysis by Unsupervised Learning,” Kluwer academic publishers A. Hyvärinen, J. Karhunen and E. Oja, “Independent Component Analysis,”, John Willy & Sons, Inc. D. L. Swets and J. Weng, “Using Discriminant Eigenfeatures for Image Retrieval”, IEEE Trasaction on Pattern Analysis and and Machine Intelligence, Vol. 18, No. 8, August 1996