1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh

Slides:



Advertisements
Similar presentations
Face Recognition Sumitha Balasuriya.
Advertisements

Component Analysis (Review)
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Input Space versus Feature Space in Kernel- Based Methods Scholkopf, Mika, Burges, Knirsch, Muller, Ratsch, Smola presented by: Joe Drish Department of.
Machine Learning Lecture 8 Data Processing and Representation
On the Dimensionality of Face Space Marsha Meytlis and Lawrence Sirovich IEEE Transactions on PAMI, JULY 2007.
Dimensionality Reduction PCA -- SVD
Dimension reduction (1)
Computer Vision Spring ,-685 Instructor: S. Narasimhan Wean 5403 T-R 3:00pm – 4:20pm Lecture #20.
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8
Subspace and Kernel Methods April 2004 Seong-Wook Joo.
Principal Component Analysis
L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.
Principal component analysis (PCA)
Pattern Recognition Topic 1: Principle Component Analysis Shapiro chap
CS 790Q Biometrics Face Recognition Using Dimensionality Reduction PCA and LDA M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive.
Dimensional reduction, PCA
Face Recognition using PCA (Eigenfaces) and LDA (Fisherfaces)
Face Recognition Jeremy Wyatt.
SVD and PCA COS 323. Dimensionality Reduction Map points in high-dimensional space to lower number of dimensionsMap points in high-dimensional space to.
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Face Collections : Rendering and Image Processing Alexei Efros.
Eigen Value Analysis in Pattern Recognition
Dimensionality Reduction
CS 485/685 Computer Vision Face Recognition Using Principal Components Analysis (PCA) M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive.
Summarized by Soo-Jin Kim
Machine Learning CS 165B Spring Course outline Introduction (Ch. 1) Concept learning (Ch. 2) Decision trees (Ch. 3) Ensemble learning Neural Networks.
Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)
Chapter 2 Dimensionality Reduction. Linear Methods
Probability of Error Feature vectors typically have dimensions greater than 50. Classification accuracy depends upon the dimensionality and the amount.
Feature extraction 1.Introduction 2.T-test 3.Signal Noise Ratio (SNR) 4.Linear Correlation Coefficient (LCC) 5.Principle component analysis (PCA) 6.Linear.
1 Recognition by Appearance Appearance-based recognition is a competing paradigm to features and alignment. No features are extracted! Images are represented.
Principal Component Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
ECE 8443 – Pattern Recognition LECTURE 08: DIMENSIONALITY, PRINCIPAL COMPONENTS ANALYSIS Objectives: Data Considerations Computational Complexity Overfitting.
CSE 185 Introduction to Computer Vision Face Recognition.
EE4-62 MLCV Lecture Face Recognition – Subspace/Manifold Learning Tae-Kyun Kim 1 EE4-62 MLCV.
Reduces time complexity: Less computation Reduces space complexity: Less parameters Simpler models are more robust on small datasets More interpretable;
PCA vs ICA vs LDA. How to represent images? Why representation methods are needed?? –Curse of dimensionality – width x height x channels –Noise reduction.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:
MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Feature Selection and Dimensionality Reduction. “Curse of dimensionality” – The higher the dimensionality of the data, the more data is needed to learn.
Principal Component Analysis and Linear Discriminant Analysis for Feature Reduction Jieping Ye Department of Computer Science and Engineering Arizona State.
Feature Extraction 主講人:虞台文. Content Principal Component Analysis (PCA) PCA Calculation — for Fewer-Sample Case Factor Analysis Fisher’s Linear Discriminant.
2D-LDA: A statistical linear discriminant analysis for image matrix
Feature Extraction 主講人:虞台文.
Face detection and recognition Many slides adapted from K. Grauman and D. Lowe.
Principal Components Analysis ( PCA)
Unsupervised Learning II Feature Extraction
Unsupervised Learning II Feature Extraction
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
Principal Component Analysis (PCA)
CS 9633 Machine Learning Support Vector Machines
LECTURE 11: Advanced Discriminant Analysis
University of Ioannina
LECTURE 10: DISCRIMINANT ANALYSIS
Recognition with Expression Variations
Lecture 8:Eigenfaces and Shared Features
Principal Component Analysis (PCA)
Machine Learning Dimensionality Reduction
Outline Peter N. Belhumeur, Joao P. Hespanha, and David J. Kriegman, “Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection,”
PCA vs ICA vs LDA.
Principal Component Analysis
Feature space tansformation methods
Generally Discriminant Analysis
LECTURE 09: DISCRIMINANT ANALYSIS
Principal Component Analysis
Presentation transcript:

1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh

Contents Basics of PCA Application of PCA in Face Recognition Some Terms in PCA Motivation for KPCA Basics of KPCA Applications of KPCA

High-dimensional Data Gene expression Face imagesHandwritten digits

Why Feature Reduction? Most machine learning and data mining techniques may not be effective for high-dimensional data –Curse of Dimensionality –Query accuracy and efficiency degrade rapidly as the dimension increases. The intrinsic dimension may be small. –For example, the number of genes responsible for a certain type of disease may be small.

Why Reduce Dimensionality? 1.Reduces time complexity: Less computation 2.Reduces space complexity: Less parameters 3.Saves the cost of observing the feature 4.Simpler models are more robust on small datasets 5.More interpretable; simpler explanation 6.Data visualization (structure, groups, outliers, etc) if plotted in 2 or 3 dimensions

Feature reduction algorithms Unsupervised –Latent Semantic Indexing (LSI): truncated SVD –Independent Component Analysis (ICA) –Principal Component Analysis (PCA) –Canonical Correlation Analysis (CCA) Supervised –Linear Discriminant Analysis (LDA) Semi-supervised –Research topic

Algebraic derivation of PCs Main steps for computing PCs –Form the covariance matrix S. –Compute its eigenvectors: –Use the first d eigenvectors to form the d PCs. –The transformation G is given by

Optimality property of PCA Dimension reduction Reconstruction Original data

Optimality property of PCA The matrix G consisting of the first d eigenvectors of the covariance matrix S solves the following min problem: Main theoretical result: reconstruction error PCA projection minimizes the reconstruction error among all linear projections of size d.

Dimensionality Reduction One approach to deal with high dimensional data is by reducing their dimensionality. Project high dimensional data onto a lower dimensional sub-space using linear or non-linear transformations.

Dimensionality Reduction Linear transformations are simple to compute and tractable. Classical –linear- approaches: – Principal Component Analysis (PCA) – Fisher Discriminant Analysis (FDA) –Singular Value Decomosition (SVD) --Factor Analysis (FA) --Canonical Correlation(CCA) k x 1 k x d d x 1 (k<<d)

Principal Component Analysis (PCA) Each dimensionality reduction technique finds an appropriate transformation by satisfying certain criteria (e.g., information loss, data discrimination, etc.) The goal of PCA is to reduce the dimensionality of the data while retaining as much as possible of the variation present in the dataset.

Principal Component Analysis (PCA) Find a basis in a low dimensional sub-space: –Approximate vectors by projecting them in a low dimensional sub-space: (1) Original space representation: (2) Lower-dimensional sub-space representation: Note: if K=N, then

Principal Component Analysis (PCA) Example (K=N):

Principal Component Analysis (PCA) Methodology –Suppose x 1, x 2,..., x M are N x 1 vectors

Principal Component Analysis (PCA) Methodology – cont.

Principal Component Analysis (PCA) Linear transformation implied by PCA –The linear transformation R N  R K that performs the dimensionality reduction is:

Principal Component Analysis (PCA) How many principal components to keep? –To choose K, you can use the following criterion: Unfortunately for some data sets to meet this requirement we need K almost equal to N. That is, no effective data reduction is possible.

Principal Component Analysis (PCA) Eigenvalue spectrum λiλi K λNλN Scree plot

Principal Component Analysis (PCA) Standardization –The principal components are dependent on the units used to measure the original variables as well as on the range of values they assume. –We should always standardize the data prior to using PCA. –A common standardization method is to transform all the data to have zero mean and unit standard deviation:

CS 479/679 Pattern Recognition – Spring 2006 Dimensionality Reduction Using PCA/LDA Chapter 3 (Duda et al.) – Section 3.8 Case Studies: Face Recognition Using Dimensionality Reduction M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive Neuroscience, 3(1), pp , D. Swets, J. Weng, "Using Discriminant Eigenfeatures for Image Retrieval", IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(8), pp , A. Martinez, A. Kak, "PCA versus LDA", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 2, pp , 2001.

Principal Component Analysis (PCA) Face Recognition –The simplest approach is to think of it as a template matching problem –Problems arise when performing recognition in a high-dimensional space. –Significant improvements can be achieved by first mapping the data into a lower dimensionality space. –How to find this lower-dimensional space?

Principal Component Analysis (PCA) Main idea behind eigenfaces average face

Principal Component Analysis (PCA) Computation of the eigenfaces

Principal Component Analysis (PCA) Computation of the eigenfaces – cont.

Principal Component Analysis (PCA) Computation of the eigenfaces – cont. uiui Mind that this is normalized..

Principal Component Analysis (PCA) Computation of the eigenfaces – cont.

Principal Component Analysis (PCA) Representing faces onto this basis

Principal Component Analysis (PCA) Representing faces onto this basis – cont.

Principal Component Analysis (PCA) Face Recognition Using Eigenfaces

Principal Component Analysis (PCA) Face Recognition Using Eigenfaces – cont. –The distance e r is called distance within the face space (difs) –Comment: we can use the common Euclidean distance to compute e r, however, it has been reported that the Mahalanobis distance performs better:

Principal Component Analysis (PCA) Face Detection Using Eigenfaces

Principal Component Analysis (PCA) Face Detection Using Eigenfaces – cont.

Principal Components Analysis So, principal components are given by: b 1 = u 11 x 1 + u 12 x u 1N x N b 2 = u 21 x 1 + u 22 x u 2N x N... b N = a N1 x 1 + a N2 x a NN x N x j ’s are standardized if correlation matrix is used (mean 0.0, SD 1.0) Score of ith unit on jth principal component b i,j = u j1 x i1 + u j2 x i u jN x iN

PCA Scores x i2 x i1 b i,1 b i,2

Principal Components Analysis Amount of variance accounted for by: 1st principal component, λ 1, 1st eigenvalue 2nd principal component, λ 2, 2ndeigenvalue... λ 1 > λ 2 > λ 3 > λ 4 >... Average λ j = 1 (correlation matrix)

Principal Components Analysis: Eigenvalues λ1λ1 λ2λ2 U1U1

PCA: Terminology jth principal component is jth eigenvector of correlation/covariance matrix coefficients, u jk, are elements of eigenvectors and relate original variables (standardized if using correlation matrix) to components scores are values of units on components (produced using coefficients) amount of variance accounted for by component is given by eigenvalue, λ j proportion of variance accounted for by component is given by λ j / Σ λ j loading of kth original variable on jth component is given by u jk √λ j --correlation between variable and component

Principal Components Analysis Covariance Matrix: –Variables must be in same units –Emphasizes variables with most variance –Mean eigenvalue ≠1.0 –Useful in morphometrics, a few other cases Correlation Matrix: –Variables are standardized (mean 0.0, SD 1.0) –Variables can be in different units –All variables have same impact on analysis –Mean eigenvalue = 1.0

PCA: Potential Problems Lack of Independence –NO PROBLEM Lack of Normality –Normality desirable but not essential Lack of Precision –Precision desirable but not essential Many Zeroes in Data Matrix –Problem (use Correspondence Analysis)

Principal Component Analysis (PCA) PCA and classification (cont’d)

Motivation

??????? Motivation

Linear projections will not detect the pattern.

Limitations of linear PCA 1,2,3 =1/3

Nonlinear PCA Three popular methods are available: 1)Neural-network based PCA (E. Oja, 1982) 2)Method of Principal Curves (T.J. Hastie and W. Stuetzle, 1989) 3) Kernel based PCA (B. Schölkopf, A. Smola, and K. Müller, 1998)

PCA NPCA

Kernel PCA: The main idea

A Useful Theorem for Hilbert space Let H be a Hilbert space and x 1, ……x n in H. Let M=span{x 1, ……x n }. Also u and v in M. =, i=1,……,n implies u=v Proof. Try your self.

Kernel methods in PCA Linear PCA where C is covariance matrix for centered data X: (1) and (2) are equivalent conditions. (2)

Kernel methods in PCA Now let us suppose: In Kernel PCA, we do the PCA in feature space. remember about centering! (*) Possibly F is a very high dimension space.

Kernel Methods in PCA Again all solutions with lie in the space generated by It has two useful consequences: 1} 2) We may instead solve the set of equations

Defining an lxl kernel matrix K: Kernel Methods in PCA And using the result (1) in ( 2) we get But we need not solve (3). It can be shown easily that the following simpler system gives us solutions that are interesting to us.

Compute eigenvalue problem for the kernel matrix The solutions ( k,  k ) further need to be normalized by imposing I f x is our new observation, the feature value (??) will be and kth principal score will be Kernel Methods in PCA

Data centering: Hence, the kernel for the transformed space is Kernel Methods in PCA

Expressed as an operation on the kernel matrix this can be rewritten as j where j is the all 1s vector. Kernel Methods in PCA

Linear PCA Kernel PCA captures the nonlinear structure of the data

Linear PCA Kernel PCA captures the nonlinear structure of the data

Algorithm Input: Data X={x 1, x 2, …, x l } in n-dimensional space. Process: K i,j = k(x i,x j ); i,j=1,…, l. Output: Transformed data … for centered data Kernel matrix... k-dimensional vector projection of new data into this subspace

Reference I.T. Jolliffe. (2002)Principal Component Analysis.. Schölkopf, et al. (1998 Kernel Principal Component Analysis)/ B.. Schölkopf and A.J. Smola(2000/ ) Learning with Kernels Christopher J C Burges (2005).Geometric Methods for Feature Extraction and Dimensional Reduction.