Feature Extraction 主講人：虞台文.

Slides:

Advertisements

Similar presentations

Component Analysis (Review)

Advertisements

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

Machine Learning Lecture 8 Data Processing and Representation

Dimension reduction (1)

1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification.

Lecture 7: Principal component analysis (PCA)

Principal Component Analysis CMPUT 466/551 Nilanjan Ray.

Principal Component Analysis

© 2003 by Davi GeigerComputer Vision September 2003 L1.1 Face Recognition Recognized Person Face Recognition.

Principal Component Analysis

Factor Analysis There are two main types of factor analysis:

Dimensional reduction, PCA

Face Recognition using PCA (Eigenfaces) and LDA (Fisherfaces)

The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.

Ch. 10: Linear Discriminant Analysis (LDA) based on slides from

Techniques for studying correlation and covariance structure

CS 485/685 Computer Vision Face Recognition Using Principal Components Analysis (PCA) M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive.

SVD(Singular Value Decomposition) and Its Applications

Summarized by Soo-Jin Kim

Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)

Chapter 2 Dimensionality Reduction. Linear Methods

Probability of Error Feature vectors typically have dimensions greater than 50. Classification accuracy depends upon the dimensionality and the amount.

Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of.

Feature extraction 1.Introduction 2.T-test 3.Signal Noise Ratio (SNR) 4.Linear Correlation Coefficient (LCC) 5.Principle component analysis (PCA) 6.Linear.

Digital Image Processing, 3rd ed. © 1992–2008 R. C. Gonzalez & R. E. Woods Gonzalez & Woods Matrices and Vectors Objective.

Principal Component Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.

N– variate Gaussian. Some important characteristics: 1)The pdf of n jointly Gaussian R.V.’s is completely described by means, variances and covariances.

ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.

Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.

ECE 8443 – Pattern Recognition LECTURE 08: DIMENSIONALITY, PRINCIPAL COMPONENTS ANALYSIS Objectives: Data Considerations Computational Complexity Overfitting.

CSE 185 Introduction to Computer Vision Face Recognition.

Discriminant Analysis

Lecture 12 Factor Analysis.

Reduces time complexity: Less computation Reduces space complexity: Less parameters Simpler models are more robust on small datasets More interpretable;

PCA vs ICA vs LDA. How to represent images? Why representation methods are needed?? –Curse of dimensionality – width x height x channels –Noise reduction.

Elements of Pattern Recognition CNS/EE Lecture 5 M. Weber P. Perona.

Principle Component Analysis and its use in MA clustering Lecture 12.

Principal Component Analysis (PCA)

MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Feature Extraction 主講人：虞台文. Content Principal Component Analysis (PCA) PCA Calculation — for Fewer-Sample Case Factor Analysis Fisher’s Linear Discriminant.

2D-LDA: A statistical linear discriminant analysis for image matrix

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 10: PRINCIPAL COMPONENTS ANALYSIS Objectives:

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 09: Discriminant Analysis Objectives: Principal.

Chapter 13 Discrete Image Transforms

Unsupervised Learning II Feature Extraction

1 Objective To provide background material in support of topics in Digital Image Processing that are based on matrices and/or vectors. Review Matrices.

Chapter 14 EXPLORATORY FACTOR ANALYSIS. Exploratory Factor Analysis  Statistical technique for dealing with multiple variables  Many variables are reduced.

Dimension reduction (1) Overview PCA Factor Analysis Projection persuit ICA.

Unsupervised Learning II Feature Extraction

Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.

Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:

Principal Component Analysis (PCA)

LECTURE 09: BAYESIAN ESTIMATION (Cont.)

University of Ioannina

Factor Analysis An Alternative technique for studying correlation and covariance structure.

LECTURE 10: DISCRIMINANT ANALYSIS

Principal Component Analysis (PCA)

Principal Component Analysis

Techniques for studying correlation and covariance structure

Principal Component Analysis

Descriptive Statistics vs. Factor Analysis

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Dimensionality Reduction

Factor Analysis An Alternative technique for studying correlation and covariance structure.

Feature space tansformation methods

Principal Components What matters most?.

LECTURE 09: DISCRIMINANT ANALYSIS

Principal Component Analysis

Lecture 8: Factor analysis (FA)

Exploratory Factor Analysis. Factor Analysis: The Measurement Model D1D1 D8D8 D7D7 D6D6 D5D5 D4D4 D3D3 D2D2 F1F1 F2F2.

Presentation transcript:

Feature Extraction 主講人：虞台文

Content Principal Component Analysis (PCA) Factor Analysis Fisher’s Linear Discriminant Analysis Multiple Discriminant Analysis

Principal Component Analysis (PCA) Feature Extraction Principal Component Analysis (PCA)

Principle Component Analysis It is a linear procedure to find the direction in input space where most of the energy of the input lies. Feature Extraction Dimension Reduction It is also called the (discrete) Karhunen-Loève transform, or the Hotelling transform.

The Basis Concept x w wTx That is, Demo Assume data x (random vector) has zero mean. w PCA finds a unit vector w to reflect the largest amount of variance of the data. wTx That is, Demo

The Method Remark: C is symmetric and semipositive definite. Covariance Matrix

The Method maximize subject to The method of Lagrange multiplier: Define The extreme point, say, w* satisfies

The Method maximize subject to Setting

Discussion At extreme points Let w1, w2, …, wd be the eigenvectors of C whose corresponding eigenvalues are 1≧ 2 ≧ … ≧ d. They are called the principal components of C. Their significance can be ordered according to their eigenvalues. w is a eigenvector of C, and  is its corresponding eigenvalue.

Discussion At extreme points Let w1, w2, …, wd be the eigenvectors of C whose corresponding eigenvalues are 1≧ 2 ≧ … ≧ d. They are called the principal components of C. Their significance can be ordered according to their eigenvalues. If C is symmetric and semipositive definite, all their eigenvectors are orthogonal. They, hence, form a basis of the feature space. For dimensionality reduction, only choose few of them.

Applications Image Processing Signal Processing Compression Feature Extraction Pattern Recognition

Example Projecting the data onto the most significant axis will facilitate classification. This also achieves dimensionality reduction.

Issues The most significant component obtained using PCA. PCA is effective for identifying the multivariate signal distribution. Hence, it is good for signal reconstruction. But, it may be inappropriate for pattern classification. The most significant component for classification

Whitening Whitening is a process that transforms the random vector, say, x = (x1, x2 , …, xn )T (assumed it is zero mean) to, say, z = (z1, z2 , …, zn )T with zero mean and unit variance. z is said to be white or sphered. This implies that all of its elements are uncorrelated. However, this doesn’t implies its elements are independent.

Whitening Transform Decompose Cx as Set Clearly, D is a diagonal matrix and E is an orthonormal matrix. Whitening Transform Let V be a whitening transform, then Decompose Cx as Set

Whitening Transform Proof) If V is a whitening transform, and U is any orthonormal matrix, show that UV, i.e., rotation, is also a whitening transform. Proof)

Why Whitening? With PCA, we usually choose several major eigenvectors as the basis for representation. This basis is efficient for reconstruction, but may be inappropriate for other applications, e.g., classification. By whitening, we can rotate the basis to get more interesting features.

Feature Extraction Factor Analysis

What is a Factor? If several variables correlate highly, they might measure aspects of a common underlying dimension. These dimensions are called factors. Factors are classification axis along which the measures can be plotted. The greater the loading of variables on a factor, the more that factor can explain intercorrelations between those variables.

Graph Representation Quantitative Skill (F1) Verbal (F2) 1 +1

What is Factor Analysis? A method for investigating whether a number of variables of interest Y1, Y2, …, Yn, are linearly related to a smaller number of unobservable factors F1, F2, …, Fm. For data reduction and summarization. Statistical approach to analyze interrelationships among the large number of variables & to explain these variables in term of their common underlying dimensions (factors).

Example What factors influence students’ grades? Observable Data Quantitative skill? unobservable Example Verbal skill? Observable Data

The Model y: Observation Vector B: Factor-Loading Matrix f: Factor Vector : Gaussian-Noise Matrix

The Model y: Observation Vector B: Factor-Loading Matrix f: Factor Vector : Gaussian-Noise Matrix

The Model Can be obtained from the model Can be estimated from data

The Model Commuality Specific Variance Explained Unexplained

Example Cy  BBT + Q =

Goal Our goal is to minimize Hence,

Uniqueness Is the solution unique? There are infinite number of solutions. Since if B* is a solution and T is an orthonormal transformation (rotation), then BT is also a solution.

Cy = Example Which one is better?

Example Left: each factor have nonzero loading for all variables. Right: each factor controls different variables. i1 i2 i1 i2

The Method Determine the first set of loadings using principal component method.

Example Cy 

Factor Rotation Factor-Loading Matrix Rotation Matrix Factor Rotation:

Factor Rotation Criteria: Varimax Quartimax Equimax Orthomax Oblimin Factor-Loading Matrix Factor Rotation:

Criterion: Maxmize Varimax Subject to Let . . .

Criterion: Maxmize Varimax Subject to Construct the Lagrangian

Varimax cjk dk bjk

Varimax Define is the kth column of

Varimax is the kth column of

Varimax Goal: reaches maximum once

Varimax Goal: Initially, obtain B0 by whatever method, e.g., PCA. set T0 as the approximation rotation matrix, e.g., T0=I. Iteratively execute the following procedure: evaluate and You need information of B1. find such that Next slide if stop Repeat

Varimax Goal: Pre-multiplying each side by its transpose. Initially, obtain B0 by whatever method, e.g., PCA. set T0 as the approximation rotation matrix, e.g., T0=I. Iteratively execute the following procedure: evaluate and You need information of B1. find such that Next slide if stop Repeat

Varimax Criterion: Maximize . . .

Maximize Varimax Let

Fisher’s Linear Discriminant Analysis Feature Extraction Fisher’s Linear Discriminant Analysis

Main Concept PCA seeks directions that are efficient for representation. Discriminant analysis seeks directions that are efficient for discrimination.

Classification Efficiencies on Projections

Criterion  Two-Category 1 m 2 m

Scatter ||w|| = 1 w m m The larger the better Between-Class Scatter Between-Class Scatter Matrix Scatter ||w|| = 1 w 1 m Between-Class Scatter 2 m The larger the better

Scatter ||w|| = 1 w m m The smaller the better Within-Class Scatter Between-Class Scatter Matrix Scatter Within-Class Scatter Matrix ||w|| = 1 w 1 m 2 m Within-Class Scatter The smaller the better

Goal ||w|| = 1 w m m Define Generalized Rayleigh quotient Between-Class Scatter Matrix Goal Within-Class Scatter Matrix ||w|| = 1 Define Generalized Rayleigh quotient w 1 m 2 m The length of w is immaterial.

Generalized Eigenvector To maximize J(w), w is the generalized eigenvector associated with largest generalized eigenvalue. Define Generalized Rayleigh quotient That is, or The length of w is immaterial.

Proof To maximize J(w), w is the generalized eigenvector associated with largest generalized eigenvalue. Set That is, or 

Example 2 1 m - w w w

Multiple Discriminant Analysis Feature Extraction Multiple Discriminant Analysis

Generalization of Fisher’s Linear Discriminant For the c-class problem, we seek a (c1)-dimension projection for efficient discrimination.

Scatter Matrices  Feature Space Total Scatter Matrix 2 m 1 m + Within-Class Scatter Matrix 3 m Between-Class Scatter Matrix

The (c1)-Dim Projection The projection space will be described using a d(c1) matrix W. 2 m 1 m + 3 m

Scatter Matrices  Projection Space Total Scatter Matrix 2 m 1 m 1 ~ m 2 3 + + Within-Class Scatter Matrix 3 m W Between-Class Scatter Matrix

Criterion Total Scatter Matrix Within-Class Scatter Matrix Between-Class Scatter Matrix