Why to reduce the number of the features? Having D features, we want to reduce their number to n, where n<<D Benefits: Lower computing complexity Improvement.

Slides:



Advertisements
Similar presentations
Image Registration  Mapping of Evolution. Registration Goals Assume the correspondences are known Find such f() and g() such that the images are best.
Advertisements

Discrimination amongst k populations. We want to determine if an observation vector comes from one of the k populations For this purpose we need to partition.
Learning deformable models Yali Amit, University of Chicago Alain Trouvé, CMLA Cachan.
RANDOM PROJECTIONS IN DIMENSIONALITY REDUCTION APPLICATIONS TO IMAGE AND TEXT DATA Ella Bingham and Heikki Mannila Ângelo Cardoso IST/UTL November 2009.
Component Analysis (Review)
Data Mining Feature Selection. Data reduction: Obtain a reduced representation of the data set that is much smaller in volume but yet produces the same.
Machine Learning Lecture 8 Data Processing and Representation
1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification.
COMPUTER AIDED DIAGNOSIS: FEATURE SELECTION Prof. Yasser Mostafa Kadah –
A 4-WEEK PROJECT IN Active Shape and Appearance Models
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8
© 2003 by Davi GeigerComputer Vision September 2003 L1.1 Face Recognition Recognized Person Face Recognition.
Principal Component Analysis
Neural Computation Prof. Nathan Intrator
CS 790Q Biometrics Face Recognition Using Dimensionality Reduction PCA and LDA M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive.
Dimensional reduction, PCA
Lecture 4 Unsupervised Learning Clustering & Dimensionality Reduction
Unsupervised Learning
Dan Simon Cleveland State University
CS 485/685 Computer Vision Face Recognition Using Principal Components Analysis (PCA) M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive.
Summarized by Soo-Jin Kim
Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)
Lecture 19 Representation and description II
Multimodal Interaction Dr. Mike Spann
Feature extraction 1.Introduction 2.T-test 3.Signal Noise Ratio (SNR) 4.Linear Correlation Coefficient (LCC) 5.Principle component analysis (PCA) 6.Linear.
BACKGROUND LEARNING AND LETTER DETECTION USING TEXTURE WITH PRINCIPAL COMPONENT ANALYSIS (PCA) CIS 601 PROJECT SUMIT BASU FALL 2004.
EE 492 ENGINEERING PROJECT LIP TRACKING Yusuf Ziya Işık & Ashat Turlibayev Yusuf Ziya Işık & Ashat Turlibayev Advisor: Prof. Dr. Bülent Sankur Advisor:
1 Recognition by Appearance Appearance-based recognition is a competing paradigm to features and alignment. No features are extracted! Images are represented.
Principal Component Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Classification Course web page: vision.cis.udel.edu/~cv May 12, 2003  Lecture 33.
Chapter 7 FEATURE EXTRACTION AND SELECTION METHODS Part 2 Cios / Pedrycz / Swiniarski / Kurgan.
Principal Component Analysis Machine Learning. Last Time Expectation Maximization in Graphical Models – Baum Welch.
ECE 471/571 – Lecture 6 Dimensionality Reduction – Fisher’s Linear Discriminant 09/08/15.
Linear Models for Classification
1/18 New Feature Presentation of Transition Probability Matrix for Image Tampering Detection Luyi Chen 1 Shilin Wang 2 Shenghong Li 1 Jianhua Li 1 1 Department.
Supervisor: Nakhmani Arie Semester: Winter 2007 Target Recognition Harmatz Isca.
Speech Lab, ECE, State University of New York at Binghamton  Classification accuracies of neural network (left) and MXL (right) classifiers with various.
PCA vs ICA vs LDA. How to represent images? Why representation methods are needed?? –Curse of dimensionality – width x height x channels –Noise reduction.
MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 04: GAUSSIAN CLASSIFIERS Objectives: Whitening.
Feature Selection and Dimensionality Reduction. “Curse of dimensionality” – The higher the dimensionality of the data, the more data is needed to learn.
Objectives: Chernoff Bound Bhattacharyya Bound ROC Curves Discrete Features Resources: V.V. – Chernoff Bound J.G. – Bhattacharyya T.T. – ROC Curves NIST.
Data Mining 2, Filter methods T statistic Information Distance Correlation Separability …
Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.
Chapter 15: Classification of Time- Embedded EEG Using Short-Time Principal Component Analysis by Nguyen Duc Thang 5/2009.
Presented by: Muhammad Wasif Laeeq (BSIT07-1) Muhammad Aatif Aneeq (BSIT07-15) Shah Rukh (BSIT07-22) Mudasir Abbas (BSIT07-34) Ahmad Mushtaq (BSIT07-45)
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Ching-Lung Chen Author : Pabitra Mitra Student Member 國立雲林科技大學 National Yunlin University.
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.
Principal Component Analysis (PCA)
Dimensionality Reduction
School of Computer Science & Engineering
Principal Component Analysis (PCA)
Principal Component Analysis
PCA vs ICA vs LDA.
Feature Selection To avid “curse of dimensionality”
REMOTE SENSING Multispectral Image Classification
REMOTE SENSING Multispectral Image Classification
Principal Component Analysis
Introduction PCA (Principal Component Analysis) Characteristics:
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Dimensionality Reduction
Feature space tansformation methods
Spectral Transformation
EE 492 ENGINEERING PROJECT
Multivariate Methods Berlin Chen
Principal Component Analysis
Multivariate Methods Berlin Chen, 2005 References:
Presentation transcript:

Why to reduce the number of the features? Having D features, we want to reduce their number to n, where n<<D Benefits: Lower computing complexity Improvement of the classification performance Danger: Possible loss of information

Basic approaches to DR Feature extraction Transform t : R D  R n Creation of a new feature space. The features lose their original meaning. Feature selection Selection of a subset of the original features.

Principal Component Transform (Karhunen-Loeve) PCT belongs to “feature extraction”, t is a rotation y = Tx, T is a matrix of the eigenvectors of the original covariance matrix C x PCT creates D new uncorrelated features y, C y = T’ C x T n features with the highest variations are kept

Principal Component Transform

Applications of the PCT “Optimal” data representation, compaction of the energy Visualization and compression of multimodal images

PCT of multispectral images Satellite image: B, G, R, nIR, IR, thermal IR

Why is PCT bad for classification purposes? PCT evaluates the contribution of individual features solely by their variation, which may be different from their discrimination power.

Why is PCT bad for classification purposes?

Separability problem Dimensionality reduction methods for classification purposes (Two-class problem) must consider the discrimination power of individual features. The goal is to maximize the “distance” between the classes.

An Example 3 classes, 3D feature space, reduction to 2D High discriminabilityLow discriminability

DR via feature selection Two things needed: Discriminability measure (Mahalanobis distance, Bhattacharyya distance) MD 12 = (m 1 – m 2 )(C 1 + C 2 ) -1 (m 1 – m 2 )’ Selection strategy Feature selection  optimization problem

Feature selection strategies Optimal - full search, complexity D!/(D-n)!n! - branch & bound Sub-optimal - direct selection (optimal if the features are not correlated) - sequential selection (SFS, SBS) - generalized sequential selection (SFS(k), Plus k minus m, floating search)

A priori knowledge in feature selection The above discriminability measures (MD, BD) require normally distributed classes. They are misleading and inapplicable otherwise.

A priori knowledge in feature selection The above discriminability measures (MD, BD) require normally distributed classes. They are misleading and inapplicable otherwise. Crucial questions in practical applications: Can the class-condicional distributions be assumed to be normal? What happens if this assumption is wrong?

A two-class example Class 2 Class 1 x2 is selected x1 is selected

Conclusion PCT is optimal for representation of “one- class” data (visualization, compression, etc). PCT should not be used for classification purposes. Use feature selection methods based on a proper discriminability measure. If you still use PCT before classification, be aware of possible errors.