Dimensionality reduction CISC 5800 Professor Daniel Leeds.

Slides:



Advertisements
Similar presentations
: INTRODUCTION TO Machine Learning Parametric Methods.
Advertisements

Face Recognition and Biometric Systems Eigenfaces (2)
Data Mining Feature Selection. Data reduction: Obtain a reduced representation of the data set that is much smaller in volume but yet produces the same.
Model assessment and cross-validation - overview
Face Recognition and Biometric Systems
As applied to face recognition.  Detection vs. Recognition.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Lecture 14 – Neural Networks
Support Vector Machines (and Kernel Methods in general)
Support Vector Machines and Kernel Methods
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Principal Component Analysis
Quaternion Colour Constancy
Dimensional reduction, PCA
Principal Component Analysis IML Outline Max the variance of the output coordinates Optimal reconstruction Generating data Limitations of PCA.
Support Vector Machines Based on Burges (1998), Scholkopf (1998), Cristianini and Shawe-Taylor (2000), and Hastie et al. (2001) David Madigan.
Face Recognition Jeremy Wyatt.
Computer Vision I Instructor: Prof. Ko Nishino. Today How do we recognize objects in images?
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.
Eigenfaces for Recognition Student: Yikun Jiang Professor: Brendan Morris.
Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)
Probability of Error Feature vectors typically have dimensions greater than 50. Classification accuracy depends upon the dimensionality and the amount.
CSE 185 Introduction to Computer Vision Pattern Recognition.
Introduction to variable selection I Qi Yu. 2 Problems due to poor variable selection: Input dimension is too large; the curse of dimensionality problem.
Access Control Via Face Recognition Progress Review.
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
ECE 8443 – Pattern Recognition LECTURE 08: DIMENSIONALITY, PRINCIPAL COMPONENTS ANALYSIS Objectives: Data Considerations Computational Complexity Overfitting.
Classification Derek Hoiem CS 598, Spring 2009 Jan 27, 2009.
Nick Wang, 25 Oct Speaker identification and verification using EigenVoices O. Thyes, R. Kuhn, P. Nguyen, and J.-C. Junqua in ICSLP2000 Presented.
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.
INTRODUCTION TO Machine Learning 3rd Edition
Slides for “Data Mining” by I. H. Witten and E. Frank.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Machine Learning Saarland University, SS 2007 Holger Bast Marjan Celikik Kevin Chang Stefan Funke Joachim Giesen Max-Planck-Institut für Informatik Saarbrücken,
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Elements of Pattern Recognition CNS/EE Lecture 5 M. Weber P. Perona.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:
CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct
Feature Selection and Dimensionality Reduction. “Curse of dimensionality” – The higher the dimensionality of the data, the more data is needed to learn.
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
2D-LDA: A statistical linear discriminant analysis for image matrix
Machine Learning CUNY Graduate Center Lecture 6: Linear Regression II.
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
Computational Intelligence: Methods and Applications Lecture 34 Applications of information theory and selection of information Włodzisław Duch Dept. of.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
Predictive Automatic Relevance Determination by Expectation Propagation Y. Qi T.P. Minka R.W. Picard Z. Ghahramani.
1 C.A.L. Bailer-Jones. Machine Learning. Model selection and combination Machine learning, pattern recognition and statistical data modelling Lecture 10.
Support Vector Machines (SVMs) Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.
Dimensionality Reduction
LECTURE 11: Advanced Discriminant Analysis
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
Machine learning, pattern recognition and statistical data modelling
Lecture 8:Eigenfaces and Shared Features
Mixture of SVMs for Face Class Modeling
Feature Selection To avid “curse of dimensionality”
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
Object Modeling with Layers
PCA is “an orthogonal linear transformation that transfers the data to a new coordinate system such that the greatest variance by any projection of the.
Presenter: Georgi Nalbantov
Linear Model Selection and Regularization
Linear Model Selection and regularization
INTRODUCTION TO Machine Learning
Multivariate Methods Berlin Chen, 2005 References:
Presentation transcript:

Dimensionality reduction CISC 5800 Professor Daniel Leeds

The benefits of extra dimensions Finds existing complex separations between classes 2

The risks of too-many dimensions 3 High dimensions with kernels over-fit the outlier data Two dimensions ignore the outlier data

Training vs. testing 4 training epochs error

Goal: High Performance, Few Parameters “Information criterion”: performance/parameter trade-off Variables to consider: L likelihood of train data after learning k number of parameters (e.g., number of features) m number of points of training data Popular information criteria: Akaike information criterion AIC : log(L) - k Bayesian information criterion BIC : log(L) k log(m) 5

Goal: |Data| > |Parameters| 6

Decreasing parameters Force parameter values to 0 L1 regularization Support Vector selection Feature selection/removal Consolidate feature space Component analysis 7

Feature removal Start with feature set: F={x 1, …, x k } Find classifier performance with set F: perform(F) Loop Find classifier performance for removing feature x 1, x 2, …, x k : argmax i perform(F-x 1 ) Remove feature that causes least decrease in performance: F=F-x i Repeat, using AIC or BIC as termination criterion 8 AIC : log(L) - k BIC : log(L) k log(m)

AIC testing: log(L)-k 9 Featuresk (num features)L (likelihood)AIC F F-{x 3 } F-{x 3,x 24 } F-{x 3,x 24,x 32 } F-{x 3,x 24,x 32,x 15 }

Feature selection Find classifier performance for just set of 1 feature: argmax i perform({x i }) Add feature with highest performance: F={x i } Loop Find classifier performance for adding one new feature: argmax i perform(F+{x i }) Add to F feature with highest performance increase: F=F+{x i } Repeat, using AIC or BIC as termination criterion 10 AIC : log(L) - k BIC : log(L) k log(m)

Defining new feature axes 11

Defining data points with new axes 12

Component analysis 13

Component analysis: examples 14 ComponentsData 2 0

Component analysis: examples “Eigenfaces” – learned from set of face images u: nine components x i : data reconstructed 15

Types of component analysis 16

Principle component analysis (PCA) 17

Independent component analysis (ICA) 18

Evaluating components 19