Hairong Qi, Gonzalez Family Professor

Slides:



Advertisements
Similar presentations
Component Analysis (Review)
Advertisements

Pattern Recognition and Machine Learning
Face Recognition Ying Wu Electrical and Computer Engineering Northwestern University, Evanston, IL
Classification and risk prediction
Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Principal Component Analysis
1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Probability of Error Feature vectors typically have dimensions greater than 50. Classification accuracy depends upon the dimensionality and the amount.
Principles of Pattern Recognition
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
1 E. Fatemizadeh Statistical Pattern Recognition.
ECE 8443 – Pattern Recognition LECTURE 08: DIMENSIONALITY, PRINCIPAL COMPONENTS ANALYSIS Objectives: Data Considerations Computational Complexity Overfitting.
ECE 471/571 – Lecture 2 Bayesian Decision Theory 08/25/15.
ECE 471/571 – Lecture 6 Dimensionality Reduction – Fisher’s Linear Discriminant 09/08/15.
Linear Models for Classification
Discriminant Analysis
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Elements of Pattern Recognition CNS/EE Lecture 5 M. Weber P. Perona.
ECE 471/571 - Lecture 19 Review 11/12/15. A Roadmap 2 Pattern Classification Statistical ApproachNon-Statistical Approach SupervisedUnsupervised Basic.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 9: Review.
Linear Classifiers Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
ECE 471/571 – Lecture 21 Syntactic Pattern Recognition 11/19/15.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 10: PRINCIPAL COMPONENTS ANALYSIS Objectives:
ECE 471/571 – Lecture 3 Discriminant Function and Normal Density 08/27/15.
Computer Vision Lecture 7 Classifiers. Computer Vision, Lecture 6 Oleh Tretiak © 2005Slide 1 This Lecture Bayesian decision theory (22.1, 22.2) –General.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.
Syntactic Pattern Recognition 04/28/17
ECE 471/571 - Lecture 19 Review 02/24/17.
Nonparametric Density Estimation – k-nearest neighbor (kNN) 02/20/17
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
Performance Evaluation 02/15/17
LECTURE 10: DISCRIMINANT ANALYSIS
Recognition with Expression Variations
ECE 471/571 – Lecture 18 Classifier Fusion 04/12/17.
Unsupervised Learning - Clustering 04/03/17
Unsupervised Learning - Clustering
Course Outline MODEL INFORMATION COMPLETE INCOMPLETE
REMOTE SENSING Multispectral Image Classification
REMOTE SENSING Multispectral Image Classification
ECE 471/571 – Lecture 12 Perceptron.
ECE 599/692 – Deep Learning Lecture 9 – Autoencoder (AE)
Pattern Recognition and Image Analysis
ECE 471/571 – Review 1.
Parametric Estimation
Pattern Recognition and Machine Learning
Hairong Qi, Gonzalez Family Professor
Generally Discriminant Analysis
LECTURE 09: DISCRIMINANT ANALYSIS
Parametric Methods Berlin Chen, 2005 References:
ECE – Pattern Recognition Lecture 16: NN – Back Propagation
ECE – Pattern Recognition Lecture 15: NN - Perceptron
ECE – Pattern Recognition Lecture 13 – Decision Tree
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Hairong Qi, Gonzalez Family Professor
Machine Learning – a Probabilistic Perspective
Hairong Qi, Gonzalez Family Professor
ECE – Lecture 1 Introduction.
ECE – Pattern Recognition Lecture 10 – Nonparametric Density Estimation – k-nearest-neighbor (kNN) Hairong Qi, Gonzalez Family Professor Electrical.
Bayesian Decision Theory
ECE – Pattern Recognition Lecture 8 – Performance Evaluation
ECE – Pattern Recognition Lecture 4 – Parametric Estimation
Hairong Qi, Gonzalez Family Professor
ECE – Pattern Recognition Lecture 14 – Gradient Descent
ECE – Pattern Recognition Midterm Review
Presentation transcript:

ECE471-571 – Pattern Recognition Lecture 6 – Dimensionality Reduction – Fisher’s Linear Discriminant Hairong Qi, Gonzalez Family Professor Electrical Engineering and Computer Science University of Tennessee, Knoxville http://www.eecs.utk.edu/faculty/qi Email: hqi@utk.edu

Pattern Classification Statistical Approach Non-Statistical Approach Supervised Unsupervised Decision-tree Basic concepts: Baysian decision rule (MPP, LR, Discri.) Basic concepts: Distance Agglomerative method Syntactic approach Parameter estimate (ML, BL) k-means Non-Parametric learning (kNN) Winner-takes-all LDF (Perceptron) Kohonen maps NN (BP) Mean-shift Support Vector Machine Deep Learning (DL) Dimensionality Reduction FLD, PCA Performance Evaluation ROC curve (TP, TN, FN, FP) cross validation Stochastic Methods local opt (GD) global opt (SA, GA) Classifier Fusion majority voting NB, BKS

Review - Bayes Decision Rule Maximum Posterior Probability Likelihood Ratio If , then x belongs to class 1, otherwise, 2. Discriminant Function Case 1: Minimum Euclidean Distance (Linear Machine), Si=s2I Case 2: Minimum Mahalanobis Distance (Linear Machine), Si = S Case 3: Quadratic classifier , Si = arbitrary

The Curse of Dimensionality – 1st Aspect The number of training samples What would the probability density function look like if the dimensionality is very high? For a 7-dimensional space, where each variable could have 20 possible values, then the 7-d histogram contains 207 cells. To distribute a training set of some reasonable size (1000) among this many cells is to leave virtually all the cells empty

Curse of Dimensionality – 2nd Aspect Accuracy and overfitting In theory, the higher the dimensionality, the less the error, the better the performance. However, in realistic pattern recognition problems, the opposite is often true. Why? The assumption that pdf behaves like Gaussian is only approximately true When increasing the dimensionality, we may be overfitting the training set. Problem: excellent performance on the training set, poor performance on new data points which are in fact very close to the data within the training set

Curse of Dimensionality - 3rd Aspect Computational complexity

Dimensionality Reduction Fisher’s linear discriminant Best discriminating the data Principal component analysis (PCA) Best representing the data

Fisher’s Linear Discriminant For two-class cases, projection of data from d-dimension onto a line Principle: We’d like to find vector w (direction of the line) such that the projected data set can be best separated Projected mean Sample mean

Other Approaches? Solution 1: make the projected mean as apart as possible Solution 2? Scatter matrix Between-class scatter matrix Within-class scatter matrix

*The Generalized Rayleigh Quotient Canonical variate

Some Math Preliminaries Positive definite A matrix S is positive definite if y=xTSx>0 for all Rd except 0 xTSx is called the quadratic form The derivative of a quadratic form is particularly useful Eigenvalue and eigenvector x is called the eigenvector of A iff x is not zero, and Ax=lx l is the eigenvalue of x

* Multiple Discriminant Analysis For c-class problem, the projection is from d-dimensional space to a (c-1)-dimensional space (assume d >= c) Sec. 3.8.3