Review of Statistical Pattern Recognition Wen-Hung Liao 9/22/2009
Review Paper A.K. Jain, R.P.W. Duin and J. Mao, “Statistical Pattern Recognition: A Review”, IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), Vol. 22, No. 1, pp. 4-37, Jan. 2000. More review papers: http://www.ph.tn.tudelft.nl/PRInfo/revpapers.html
Statistical Approach in PR Each pattern is represented in terms of d features and is viewed as a point in a d-dimensional feature space. Goal: establish decision boundaries to separate patterns belonging to different classes. Need to specify/estimate the probability distributions of the patterns.
Various Approaches in Statistical PR
Links Between Statistical and Neural Network Methods Linear Discriminant Function Principal Component Analysis Nonlinear Discriminant Function Parzen Window Density-based Classifier Perceptron Auto-Associative Networks Multilayer Perceptron Radial Basis Function Network
Model for Statistical Pattern Recognition Classification Feature Measurement Classification Preprocessing Training Feature Extraction /Selection Learning Preprocessing
The Curse of Dimensionality The performance of a classifier depends on the relationship between sample sizes, number of features and classifier complexity. Number of training data points be an exponential function of the feature dimension space.
Class-Conditional Probability Length d feature vector: x = (x1,x2,…,xd) C Classes (or categories): w1,w2,…,wc Class-conditional probability: The probability of x happening given that it belongs to class wi: p(x|wi)
How Many Features are Enough? Question: More features, better classification? Answer: Yes, if the class-conditional densities are completely known. No, if we need to estimate the the class-conditional densities.
Dimensionality Reduction Keep the number of features as small as possible (but not too small) due to: measurement cost classification accuracy Always some trade-off
Feature Extraction/Selection Feature Extraction: extract features from the sensed data Feature Selection: select (hopefully) the best subset of the input feature set. Feature extraction usually precedes selection Application-domain dependent
Example: Chernoff Faces Three classes of face Feature set: Nose length, mouth curvature, eye size, face shape. 150 4-d patterns, 50 patterns per class.
Chernoff Faces