ECSE 6610 Pattern Recognition Professor Qiang Ji Spring, 2011
Pattern Recognition Overview Unknown Classifier/ Regressor Feature extraction: extract the most discriminative features to concisely represent the original data, typically involving dimensionality reduction Training/Learning: learn a mapping function that maps input to output Classification/regression: map the input to a discrete output value for classification and to continuous output value for regression. Feature extraction Training Raw Data Features Output Values Training Testing Raw Data Features Output Values Training Classification/ Regression Feature extraction Learned Classifier/ Regressor Training Testing
Pattern Recognition Overview (cont’d) Supervised learning Both input (feature) and output (class labels) are provided Unsupervised learning-only input is given Clustering Dimensionality reduction Density estimation Semi-supervised learning-some input has output labels and others do not have
Examples of Pattern Recognition Applications Computer/Machine Vision object recognition, activity recognition, image segmentation, inspection Medical Imaging Cell classification Optical Character Recognition Machine or hand written character/digit recognition Brain Computer Interface Classify human brain states from EEG signals Speech Recognition Speaker recognition, speech understanding, language translation Robotics Obstacle detection, scene understanding, navigation
Computer Vision Example: Facial Expression Recognition
Machine Vision Example
Example: Handwritten Digit Recognition
8 Probability Calculus P(X ˅ Y)=P(X)+P(Y) - P(X ˄ Y) U is the sample space X is a subset of the outcome or an event, i.e, X and Y are mutually exclusive
9 Probability Calculus (cont’d) Conditional independence The Chain Rule Given three events A, B, C
The Rules of Probability Sum Rule Product Rule
Bayes’ Theorem posterior likelihood × prior
12
13 Bayes Rule Based on definition of conditional probability p(A i |E) is posterior probability given evidence E p(A i ) is the prior probability P(E|A i ) is the likelihood of the evidence given A i p(E) is the probability of the evidence i ii iiii i ))p(AA|p(E ))p(AA|p(E p(E) ))p(AA|p(E E)|p(A p(B) A)p(A)|p(B p(B) B)p(A, B)|p(A A1A1 A2A2 A3A3 A4A4 A5A5 A6A6 E
Bayesian Rule (cont’d) Assume E 1 and E 2 are independent given H, the above equation may be written as where is the prior and is the likelihood of H given E 2
15 A Simple Example Consider two related variables: 1. Drug (D) with values y or n 2. Test (T) with values +ve or –ve And suppose we have the following probabilities: P(D = y) = P(T = +ve | D = y) = 0.8 P(T = +ve | D = n) = 0.01 These probabilities are sufficient to define a joint probability distribution. Suppose an athlete tests positive. What is the probability that he has taken the drug?
Expectation (or Mean) For discrete RV X For continuous RV X Conditional Expectation 16
Expectations Conditional Expectation (discrete) Approximate Expectation (discrete and continuous)
Variance The variance of a RV X Standard deviation Covariance of RVs X and Y, Chebyshev inequality 18
Variances and Covariances
Independence If X and Y are independent, then 20
Probability Densities p(x) is the density function, while P(x) is the cumulative distribution. P(x) is a non-decreasing function.
Transformed Densities
The Gaussian Distribution
Gaussian Mean and Variance
The Multivariate Gaussian mean vector covariance matrix
Minimum Misclassification Rate Two types of mistakes: False positive (type 1) False negative (type 2) The above is called Bayes error. Minimum Bayes error is achieved at x 0
Generative vs Discriminative Generative approach: Model Use Bayes’ theorem Discriminative approach: Model directly