Download presentation
Presentation is loading. Please wait.
Published byMelvyn King Modified over 9 years ago
1
Introduction to Pattern Recognition (การรู้จํารูปแบบเบื้องต้น)
2
Week 2-3 Pattern Recognition Systems The Design Cycle
Learning and Adaptation Classifier Based on Bayes Decision Theory
3
Pattern Recognition Systems
4
Pattern Recognition Systems
Sensing Use of a transducer (camera or microphone) PR system depends of the bandwidth, the resolution sensitivity distortion of the transducer Segmentation and grouping Patterns should be well separated and should not overlap
5
Pattern Recognition Systems
Feature extraction Discriminative features Invariant features with respect to translation, rotation and scale. Classification Use a feature vector provided by a feature extractor to assign the object to a category Post Processing Exploit context input dependent information other than from the target pattern itself to improve performance
6
The Design Cycle Data collection Feature Choice Model Choice Training
Evaluation Computational Complexity
7
The Design Cycle Data Collection
How do we know when we have collected an adequately large and representative set of examples for training and testing the system? Feature Choice Depends on the characteristics of the problem domain. Simple to extract, invariant to irrelevant transformation insensitive to noise.
8
The Design Cycle Model Choice
Unsatisfied with the performance of our fish classifier and want to jump to another class of model Training Use data to determine the classifier. Many different procedures for training classifiers and choosing models
9
The Design Cycle Evaluation
Measure the error rate (or performance and switch from one set of features to another one Computational Complexity What is the trade-off between computational ease and performance? (How an algorithm scales as a function of the number of features, patterns or categories?)
10
Learning and Adaptation
Supervised learning A teacher provides a category label or cost for each pattern in the training set- Classification Unsupervised learning The system forms clusters or “natural groupings” of the input patterns - Clustering
11
Classifier Based on Bayes Decision Theory
12
Classifier Based on Bayes Decision Theory
Bay’s Decision theory The Gaussian Probability Density Function Minimum Distance Classifier Euclidean Mahalanobis
13
Bay’s Decision theory
14
Bay’s Decision theory our example on classifying two fish as salmon or sea bass. And our agreement that any given fish is either a salmon or a sea bass; call state of nature of the fish. Let’s define a (probabilistic) variable that describes the state of nature. 𝜔 = 𝜔 1 for sea bass (1) 𝜔 = 𝜔 2 for salmon (2) Let’s assume this two class case.
15
Bay’s Decision theory The a priori or prior probability reflects our knowledge of how likely we expect a certain state of nature before we can actually observe said state of nature. In the fish example, it is the probability that we will see either a salmon or a sea bass next on the conveyor belt.
16
Bay’s Decision theory Note: The prior may vary depending on the situation. If we get equal numbers of salmon and sea bass in a catch, then the priors are equal, or uniform. Depending on the season, we may get more salmon than sea bass, for example.
17
Bay’s Decision theory We write P(𝜔 = 𝜔 1 ) or just 𝑃 𝜔 1 for the prior the next is a sea bass. The priors must exhibit exclusivity and exhaustively. For 𝑐 states of nature, or classes: 1= 𝑖=1 𝑐 𝑃( 𝜔 𝑖 )
18
Bay’s Decision theory A feature is an observable variable.
A feature space is a set from which we can sample or observe values. Examples of features: Length, Width, Lightness, Location of Dorsal Fin For simplicity, let’s assume that our features are all continuous values. Denote a scalar feature as x and a vector feature as 𝑥. For a 𝑙-dimensional feature space, 𝑥 ∈ 𝑅 𝑙
19
Bay’s Decision theory In a classification task, we are given a pattern and task is to classify it into one out of 𝑐 classes. The number of classes, 𝑐, is assumed to be known a priori. Each pattern is represented by a set of feature values, 𝑥 𝑖 , 𝑖= 1,2,…,𝑙 which make up the l-dimensional feature vector 𝑥= [𝑥(1), 𝑥(2), … 𝑥(𝑙)] 𝑇 ∈𝑅 We assume that each pattern is represented uniquely by a single feature vector and that it can belong to only one class.
20
Bays Decision Theory Also, we let the number of possible classes be equal to 𝑐, that is 𝜔 𝑖 , … 𝜔 𝑐 According to the Bayes decision theory, 𝑥 is assigned to the class 𝜔 𝑖 if 𝑃 𝜔 𝑖 𝑥 >𝑃 𝜔 𝑗 𝑥
21
The Gaussian Probability Density Function
22
The Gaussian Probability Density Function
The Gaussian pdf is extensively used in pattern recognition because of its mathematical tractability as well as because of the central limit theorem.
23
The Gaussian Probability Density Function
The multidimensional Gaussian pdf has form 𝑝 𝑥 = 1 (2𝜋) 𝑙 2 |𝑆| 𝑙 2 𝑒𝑥𝑝(− 𝑥−𝑚 𝑇 𝑆 −1 (𝑥−𝑚)) Where 𝑚 is the mean vector 𝑆 is the covariance matrix |𝑆| is determinant of 𝑆 𝑆 −1 is invert of 𝑆 𝑙 is number of dimension
24
The Gaussian Probability Density Function
Example 1: Compute the value of a Gaussian pdf, 𝒩 𝑚,𝑆 , at 𝑥 1 = 0.2, 1.3 T , 𝑥 2 = 2.2,− 1.3 T where 𝑚= 0, 1 T , S= 𝑝(𝑥 1 )= ? 𝑝(𝑥 2 )= ? Answers 𝑝(𝑥 1 )=0.1491 𝑝(𝑥 2 )=0.001
25
The Gaussian Probability Density Function
Example 2: Consider a 2-class classification task in the 2-dimensional space, where the data in both classes, 𝜔 1 , 𝜔 2 are distributed according to the Gaussian distributions 𝒩( 𝑚 1 , 𝑆 1 ) and 𝒩( 𝑚 2 , 𝑆 2 ) respectively. Let 𝑚 1 = 1, 1 𝑇 , 𝑚 2 = 3,− 3 𝑇 , 𝑆 1 = 𝑆 2 = Assuming that 𝑃 𝜔 1 =𝑃 𝜔 2 = 1 2 classify 𝑥= 1.8, 1.8 𝑇 into 𝜔 1 or 𝜔 2 𝑃 𝜔 1 |𝑥 = ? 𝑃 𝜔 2 |𝑥 = ? Answers 𝑃 𝜔 1 |𝑥 =0.042 𝑃 𝜔 2 |𝑥 =0.0189 classify 𝑥 into 𝜔 1
26
Mean Vector and Covariance Matrix
27
Mean Vector and Covariance Matrix
The first step in analyzing multivariate data is computing the mean vector and the variance-covariance matrix. Consider the following matrix: Sample data matrix Sample data matrix 𝑥=
28
Mean Vector and Covariance Matrix
Each row vector 𝑥 𝑖 is another observation of the three variables (or components). The mean vector consists of the means of each variable the variance-covariance matrix consists of the variances of the variables along the main diagonal and the covariances between each pair of variables in the other matrix positions. 𝐶𝑜𝑣 𝑥 = 𝜎 11 𝜎 12 𝜎 21 𝜎 22 ⋮ 𝜎 𝑛1 ⋮ 𝜎 𝑛 … 𝜎 1𝑛 … 𝜎 21 ⋱ … ⋮ 𝜎 𝑛𝑚
29
Mean Vector and Covariance Matrix
The formula for computing the covariance of the variables 𝑋 and 𝑌 is 𝐶𝑜𝑣 𝑥 = 𝑖=1 𝑛 ( 𝑋 𝑖 − 𝑥 )( 𝑌 𝑖 − 𝑦 ) 𝑛−1 with 𝑥 and 𝑦 denoting the means of 𝑋 and 𝑌, respectively. 𝑛=5 for this example.
30
Mean Vector and Covariance Matrix
The results are: mean vector 𝑥 = variance-covariance matrix 𝑆=𝐶𝑜𝑣 𝑥 =
31
Mean Vector and Covariance Matrix
Example mean : 𝑚= 0 0 covariance matrix: S= 𝑁=500 Generate random number following the Gaussian distribution 𝑋=𝑚𝑣𝑛𝑟𝑛𝑑(𝑚, 𝑆, 𝑁)
32
Mean Vector and Covariance Matrix
S= S= S=
33
Mean Vector and Covariance Matrix
S= S= S=
34
Mean Vector and Covariance Matrix
S= 0.3 −0.5 −0.5 2 S=
35
Minimum Distance Classifiers
Euclidean Mahalanobis
36
Minimum Distance Classifiers
Template matching can be expressed mathematically through a notion of distance. Let 𝑥 be the feature vector for the unknown input, and let 𝑚 1 , 𝑚 2 , …, 𝑚 𝑐 be mean for the c classes. The error in matching 𝑥 against 𝑚 𝑖 is given by distance between 𝑥 and 𝑚 𝑖 . Choose the class for which the error is a minimum. This technique is called minimum distance classification.
37
Minimum Distance Classifiers
x Distance m1 m2 mc • Minimum Selector Class x
38
The Euclidean Distance Classifier
The minimum Euclidean distance classifier. 𝑑 𝐸 𝑥, 𝑚 𝑖 = 𝑥− 𝑚 𝑖 𝑇 𝑥− 𝑚 𝑖 That is, given an unknown 𝑥, assign it to class 𝜔 𝑖 if 𝑥− 𝑚 𝑖 ≡ 𝑥− 𝑚 𝑖 𝑇 𝑥− 𝑚 𝑖 < 𝑥− 𝑚 𝑗 , ∀𝑖≠𝑗 Where 𝑚 𝑖 is the mean of class 𝑖 𝑚 𝑗 is the mean of class 𝑗
39
The Euclidean Distance Classifier
It must be stated that the Euclidean classifier is often used, because of its simplicity. It assigns a pattern to the class whose mean is closest to it with respect to the Euclidean norm.
40
The Euclidean Distance Classifier
Example: Consider a 2-class classification task in the 3-dimensional space, where the two classes, 𝜔 1 and 𝜔 2 are modeled by Gaussian distributions with means 𝑚 1 and 𝑚 2 , respectively. 𝑚 1 = [0, 0,0] 𝑇 , 𝑚 2 = [0.5, 0.5, 0.5] 𝑇 Given the point 𝑥= [0.1, 0.5,0.1] 𝑇 classify 𝑥 according to the Euclidean distance classifier. Answers 𝑑 𝐸 𝑥, 𝑚 1 = 𝑑 𝐸 𝑥, 𝑚 2 = The point 𝑥 is classified to the 𝜔 1 class.
41
The Mahalanobis Distance Classifier
The minimum Mahalanobis distance classifier. 𝑑 𝑀 𝑥, 𝑚 𝑖 = 𝑥− 𝑚 𝑖 𝑇 𝑆 −1 (𝑥− 𝑚 𝑖 ) That is, given an unknown x, it is assigned to class 𝜔 𝑖 if 𝑥− 𝑚 𝑖 𝑇 𝑆 −1 (𝑥− 𝑚 𝑖 ) < 𝑥− 𝑚 𝑖 𝑇 𝑆 −1 (𝑥− 𝑚 𝑖 ) , ∀𝑖≠𝑗 where 𝑆 is the common covariance matrix. The presence of the covariance matrix accounts for the shape of the Gaussians. and 𝑆 −1 is invert of 𝑆
42
The Mahalanobis Distance Classifier
Example: Consider a 2-class classification task in the 3-dimensional space, where the two classes, 𝜔 1 and 𝜔 2 are modeled by Gaussian distributions with means 𝑚 1 and 𝑚 2 ,respectively. 𝑚 1 = [0, 0,0] 𝑇 , 𝑚 2 = [0.5, 0.5, 0.5] 𝑇 Given the point 𝑥= [0.1, 0.5,0.1] 𝑇 The covariance matrix for distribution is 𝑠= Given the point 𝑥= [0.1,0.5,0.1] 𝑇 , classifity 𝑥 according to the Mahalanobis distance classifier. Answers 𝑑 𝑀 𝑥, 𝑚 1 = 𝑑 𝑀 𝑥, 𝑚 2 = The point 𝑥 is classified to the 𝜔 2 class.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.