Introduction to Pattern Recognition (การรู้จํารูปแบบเบื้องต้น)

Slides:



Advertisements
Similar presentations
ECE 8443 – Pattern Recognition Objectives: Course Introduction Typical Applications Resources: Syllabus Internet Books and Notes D.H.S: Chapter 1 Glossary.
Advertisements

Pattern Classification. Chapter 2 (Part 1): Bayesian Decision Theory (Sections ) Introduction Bayesian Decision Theory–Continuous Features.
Bayesian Decision Theory
Chapter 4: Linear Models for Classification
Chapter 1: Introduction to Pattern Recognition
Bayesian Decision Theory Chapter 2 (Duda et al.) – Sections
Lecture 20 Object recognition I
CS292 Computational Vision and Language Pattern Recognition and Classification.
OUTLINE Course description, What is pattern recognition, Cost of error, Decision boundaries, The desgin cycle.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification, Chapter 1 1 Basic Probability.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Introduction to machine learning
METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
嵌入式視覺 Pattern Recognition for Embedded Vision Template matching Statistical / Structural Pattern Recognition Neural networks.
Introduction to Pattern Recognition Charles Tappert Seidenberg School of CSIS, Pace University.
Introduction to Pattern Recognition Chapter 1 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis 1.
: Chapter 1: Introduction 1 Montri Karnjanadecha ac.th/~montri Principles of Pattern Recognition.
Institute of Systems and Robotics ISR – Coimbra Mobile Robotics Lab Bayesian Approaches 1 1 jrett.
Principles of Pattern Recognition
Speech Recognition Pattern Classification. 22 September 2015Veton Këpuska2 Pattern Classification  Introduction  Parametric classifiers  Semi-parametric.
Classification. An Example (from Pattern Classification by Duda & Hart & Stork – Second Edition, 2001)
IBS-09-SL RM 501 – Ranjit Goswami 1 Basic Probability.
Perception Introduction Pattern Recognition Image Formation
ECE 8443 – Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Normal Distributions Whitening Transformations Linear Discriminants Resources.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 02: BAYESIAN DECISION THEORY Objectives: Bayes.
Compiled By: Raj G Tiwari.  A pattern is an object, process or event that can be given a name.  A pattern class (or category) is a set of patterns sharing.
Image Classification 영상분류
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Classification Heejune Ahn SeoulTech Last updated May. 03.
1 E. Fatemizadeh Statistical Pattern Recognition.
Pattern Classification
Pattern Recognition April 19, 2007 Suggested Reading: Horn Chapter 14.
1 Pattern Recognition Pattern recognition is: 1. A research area in which patterns in data are found, recognized, discovered, …whatever. 2. A catchall.
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
Chapter 12 Object Recognition Chapter 12 Object Recognition 12.1 Patterns and pattern classes Definition of a pattern class:a family of patterns that share.
Mestrado em Ciência de Computadores Mestrado Integrado em Engenharia de Redes e Sistemas Informáticos VC 15/16 – TP14 Pattern Recognition Miguel Tavares.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Elements of Pattern Recognition CNS/EE Lecture 5 M. Weber P. Perona.
Learning Kernel Classifiers 1. Introduction Summarized by In-Hee Lee.
Objectives: Normal Random Variables Support Regions Whitening Transformations Resources: DHS – Chap. 2 (Part 2) K.F. – Intro to PR X. Z. – PR Course S.B.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Introduction to Pattern Recognition Chapter 1 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis 1.
Artificial Intelligence
Pattern Classification Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2000 Dr. Ding Yuxin Pattern Recognition.
Introduction Machine Learning 14/02/2017.
Special Topics In Scientific Computing
IMAGE PROCESSING RECOGNITION AND CLASSIFICATION
LECTURE 03: DECISION SURFACES
LECTURE 01: COURSE OVERVIEW
Pattern Recognition Sergios Theodoridis Konstantinos Koutroumbas
Introduction to Pattern Recognition and Machine Learning
Introduction to Pattern Recognition
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Course Outline MODEL INFORMATION COMPLETE INCOMPLETE
REMOTE SENSING Multispectral Image Classification
REMOTE SENSING Multispectral Image Classification
An Introduction to Supervised Learning
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Introduction to Pattern Recognition
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
LECTURE 01: COURSE OVERVIEW
Multivariate Methods Berlin Chen
Miguel Tavares Coimbra
Multivariate Methods Berlin Chen, 2005 References:
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Presentation transcript:

Introduction to Pattern Recognition (การรู้จํารูปแบบเบื้องต้น)

Week 2-3 Pattern Recognition Systems The Design Cycle Learning and Adaptation Classifier Based on Bayes Decision Theory

Pattern Recognition Systems

Pattern Recognition Systems Sensing Use of a transducer (camera or microphone) PR system depends of the bandwidth, the resolution sensitivity distortion of the transducer Segmentation and grouping Patterns should be well separated and should not overlap

Pattern Recognition Systems Feature extraction Discriminative features Invariant features with respect to translation, rotation and scale. Classification Use a feature vector provided by a feature extractor to assign the object to a category Post Processing Exploit context input dependent information other than from the target pattern itself to improve performance

The Design Cycle Data collection Feature Choice Model Choice Training Evaluation Computational Complexity

The Design Cycle Data Collection How do we know when we have collected an adequately large and representative set of examples for training and testing the system? Feature Choice Depends on the characteristics of the problem domain. Simple to extract, invariant to irrelevant transformation insensitive to noise.

The Design Cycle Model Choice Unsatisfied with the performance of our fish classifier and want to jump to another class of model Training Use data to determine the classifier. Many different procedures for training classifiers and choosing models

The Design Cycle Evaluation Measure the error rate (or performance and switch from one set of features to another one Computational Complexity What is the trade-off between computational ease and performance? (How an algorithm scales as a function of the number of features, patterns or categories?)

Learning and Adaptation Supervised learning A teacher provides a category label or cost for each pattern in the training set- Classification Unsupervised learning The system forms clusters or “natural groupings” of the input patterns - Clustering

Classifier Based on Bayes Decision Theory

Classifier Based on Bayes Decision Theory Bay’s Decision theory The Gaussian Probability Density Function Minimum Distance Classifier Euclidean Mahalanobis

Bay’s Decision theory

Bay’s Decision theory our example on classifying two fish as salmon or sea bass. And our agreement that any given fish is either a salmon or a sea bass; call state of nature of the fish. Let’s define a (probabilistic) variable that describes the state of nature. 𝜔 = 𝜔 1 for sea bass (1) 𝜔 = 𝜔 2 for salmon (2) Let’s assume this two class case.

Bay’s Decision theory The a priori or prior probability reflects our knowledge of how likely we expect a certain state of nature before we can actually observe said state of nature. In the fish example, it is the probability that we will see either a salmon or a sea bass next on the conveyor belt.

Bay’s Decision theory Note: The prior may vary depending on the situation. If we get equal numbers of salmon and sea bass in a catch, then the priors are equal, or uniform. Depending on the season, we may get more salmon than sea bass, for example.

Bay’s Decision theory We write P(𝜔 = 𝜔 1 ) or just 𝑃 𝜔 1 for the prior the next is a sea bass. The priors must exhibit exclusivity and exhaustively. For 𝑐 states of nature, or classes: 1= 𝑖=1 𝑐 𝑃( 𝜔 𝑖 )

Bay’s Decision theory A feature is an observable variable. A feature space is a set from which we can sample or observe values. Examples of features: Length, Width, Lightness, Location of Dorsal Fin For simplicity, let’s assume that our features are all continuous values. Denote a scalar feature as x and a vector feature as 𝑥. For a 𝑙-dimensional feature space, 𝑥 ∈ 𝑅 𝑙

Bay’s Decision theory In a classification task, we are given a pattern and task is to classify it into one out of 𝑐 classes. The number of classes, 𝑐, is assumed to be known a priori. Each pattern is represented by a set of feature values, 𝑥 𝑖 , 𝑖= 1,2,…,𝑙 which make up the l-dimensional feature vector 𝑥= [𝑥(1), 𝑥(2), … 𝑥(𝑙)] 𝑇 ∈𝑅 We assume that each pattern is represented uniquely by a single feature vector and that it can belong to only one class.

Bays Decision Theory Also, we let the number of possible classes be equal to 𝑐, that is 𝜔 𝑖 , … 𝜔 𝑐 According to the Bayes decision theory, 𝑥 is assigned to the class 𝜔 𝑖 if 𝑃 𝜔 𝑖 𝑥 >𝑃 𝜔 𝑗 𝑥

The Gaussian Probability Density Function

The Gaussian Probability Density Function The Gaussian pdf is extensively used in pattern recognition because of its mathematical tractability as well as because of the central limit theorem.

The Gaussian Probability Density Function The multidimensional Gaussian pdf has form 𝑝 𝑥 = 1 (2𝜋) 𝑙 2 |𝑆| 𝑙 2 𝑒𝑥𝑝⁡(− 1 2 𝑥−𝑚 𝑇 𝑆 −1 (𝑥−𝑚)) Where 𝑚 is the mean vector 𝑆 is the covariance matrix |𝑆| is determinant of 𝑆 𝑆 −1 is invert of 𝑆 𝑙 is number of dimension

The Gaussian Probability Density Function Example 1: Compute the value of a Gaussian pdf, 𝒩 𝑚,𝑆 , at 𝑥 1 = 0.2, 1.3 T , 𝑥 2 = 2.2,− 1.3 T where 𝑚= 0, 1 T , S= 1 0 0 1 𝑝(𝑥 1 )= ? 𝑝(𝑥 2 )= ? Answers 𝑝(𝑥 1 )=0.1491 𝑝(𝑥 2 )=0.001

The Gaussian Probability Density Function Example 2: Consider a 2-class classification task in the 2-dimensional space, where the data in both classes, 𝜔 1 , 𝜔 2 are distributed according to the Gaussian distributions 𝒩( 𝑚 1 , 𝑆 1 ) and 𝒩( 𝑚 2 , 𝑆 2 ) respectively. Let 𝑚 1 = 1, 1 𝑇 , 𝑚 2 = 3,− 3 𝑇 , 𝑆 1 = 𝑆 2 = 1 0 0 1 Assuming that 𝑃 𝜔 1 =𝑃 𝜔 2 = 1 2 classify 𝑥= 1.8, 1.8 𝑇 into 𝜔 1 or 𝜔 2 𝑃 𝜔 1 |𝑥 = ? 𝑃 𝜔 2 |𝑥 = ? Answers 𝑃 𝜔 1 |𝑥 =0.042 𝑃 𝜔 2 |𝑥 =0.0189 classify 𝑥 into 𝜔 1

Mean Vector and Covariance Matrix

Mean Vector and Covariance Matrix The first step in analyzing multivariate data is computing the mean vector and the variance-covariance matrix. Consider the following matrix: Sample data matrix Sample data matrix 𝑥= 4.0 2.0 0.60 4.2 2.1 0.59 3.9 4.3 4.1 2.0 2.1 2.2 0.58 0.62 0.63

Mean Vector and Covariance Matrix Each row vector  𝑥 𝑖  is another observation of the three variables (or components). The mean vector consists of the means of each variable the variance-covariance matrix consists of the variances of the variables along the main diagonal and the covariances between each pair of variables in the other matrix positions. 𝐶𝑜𝑣 𝑥 = 𝜎 11 𝜎 12 𝜎 21 𝜎 22 ⋮ 𝜎 𝑛1 ⋮ 𝜎 𝑛2 … 𝜎 1𝑛 … 𝜎 21 ⋱ … ⋮ 𝜎 𝑛𝑚

Mean Vector and Covariance Matrix The formula for computing the covariance of the variables  𝑋 and 𝑌  is 𝐶𝑜𝑣 𝑥 = 𝑖=1 𝑛 ( 𝑋 𝑖 − 𝑥 )( 𝑌 𝑖 − 𝑦 ) 𝑛−1 with  𝑥  and 𝑦 denoting the means of 𝑋 and 𝑌, respectively. 𝑛=5 for this example.

Mean Vector and Covariance Matrix The results are: mean vector 𝑥 = 4.10 2.08 0.604 variance-covariance matrix 𝑆=𝐶𝑜𝑣 𝑥 = 0.025 0.0075 0.00175 0.0075 0.0070 0.00135 0.00175 0.00135 0.0043

Mean Vector and Covariance Matrix Example mean : 𝑚= 0 0 covariance matrix: S= 𝑁=500 Generate random number following the Gaussian distribution 𝑋=𝑚𝑣𝑛𝑟𝑛𝑑(𝑚, 𝑆, 𝑁)

Mean Vector and Covariance Matrix S= 2 0 0 2 S= 1 0 0 1 S= 0.2 0 0 0.2

Mean Vector and Covariance Matrix S= 1 0.5 0.5 1 S= 0.2 0 0 2 S= 2 0 0 0.2

Mean Vector and Covariance Matrix S= 0.3 −0.5 −0.5 2 S= 0.3 0.5 0.5 2

Minimum Distance Classifiers Euclidean Mahalanobis

Minimum Distance Classifiers Template matching can be expressed mathematically through a notion of distance. Let 𝑥 be the feature vector for the unknown input, and let 𝑚 1 , 𝑚 2 , …, 𝑚 𝑐 be mean for the c classes. The error in matching 𝑥 against 𝑚 𝑖 is given by distance between 𝑥 and 𝑚 𝑖 . Choose the class for which the error is a minimum. This technique is called minimum distance classification.

Minimum Distance Classifiers x Distance m1 m2 mc • Minimum Selector Class x

The Euclidean Distance Classifier The minimum Euclidean distance classifier. 𝑑 𝐸 𝑥, 𝑚 𝑖 = 𝑥− 𝑚 𝑖 𝑇 𝑥− 𝑚 𝑖 That is, given an unknown 𝑥, assign it to class 𝜔 𝑖 if 𝑥− 𝑚 𝑖 ≡ 𝑥− 𝑚 𝑖 𝑇 𝑥− 𝑚 𝑖 < 𝑥− 𝑚 𝑗 , ∀𝑖≠𝑗 Where 𝑚 𝑖 is the mean of class 𝑖 𝑚 𝑗 is the mean of class 𝑗

The Euclidean Distance Classifier It must be stated that the Euclidean classifier is often used, because of its simplicity. It assigns a pattern to the class whose mean is closest to it with respect to the Euclidean norm.

The Euclidean Distance Classifier Example: Consider a 2-class classification task in the 3-dimensional space, where the two classes, 𝜔 1 and 𝜔 2 are modeled by Gaussian distributions with means 𝑚 1 and 𝑚 2 , respectively. 𝑚 1 = [0, 0,0] 𝑇 , 𝑚 2 = [0.5, 0.5, 0.5] 𝑇 Given the point 𝑥= [0.1, 0.5,0.1] 𝑇 classify 𝑥 according to the Euclidean distance classifier. Answers 𝑑 𝐸 𝑥, 𝑚 1 =0.51962 𝑑 𝐸 𝑥, 𝑚 2 =0.56569 The point 𝑥 is classified to the 𝜔 1 class.

The Mahalanobis Distance Classifier The minimum Mahalanobis distance classifier. 𝑑 𝑀 𝑥, 𝑚 𝑖 = 𝑥− 𝑚 𝑖 𝑇 𝑆 −1 (𝑥− 𝑚 𝑖 ) That is, given an unknown x, it is assigned to class 𝜔 𝑖 if 𝑥− 𝑚 𝑖 𝑇 𝑆 −1 (𝑥− 𝑚 𝑖 ) < 𝑥− 𝑚 𝑖 𝑇 𝑆 −1 (𝑥− 𝑚 𝑖 ) , ∀𝑖≠𝑗 where 𝑆 is the common covariance matrix. The presence of the covariance matrix accounts for the shape of the Gaussians. and 𝑆 −1 is invert of 𝑆

The Mahalanobis Distance Classifier Example: Consider a 2-class classification task in the 3-dimensional space, where the two classes, 𝜔 1 and 𝜔 2 are modeled by Gaussian distributions with means 𝑚 1 and 𝑚 2 ,respectively. 𝑚 1 = [0, 0,0] 𝑇 , 𝑚 2 = [0.5, 0.5, 0.5] 𝑇 Given the point 𝑥= [0.1, 0.5,0.1] 𝑇 The covariance matrix for distribution is 𝑠= 0.8 0.01 0.01 0.01 0.2 0.01 0.01 0.01 0.2 Given the point 𝑥= [0.1,0.5,0.1] 𝑇 , classifity 𝑥 according to the Mahalanobis distance classifier. Answers 𝑑 𝑀 𝑥, 𝑚 1 =1.133393 𝑑 𝑀 𝑥, 𝑚 2 =0.991780 The point 𝑥 is classified to the 𝜔 2 class.