Covariance matrices for all of the classes are identical, But covariance matrices are arbitrary.

Slides:



Advertisements
Similar presentations
March 2006Alon Slapak 1 of 1 Bayes Classification A practical approach Example Discriminant function Bayes theorem Bayes discriminant function Bibliography.
Advertisements

Chapter 4 Pattern Recognition Concepts: Introduction & ROC Analysis.
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
2 1 Discrete Markov Processes (Markov Chains) 3 1 First-Order Markov Models.
2 – In previous chapters: – We could design an optimal classifier if we knew the prior probabilities P(wi) and the class- conditional probabilities P(x|wi)
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Chapter 2: Bayesian Decision Theory (Part 2) Minimum-Error-Rate Classification Classifiers, Discriminant Functions and Decision Surfaces The Normal Density.
Pattern Classification, Chapter 2 (Part 2) 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R.
Pattern Classification. Chapter 2 (Part 1): Bayesian Decision Theory (Sections ) Introduction Bayesian Decision Theory–Continuous Features.
Pattern Classification, Chapter 2 (Part 2) 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R.
Pattern Classification Chapter 2 (Part 2)0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Bayesian Decision Theory Chapter 2 (Duda et al.) – Sections
Classification and risk prediction
Chapter 5: Linear Discriminant Functions
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Chapter 2: Bayesian Decision Theory (Part 1) Introduction Bayesian Decision Theory–Continuous Features All materials used in this course were taken from.
1-NN Rule: Given an unknown sample X decide if for That is, assign X to category if the closest neighbor of X is from category i.
Decision Theory Naïve Bayes ROC Curves
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Chapter 2 (part 3) Bayesian Decision Theory Discriminant Functions for the Normal Density Bayes Decision Theory – Discrete Features All materials used.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Pattern Recognition: Baysian Decision Theory Charles Tappert Seidenberg School of CSIS, Pace University.
Principles of Pattern Recognition
Speech Recognition Pattern Classification. 22 September 2015Veton Këpuska2 Pattern Classification  Introduction  Parametric classifiers  Semi-parametric.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
2. Bayes Decision Theory Prof. A.L. Yuille Stat 231. Fall 2004.
Nearest Neighbor (NN) Rule & k-Nearest Neighbor (k-NN) Rule Non-parametric : Can be used with arbitrary distributions, No need to assume that the form.
Computational Intelligence: Methods and Applications Lecture 12 Bayesian decisions: foundation of learning Włodzisław Duch Dept. of Informatics, UMK Google:
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
URL:.../publications/courses/ece_8443/lectures/current/exam/2004/ ECE 8443 – Pattern Recognition LECTURE 15: EXAM NO. 1 (CHAP. 2) Spring 2004 Solutions:
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 3. Bayes Decision Theory: Part II. Prof. A.L. Yuille Stat 231. Fall 2004.
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
Optimal Bayes Classification
1 Bayesian Decision Theory Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking and.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Elements of Pattern Recognition CNS/EE Lecture 5 M. Weber P. Perona.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Bayesian decision theory: A framework for making decisions when uncertainty exit 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 04: GAUSSIAN CLASSIFIERS Objectives: Whitening.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 9: Review.
ECE 471/571 – Lecture 3 Discriminant Function and Normal Density 08/27/15.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Pattern Classification Chapter 2(Part 3) 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O.
Computer Vision Lecture 7 Classifiers. Computer Vision, Lecture 6 Oleh Tretiak © 2005Slide 1 This Lecture Bayesian decision theory (22.1, 22.2) –General.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Lecture 2. Bayesian Decision Theory
Machine Learning – Classification David Fenyő
IMAGE PROCESSING RECOGNITION AND CLASSIFICATION
Classification Discriminant Analysis
Classification Discriminant Analysis
LECTURE 05: THRESHOLD DECODING
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Recognition and Image Analysis
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
LECTURE 05: THRESHOLD DECODING
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Presentation transcript:

Covariance matrices for all of the classes are identical, But covariance matrices are arbitrary.

and are independent of i. So,

Covariance matrices are different for each category.

only is independent of i. So,

EXAMPLE:

EXAMPLE (cont.):

Find the discriminant function for the first class.

EXAMPLE (cont.): Find the discriminant function for the first class.

EXAMPLE (cont.): Similarly, find the discriminant function for the second class.

EXAMPLE (cont.): The decision boundary:

EXAMPLE (cont.): The decision boundary:

EXAMPLE (cont.): Using MATLAB we can draw the decision boundary: (to draw the decision boundary in MATLAB) >> s = 'x^2-10*x-4*x*y+8*y-1+2*log(2)'; >> ezplot(s)

EXAMPLE (cont.): Using MATLAB we can draw the decision boundary:

EXAMPLE (cont.):

Error Probabilities and Integrals

Receiver Operating Characteristics Another measure of distance between two Gaussian distributions. found a great use in medicine, radar detection and other fields.

Receiver Operating Characteristics

If both diagnosis and test are positive, it is called a true positive. The probability of a TP to occur is estimated by counting the true positives in the sample and divide by the sample size. If the diagnosis is positive and the test is negative it is called a false negative (FN). False positive (FP) and true negative (TN) are defined similarly.

Receiver Operating Characteristics The values described are used to calculate different measurements of the quality of the test. The first one is sensitivity, SE, which is the probability of having a positive test among the patients who have a positive diagnosis.

Receiver Operating Characteristics Specificity, SP, is the probability of having a negative test among the patients who have a negative diagnosis.

Receiver Operating Characteristics Example:

Receiver Operating Characteristics Example (cont.):

Receiver Operating Characteristics Overlap in distributions:

Bayes Decision Theory – Discrete Features Assume x = (x xd) takes only m discrete values Components of x are binary or integer valued, x can take only one of m discrete values v 1, v 2, …, v m

Bayes Decision Theory – Discrete Features (Decision Strategy) Maximize class posterior using bayes decision rule P(  1 | x) = P(x |  j ). P (  j )  P(x |  j ). P (  j ) Decide  i if P(  i | x) > P(  i | x) for all i ≠ j or Minimize the overall risk by selecting the action  i =  * : deciding  2  * = arg min R(  i | x)

Bayes Decision Theory – Discrete Features (Independent binary features in 2-category problem)

Bayes Decision Theory – Discrete Features (Independent binary features in 2-category problem) EXAMPLE

APPLICATION EXAMPLE Bit – matrix for machine – printed characters a pixel Here, each pixel may be taken as a feature For above problem, we have is the probabilty that for letter A,B,… AB

Summary To minimize the overall risk, choose the action that minimizes the conditional risk R (α |x). To minimize the probability of error, choose the class that maximizes the posterior probability P (wj |x). If there are different penalties for misclassifying patterns from different classes, the posteriors must be weighted according to such penalties before taking action. Do not forget that these decisions are the optimal ones under the assumption that the “true” values of the probabilities are known.

References R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification, New York: John Wiley, Selim Aksoy, Pattern Recognition Course Materials, M. Narasimha Murty, V. Susheela Devi, Pattern Recognition an Algorithmic Approach, Springer, “Bayes Decision Rule Discrete Features”, access: