Special Topics In Scientific Computing

Slides:



Advertisements
Similar presentations
Classification. Introduction A discriminant is a function that separates the examples of different classes. For example – IF (income > Q1 and saving >Q2)
Advertisements

Pattern Recognition and Machine Learning
2 – In previous chapters: – We could design an optimal classifier if we knew the prior probabilities P(wi) and the class- conditional probabilities P(x|wi)
Yazd University, Electrical and Computer Engineering Department Course Title: Machine Learning By: Mohammad Ali Zare Chahooki Bayesian Decision Theory.
Chapter 2: Bayesian Decision Theory (Part 2) Minimum-Error-Rate Classification Classifiers, Discriminant Functions and Decision Surfaces The Normal Density.
Pattern Classification, Chapter 2 (Part 2) 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R.
Pattern Classification. Chapter 2 (Part 1): Bayesian Decision Theory (Sections ) Introduction Bayesian Decision Theory–Continuous Features.
Pattern Classification, Chapter 2 (Part 2) 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R.
Chapter 2: Bayesian Decision Theory (Part 2) Minimum-Error-Rate Classification Classifiers, Discriminant Functions and Decision Surfaces The Normal Density.
Bayesian Decision Theory
Pattern Classification Chapter 2 (Part 2)0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O.
Chapter 4: Linear Models for Classification
Laboratory for Social & Neural Systems Research (SNS) PATTERN RECOGNITION AND MACHINE LEARNING Institute of Empirical Research in Economics (IEW)
Bayes Rule How is this rule derived? Using Bayes rule for probabilistic inference: –P(Cause | Evidence): diagnostic probability –P(Evidence | Cause): causal.
Bayesian Decision Theory Chapter 2 (Duda et al.) – Sections
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Visual Recognition Tutorial1 Bayesian decision making with discrete probabilities – an example Looking at continuous densities Bayesian decision.
Chapter 2: Bayesian Decision Theory (Part 1) Introduction Bayesian Decision Theory–Continuous Features All materials used in this course were taken from.
Machine Learning CMPT 726 Simon Fraser University
Introduction to Bayesian Learning Ata Kaban School of Computer Science University of Birmingham.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Crash Course on Machine Learning
Bayesian Decision Theory Making Decisions Under uncertainty 1.
METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.
Pattern Recognition: Baysian Decision Theory Charles Tappert Seidenberg School of CSIS, Pace University.
Principles of Pattern Recognition
Lecture 2: Bayesian Decision Theory 1. Diagram and formulation
Classification. An Example (from Pattern Classification by Duda & Hart & Stork – Second Edition, 2001)
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 02: BAYESIAN DECISION THEORY Objectives: Bayes.
Naive Bayes Classifier
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Optimal Bayes Classification
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Ch15: Decision Theory & Bayesian Inference 15.1: INTRO: We are back to some theoretical statistics: 1.Decision Theory –Make decisions in the presence of.
Bayesian Decision Theory Basic Concepts Discriminant Functions The Normal Density ROC Curves.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Introduction to Pattern Recognition (การรู้จํารูปแบบเบื้องต้น)
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
Intro. ANN & Fuzzy Systems Lecture 15. Pattern Classification (I): Statistical Formulation.
Basic Technical Concepts in Machine Learning Introduction Supervised learning Problems in supervised learning Bayesian decision theory.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Objectives: Loss Functions Risk Min. Error Rate Class. Resources: DHS – Chap. 2 (Part 1) DHS – Chap. 2 (Part 2) RGO - Intro to PR MCE for Speech MCE for.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Lecture 2. Bayesian Decision Theory
Basic Technical Concepts in Machine Learning
Lecture 1.31 Criteria for optimal reception of radio signals.
CS479/679 Pattern Recognition Dr. George Bebis
Lecture 15. Pattern Classification (I): Statistical Formulation
LECTURE 03: DECISION SURFACES
CS668: Pattern Recognition Ch 1: Introduction
Special Topics In Scientific Computing
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
Pattern Recognition PhD Course.
Lecture 26: Faces and probabilities
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Advanced Pattern Recognition
INTRODUCTION TO Machine Learning 3rd Edition
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Recognition and Machine Learning
LECTURE 23: INFORMATION THEORY REVIEW
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Mathematical Foundations of BME Reza Shadmehr
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
Presentation transcript:

Special Topics In Scientific Computing Pattern Recognition & Data Mining Lect2: Bayesian Decision Theory

Ref: Bishop: 1.5 Duda: 2.1-2.2

Decision Theory Consider, for example, a medical diagnosis problem in which we have taken an X-ray image of a patient, and we wish to determine whether the patient has cancer or not input vector x is the set of pixel intensities in the image output variable t will represent the presence of cancer, which we denote by the class C1, or the absence of cancer, which we denote by the class C2. Class C1: t=0 Class C2: t=1 P(X,t) gives us the most complete probabilistic description of the situation

Minimizing the misclassification rate Example: Consider Two class C1 & C2 R1 and R2 are Real Area of C1 & C2 Class respectively Probability of Miss Classification: Good Decision should minimize P(Mistake): We should assign x to C1 if P(x,C1)>P(x,C2)

Applications: portfolio optimization P(x,C1)=P(C1|x)P(x) Optimal Decision: Assign x to C1 if: P(C1|x)>P(C2|x)

General Form: For the more general case of K classes, it is slightly easier to maximize the probability of being correct, which is given by: Optimal: Assign x to Class Ci : i=argmax(P(x,Ck)), k=1,…,K Or i=argmax(P(Ck|x)), k=1,…,K

Minimizing the expected loss For many applications, our objective will be more complex than simply minimizing the number of misclassifications. Consider Medical diagnosis problem: We note that, if a patient who does not have cancer is incorrectly diagnosed as having cancer, the consequences may be some patient distress plus the need for further investigations. Conversely, if a patient with cancer is diagnosed as healthy, the result may be premature death due to lack of treatment. Thus the consequences of these two types of mistake can be dramatically different. It would clearly be better to make fewer mistakes of the second kind, even if this was at the expense of making more mistakes of the first kind.

Minimizing the expected loss :Loss Function Optimal Decision: Minimization of E[L]

Minimization of E[L] Format in Duda book: Minimizing E[L] Minimize R(i | x) for i = 1,…, k Optimal Decision: Assign x to Ck: k=argmin{R(Ci|x)}, i=1,…,K

1 : deciding 1 2 : deciding 2 ik = (i | k) Two-category classification 1 : deciding 1 2 : deciding 2 ik = (i | k) loss incurred for deciding i when the true state of nature is k Conditional risk: R(1 | x) = 11P(1 | x) + 12P(2 | x) R(2 | x) = 21P(1 | x) + 22P(2 | x)

Example Take action 1: “decide 1” Bayes decision rule is stated as: if R(1 | x) < R(2 | x) Take action 1: “decide 1” This results in the equivalent rule: decide 1 if: (21- 11) P(x | 1) P(1) > (12- 22) P(x | 2) P(2) and decide 2 otherwise

Example

Reject option: Reject x if: Max (P(Ck|x)) < t t: Reject parameter

Discriminative Model: logistic regression Decision Approaches: Generative: Discriminative Model: logistic regression

Discriminant Functions Decision Approaches: Discriminant Functions

P(Ck): Priori Probability Optimal Decision: Assign x to C1 if: P(C1|x)>P(C2|x) P(Ck): Priori Probability P(x|Ck): maximum likelihood P(Ck|x): Posterior Probability

Example: From sea bass vs. salmon example to “abstract” decision making problem State of nature; a priori (prior) probability State of nature (which type of fish will be observed next) is unpredictable, so it is a random variable The catch of salmon and sea bass is equiprobable P(1) = P(2) (uniform priors) P(1) + P( 2) = 1 (exclusivity and exhaustively) Prior prob. reflects our prior knowledge about how likely we are to observe a sea bass or salmon; these probabilities may depend on time of the year or the fishing area!

Example Bayes decision rule with only the prior information Decide 1 if P(1) > P(2), otherwise decide 2 Error rate = Min {P(1) , P(2)} Suppose now we have a measurement or feature on the state of nature - say the fish lightness value Use of the class-conditional probability density P(x | 1) and P(x | 2) describe the difference in lightness feature between populations of sea bass and salmon

Maximum likelihood decision rule Assign input pattern x to class 1 if P(x | 1) > P(x | 2), otherwise 2 How does the feature x influence our attitude (prior) concerning the true state of nature? Bayes decision rule

Posteriori probability Posteriori probability, likelihood, evidence P(j , x) = P(j | x)p (x) = p(x | j) P (j) Bayes formula P(j | x) = {p(x | j) . P (j)} / p(x) where Posterior = (Likelihood. Prior) / Evidence

Optimal Bayes decision rule Decide 1 if P(1 | x) > P(2 | x); otherwise decide 2 Special cases: (i) P(1) = P(2); Decide 1 if p(x | 1) > p(x | 2), otherwise 2 (ii) p(x | 1) = p(x | 2); Decide 1 if P(1) > P(2), otherwise 2