236607 Visual Recognition Tutorial1 Bayesian decision making with discrete probabilities – an example Looking at continuous densities Bayesian decision.

Slides:



Advertisements
Similar presentations
Bayes rule, priors and maximum a posteriori
Advertisements

Classification. Introduction A discriminant is a function that separates the examples of different classes. For example – IF (income > Q1 and saving >Q2)
CHAPTER 8 More About Estimation. 8.1 Bayesian Estimation In this chapter we introduce the concepts related to estimation and begin this by considering.
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
Psychology 290 Special Topics Study Course: Advanced Meta-analysis April 7, 2014.
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
2 – In previous chapters: – We could design an optimal classifier if we knew the prior probabilities P(wi) and the class- conditional probabilities P(x|wi)
LECTURE 11: BAYESIAN PARAMETER ESTIMATION
Yazd University, Electrical and Computer Engineering Department Course Title: Machine Learning By: Mohammad Ali Zare Chahooki Bayesian Decision Theory.
Pattern Classification. Chapter 2 (Part 1): Bayesian Decision Theory (Sections ) Introduction Bayesian Decision Theory–Continuous Features.
Bayesian Decision Theory
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
Parameter Estimation using likelihood functions Tutorial #1
Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.
Visual Recognition Tutorial
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Visual Recognition Tutorial1 Bayesian decision making with discrete probabilities – an example Looking at continuous densities Bayesian decision.
Chapter 2: Bayesian Decision Theory (Part 1) Introduction Bayesian Decision Theory–Continuous Features All materials used in this course were taken from.
Visual Recognition Tutorial
Presenting: Assaf Tzabari
Machine Learning CMPT 726 Simon Fraser University
Visual Recognition Tutorial
Computer vision: models, learning and inference
Thanks to Nir Friedman, HU
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Probability, Bayes’ Theorem and the Monty Hall Problem
Computer vision: models, learning and inference Chapter 6 Learning and Inference in Vision.
Principles of Pattern Recognition
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Lecture 2: Bayesian Decision Theory 1. Diagram and formulation
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 02: BAYESIAN DECISION THEORY Objectives: Bayes.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Computational Intelligence: Methods and Applications Lecture 12 Bayesian decisions: foundation of learning Włodzisław Duch Dept. of Informatics, UMK Google:
URL:.../publications/courses/ece_8443/lectures/current/exam/2004/ ECE 8443 – Pattern Recognition LECTURE 15: EXAM NO. 1 (CHAP. 2) Spring 2004 Solutions:
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
Optimal Bayes Classification
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
Probability Course web page: vision.cis.udel.edu/cv March 19, 2003  Lecture 15.
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 07: BAYESIAN ESTIMATION (Cont.) Objectives:
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Conditional Probability Mass Function. Introduction P[A|B] is the probability of an event A, giving that we know that some other event B has occurred.
Chapter 3 DeGroot & Schervish. Functions of a Random Variable the distribution of some function of X suppose X is the rate at which customers are served.
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
STA347 - week 91 Random Vectors and Matrices A random vector is a vector whose elements are random variables. The collective behavior of a p x 1 random.
Univariate Gaussian Case (Cont.)
ELEC 303 – Random Signals Lecture 17 – Hypothesis testing 2 Dr. Farinaz Koushanfar ECE Dept., Rice University Nov 2, 2009.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
CSC321: Lecture 8: The Bayesian way to fit models Geoffrey Hinton.
Objectives: Loss Functions Risk Min. Error Rate Class. Resources: DHS – Chap. 2 (Part 1) DHS – Chap. 2 (Part 2) RGO - Intro to PR MCE for Speech MCE for.
Bayesian Estimation and Confidence Intervals Lecture XXII.
Lecture 2. Bayesian Decision Theory
Lecture 1.31 Criteria for optimal reception of radio signals.
12. Principles of Parameter Estimation
LECTURE 03: DECISION SURFACES
Some Rules for Expectation
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
More about Posterior Distributions
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Pattern Recognition and Machine Learning
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Learning From Observed Data
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
12. Principles of Parameter Estimation
Further Topics on Random Variables: Derived Distributions
Further Topics on Random Variables: Derived Distributions
Mathematical Foundations of BME Reza Shadmehr
Further Topics on Random Variables: Derived Distributions
Presentation transcript:

Visual Recognition Tutorial1 Bayesian decision making with discrete probabilities – an example Looking at continuous densities Bayesian decision making with continuous probabilities – an example The Bayesian Doctor Example Tutorial 3 – the outline

Visual Recognition Tutorial2 Example 1 – checking on a course A student needs to achieve a decision on which courses to take, based only on his first lecture. From his previous experience, he knows: These are prior probabilities. Quality of the course goodfairbad Probability (prior)

Visual Recognition Tutorial3 Example 1 – continued The student also knows the class-conditionals: The loss function is given by the matrix Pr(x|  j ) goodfairbad Interesting lecture Boring lecture (a i |  j ) good coursefair coursebad course Taking the course 0510 Not taking the course 2050

Visual Recognition Tutorial4 The student wants to make an optimal decision The probability to get the “interesting lecture”(x= interesting): Pr(interesting)= Pr(interesting|good course)* Pr(good course) + Pr(interesting|fair course)* Pr(fair course) + Pr(interesting|bad course)* Pr(bad course) =0.8* * *0.4=0.4 Consequently, Pr(boring)=1-0.4=0.6 Suppose the lecture was interesting. Then we want to compute the posterior probabilities of each one of the 3 possible “states of nature”. Example 1 – continued

Visual Recognition Tutorial5 We can get Pr(bad|interesting)=0.1 either by the same method, or by noting that it complementsto 1 the above two. Now, we have all we need for making an intelligent decision about an optimal action Example 1 – continued

Visual Recognition Tutorial6 The student needs to minimize the conditional risk; he can either take the course: R(taking|interesting)= Pr(good|interesting) (taking good course) +Pr(fair|interesting) (taking fair course) +Pr(bad|interesting) (taking bad course) =0.4*0+0.5*5+0.1*10=3.5 or drop it: R(not taking|interesting)= Pr(good|interesting) (not taking good course) +Pr(fair|interesting) (not taking fair course) +Pr(bad|interesting) (not taking bad course) =0.4*20+0.5*5+0.1*0=10.5 Example 1 – conclusion

Visual Recognition Tutorial7 So, if the first lecture was interesting, the student will minimize the conditional risk by taking the course. In order to construct the full decision function, we need to define the risk minimization action for the case of boring lecture, as well. Do it! Constructing an optimal decision function

Visual Recognition Tutorial8 Let X be a real value r.v., representing a number randomly picked from the interval [0,1]; its distribution is known to be uniform. Then let Y be a real r.v. whose value is chosen at random from [0, X] also with uniform distribution. We are presented with the value of Y, and need to “guess” the most “likely” value of X. In a more formal fashion:given the value of Y, find the probability density function p.d.f. of X and determine its maxima. Example 2 – continuous density

Visual Recognition Tutorial9 Let w x denote the “state of nature”, when X=x ; What we look for is P(w x | Y=y) – that is, the p.d.f. The class-conditional (given the value of X): For the given evidence: (using total probability) Example 2 – continued

Visual Recognition Tutorial10 Applying Bayes’ rule: This is monotonically decreasing function, over [y,1]. So (informally) the most “likely” value of X (the one with highest probability density value) is X=y. Example 2 – conclusion

Visual Recognition Tutorial11 Illustration – conditional p.d.f.

Visual Recognition Tutorial12 A manager needs to hire a new secretary, and a good one. Unfortunately, good secretary are hard to find: Pr(w g )=0.2, Pr(w b )=0.8 The manager decides to use a new test. The grade is a real number in the range from 0 to 100. The manager’s estimation of the possible losses: Example 3: hiring a secretary (decision,w i ) wgwg wbwb Hire020 Reject50

Visual Recognition Tutorial13 The class conditional densities are known to be approximated by a normal p.d.f.: Example 3: continued

Visual Recognition Tutorial14 The resulting probability density for the grade looks as follows: p(x)=p( x|w b )p( w b )+ p( x|w g )p( w g ) Example 3: continued

Visual Recognition Tutorial15 We need to know for which grade values hiring the secretary would minimize the risk: The posteriors are given by Example 3: continued

Visual Recognition Tutorial16 The posteriors scaled by the loss differences, and look like: Example 3: continued

Visual Recognition Tutorial17 Numerically, we have: We need to solve Solving numerically yields one solution in [0, 100]: Example 3: continued

Visual Recognition Tutorial18 The Bayesian Doctor Example A person doesn’t feel well and goes to the doctor. Assume two states of nature:  1 : The person has a common flue.  2 : The person is really sick (a vicious bacterial infection). The doctors prior is: This doctor has two possible actions: ``prescribe’’ hot tea or antibiotics. Doctor can use prior and predict optimally: always flue. Therefore doctor will always prescribe hot tea.

Visual Recognition Tutorial19 But there is very high risk: Although this doctor can diagnose with very high rate of success using the prior, (s)he can lose a patient once in a while. Denote the two possible actions: a 1 = prescribe hot tea a 2 = prescribe antibiotics Now assume the following cost (loss) matrix: The Bayesian Doctor - Cntd.

Visual Recognition Tutorial20 Choosing a 1 results in expected risk of Choosing a 2 results in expected risk of So, considering the costs it’s much better (and optimal!) to always give antibiotics. The Bayesian Doctor - Cntd.

Visual Recognition Tutorial21 But doctors can do more. For example, they can take some observations. A reasonable observation is to perform a blood test. Suppose the possible results of the blood test are: x 1 = negative (no bacterial infection) x 2 = positive (infection) But blood tests can often fail. Suppose (Called class conditional probabilities.) The Bayesian Doctor - Cntd.

Visual Recognition Tutorial22 Define the conditional risk given the observation We would like to compute the conditional risk for each action and observation so that the doctor can choose an optimal action that minimizes risk. How can we compute ? We use the class conditional probabilities and Bayes inversion rule. The Bayesian Doctor - Cntd.

Visual Recognition Tutorial23 Let’s calculate first p(x 1 ) and p(x 2 ) p(x 2 ) is complementary to p(x 1 ), so The Bayesian Doctor - Cntd.

Visual Recognition Tutorial24 The Bayesian Doctor - Cntd.

Visual Recognition Tutorial25 The Bayesian Doctor - Cntd.

Visual Recognition Tutorial26 To summarize: Whenever we encounter an observation x, we can minimize the expected loss by minimizing the conditional risk. Makes sense: Doctor chooses hot tea if blood test is negative, and antibiotics otherwise. The Bayesian Doctor - Cntd.

Visual Recognition Tutorial27 Optimal Bayes Decision Strategies A strategy or decision function  (x) is a mapping from observations to actions. The total risk of a decision function is given by A decision function is optimal if it minimizes the total risk. This optimal total risk is called Bayes risk. In the Bayesian doctor example: –Total risk if doctor always gives antibiotics: 0.9 –Bayes risk: 0.48 How have we got it?