Special Topics In Scientific Computing Pattern Recognition & Data Mining Lect2: Bayesian Decision Theory
Ref: Bishop: 1.5 Duda: 2.1-2.2
Decision Theory Consider, for example, a medical diagnosis problem in which we have taken an X-ray image of a patient, and we wish to determine whether the patient has cancer or not input vector x is the set of pixel intensities in the image output variable t will represent the presence of cancer, which we denote by the class C1, or the absence of cancer, which we denote by the class C2. Class C1: t=0 Class C2: t=1 P(X,t) gives us the most complete probabilistic description of the situation
Minimizing the misclassification rate Example: Consider Two class C1 & C2 R1 and R2 are Real Area of C1 & C2 Class respectively Probability of Miss Classification: Good Decision should minimize P(Mistake): We should assign x to C1 if P(x,C1)>P(x,C2)
Applications: portfolio optimization P(x,C1)=P(C1|x)P(x) Optimal Decision: Assign x to C1 if: P(C1|x)>P(C2|x)
General Form: For the more general case of K classes, it is slightly easier to maximize the probability of being correct, which is given by: Optimal: Assign x to Class Ci : i=argmax(P(x,Ck)), k=1,…,K Or i=argmax(P(Ck|x)), k=1,…,K
Minimizing the expected loss For many applications, our objective will be more complex than simply minimizing the number of misclassifications. Consider Medical diagnosis problem: We note that, if a patient who does not have cancer is incorrectly diagnosed as having cancer, the consequences may be some patient distress plus the need for further investigations. Conversely, if a patient with cancer is diagnosed as healthy, the result may be premature death due to lack of treatment. Thus the consequences of these two types of mistake can be dramatically different. It would clearly be better to make fewer mistakes of the second kind, even if this was at the expense of making more mistakes of the first kind.
Minimizing the expected loss :Loss Function Optimal Decision: Minimization of E[L]
Minimization of E[L] Format in Duda book: Minimizing E[L] Minimize R(i | x) for i = 1,…, k Optimal Decision: Assign x to Ck: k=argmin{R(Ci|x)}, i=1,…,K
1 : deciding 1 2 : deciding 2 ik = (i | k) Two-category classification 1 : deciding 1 2 : deciding 2 ik = (i | k) loss incurred for deciding i when the true state of nature is k Conditional risk: R(1 | x) = 11P(1 | x) + 12P(2 | x) R(2 | x) = 21P(1 | x) + 22P(2 | x)
Example Take action 1: “decide 1” Bayes decision rule is stated as: if R(1 | x) < R(2 | x) Take action 1: “decide 1” This results in the equivalent rule: decide 1 if: (21- 11) P(x | 1) P(1) > (12- 22) P(x | 2) P(2) and decide 2 otherwise
Example
Reject option: Reject x if: Max (P(Ck|x)) < t t: Reject parameter
Discriminative Model: logistic regression Decision Approaches: Generative: Discriminative Model: logistic regression
Discriminant Functions Decision Approaches: Discriminant Functions
P(Ck): Priori Probability Optimal Decision: Assign x to C1 if: P(C1|x)>P(C2|x) P(Ck): Priori Probability P(x|Ck): maximum likelihood P(Ck|x): Posterior Probability
Example: From sea bass vs. salmon example to “abstract” decision making problem State of nature; a priori (prior) probability State of nature (which type of fish will be observed next) is unpredictable, so it is a random variable The catch of salmon and sea bass is equiprobable P(1) = P(2) (uniform priors) P(1) + P( 2) = 1 (exclusivity and exhaustively) Prior prob. reflects our prior knowledge about how likely we are to observe a sea bass or salmon; these probabilities may depend on time of the year or the fishing area!
Example Bayes decision rule with only the prior information Decide 1 if P(1) > P(2), otherwise decide 2 Error rate = Min {P(1) , P(2)} Suppose now we have a measurement or feature on the state of nature - say the fish lightness value Use of the class-conditional probability density P(x | 1) and P(x | 2) describe the difference in lightness feature between populations of sea bass and salmon
Maximum likelihood decision rule Assign input pattern x to class 1 if P(x | 1) > P(x | 2), otherwise 2 How does the feature x influence our attitude (prior) concerning the true state of nature? Bayes decision rule
Posteriori probability Posteriori probability, likelihood, evidence P(j , x) = P(j | x)p (x) = p(x | j) P (j) Bayes formula P(j | x) = {p(x | j) . P (j)} / p(x) where Posterior = (Likelihood. Prior) / Evidence
Optimal Bayes decision rule Decide 1 if P(1 | x) > P(2 | x); otherwise decide 2 Special cases: (i) P(1) = P(2); Decide 1 if p(x | 1) > p(x | 2), otherwise 2 (ii) p(x | 1) = p(x | 2); Decide 1 if P(1) > P(2), otherwise 2