Presentation is loading. Please wait.

Presentation is loading. Please wait.

Naive Bayes Classifier

Similar presentations


Presentation on theme: "Naive Bayes Classifier"— Presentation transcript:

1 Naive Bayes Classifier
Comp3710 Artificial Intelligence Computing Science Thompson Rivers University

2 Naive Bayes Classifier
Course Outline Part I – Introduction to Artificial Intelligence Part II – Classical Artificial Intelligence Part III – Machine Learning Introduction to Machine Learning Neural Networks Probabilistic Reasoning and Bayesian Belief Networks Artificial Life: Learning through Emergent Behavior Part IV – Advanced Topics TRU-COMP3710 Naive Bayes Classifier

3 Learning Outcomes ... Finding a Maximum a Posteriori (MAP) hypothesis
Use of naïve Bayes classifier Use of Bayes optimal classifier Bayes Classifiers

4 Unit Outline Maximum a posteriori Naïve Bayes classifier
Bayesian Networks

5 References ... Textbook Bayesian Learning – Bayes Classifiers

6 1. Maximum a Posteriori (MAP)
View learning as Bayesian updating of a probability distribution over the hypothesis space H is the hypothesis variable, values h1, h2, … E.g., given a positive lab test result, having a cancer (h1) or not having a cancer (h2) ? Another example, my car won’t start. Is the starter bad? Is the fuel pump bad? We just need to know the chance having a cancer is higher than the chance not having a caner, and the chance having a bad starter is higher than the chance having a bad fuel pump. We do not need to compute the probabilities P(cancer | lab_test), P(~cancer | lab_test), P(bad_starter | wont_start), P(bad_fuel_pump | wont_start). Generally want the most probable hypothesis given the training data. Bayes Classifiers

7 H is the hypothesis variable, values h1, h2, …
Generally want the most probable hypothesis given the training data D. The maximum a posteriori (MAP) hypothesis is If we assume that all hypotheses have the same priori probabilities, we can simplify even more and choose the maximum likelihood hypothesis. Bayes Classifiers

8 P(cancer|+)??? P(cancer|+)???
A patient takes a lab test and the result comes back positive. The test returns a correct positive result in only 98% of the cases in which the disease is actually present, and a correct negative result in only 97% of the cases in which the diseases is not present. Furthermore, .008 of the entire population have this cancer. [Q] If a new patient comes in with a positive test result, what is the chance that he has the cancer? P(cancer|+)??? P(cancer|+)??? [Q] So, among two hypotheses (cancer, cancer) given +, Then? Bayes Classifiers

9 So, among two hypotheses (cancer, cancer) given +,
Topics [Q] If a new patient comes in with a positive test result, what is the chance that he has cancer? So, among two hypotheses (cancer, cancer) given +, Actually we don’t have to compute P(+) to decide hMAP. Bayes Classifiers

10 2. Naive Bayes Classifier
When There is a large set of training examples. The attributes, that describe instances, are conditionally independent given classification. It has been used in many applications such as diagnosis and the classification of text documents. [Q] Why do we not use the k-nearest neighbor algorithm? Large set of training examples -> more computation of distances Not always possible to compute distances, especially when attributes are ordinal and categorical. Bayes Classifiers

11 [Q] E.g., given (2, 3, 4) in the table, classification? P(A) = ???
P(B) = ??? P(C) = ??? X d1 d2 d3 Class x1 2 3 A x2 4 1 B x3 x4 x5 x6 C x7 x8 x9 x10 x11 x12 x13 x14 x15 Bayes Classifiers

12 [Q] E.g., given (2, 3, 4) in the table, classification? P(A) = 8 / 15
P(B) = 4 / 15 P(C) = 3 / 15 X d1 d2 d3 Class x1 2 3 A x2 4 1 B x3 x4 x5 x6 C x7 x8 x9 x10 x11 x12 x13 x14 x15 Bayes Classifiers

13 [Q] E.g., given d = (2, 3, 4) in the table, classification?
A vector of data is classified as a single classification. P(ci| d1, …, dn), where d = (d1, …, dn) The classification with the highest posterior probability is chosen. In this case, we are looking for the MAP classification. Since P(d1, …, dn) is a constant, independent of ci, we can eliminate it, and simply aim to find the classification ci, for which the following is maximised: We now assume that all the attributes d1, …, dn are independent so that cMAP can be rewritten as Bayes Classifiers Each attribute value

14 From the training data, [Q] For example of x1 = (2, 3, 2),
Topics Each attribute value From the training data, P(A) = 8/15; P(B) = 4/15; P(C) = 3/15 [Q] For example of x1 = (2, 3, 2), P(A) × P(2|A) × P(3|A) × P(2|A) = 8/15 × 5/8 × 2/8 × 2/8 P(B) × P(2|B) × P(3|B) × P(2|B) = 4/15 × 1/4 × 1/4 × 0/4 P(C) × P(2|C) × P(3|C) × P(2|C) = 3/15 × 1/3 × 2/3 × 0/3 CMAP for x1 = A [Q] For example of y = (2, 3, 4) ??? [Q] For example of y = (4, 3, 2) ??? X d1 d2 d3 Class x1 2 3 A x2 4 1 B x3 x4 x5 x6 C x7 x8 x9 x10 x11 x12 x13 x14 x15 Bayes Classifiers

15 3. Bayes’ Optimal Classifier
Given a new instance y, what is the most probable classification? [Q] For example, given three possible hypotheses with a training set X Two classes: +, – Given a new instance y, What is the most probable classification of y, + or –? Bayes Classifiers

16 The probability, that the new item of data, y, should be classified with classification cj, is defined by the following: P(cj | hi) means that hi says y is classified to cj with this probability. The optimal classification for y is the classification cj for which P(cj | X) is the highest. Bayes Classifiers

17 The hypothesis h1 defines y as +, which h2 and h3 define y as –.
In our case, c1 = +, c2 = –. The hypothesis h1 defines y as +, which h2 and h3 define y as –. Hence The optimal classification for y is –. Bayes Classifiers

18 Another Example Suppose there are five kinds of bags of candies:
h1: 100% cherry candies h2: 75% cherry candies + 25% lime candies h3: 50% cherry candies + 50% lime candies h4: 25% cherry candies + 75% lime candies h5: 100% lime candies 10%:h1, 20%:h2, 40%:h3, 20%:h4, and 10%:h5. Bayes Classifiers

19 Then we observe candies drawn from some bag:
[Q] MAP hypothesis – What kind of bag is it? [Q] Bayes optimal classification – What flavor will the next candy be? P(l|d) = P(l|h1) P(h1|d) + P(l|h2) P(h2|d) + P(l|h3) P(h3|d) + P(l|h4) P(h4|d) + P(l|h5) P(h5|d) P(c|d) = … P(hi|d) = α P(d|hi)P(hi) = α P(l, l, l, l, l, l, l, l, l, l | hi) P(hi) = α P(l| hi,)10 P(hi) P(h4|d) = α P(l| h4,)10 P(h4) = α * * .2 Bayes Classifiers

20 Posterior probability of hypotheses
P(h4|d) = P(d1,d2|h4)P(h4) / P(d1,d2) = 0.75 * 0.75 * 0.2 / (0.5 * 0.5) = 0.45 P(hi|d) = α P(d|hi)P(hi) Bayes Classifiers

21 Prediction probability
Bayes Classifiers


Download ppt "Naive Bayes Classifier"

Similar presentations


Ads by Google