Naive Bayes Classifier

Slides:



Advertisements
Similar presentations
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Advertisements

Bayesian Learning Provides practical learning algorithms
Data Mining Classification: Alternative Techniques
Lecture 5 Bayesian Learning
CS 484 – Artificial Intelligence1 Announcements Homework 8 due today, November 13 ½ to 1 page description of final project due Thursday, November 15 Current.
Statistical Learning Methods Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 20 (20.1, 20.2, 20.3, 20.4) Fall 2005.
Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.
Overview Full Bayesian Learning MAP learning
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Statistical Learning: Bayesian and ML COMP155 Sections May 2, 2007.
Review P(h i | d) – probability that the hypothesis is true, given the data (effect  cause) Used by MAP: select the hypothesis that is most likely given.
Introduction to Bayesian Learning Bob Durrant School of Computer Science University of Birmingham (Slides: Dr Ata Kabán)
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Monday, March 6, 2000 William.
1er. Escuela Red ProTIC - Tandil, de Abril, Bayesian Learning 5.1 Introduction –Bayesian learning algorithms calculate explicit probabilities.
Bayesian Learning and Learning Bayesian Networks.
Introduction to Bayesian Learning Ata Kaban School of Computer Science University of Birmingham.
Bayes Classification.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
1 Introduction to Computational Natural Language Learning Linguistics (Under: Topics in Natural Language Processing ) Computer Science (Under:
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
Midterm Review Rao Vemuri 16 Oct Posing a Machine Learning Problem Experience Table – Each row is an instance – Each column is an attribute/feature.
Text Classification, Active/Interactive learning.
Naive Bayes Classifier
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 25 Wednesday, 20 October.
CS464 Introduction to Machine Learning1 Bayesian Learning Features of Bayesian learning methods: Each observed training example can incrementally decrease.
Computing & Information Sciences Kansas State University Wednesday, 22 Oct 2008CIS 530 / 730: Artificial Intelligence Lecture 22 of 42 Wednesday, 22 October.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 28 of 41 Friday, 22 October.
1 Ensembles An ensemble is a set of classifiers whose combined results give the final decision. test feature vector classifier 1classifier 2classifier.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 25 of 41 Monday, 25 October.
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Statistical Inference (By Michael Jordon) l Bayesian perspective –conditional perspective—inferences.
Classification Techniques: Bayesian Classification
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Chapter 6 Bayesian Learning
Review: Probability Random variables, events Axioms of probability Atomic events Joint and marginal probability distributions Conditional probability distributions.
Bayesian Classification
Classification And Bayesian Learning
Bayesian Learning Provides practical learning algorithms
1 Machine Learning: Lecture 6 Bayesian Learning (Based on Chapter 6 of Mitchell T.., Machine Learning, 1997)
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 24 of 41 Monday, 18 October.
Bayesian Learning Bayes Theorem MAP, ML hypotheses MAP learners
Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by.
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP Seminar BCB 713 Module Spring 2011.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
Bayesian Learning Evgueni Smirnov Overview Bayesian Theorem Maximum A Posteriori Hypothesis Naïve Bayes Classifier Learning Text Classifiers.
Chapter 12. Probability Reasoning Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.
Naive Bayes Classifier. REVIEW: Bayesian Methods Our focus this lecture: – Learning and classification methods based on probability theory. Bayes theorem.
1 Ensembles An ensemble is a set of classifiers whose combined results give the final decision. test feature vector classifier 1classifier 2classifier.
1 1)Bayes’ Theorem 2)MAP, ML Hypothesis 3)Bayes optimal & Naïve Bayes classifiers IES 511 Machine Learning Dr. Türker İnce (Lecture notes by Prof. T. M.
Bayesian Classification 1. 2 Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership.
Statistical Learning Methods
Bayesian Learning Reading: Tom Mitchell, “Generative and discriminative classifiers: Naive Bayes and logistic regression”, Sections 1-2. (Linked from.
Essential Probability & Statistics
Class #21 – Tuesday, November 10
Naive Bayes Classifier
Computer Science Department
CS 416 Artificial Intelligence
Lecture 15: Text Classification & Naive Bayes
Bayesian Classification
Data Mining Lecture 11.
Classification Techniques: Bayesian Classification
Prepared by: Mahmoud Rafeek Al-Farra
CSE P573 Applications of Artificial Intelligence Bayesian Learning
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 8 —
Machine Learning: Lecture 6
Machine Learning: UNIT-3 CHAPTER-1
Probabilistic Reasoning
Ensembles An ensemble is a set of classifiers whose combined results give the final decision. test feature vector classifier 1 classifier 2 classifier.
Presentation transcript:

Naive Bayes Classifier Comp3710 Artificial Intelligence Computing Science Thompson Rivers University

Naive Bayes Classifier Course Outline Part I – Introduction to Artificial Intelligence Part II – Classical Artificial Intelligence Part III – Machine Learning Introduction to Machine Learning Neural Networks Probabilistic Reasoning and Bayesian Belief Networks Artificial Life: Learning through Emergent Behavior Part IV – Advanced Topics TRU-COMP3710 Naive Bayes Classifier

Learning Outcomes ... Finding a Maximum a Posteriori (MAP) hypothesis Use of naïve Bayes classifier Use of Bayes optimal classifier … Bayes Classifiers

Unit Outline Maximum a posteriori Naïve Bayes classifier Bayesian Networks

References ... Textbook Bayesian Learning – http://129.252.11.88/talks/bayesianlearning/ Bayes Classifiers

1. Maximum a Posteriori (MAP) View learning as Bayesian updating of a probability distribution over the hypothesis space H is the hypothesis variable, values h1, h2, … E.g., given a positive lab test result, having a cancer (h1) or not having a cancer (h2) ? Another example, my car won’t start. Is the starter bad? Is the fuel pump bad? We just need to know the chance having a cancer is higher than the chance not having a caner, and the chance having a bad starter is higher than the chance having a bad fuel pump. We do not need to compute the probabilities P(cancer | lab_test), P(~cancer | lab_test), P(bad_starter | wont_start), P(bad_fuel_pump | wont_start). Generally want the most probable hypothesis given the training data. Bayes Classifiers

H is the hypothesis variable, values h1, h2, … Generally want the most probable hypothesis given the training data D. The maximum a posteriori (MAP) hypothesis is If we assume that all hypotheses have the same priori probabilities, we can simplify even more and choose the maximum likelihood hypothesis. Bayes Classifiers

P(cancer|+)??? P(cancer|+)??? A patient takes a lab test and the result comes back positive. The test returns a correct positive result in only 98% of the cases in which the disease is actually present, and a correct negative result in only 97% of the cases in which the diseases is not present. Furthermore, .008 of the entire population have this cancer. [Q] If a new patient comes in with a positive test result, what is the chance that he has the cancer? P(cancer|+)??? P(cancer|+)??? [Q] So, among two hypotheses (cancer, cancer) given +, Then? Bayes Classifiers

So, among two hypotheses (cancer, cancer) given +, Topics [Q] If a new patient comes in with a positive test result, what is the chance that he has cancer? So, among two hypotheses (cancer, cancer) given +, Actually we don’t have to compute P(+) to decide hMAP. Bayes Classifiers

2. Naive Bayes Classifier When There is a large set of training examples. The attributes, that describe instances, are conditionally independent given classification. It has been used in many applications such as diagnosis and the classification of text documents. [Q] Why do we not use the k-nearest neighbor algorithm? Large set of training examples -> more computation of distances Not always possible to compute distances, especially when attributes are ordinal and categorical. Bayes Classifiers

[Q] E.g., given (2, 3, 4) in the table, classification? P(A) = ??? P(B) = ??? P(C) = ??? X d1 d2 d3 Class x1 2 3 A x2 4 1 B x3 x4 x5 x6 C x7 x8 x9 x10 x11 x12 x13 x14 x15 Bayes Classifiers

[Q] E.g., given (2, 3, 4) in the table, classification? P(A) = 8 / 15 P(B) = 4 / 15 P(C) = 3 / 15 X d1 d2 d3 Class x1 2 3 A x2 4 1 B x3 x4 x5 x6 C x7 x8 x9 x10 x11 x12 x13 x14 x15 Bayes Classifiers

[Q] E.g., given d = (2, 3, 4) in the table, classification? A vector of data is classified as a single classification. P(ci| d1, …, dn), where d = (d1, …, dn) The classification with the highest posterior probability is chosen. In this case, we are looking for the MAP classification. Since P(d1, …, dn) is a constant, independent of ci, we can eliminate it, and simply aim to find the classification ci, for which the following is maximised: We now assume that all the attributes d1, …, dn are independent so that cMAP can be rewritten as Bayes Classifiers Each attribute value

From the training data, [Q] For example of x1 = (2, 3, 2), Topics Each attribute value From the training data, P(A) = 8/15; P(B) = 4/15; P(C) = 3/15 [Q] For example of x1 = (2, 3, 2), P(A) × P(2|A) × P(3|A) × P(2|A) = 8/15 × 5/8 × 2/8 × 2/8 P(B) × P(2|B) × P(3|B) × P(2|B) = 4/15 × 1/4 × 1/4 × 0/4 P(C) × P(2|C) × P(3|C) × P(2|C) = 3/15 × 1/3 × 2/3 × 0/3 CMAP for x1 = A [Q] For example of y = (2, 3, 4) ??? [Q] For example of y = (4, 3, 2) ??? X d1 d2 d3 Class x1 2 3 A x2 4 1 B x3 x4 x5 x6 C x7 x8 x9 x10 x11 x12 x13 x14 x15 Bayes Classifiers

3. Bayes’ Optimal Classifier Given a new instance y, what is the most probable classification? [Q] For example, given three possible hypotheses with a training set X Two classes: +, – Given a new instance y, What is the most probable classification of y, + or –? Bayes Classifiers

The probability, that the new item of data, y, should be classified with classification cj, is defined by the following: P(cj | hi) means that hi says y is classified to cj with this probability. The optimal classification for y is the classification cj for which P(cj | X) is the highest. Bayes Classifiers

The hypothesis h1 defines y as +, which h2 and h3 define y as –. In our case, c1 = +, c2 = –. The hypothesis h1 defines y as +, which h2 and h3 define y as –. Hence The optimal classification for y is –. Bayes Classifiers

Another Example Suppose there are five kinds of bags of candies: h1: 100% cherry candies h2: 75% cherry candies + 25% lime candies h3: 50% cherry candies + 50% lime candies h4: 25% cherry candies + 75% lime candies h5: 100% lime candies 10%:h1, 20%:h2, 40%:h3, 20%:h4, and 10%:h5. Bayes Classifiers

Then we observe candies drawn from some bag: [Q] MAP hypothesis – What kind of bag is it? [Q] Bayes optimal classification – What flavor will the next candy be? P(l|d) = P(l|h1) P(h1|d) + P(l|h2) P(h2|d) + P(l|h3) P(h3|d) + P(l|h4) P(h4|d) + P(l|h5) P(h5|d) P(c|d) = … P(hi|d) = α P(d|hi)P(hi) = α P(l, l, l, l, l, l, l, l, l, l | hi) P(hi) = α P(l| hi,)10 P(hi) P(h4|d) = α P(l| h4,)10 P(h4) = α * .7510 * .2 Bayes Classifiers

Posterior probability of hypotheses P(h4|d) = P(d1,d2|h4)P(h4) / P(d1,d2) = 0.75 * 0.75 * 0.2 / (0.5 * 0.5) = 0.45 P(hi|d) = α P(d|hi)P(hi) Bayes Classifiers

Prediction probability Bayes Classifiers