Bayesian Learning Evgueni Smirnov Overview Bayesian Theorem Maximum A Posteriori Hypothesis Naïve Bayes Classifier Learning Text Classifiers.

Slides:



Advertisements
Similar presentations
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Advertisements

Bayesian Learning Provides practical learning algorithms
Lecture 5 Bayesian Learning
Text Categorization CSC 575 Intelligent Information Retrieval.
Bayesian Classification
What we will cover here What is a classifier
Bayesian Learning No reading assignment for this topic
Probabilistic inference
Bayes Rule How is this rule derived? Using Bayes rule for probabilistic inference: –P(Cause | Evidence): diagnostic probability –P(Evidence | Cause): causal.
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
2D1431 Machine Learning Bayesian Learning. Outline Bayes theorem Maximum likelihood (ML) hypothesis Maximum a posteriori (MAP) hypothesis Naïve Bayes.
Data Mining with Naïve Bayesian Methods
Introduction to Bayesian Learning Bob Durrant School of Computer Science University of Birmingham (Slides: Dr Ata Kabán)
Review. 2 Statistical modeling  “Opposite” of 1R: use all the attributes  Two assumptions: Attributes are  equally important  statistically independent.
1er. Escuela Red ProTIC - Tandil, de Abril, Bayesian Learning 5.1 Introduction –Bayesian learning algorithms calculate explicit probabilities.
Introduction to Bayesian Learning Ata Kaban School of Computer Science University of Birmingham.
Bayes Classification.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Review: Probability Random variables, events Axioms of probability
NAÏVE BAYES CLASSIFIER 1 ACM Student Chapter, Heritage Institute of Technology 10 th February, 2012 SIGKDD Presentation by Anirban Ghose Parami Roy Sourav.
DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.
Rainbow Tool Kit Matt Perry Global Information Systems Spring 2003.
Simple Bayesian Classifier
Bayesian Networks. Male brain wiring Female brain wiring.
Midterm Review Rao Vemuri 16 Oct Posing a Machine Learning Problem Experience Table – Each row is an instance – Each column is an attribute/feature.
Text Classification, Active/Interactive learning.
Naive Bayes Classifier
ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.
Naïve Bayes Classifier. Bayes Classifier l A probabilistic framework for classification problems l Often appropriate because the world is noisy and also.
Language Modeling Anytime a linguist leaves the group the recognition rate goes up. (Fred Jelinek)
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 25 Wednesday, 20 October.
CS464 Introduction to Machine Learning1 Bayesian Learning Features of Bayesian learning methods: Each observed training example can incrementally decrease.
Computing & Information Sciences Kansas State University Wednesday, 22 Oct 2008CIS 530 / 730: Artificial Intelligence Lecture 22 of 42 Wednesday, 22 October.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 25 of 41 Monday, 25 October.
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
Classification Techniques: Bayesian Classification
Naïve Bayes Classifier. Red = Yellow = Mass = Volume = Apple Sensors, scales, etc… 8/29/03Bayesian Classifier2.
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
Bayesian Classification Using P-tree  Classification –Classification is a process of predicting an – unknown attribute-value in a relation –Given a relation,
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Chapter 6 Bayesian Learning
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.2 Statistical Modeling Rodney Nielsen Many.
Review: Probability Random variables, events Axioms of probability Atomic events Joint and marginal probability distributions Conditional probability distributions.
Bayesian Classification
1 Bayesian Learning. 2 Bayesian Reasoning Basic assumption –The quantities of interest are governed by probability distribution –These probability + observed.
Bayesian Learning Provides practical learning algorithms
Classification Today: Basic Problem Decision Trees.
1 Machine Learning: Lecture 6 Bayesian Learning (Based on Chapter 6 of Mitchell T.., Machine Learning, 1997)
Bayesian Learning Bayes Theorem MAP, ML hypotheses MAP learners
Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by.
Bayesian Learning Reading: C. Haruechaiyasak, “A tutorial on naive Bayes classification” (linked from class website)
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP Seminar BCB 713 Module Spring 2011.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
Bayesian Learning. Probability Bayes Rule Choosing Hypotheses- Maximum a Posteriori Maximum Likelihood - Bayes Concept Learning Maximum Likelihood of.
Naive Bayes Classifier. REVIEW: Bayesian Methods Our focus this lecture: – Learning and classification methods based on probability theory. Bayes theorem.
Bayesian Learning. Uncertainty & Probability Baye's rule Choosing Hypotheses- Maximum a posteriori Maximum Likelihood - Baye's concept learning Maximum.
Bayesian Classification 1. 2 Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership.
Text Classification and Naïve Bayes Formalizing the Naïve Bayes Classifier.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Bayesian Learning Reading: Tom Mitchell, “Generative and discriminative classifiers: Naive Bayes and logistic regression”, Sections 1-2. (Linked from.
Naive Bayes Classifier
Data Science Algorithms: The Basic Methods
Bayes Net Learning: Bayesian Approaches
Data Mining Lecture 11.
Classification Techniques: Bayesian Classification
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 8 —
Machine Learning: Lecture 6
Machine Learning: UNIT-3 CHAPTER-1
Naive Bayes Classifier
Presentation transcript:

Bayesian Learning Evgueni Smirnov

Overview Bayesian Theorem Maximum A Posteriori Hypothesis Naïve Bayes Classifier Learning Text Classifiers

Thomas Bayes ( ) Bayesian theory of probability was set out in His conclusions were accepted by Laplace in 1781, rediscovered by Condorcet, and remained unchallenged until Boole questioned them.

Goal: To determine the posterior probability P(h|D) of hypothesis h given the data D from: Prior probability of h, P(h): it reflects any background knowledge we have about the chance that h is a correct hypothesis (before having observed the data). Prior probability of D, P(D): it reflects the probability that training data D will be observed given no knowledge about which hypothesis h holds. Conditional Probability of observation D, P(D|h): it denotes the probability of observing data D given some world in which hypothesis h holds. Bayes Theorem

§Posterior probability of h, P(h|D): it represents the probability that h holds given the observed training data D. It reflects our confidence that h holds after we have seen the training data D and it is the quantity that Data-mining researchers are interested in. §Bayes Theorem allows us to compute P(h|D): Bayes Theorem

Maximum a Posteriori Hypothesis (MAP) In many learning scenarios, the learner considers a set of hypotheses H and is interested in finding the most probable hypothesis h  H given the observed data D. Any such hypothesis is called maximum a posteriori hypothesis.

Consider a cancer test with two outcomes: positive [+] and negative [-]. The test returns a correct positive result in 98% of the cases in which the disease is actually present, and a correct negative result in 97% of the cases in which the disease is not present. Furthermore,.008 of all people have this cancer. P(cancer) = 0.008P(  cancer) = P([+] | cancer) = 0.98P([-] | cancer) = 0.02 P([+] |  cancer) = 0.03P([-] |  cancer) = 0.97 A patient got a positive test [+]. The maximum a posteriori hypothesis is: P([+] | cancer)P(cancer) = 0.98 x = P([+] |  cancer)P(  cancer) = 0.03 x = Example h MAP =  cancer

Let each instance x of a training set D be described by a conjunction of n attribute values and we have a finite set V of possible classes (concepts). Naïve Bayes Classifier Naïve Bayes assumption is that attributes are conditionally independent!

Example Consider the weather data and we have to classify the instance: The task is to predict the value (yes or no) of the concept PlayTennis. We apply the naïve bayes rule:

Example: Estimating Probabilities Outlook P(sunny|yes) = 2/9P(sunny|no) = 3/5 P(overcast|yes) = 4/9P(overcast|no) = 0 P(rain|yes) = 3/9P(rain|no) = 2/5 Temp P(hot|yes) = 2/9P(hot|no) = 2/5 P(mild|yes) = 4/9P(mild|no) = 2/5 P(cool|yes) = 3/9P(cool|no) = 1/5 Hum P(high|yes) = 3/9P(high|no) = 4/5 P(normal|yes) = 6/9P(normal|no) = 2/5 Windy P(true|yes) = 3/9P(true|no) = 3/5 P(false|yes) = 6/9P(false|no) = 2/5 P(yes) = 9/14 P(no) = 5/14

Example Thus, the naïve Bayes classifier assigns the value no to PlayTennis!

–To estimate the probability P(A=v|C) of an attribute-value A = v for a given class C we use: Relative frequency: n c /n, where n c is the number of training instances that belong to the class C and have value v for the attribute A, and n is the number of training instances of the class C; m-estimate of accuracy: (n c + mp)/(n+m), n c /n, where n c is the number of training instances that belong to the class C and have value v for the attribute A, n is the number of training instances of the class C, p is the prior probablity of P(A=v), and m is the weight of p. Estimating Probabilities

Learning to Classify Text each document is represented by a vector of word numerical attributes w k ; the values of the word attributes w k are the frequencies the words occur in the text. To estimate the probability P(w k | v) we use: where n is the total number of word positions in all the documents (instances) whose target value is v, n k is the number of times word w k is found in these n word positions, and |Vocabulary| is the total number of distinct words found in the training data.

Summary Bayesian methods provide the basis for probabilistic learning methods that use knowledge about the prior probabilities of hypotheses and about the probability of observing data given the hypothesis; Bayesian methods can be used to determine the most probable hypothesis given the data; The naive Bayes classifier is useful in many practical applications.