1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.

Slides:



Advertisements
Similar presentations
Bayes rule, priors and maximum a posteriori
Advertisements

Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Probability: Review The state of the world is described using random variables Probabilities are defined over events –Sets of world states characterized.
Bayes Rule The product rule gives us two ways to factor a joint probability: Therefore, Why is this useful? –Can get diagnostic probability P(Cavity |
Psychology 290 Special Topics Study Course: Advanced Meta-analysis April 7, 2014.
LECTURE 11: BAYESIAN PARAMETER ESTIMATION
Uncertainty Everyday reasoning and decision making is based on uncertain evidence and inferences. Classical logic only allows conclusions to be strictly.
CS 484 – Artificial Intelligence1 Announcements Homework 8 due today, November 13 ½ to 1 page description of final project due Thursday, November 15 Current.
Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.
Data Mining Classification: Naïve Bayes Classifier
Probabilistic inference
Statistical Learning: Bayesian and ML COMP155 Sections May 2, 2007.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
Bayesian Belief Networks
Introduction to Bayesian Learning Bob Durrant School of Computer Science University of Birmingham (Slides: Dr Ata Kabán)
CSE (c) S. Tanimoto, 2008 Bayes Nets 1 Probabilistic Reasoning With Bayes’ Rule Outline: Motivation Generalizing Modus Ponens Bayes’ Rule Applying.
1er. Escuela Red ProTIC - Tandil, de Abril, Bayesian Learning 5.1 Introduction –Bayesian learning algorithms calculate explicit probabilities.
Introduction to Bayesian Learning Ata Kaban School of Computer Science University of Birmingham.
Naïve Bayes Classification Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 14, 2014.
Bayes Classification.
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
CSc411Artificial Intelligence1 Chapter 5 STOCHASTIC METHODS Contents The Elements of Counting Elements of Probability Theory Applications of the Stochastic.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Does Naïve Bayes always work?
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
Review: Probability Random variables, events Axioms of probability
Rule Generation [Chapter ]
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Bayesian Networks. Male brain wiring Female brain wiring.
Midterm Review Rao Vemuri 16 Oct Posing a Machine Learning Problem Experience Table – Each row is an instance – Each column is an attribute/feature.
Text Classification, Active/Interactive learning.
Naive Bayes Classifier
Bayesian Networks Martin Bachler MLA - VO
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 25 Wednesday, 20 October.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
1 Reasoning Under Uncertainty Artificial Intelligence Chapter 9.
Computing & Information Sciences Kansas State University Wednesday, 22 Oct 2008CIS 530 / 730: Artificial Intelligence Lecture 22 of 42 Wednesday, 22 October.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 25 of 41 Monday, 25 October.
Bayesian Classifier. 2 Review: Decision Tree Age? Student? Credit? fair excellent >40 31…40
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
Classification Techniques: Bayesian Classification
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Chapter 6 Bayesian Learning
Review: Probability Random variables, events Axioms of probability Atomic events Joint and marginal probability distributions Conditional probability distributions.
Bayesian Classification
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
1 Machine Learning: Lecture 6 Bayesian Learning (Based on Chapter 6 of Mitchell T.., Machine Learning, 1997)
Bayesian Learning Bayes Theorem MAP, ML hypotheses MAP learners
Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by.
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP Seminar BCB 713 Module Spring 2011.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
CS Ensembles and Bayes1 Ensembles, Model Combination and Bayesian Combination.
Naive Bayes Classifier. REVIEW: Bayesian Methods Our focus this lecture: – Learning and classification methods based on probability theory. Bayes theorem.
1 1)Bayes’ Theorem 2)MAP, ML Hypothesis 3)Bayes optimal & Naïve Bayes classifiers IES 511 Machine Learning Dr. Türker İnce (Lecture notes by Prof. T. M.
Bayesian Classification 1. 2 Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership.
Does Naïve Bayes always work?
Naive Bayes Classifier
Data Science Algorithms: The Basic Methods
Data Mining Lecture 11.
Classification Techniques: Bayesian Classification
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 8 —
Class #16 – Tuesday, October 26
LECTURE 07: BAYESIAN ESTIMATION
Machine Learning: Lecture 6
Machine Learning: UNIT-3 CHAPTER-1
Naive Bayes Classifier
Mathematical Foundations of BME Reza Shadmehr
Presentation transcript:

1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks

2 Chapter 12 Contents l Probabilistic Reasoning l Joint Probability Distributions l Bayes’ Theorem l Simple Bayesian Concept Learning l Bayesian Belief Networks l The Noisy-V Function l Bayes’ Optimal Classifier l The Naïve Bayes Classifier l Collaborative Filtering

3 Probabilistic Reasoning l Probabilities are expressed in a notation similar to that of predicates in FOPC: nP(S) = 0.5 nP(T) = 1 nP(¬(A Λ B) V C) = 0.2 l 1 = certain; 0 = certainly not

4 Conditional Probability l Conditional probability refers to the probability of one thing given that we already know another to be true: l This states the probability of B, given A.

5 Joint Probability Distributions l A joint probability distribution represents the combined probabilities of two or more variables. l This table shows, for example, that P (A Λ B) = 0.11 P (¬A Λ B) = 0.09 l Using this, we can calculate P(A): P(A) = P(A Λ B) + P(A Λ ¬B) = = 0.74

6 Bayes’ Theorem l Bayes’ theorem lets us calculate a conditional probability: l P(B) is the prior probability of B. l P(B | A) is the posterior probability of B.

7 Simple Bayesian Concept Learning (1) l P (H|E) is used to represent the probability that some hypothesis, H, is true, given evidence E. l Let us suppose we have a set of hypotheses H 1 …H n. l For each H i l Hence, given a piece of evidence, a learner can determine which is the most likely explanation by finding the hypothesis that has the highest posterior probability.

8 Simple Bayesian Concept Learning (2) l In fact, this can be simplified. Since P(E) is independent of H i it will have the same value for each hypothesis. l Hence, it can be ignored, and we can find the hypothesis with the highest value of: l We can simplify this further if all the hypotheses are equally likely, in which case we simply seek the hypothesis with the highest value of P(E|H i ). This is the likelihood of E given H i.

9 Bayesian Belief Networks (1) l A belief network shows the dependencies between a group of variables. l If two variables A and B are independent if the likelihood that A will occur has nothing to do with whether B occurs. l C and D are dependent on A; D and E are dependent on B. The Bayesian belief network has probabilities associated with each link. E.g., P(C|A) = 0.2, P(C|¬A) = 0.4

10 Bayesian Belief Networks (2) l A complete set of probabilities for this belief network might be: n P(A) = 0.1 n P(B) = 0.7 n P(C|A) = 0.2 n P(C|¬A) = 0.4 n P(D|A Λ B) = 0.5 n P(D|A Λ ¬B) = 0.4 n P(D|¬A Λ B) = 0.2 n P(D|¬A Λ ¬B) = n P(E|B) = 0.2 n P(E|¬B) = 0.1

11 Bayesian Belief Networks (3) l We can now calculate conditional probabilities: l In fact, we can simplify this, since there are no dependencies between certain pairs of variables – between E and A, for example. Hence:

12 Bayes’ Optimal Classifier l A system that uses Bayes’ theory to classify data. l We have a piece of data y, and are seeking the correct hypothesis from H 1 … H 5, each of which assigns a classification to y. l The probability that y should be classified as c j is: l x 1 to x n are the training data, and m is the number of hypotheses. l This method provides the best possible classification for a piece of data.

13 The Naïve Bayes Classifier (1) l A vector of data is classified as a single classification. p(c i | d 1, …, d n ) l The classification with the highest posterior probability is chosen. l The hypothesis which has the highest posterior probability is the maximum a posteriori, or MAP hypothesis. l In this case, we are looking for the MAP classification. l Bayes’ theorem is used to find the posterior probability:

14 The Naïve Bayes Classifier (2) l since P(d 1, …, d n ) is a constant, independent of c i, we can eliminate it, and simply aim to find the classification c i, for which the following is maximised: l We now assume that all the attributes d 1, …, d n are independent l So P(d 1, …, d n |c i ) can be rewritten as: l The classification for which this is highest is chosen to classify the data.

15 Collaborative Filtering l A method that uses Bayesian reasoning to suggest items that a person might be interested in, based on their known interests. l if we know that Anne and Bob both like A, B and C, and that Anne likes D then we guess that Bob would also like D. l Can be calculated using decision trees: