CISC 4631 Data Mining Lecture 06: Bayes Theorem Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Eamonn Koegh (UC Riverside)

Slides:



Advertisements
Similar presentations
Probability: Review The state of the world is described using random variables Probabilities are defined over events –Sets of world states characterized.
Advertisements

PROBABILITY. Uncertainty  Let action A t = leave for airport t minutes before flight from Logan Airport  Will A t get me there on time ? Problems :
Lecture 5 Bayesian Learning
Uncertainty Everyday reasoning and decision making is based on uncertain evidence and inferences. Classical logic only allows conclusions to be strictly.
FT228/4 Knowledge Based Decision Support Systems
A gentle introduction to the mathematics of biosurveillance: Bayes Rule and Bayes Classifiers Associate Member The RODS Lab University of Pittburgh Carnegie.
A gentle introduction to the mathematics of biosurveillance: Bayes Rule and Bayes Classifiers Associate Member The RODS Lab University of Pittburgh Carnegie.
CS 484 – Artificial Intelligence1 Announcements Homework 8 due today, November 13 ½ to 1 page description of final project due Thursday, November 15 Current.
KETIDAKPASTIAN (UNCERTAINTY) Yeni Herdiyeni – “Uncertainty is defined as the lack of the exact knowledge that would enable.
Data Mining Classification: Naïve Bayes Classifier
CPSC 422 Review Of Probability Theory.
Probability.
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Uncertainty Chapter 13. Uncertainty Let action A t = leave for airport t minutes before flight Will A t get me there on time? Problems: 1.partial observability.
KI2 - 2 Kunstmatige Intelligentie / RuG Probabilities Revisited AIMA, Chapter 13.
1 Bayesian Reasoning Chapter 13 CMSC 471 Adapted from slides by Tim Finin and Marie desJardins.
Ai in game programming it university of copenhagen Welcome to... the Crash Course Probability Theory Marco Loog.
Thanks to Nir Friedman, HU
Bayes Classification.
Probabilistic Prediction Algorithms Jon Radoff Biophysics 101 Fall 2002.
Uncertainty Chapter 13.
For Monday after Spring Break Read Homework: –Chapter 13, exercise 6 and 8 May be done in pairs.
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
CSCI 121 Special Topics: Bayesian Network Lecture #1: Reasoning Under Uncertainty.
Recitation 1 Probability Review
Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.
Aug 25th, 2001Copyright © 2001, Andrew W. Moore Probabilistic and Bayesian Analytics Andrew W. Moore Associate Professor School of Computer Science Carnegie.
CISC 4631 Data Mining Lecture 03: Introduction to classification Linear classifier Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook.
Introduction to Probability Theory March 24, 2015 Credits for slides: Allan, Arms, Mihalcea, Schutze.
Probability Calculations Matt Huenerfauth CSE-391: Artificial Intelligence University of Pennsylvania April 2005.
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 13, 2012.
Digital Statisticians INST 4200 David J Stucki Spring 2015.
Naïve Bayes Classifier. Bayes Classifier l A probabilistic framework for classification problems l Often appropriate because the world is noisy and also.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 25 Wednesday, 20 October.
CSE 446: Point Estimation Winter 2012 Dan Weld Slides adapted from Carlos Guestrin (& Luke Zettlemoyer)
Slide 15-1 Copyright © 2004 Pearson Education, Inc.
CSE PR 1 Reasoning - Rule-based and Probabilistic Representing relations with predicate logic Limitations of predicate logic Representing relations.
Bayesian vs. frequentist inference frequentist: 1) Deductive hypothesis testing of Popper--ruling out alternative explanations Falsification: can prove.
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
Uncertainty Uncertain Knowledge Probability Review Bayes’ Theorem Summary.
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
Chapter 13 February 19, Acting Under Uncertainty Rational Decision – Depends on the relative importance of the goals and the likelihood of.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Lecture notes 9 Bayesian Belief Networks.
Making sense of randomness
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Uncertainty Chapter 13. Outline Uncertainty Probability Syntax and Semantics Inference Independence and Bayes' Rule.
Uncertainty ECE457 Applied Artificial Intelligence Spring 2007 Lecture #8.
Ch15: Decision Theory & Bayesian Inference 15.1: INTRO: We are back to some theoretical statistics: 1.Decision Theory –Make decisions in the presence of.
Uncertainty Let action A t = leave for airport t minutes before flight Will A t get me there on time? Problems: 1.partial observability (road state, other.
CSE 473 Uncertainty. © UW CSE AI Faculty 2 Many Techniques Developed Fuzzy Logic Certainty Factors Non-monotonic logic Probability Only one has stood.
Uncertainty Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.
CS 2750: Machine Learning Probability Review Prof. Adriana Kovashka University of Pittsburgh February 29, 2016.
Anifuddin Azis UNCERTAINTY. 2 Introduction The world is not a well-defined place. There is uncertainty in the facts we know: What’s the temperature? Imprecise.
Weng-Keen Wong, Oregon State University © Bayesian Networks: A Tutorial Weng-Keen Wong School of Electrical Engineering and Computer Science Oregon.
Matching ® ® ® Global Map Local Map … … … obstacle Where am I on the global map?                                   
Bayes’ Theorem Suppose we have estimated prior probabilities for events we are concerned with, and then obtain new information. We would like to a sound.
ECE457 Applied Artificial Intelligence Fall 2007 Lecture #8
Bayesian Reasoning Chapter 13 Thomas Bayes,
Bayes Rule and Bayes Classifiers
Conditional probability
Bayesian Networks: A Tutorial
Where are we in CS 440? Now leaving: sequential, deterministic reasoning Entering: probabilistic reasoning and machine learning.
Uncertainty Chapter 13.
Bayesian Classification
Data Mining: naïve Bayes
Bayesian Reasoning Chapter 13 Thomas Bayes,
Bayesian Reasoning Chapter 13 Thomas Bayes,
Wellcome Trust Centre for Neuroimaging
ECE457 Applied Artificial Intelligence Spring 2008 Lecture #8
basic probability and bayes' rule
Presentation transcript:

CISC 4631 Data Mining Lecture 06: Bayes Theorem Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Eamonn Koegh (UC Riverside) Andrew Moore (CMU/Google) 1

2 Naïve Bayes Classifier We will start off with a visual intuition, before looking at the math… Thomas Bayes

3 Antenna Length Grasshoppers Katydids Abdomen Length Remember this example? Let’s get lots more data…

4 Antenna Length Katydids Grasshoppers With a lot of data, we can build a histogram. Let us just build one for “Antenna Length” for now…

We can leave the histograms as they are, or we can summarize them with two normal distributions. Let us us two normal distributions for ease of visualization in the following slides… 5

p(c j | d) = probability of class c j, given that we have observed d 3 Antennae length is 3 We want to classify an insect we have found. Its antennae are 3 units long. How can we classify it? We can just ask ourselves, give the distributions of antennae lengths we have seen, is it more probable that our insect is a Grasshopper or a Katydid. There is a formal way to discuss the most probable classification… 6

Bayes Classifier A probabilistic framework for classification problems Often appropriate because the world is noisy and also some relationships are probabilistic in nature – Is predicting who will win a baseball game probabilistic in nature? Before getting the heart of the matter, we will go over some basic probability. We will review the concept of reasoning with uncertainty also known as probability – This is a fundamental building block for understanding how Bayesian classifiers work – It’s really going to be worth it – You may find a few of these basic probability questions on your exam – Stop me if you have questions!!!! 7

Discrete Random Variables A is a Boolean-valued random variable if A denotes an event, and there is some degree of uncertainty as to whether A occurs. Examples – A = The next patient you examine is suffering from inhalational anthrax – A = The next patient you examine has a cough – A = There is an active terrorist cell in your city 8

Probabilities We write P(A) as “the fraction of possible worlds in which A is true” We could at this point spend 2 hours on the philosophy of this. But we won’t. 9

Visualizing A Event space of all possible worlds Its area is 1 Worlds in which A is False Worlds in which A is true P(A) = Area of reddish oval 10

The Axioms Of Probability 0 <= P(A) <= 1 P(True) = 1 P(False) = 0 P(A or B) = P(A) + P(B) - P(A and B) The area of A can’t get any smaller than 0 And a zero area would mean no world could ever have A true 11

Interpreting the axioms 0 <= P(A) <= 1 P(True) = 1 P(False) = 0 P(A or B) = P(A) + P(B) - P(A and B) The area of A can’t get any bigger than 1 And an area of 1 would mean all worlds will have A true 12

Interpreting the axioms 0 <= P(A) <= 1 P(True) = 1 P(False) = 0 P(A or B) = P(A) + P(B) - P(A and B) A B 13

A B Interpreting the axioms 0 <= P(A) <= 1 P(True) = 1 P(False) = 0 P(A or B) = P(A) + P(B) - P(A and B) P(A or B) B P(A and B) Simple addition and subtraction 14

Another important theorem 0 <= P(A) <= 1, P(True) = 1, P(False) = 0 P(A or B) = P(A) + P(B) - P(A and B) From these we can prove: P(A) = P(A and B) + P(A and not B) AB 15

Conditional Probability P(A|B) = Fraction of worlds in which B is true that also have A true F H H = “Have a headache” F = “Coming down with Flu” P(H) = 1/10 P(F) = 1/40 P(H|F) = 1/2 “Headaches are rare and flu is rarer, but if you’re coming down with ‘flu there’s a chance you’ll have a headache.” 16

Conditional Probability F H H = “Have a headache” F = “Coming down with Flu” P(H) = 1/10 P(F) = 1/40 P(H|F) = 1/2 P(H|F) = Fraction of flu-inflicted worlds in which you have a headache = #worlds with flu and headache #worlds with flu = Area of “H and F” region Area of “F” region = P(H and F) P(F) 17

Definition of Conditional Probability P(A and B) P(A|B) = P(B) Corollary: The Chain Rule P(A and B) = P(A|B) P(B) 18

Probabilistic Inference F H H = “Have a headache” F = “Coming down with Flu” P(H) = 1/10 P(F) = 1/40 P(H|F) = 1/2 One day you wake up with a headache. You think: “Drat! 50% of flus are associated with headaches so I must have a chance of coming down with flu” Is this reasoning good? 19

Probabilistic Inference F H H = “Have a headache” F = “Coming down with Flu” P(H) = 1/10 P(F) = 1/40 P(H|F) = 1/2 P(F and H) = … P(F|H) = … 20

Probabilistic Inference F H H = “Have a headache” F = “Coming down with Flu” P(H) = 1/10 P(F) = 1/40 P(H|F) = 1/2 21

What we just did… P(A & B) P(A|B) P(B) P(B|A) = = P(A) P(A) This is Bayes Rule Bayes, Thomas (1763) An essay towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society of London, 53:

Some more terminology The Prior Probability is the probability assuming no specific information. – Thus we would refer to P(A) as the prior probability of even A occurring – We would not say that P(A|C) is the prior probability of A occurring The Posterior probability is the probability given that we know something – We would say that P(A|C) is the posterior probability of A (given that C occurs) 23

Example of Bayes Theorem Given: – A doctor knows that meningitis causes stiff neck 50% of the time – Prior probability of any patient having meningitis is 1/50,000 – Prior probability of any patient having stiff neck is 1/20 If a patient has stiff neck, what’s the probability he/she has meningitis? 24

Menu Bad HygieneGood Hygiene Menu You are a health official, deciding whether to investigate a restaurant You lose a dollar if you get it wrong. You win a dollar if you get it right Half of all restaurants have bad hygiene In a bad restaurant, ¾ of the menus are smudged In a good restaurant, 1/3 of the menus are smudged You are allowed to see a randomly chosen menu Another Example of BT 25

26

Menu 27

Bayesian Diagnosis BuzzwordMeaningIn our example Our example’s value True State The true state of the world, which you would like to know Is the restaurant bad? 28

Bayesian Diagnosis BuzzwordMeaningIn our example Our example’s value True State The true state of the world, which you would like to know Is the restaurant bad? Prior Prob(true state = x)P(Bad)1/2 29

Bayesian Diagnosis BuzzwordMeaningIn our example Our example’s value True State The true state of the world, which you would like to know Is the restaurant bad? Prior Prob(true state = x)P(Bad)1/2 Evidence Some symptom, or other thing you can observe Smudge 30

Bayesian Diagnosis BuzzwordMeaningIn our example Our example’s value True State The true state of the world, which you would like to know Is the restaurant bad? Prior Prob(true state = x)P(Bad)1/2 Evidence Some symptom, or other thing you can observe Conditional Probability of seeing evidence if you did know the true state P(Smudge|Bad)3/4 P(Smudge|not Bad)1/3 31

Bayesian Diagnosis BuzzwordMeaningIn our example Our example’s value True State The true state of the world, which you would like to know Is the restaurant bad? Prior Prob(true state = x)P(Bad)1/2 Evidence Some symptom, or other thing you can observe Conditional Probability of seeing evidence if you did know the true state P(Smudge|Bad)3/4 P(Smudge|not Bad)1/3 Posterior The Prob(true state = x | some evidence) P(Bad|Smudge)9/13 32

Bayesian Diagnosis BuzzwordMeaningIn our example Our example’s value True State The true state of the world, which you would like to know Is the restaurant bad? Prior Prob(true state = x)P(Bad)1/2 Evidence Some symptom, or other thing you can observe Conditional Probability of seeing evidence if you did know the true state P(Smudge|Bad)3/4 P(Smudge|not Bad)1/3 Posterior The Prob(true state = x | some evidence) P(Bad|Smudge)9/13 Inference, Diagnosis, Bayesian Reasoning Getting the posterior from the prior and the evidence 33

Bayesian Diagnosis BuzzwordMeaningIn our example Our example’s value True State The true state of the world, which you would like to know Is the restaurant bad? Prior Prob(true state = x)P(Bad)1/2 Evidence Some symptom, or other thing you can observe Conditional Probability of seeing evidence if you did know the true state P(Smudge|Bad)3/4 P(Smudge|not Bad)1/3 Posterior The Prob(true state = x | some evidence) P(Bad|Smudge)9/13 Inference, Diagnosis, Bayesian Reasoning Getting the posterior from the prior and the evidence Decision theory Combining the posterior with known costs in order to decide what to do 34

Why Bayes Theorem at all? Why modeling P(C|A) via P(A|C) Why not model P(C|A) directly? P(A|C)P(C) decomposition allows us to be “sloppy” – P(C) and P(A|C) can be trained independently 35

Crime Scene Analogy A is a crime scene. C is a person who may have committed the crime – P(C|A) - look at the scene - who did it? – P(C) - who had a motive? (Profiler) – P(A|C) - could they have done it? (CSI - transportation, access to weapons, alibi) 36