COMP 2208 Dr. Long Tran-Thanh University of Southampton Bayes’ Theorem, Bayesian Reasoning, and Bayesian Networks.

Slides:



Advertisements
Similar presentations
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Advertisements

BAYESIAN NETWORKS. Bayesian Network Motivation  We want a representation and reasoning system that is based on conditional independence  Compact yet.
Psychology 290 Special Topics Study Course: Advanced Meta-analysis April 7, 2014.
Intro to Bayesian Learning Exercise Solutions Ata Kaban The University of Birmingham 2005.
5/17/20151 Probabilistic Reasoning CIS 479/579 Bruce R. Maxim UM-Dearborn.
Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Bayesian Networks. Motivation The conditional independence assumption made by naïve Bayes classifiers may seem to rigid, especially for classification.
CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 12 Jim Martin.
Bayesian Belief Networks
Representing Uncertainty CSE 473. © Daniel S. Weld 2 Many Techniques Developed Fuzzy Logic Certainty Factors Non-monotonic logic Probability Only one.
CSE (c) S. Tanimoto, 2008 Bayes Nets 1 Probabilistic Reasoning With Bayes’ Rule Outline: Motivation Generalizing Modus Ponens Bayes’ Rule Applying.
I The meaning of chance Axiomatization. E Plurbus Unum.
. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.
Learning Goal 13: Probability Use the basic laws of probability by finding the probabilities of mutually exclusive events. Find the probabilities of dependent.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Probability and Statistics Review Thursday Sep 11.
Lecture 9: p-value functions and intro to Bayesian thinking Matthew Fox Advanced Epidemiology.
Probability, Bayes’ Theorem and the Monty Hall Problem
Soft Computing Lecture 17 Introduction to probabilistic reasoning. Bayesian nets. Markov models.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 26 of 41 Friday, 22 October.
Bayesian Learning By Porchelvi Vijayakumar. Cognitive Science Current Problem: How do children learn and how do they get it right?
Dr. Gary Blau, Sean HanMonday, Aug 13, 2007 Statistical Design of Experiments SECTION I Probability Theory Review.
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 13, 2012.
Bayesian networks. Motivation We saw that the full joint probability can be used to answer any question about the domain, but can become intractable as.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 25 Wednesday, 20 October.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
Probabilistic Reasoning ECE457 Applied Artificial Intelligence Spring 2007 Lecture #9.
Bayesian Statistics and Belief Networks. Overview Book: Ch 13,14 Refresher on Probability Bayesian classifiers Belief Networks / Bayesian Networks.
Introduction to Bayesian Networks
An Introduction to Artificial Intelligence Chapter 13 & : Uncertainty & Bayesian Networks Ramin Halavati
1 Reasoning Under Uncertainty Artificial Intelligence Chapter 9.
Computing & Information Sciences Kansas State University Wednesday, 22 Oct 2008CIS 530 / 730: Artificial Intelligence Lecture 22 of 42 Wednesday, 22 October.
Likelihood function and Bayes Theorem In simplest case P(B|A) = P(A|B) P(B)/P(A) and we consider the likelihood function in which we view the conditional.
Uncertainty Uncertain Knowledge Probability Review Bayes’ Theorem Summary.
Bayesian statistics Probabilities for everything.
Reasoning Under Uncertainty: Conditioning, Bayes Rule & the Chain Rule Jim Little Uncertainty 2 Nov 3, 2014 Textbook §6.1.3.
Topic 2: Intro to probability CEE 11 Spring 2002 Dr. Amelia Regan These notes draw liberally from the class text, Probability and Statistics for Engineering.
Uncertainty. Assumptions Inherent in Deductive Logic-based Systems All the assertions we wish to make and use are universally true. Observations of the.
Probability Course web page: vision.cis.udel.edu/cv March 19, 2003  Lecture 15.
Making sense of randomness
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Reasoning Under Uncertainty. 2 Objectives Learn the meaning of uncertainty and explore some theories designed to deal with it Find out what types of errors.
4 Proposed Research Projects SmartHome – Encouraging patients with mild cognitive disabilities to use digital memory notebook for activities of daily living.
Textbook Basics of an Expert System: – “Expert systems: Design and Development,” by: John Durkin, 1994, Chapters 1-4. Uncertainty (Probability, Certainty.
COMP 2208 Dr. Long Tran-Thanh University of Southampton Decision Trees.
Bayesian Learning Bayes Theorem MAP, ML hypotheses MAP learners
Conditional Probability, Bayes’ Theorem, and Belief Networks CISC 2315 Discrete Structures Spring2010 Professor William G. Tanner, Jr.
CSE 473 Uncertainty. © UW CSE AI Faculty 2 Many Techniques Developed Fuzzy Logic Certainty Factors Non-monotonic logic Probability Only one has stood.
Statistical NLP: Lecture 4 Mathematical Foundations I: Probability Theory (Ch2)
CHAPTER 3: BAYESIAN DECISION THEORY. Making Decision Under Uncertainty Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
COMP 2208 Dr. Long Tran-Thanh University of Southampton Revision.
Matching ® ® ® Global Map Local Map … … … obstacle Where am I on the global map?                                   
Naive Bayes Classifier. REVIEW: Bayesian Methods Our focus this lecture: – Learning and classification methods based on probability theory. Bayes theorem.
Review of Probability.
Qian Liu CSE spring University of Pennsylvania
Reasoning Under Uncertainty in Expert System
Where are we in CS 440? Now leaving: sequential, deterministic reasoning Entering: probabilistic reasoning and machine learning.
Graphical Models in Brief
Reasoning Under Uncertainty: Conditioning, Bayes Rule & Chain Rule
Conditional Probability, Bayes’ Theorem, and Belief Networks
Bayesian Statistics and Belief Networks
Lecture 2: Probability.
CS 188: Artificial Intelligence Spring 2007
CS 594: Empirical Methods in HCC Introduction to Bayesian Analysis
Probabilistic Reasoning With Bayes’ Rule
Chapter 14 February 26, 2004.
basic probability and bayes' rule
Presentation transcript:

COMP 2208 Dr. Long Tran-Thanh University of Southampton Bayes’ Theorem, Bayesian Reasoning, and Bayesian Networks

Classification Environment Perception Behaviour Categorize inputs Update belief model Update decision making policy Decision making Perception Behaviour

Reasoning Environment Perception Behaviour Categorize inputs Update belief model Update decision making policy Decision making Perception Behaviour

Reasoning Logic /Rule based build up basic rules (axioms) using some form of logic Other rules (reasoning) can be derived from the above Functional or declarative programming (LISP, ML, Prolog, etc…) Stochastic reasoning Frequentist (non-Bayesian) Bayesian Some bridging efforts: E.g., Markov logic (see, e.g., Pedro Domingos)

The right way to do reasoning? Debate 1: logic based vs. stochastic E.g., Noam Chomsky vs. peter Norvig Debate 2: frequentist vs. Bayesian Many vs. many Today we talk about Bayesian (because it’s simple to understand and elegant)

The Bayesian way Bayes’ Theorem Bayesian belief update Inference in Bayesian networks

Some probability theory Space of all possible world models = area equal to 1

Some probability theory Probability of event A = fraction of worlds in which A happens A A What does it mean that P(A) = 0.2? What does it mean that P(A) = 0? or = 1?

Some probability theory Probability of A not happening = complement of P(A) A A

Some probability theory A A B B

Basic axioms in probability theory Domain of the probability value: Constants: Connection of AND and OR:

Conditional probability A A B B Only consider worlds in which A happens -> new space of worlds B|A Consider worlds in which B happens, but only within the new space P(B|A) = fraction of worlds with B within the new space

Conditional probability A A B B B|A

Conditional probability We have: Chain rule: Law of total probability:

Bayes’ Theorem (Bayes’ rule) Use chain rule twice for P(A and B): The right hand sides must be the same! Bayes’ rule:

The beauty of Bayes’ Theorem A = evidence (observation); B = hypothesis Prior: captures our prior knowledge/belief Likelihood: how likely to observe the evidence, if the hypothesis was true P(evidence) = probability of observing the evidence in general (aggregated over all possible hypotheses) We update our belief after observing some evidences

Example: the Monty Hall problem The game: At the beginning, all doors are closed The prize is behind 1 door (with equal probability) You choose 1 door (say Door 1) The host opens a door (say Door 2) which has a goat behind it Let’s make a deal: would you swap your choice to Door 3?

Solution of the Monty Hall problem Your choice: Door 1; Offer: choose Door 3 instead What are the chances of each option for winning (getting the prize)? A = hypothesis: prize behind Door 1 B = host chooses Door 2 to open (and we see a goat) Bayes’ rule: = 1/3 = 1/2 = ½ (why?) Chance of winning the prize for staying with Door 1 = 1/3 Chance of winning the prize for switching to Door 3 = 2/3

Calculating the denominator Bayes’ rule: A = prize behind Door 1; B = host chooses Door 2 (between Doors 2 and 3) Use law of total probability: X = door with prize X= Door 1: X= Door 2: X= Door 3:

Example 2: the HIV test problem HIV lab tests are quite accurate: 99% sensitivity: if a patient is HIV+, then probability that the test has positive results is % specificity: if a patient is HIV-, then probability that the test has negative results is also 0.99 HIV is rare in patients in our population: about 1 out of 1000 (even among those who get tested) Situation: A patient does a HIV test and gets a positive result Question: what are the chances that the patient is indeed HIV+?

Solution for the HIV test problem A = test was positive; B = patient is HIV+ We want to calculate P(B|A) Prior: P(B) = (1 out of 1000 is HIV+) Likelihood: P(A|B) = 0.99 (99% sensitivity) What about P(A) ?

Solution for the HIV test problem Calculation of P(A) Use law of total probability Term 1: Term 2:

Solution for HIV test Calculating P(B|A) P(B) = 0.001; P(A|B) = 0.99; P(A) = This means that even if the test is positive, it’s only 9% that the patient is HIV+

Some discussions Only 15% of doctors gets this right Most doctors think that if a HIV test is positive, there’s a high chance that the patient is HIV+ Why? They typically focus on the accuracy (sensitivity) of the test They're neglecting the background or base rate of HIV prevalence (prior)

Bonus question Russian roulette with 2 bullets You put 2 bullets into a revolver, such that they are next to each other Your opponent spins and pull the trigger … and survives. Now it’s your turn! Question: should you spin the revolver as well, or you shouldn’t spin it? Question 2: what if there’s only 1 bullet in the revolver?

Belief update London – Rome flight

Belief update Near London Near Rome Near Paris Near Monaco You look out the window, and you see… land…and sea…and high mountains But you know that you’ve been flying for a while unlikely maybe unlikely maybe unlikely maybe unlikely probably unlikely You don’t know over which area you are flying …

Bayesian belief update Prior: probability that the model is true (before the observation) Likelihood: how likely to have the observed event, if the model was true Denominator: marginal likelihood (or model evidence) Left hand side = posterior: probability that the model is true, after we have seen the observations Belief: probability distribution over all the possible models Captures our knowledge + uncertainty about the true world model How to update our belief after each observation?

Bayesian belief update Prior probabilityPrior distribution = prior belief Posterior probabilityPosterior distribution = posterior belief

Bayesian belief update example Search for a crashed airplane using Bayesian updating Imagine you're designing a search-and- rescue UAV. Its job is to autonomously look for aircraft wreckage It is easier to detect wreckage in some terrain types than others

Bayesian belief update example Difficulty model: what’s the probability to find the wreckage in the area

Bayesian belief update example Prior belief: based on the last known location

Bayesian belief update example Search: always go to the point with highest probability At the beginning:

Bayesian belief update example Search: always go to the point with highest probability After 10 steps:

Bayesian belief update example Search: always go to the point with highest probability After 50 steps:

Bayesian belief update example Search: always go to the point with highest probability After 250 steps:

Bayesian belief update example Search: always go to the point with highest probability After 500 steps:

Bayesian belief update example Search: always go to the point with highest probability After 1000 steps:

Bayesian belief update example Search: always go to the point with highest probability After 2000 steps:

Complex knowledge representation So far we deal with simple correlations between probabilities A A B B We use probabilities to capture uncertainty in our knowledge What if we have much more complicated network of correlations? Inference: derive extra information/conclusions from observed data How to do inference in complex networks? How to use Bayes’ rule there? Inference in simple systems: use Bayes’ rule

Inference with joint distribution The simplest way to do inference is to look at the joint distribution of the probability variables Joint distribution captures all the interconnections + dependencies An example from employment survey M: the person is a male? L: does the person work long hours? R: is the person rich? But before Bayesian nets …

Inference with joint distribution Truth table: if all the variables are binaries

Inference with joint distribution P(the person is rich) = ? = 0.26

Inference with joint distribution P(L | M) = ? P (L|M) = P(L and M)/P(M) = ( )/( ) = 0.35

Inference with joint distribution We can do any inference from joint distribution Issues: doesn’t scale well in practice (brute force solution) E.g., with 30 variables, we need 2^30 probabilities … (1 billion) In theory: doesn’t show the relationships We might want to exploit the structure of relationships to simplify the calculations E.g., if R (rich) is independent from M (male) and L (working long hours), then we can drop M and L when we do inference about R

Independence Definition: Two random variables are independent if their joint probability is the product of their probabilities Similarly: Another property: If P(A), P(B) > 0

Bayesian networks Use graphical representation to capture the dependencies between the random variables Studied for the exam Lecturer is in good mood High exam result P(M) = 0.3 P(cM) = 0.7 P(S) = 0.8 P(cS) = 0.2 P(H|M, S) = 0.9 P(H|cM,S) = 0.4 P(H|M, cS) = 0.5 P(H|cM,cS) = 0.05

Bayesian networks Inference: given high exam result, what is the probability that the lecturer was in a good mood? (P(M|H) = ?) Studied for the exam Lecturer is in good mood High exam result P(M) = 0.3 P(cM) = 0.7 P(S) = 0.8 P(cS) = 0.2 P(H|M, S) = 0.9 P(H|cM,S) = 0.4 P(H|M, cS) = 0.5 P(H|cM,cS) = 0.05

Bayesian networks P(M) = 0.3 P(cM) = 0.7 P(S) = 0.8 P(cS) = 0.2 P(H|M, S) = 0.9 P(H|cM,S) = 0.4 P(H|M, cS) = 0.5 P(H|cM,cS) = 0.05 P(M| H) = ? P(M) = 0.3 P(H|M) = P(H|M, S)P(S) + P(H|M,cS)P(cS) = 0.9* *0.2 = 0.73 P(H) = ?

Bayesian networks P(H) = P(H|M,S)P(M and S) + P(H|M,cS)P(M and cS) + P(H|cM,S)P(cM and S) + P(H|cM,cS)P(cM and cS) M and S are independent = P(H|M,S)P(M)P(S) + P(H|M,cS)P(M)P(cS) + P(H|cM,S)P(cM)P(S) + P(H|cM,cS)P(cM)P(cS) P(H)= 0.9*0.3* *0.3* *0.7* *0.7*0.2 = = 0.477

Bayesian networks P(M) = 0.3 P(cM) = 0.7 P(S) = 0.8 P(cS) = 0.2 P(H|M, S) = 0.9 P(H|cM,S) = 0.4 P(H|M, cS) = 0.5 P(H|cM,cS) = 0.05 P(M| H) = ? P(M) = 0.3 P(H|M) = 0.73 P(H) = P(M| H) = 0.3*0.73/0.477 = 0.459

Building Bayesian networks Bayesian nets are sometimes built manually, consulting domain experts for structure and probabilities. More often, the structure is supplied by domain experts (i.e., they specify what affects what) but the probabilities are learned from data. Sometimes both structure and probabilities are learned from data. Difficult problem: puts the AI program in a similar position to a scientist trying out different hypotheses. Need a method to reward the proposed net structure for matching the data, but to penalize excessive complexity (Occam’s razor). Building from data: With domain expert:

Properties of Bayesian networks Bayesian networks must be directed acyclic graphs. The major efficiency of the Bayesian network is that we have economized on memory. They are also easier for human beings to interpret than the raw joint distribution.