Bayesian Statistics and Belief Networks

Slides:



Advertisements
Similar presentations
BAYESIAN NETWORKS Ivan Bratko Faculty of Computer and Information Sc. University of Ljubljana.
Advertisements

Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Atomic events Probabilities Joint distribution Inference by enumeration Independence.
Artificial Intelligence Universitatea Politehnica Bucuresti Adina Magda Florea
BAYESIAN NETWORKS. Bayesian Network Motivation  We want a representation and reasoning system that is based on conditional independence  Compact yet.
1 Bayes nets Computing conditional probability Polytrees Probability Inferences Bayes nets Computing conditional probability Polytrees Probability Inferences.
Identifying Conditional Independencies in Bayes Nets Lecture 4.
For Monday Read chapter 18, sections 1-2 Homework: –Chapter 14, exercise 8 a-d.
For Monday Finish chapter 14 Homework: –Chapter 13, exercises 8, 15.
Bayesian Networks. Introduction A problem domain is modeled by a list of variables X 1, …, X n Knowledge about the problem domain is represented by a.
1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig.
Overview of Inference Algorithms for Bayesian Networks Wei Sun, PhD Assistant Research Professor SEOR Dept. & C4I Center George Mason University, 2009.
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) March, 16, 2009.
Bayesian network inference
Review: Bayesian learning and inference
Bayesian Networks. Motivation The conditional independence assumption made by naïve Bayes classifiers may seem to rigid, especially for classification.
Probabilistic Reasoning Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 14 (14.1, 14.2, 14.3, 14.4) Capturing uncertain knowledge Probabilistic.
Summary Belief Networks zObjective: yProbabilistic knowledge base + Inference engine that computes xProb(formula | “all evidence collected so far”) zBelief.
Bayesian networks Chapter 14 Section 1 – 2.
Bayesian Belief Networks
Bayesian Networks What is the likelihood of X given evidence E? i.e. P(X|E) = ?
. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
Bayesian Reasoning. Tax Data – Naive Bayes Classify: (_, No, Married, 95K, ?)
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Bayesian networks More commonly called graphical models A way to depict conditional independence relationships between random variables A compact specification.
Artificial Intelligence CS 165A Tuesday, November 27, 2007  Probabilistic Reasoning (Ch 14)
Read R&N Ch Next lecture: Read R&N
Bayesian networks Chapter 14. Outline Syntax Semantics.
A Brief Introduction to Graphical Models
1 CS 343: Artificial Intelligence Bayesian Networks Raymond J. Mooney University of Texas at Austin.
Bayesian networks Chapter 14 Section 1 – 2. Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact.
Bayesian Networks What is the likelihood of X given evidence E? i.e. P(X|E) = ?
Bayesian networks. Motivation We saw that the full joint probability can be used to answer any question about the domain, but can become intractable as.
For Wednesday Read Chapter 11, sections 1-2 Program 2 due.
Bayesian Networks for Data Mining David Heckerman Microsoft Research (Data Mining and Knowledge Discovery 1, (1997))
Bayesian Statistics and Belief Networks. Overview Book: Ch 13,14 Refresher on Probability Bayesian classifiers Belief Networks / Bayesian Networks.
Introduction to Bayesian Networks
Learning With Bayesian Networks Markus Kalisch ETH Zürich.
1 CMSC 671 Fall 2001 Class #21 – Tuesday, November 13.
Review: Bayesian inference  A general scenario:  Query variables: X  Evidence (observed) variables and their values: E = e  Unobserved variables: Y.
Wei Sun and KC Chang George Mason University March 2008 Convergence Study of Message Passing In Arbitrary Continuous Bayesian.
Inference Algorithms for Bayes Networks
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
1 Probability FOL fails for a domain due to: –Laziness: too much to list the complete set of rules, too hard to use the enormous rules that result –Theoretical.
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) Nov, 13, 2013.
Chapter 12. Probability Reasoning Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.
Reasoning Under Uncertainty: Belief Networks
CS 2750: Machine Learning Directed Graphical Models
Bayesian networks Chapter 14 Section 1 – 2.
Presented By S.Yamuna AP/CSE
Read R&N Ch Next lecture: Read R&N
Bayesian Networks: A Tutorial
Learning Bayesian Network Models from Data
Artificial Intelligence Chapter 19
Read R&N Ch Next lecture: Read R&N
Instructors: Fei Fang (This Lecture) and Dave Touretzky
CAP 5636 – Advanced Artificial Intelligence
Class #19 – Tuesday, November 3
CS 188: Artificial Intelligence
Biointelligence Lab School of Computer Sci. & Eng.
Graduate School of Information Sciences, Tohoku University
Directed Graphical Probabilistic Models: the sequel
Class #16 – Tuesday, October 26
Belief Networks CS121 – Winter 2003 Belief Networks.
Read R&N Ch Next lecture: Read R&N
Bayesian networks Chapter 14 Section 1 – 2.
Probabilistic Reasoning
Read R&N Ch Next lecture: Read R&N
Chapter 14 February 26, 2004.
Presentation transcript:

Bayesian Statistics and Belief Networks

Overview Book: Ch 8.3 Refresher on Bayesian statistics Bayesian classifiers Belief Networks / Bayesian Networks

Why Should We Care? Theoretical framework for machine learning, classification, knowledge representation, analysis Bayesian methods are capable of handling noisy, incomplete data sets Bayesian methods are commonly in use today

Bayesian Approach To Probability and Statistics Classical Probability : Physical property of the world (e.g., 50% flip of a fair coin). True probability. Bayesian Probability : A person’s degree of belief in event X. Personal probability. Unlike classical probability, Bayesian probabilities benefit from but do not require repeated trials - only focus on next event; e.g. probability Seawolves win next game?

Bayes Rule Product Rule: Equating Sides: i.e. All classification methods can be seen as estimates of Bayes’ Rule, with different techniques to estimate P(evidence|Class).

Simple Bayes Rule Example Probability your computer has a virus, V, = 1/1000. If virused, the probability of a crash that day, C, = 4/5. Probability your computer crashes in one day, C, = 1/10. P(C|V)=0.8 P(V)=1/1000 P(C)=1/10 Even though a crash is a strong indicator of a virus, we expect only 8/1000 crashes to be caused by viruses. Why not compute P(V|C) from direct evidence? Causal vs. Diagnostic knowledge; (consider if P(C) suddenly drops).

Bayesian Classifiers If we’re selecting the single most likely class, we only need to find the class that maximizes P(e|Class)P(Class). Hard part is estimating P(e|Class). Evidence e typically consists of a set of observations: Usual simplifying assumption is conditional independence:

Bayesian Classifier Example Probability C=Virus C=Bad Disk P(C) 0.4 0.6 P(crashes|C) 0.1 0.2 P(diskfull|C) 0.6 0.1 Given a case where the disk is full and computer crashes, the classifier chooses Virus as most likely since (0.4)(0.1)(0.6) > (0.6)(0.2)(0.1).

Beyond Conditional Independence Linear Classifier: C1 C2 Include second-order dependencies; i.e. pairwise combination of variables via joint probabilities: Correction factor - Difficult to compute - joint probabilities to consider

Belief Networks DAG that represents the dependencies between variables and specifies the joint probability distribution Random variables make up the nodes Directed links represent causal direct influences Each node has a conditional probability table quantifying the effects from the parents No directed cycles

Burglary Alarm Example P(B) P(E) Burglary Earthquake 0.001 0.002 B E P(A) T T 0.95 Alarm T F 0.94 F T 0.29 F F 0.001 A P(J) A P(M) John Calls Mary Calls T 0.70 T 0.90 F 0.01 F 0.05

Sample Bayesian Network

Using The Belief Network P(B) Burglary P(E) Earthquake 0.001 0.002 B E P(A) T T 0.95 Alarm T F 0.94 F T 0.29 F F 0.001 A P(M) John Calls Mary Calls T 0.70 A P(J) F 0.01 T 0.90 F 0.05 Probability of alarm, no burglary or earthquake, both John and Mary call:

Belief Computations Two types; both are NP-Hard Belief Revision Model explanatory/diagnostic tasks Given evidence, what is the most likely hypothesis to explain the evidence? Also called abductive reasoning Belief Updating Queries Given evidence, what is the probability of some other random variable occurring?

Belief Revision Given some evidence variables, find the state of all other variables that maximize the probability. E.g.: We know John Calls, but not Mary. What is the most likely state? Only consider assignments where J=T and M=F, and maximize. Best:

Belief Updating Causal Inferences Diagnostic Inferences Intercausal Inferences Mixed Inferences E Q Q E Q E E Q E

P(M|B)=0.67 via similar calculations Causal Inferences P(B) Burglary P(E) Earthquake Inference from cause to effect. E.g. Given a burglary, what is P(J|B)? 0.001 0.002 B E P(A) T T 0.95 Alarm T F 0.94 F T 0.29 F F 0.001 A P(M) John Calls Mary Calls T 0.70 A P(J) F 0.01 T 0.90 F 0.05 P(M|B)=0.67 via similar calculations

Diagnostic Inferences From effect to cause. E.g. Given that John calls, what is the P(burglary)? What is P(J)? Need P(A) first: Many false positives.

Intercausal Inferences Explaining Away Inferences. Given an alarm, P(B|A)=0.37. But if we add the evidence that earthquake is true, then P(B|A^E)=0.003. Even though B and E are independent, the presence of one may make the other more/less likely.

Mixed Inferences Simultaneous intercausal and diagnostic inference. E.g., if John calls and Earthquake is false: Computing these values exactly is somewhat complicated.

Exact Computation - Polytree Algorithm Judea Pearl, 1982 Only works on singly-connected networks - at most one undirected path between any two nodes. Backward-chaining Message-passing algorithm for computing posterior probabilities for query node X Compute causal support for X, evidence variables “above” X Compute evidential support for X, evidence variables “below” X

Algorithm recursive, message Polytree Computation ... U(1) U(m) X Z(1,j) Z(n,j) ... Y(1) Y(n) Algorithm recursive, message passing chain

Other Query Methods Exact Algorithms Approximate Algorithms Clustering Cluster nodes to make single cluster, message-pass along that cluster Symbolic Probabilistic Inference Uses d-separation to find expressions to combine Approximate Algorithms Select sampling distribution, conduct trials sampling from root to evidence nodes, accumulating weight for each node. Still tractable for dense networks. Forward Simulation Stochastic Simulation

Summary Bayesian methods provide sound theory and framework for implementation of classifiers Bayesian networks a natural way to represent conditional independence information. Qualitative info in links, quantitative in tables. NP-complete or NP-hard to compute exact values; typical to make simplifying assumptions or approximate methods. Many Bayesian tools and systems exist

References Russel, S. and Norvig, P. (1995). Artificial Intelligence, A Modern Approach. Prentice Hall. Weiss, S. and Kulikowski, C. (1991). Computer Systems That Learn. Morgan Kaufman. Heckerman, D. (1996). A Tutorial on Learning with Bayesian Networks. Microsoft Technical Report MSR-TR-95-06. Internet Resources on Bayesian Networks and Machine Learning: http://www.cs.orst.edu/~wangxi/resource.html